Architect IVR WebSocket drop rate spikes at 500 concurrent sessions

What is the standard approach to configure Architect flow nodes to handle WebSocket termination during high-concurrency load testing without triggering immediate fallback to voice prompts?

Our team is currently stress-testing the IVR capacity using JMeter scripts that simulate 500 concurrent inbound calls per minute. The environment is a standard Genesys Cloud instance with default scaling policies enabled. We are observing a significant increase in call drops specifically at the ‘Gather Input’ node when the concurrent session count exceeds 450. The error logs show a 408 Request Timeout from the client side, but the server-side traces indicate the WebSocket connection is being reset by the platform before the timeout threshold is reached. This behavior is inconsistent with the expected linear scaling of the Architect engine. We have verified that the API rate limits for the underlying REST calls are not being hit, as the IVR flow relies primarily on WebSocket streams for real-time interaction. The issue seems isolated to the media layer rather than the data layer. We attempted to adjust the ‘Max Wait Time’ in the Gather Input configuration to 30 seconds, but this did not mitigate the drop rate. Instead, it appears that the connection state is lost during the initial handshake phase under load. The JMeter configuration uses persistent connections with a think time of 2 seconds between calls to simulate realistic user behavior. We are concerned that this limitation will impact our production stability during peak hours. The current Architect flow includes a simple menu with three options, followed by a transfer to a queue. No complex integrations or external HTTP requests are involved in this path. We need to understand if there is a specific configuration parameter for WebSocket keep-alive or connection pooling that can be adjusted to improve stability under load. Alternatively, is there a recommended pattern for handling transient connection failures within the Architect flow itself to prevent call drops? We are looking for best practices on load distribution for IVR nodes to ensure reliable performance at scale.