Looking for advice on configuring Genesys Cloud Edge for high-density agent login scenarios. The current setup involves a dedicated Edge instance (v2.16.0) deployed in a private AWS region to handle latency-sensitive voice traffic. The goal is to validate the stability of the WebSocket connection pool when simulating a sudden surge of 500 agents logging in simultaneously via the Java SDK (v1.45.2).
The JMeter script is configured to hit the /api/v2/edge/connections endpoint with a ramp-up time of 10 seconds to mimic a shift start. During the initial burst, the Edge logs show a spike in ws_connect_attempts. However, approximately 30% of the connections fail with a 1006 Abnormal Closure error within the first 5 seconds. The Genesys Cloud platform logs indicate that the connections are being accepted by the core, but the handshake with the Edge node fails intermittently.
The Edge configuration has the max_connections set to 1000, which should theoretically handle the load. The network path between the Edge node and the Genesys Cloud region is stable, with no packet loss detected via ping tests. The issue seems specific to the WebSocket handshake phase rather than the subsequent media stream establishment.
Is there a specific timeout parameter or connection pooling setting on the Edge side that needs adjustment to handle this burst pattern? The current idle_timeout is set to the default 30 seconds. Increasing the thread pool size on the JMeter server did not resolve the issue, suggesting the bottleneck is on the Edge processing capacity or the handshake validation logic.
Any insights into how the Edge handles concurrent WebSocket handshakes under peak load would be appreciated. The error logs do not provide a specific HTTP status code, only the WebSocket close code, which makes debugging difficult.
What configuration changes or troubleshooting steps are recommended to resolve WebSocket 1006 errors during high-concurrency agent login simulations on Genesys Cloud Edge?