Need some help troubleshooting WebSocket disconnects during high-concurrency JMeter test

Looking for some advice on troubleshooting this persistent issue where WebSocket connections drop unexpectedly during our load testing phase. We are trying to validate the platform’s capacity for concurrent inbound calls using the Genesys Cloud Platform API and WebSocket streaming endpoints. The environment is set up as follows:

  • Genesys Cloud Region: Singapore (ap-southeast-1)
  • JMeter Version: 5.6.2
  • Thread Group Configuration: 500 concurrent users, ramp-up 60 seconds, loop count infinite
  • API Endpoint: POST /api/v2/analytics/conversations/details/query and WS wss://webchat-singapore.genesis.com/webchat/agent
  • SDK: Pure JMeter WebSocket sampler (no external libraries)

The problem manifests after approximately 120 seconds into the test run. Initially, all 500 users establish connections successfully. The handshake returns 101 Switching Protocols without issues. However, around the 2-minute mark, we start seeing a spike in connection drops. The JMeter console logs show java.net.SocketException: Connection reset errors on about 15-20% of the active sessions. Interestingly, the HTTP API calls for conversation queries continue to return 200 OK with valid data, but the WebSocket streams for real-time agent events just sever the link.

We have checked the standard rate limits for our org, and the analytics queries are well within the allowed threshold (approx 500 req/min). The WebSocket connections themselves should not be subject to the same REST API rate limits, but it feels like there is a hidden cap on concurrent WebSocket sessions per organization or per IP range.

The JMeter thread group is configured to re-connect automatically, which works, but it creates a secondary load spike that we are trying to avoid. We are running this from a local machine in Singapore to minimize latency. Is there a known limitation on the number of concurrent WebSocket connections allowed for load testing purposes? Or is this a timeout configuration issue on the Genesys edge servers? We have tried increasing the idle timeout in the sampler, but it does not seem to help. Any insights on how to configure the client-side settings to handle this better would be appreciated.

If I remember correctly, migrating from Zendesk’s ticket-based system to Genesys Cloud often reveals that WebSocket stability relies heavily on proper session management in Architect, not just API throughput. Zendesk macros are stateless, but Genesys interactions maintain real-time state. When JMeter hits 500 concurrent users, the default WebSocket keep-alive intervals might be too aggressive for the Singapore region latency.

Try adjusting the pingInterval and pongTimeout in your JMeter WebSocket sampler. The default settings often cause premature drops if the network path has slight jitter. Also, ensure your Architect flow isn’t dropping connections due to timeout nodes before the WebSocket handshake completes. Check if the ws:// endpoint is correctly mapped to your specific media region.

  • Verify WebSocket keep-alive settings in JMeter
  • Review Architect flow timeout configurations
  • Confirm media region routing for Singapore
  • Check for IP allowlist restrictions on WebSocket ports

Check your WebSocket keep-alive configuration. The suggestion above regarding ping/pong intervals is spot on. For high concurrency, ensure the interval aligns with regional latency. Here are the recommended settings for stable connections:

Parameter Value
pingInterval 30000
pongTimeout 10000

This prevents premature drops during load testing.