Stuck on stun binding failed errors when ramping up to 200 concurrent WebRTC sessions. The JMeter script hits the v2/communications/conversations endpoint, but audio drops after 30 seconds. Is there a hard limit on STUN allocations per org? Running this from Singapore, so latency might be a factor, but the 503 Service Unavailable response seems too aggressive for a beginner setup.
The main issue here is likely not a hard STUN allocation limit, but rather the signaling overhead hitting the Genesys Cloud edge rate limits during rapid session creation. When JMeter fires 200 concurrent POST requests to /v2/communications/conversations, the platform throttles the WebSocket handshake phase. The 503 indicates the signaling server is rejecting new connections before the STUN binding can even occur, which is why audio drops after 30 seconds-the session never fully establishes.
Check your Data Actions configuration if you are triggering these calls programmatically. Ensure you are batching the outbound requests. A common fix is to implement an exponential backoff retry logic in the JMeter controller rather than a flat ramp-up.
{
"retry_policy": {
"max_retries": 3,
"backoff_strategy": "exponential"
}
}
Also, verify the STUN server configuration in your Genesys Cloud instance settings. If you are using a custom STUN/TURN provider, ensure it is correctly whitelisted. The default Genesys STUN servers are resilient, but external latency from Singapore might cause timeouts if the initial handshake exceeds the 5-second window.
Make sure you implement an exponential backoff strategy in your JMeter test plan to align with the platform’s rate limiting behavior. The 503 errors are not indicative of a STUN allocation cap, but rather a protective measure against signaling overload. When scaling WebRTC integrations, especially for multi-tenant AppFoundry solutions, the initial burst of concurrent POST requests to the conversations endpoint often exceeds the default throttle limits. This causes the signaling server to reject new WebSocket handshakes before the ICE candidates can be gathered and validated.
The thirty-second audio drop you are observing is a direct consequence of the signaling failure. Without a successful handshake, the media stream cannot establish a peer connection. The STUN binding fails because the control plane never completes the necessary setup. It is crucial to decouple the signaling phase from the media phase in your load testing logic. Use a gradual ramp-up period instead of an immediate spike to 200 concurrent sessions.
Adjusting the JMeter thread group to add a delay between request batches will likely resolve the 503 responses. This approach mimics realistic user behavior and allows the Genesys Cloud edge to process each connection properly. We have seen similar issues in our own integration deployments where aggressive load testing triggered false positives for service availability. Monitoring the WebSocket status codes during the ramp-up phase will provide clearer insights into where the bottleneck truly lies.