Can anyone clarify the behavior of the webrtc softphone endpoint when subjected to high concurrent connection attempts? we are running a stress test to validate the softphone capacity in us1 region. the goal is to simulate 2000 concurrent agent logins and initial media handshakes.
environment details:
region: us1
jmeter version: 5.6
genesys cloud sdk: 2.3.0
test duration: 15 minutes
concurrency: ramp up to 2000 users over 5 minutes
the issue is specific to the websocket upgrade phase. initially, connections establish fine. however, once the concurrent user count exceeds roughly 800, we start seeing a spike in 503 Service Unavailable errors from the media server. the error log shows:
“websocket connection failed: 503 service unavailable - media server capacity exceeded”
this happens even though our tenant limits are set much higher. we are using the standard architect flow for inbound calls routed to agents. the flow logic is simple, just a queue and a routing rule. no complex scripting.
we checked the api rate limits for the authentication endpoint, and those are fine. the 401s are not increasing. the problem is strictly at the media layer.
is there a hidden limit for concurrent webrtc sessions per tenant or per region that is not documented in the standard capacity planning docs? or is this a known issue with the sdk 2.3.0 handling the handshake under load?
we tried reducing the jmeter thread count to 500, and the 503 errors disappear completely. this suggests a hard cap or a resource exhaustion issue on the platform side.
any insights on how to tune the jmeter script to better handle the websocket handshake without triggering these 503s? or is this a platform limitation we need to accept?
thanks for any help. this is blocking our qa sign off.