It depends, but generally… the 500s are likely due to WebSocket connection limits being hit during the WFM publish storm. The platform_api gets hammered, causing downstream latency that breaks recording session handshakes. Check your JMeter thread group configuration because this usually happens when concurrent requests exceed the API rate limits for the recording service.
Try adding a throttle controller to your load test plan. Limit requests to 50 per second to mimic realistic agent behavior. This prevents connection pool exhaustion. Also, verify the WebSocket handshake succeeds before sending media streams. If the handshake fails, the recording service drops the session. A common fix is to implement exponential backoff on retries. This reduces the load on the server and allows the recording service to catch up.
try this jmeter config instead of just throttling. the issue is likely not just rate limits but the burst pattern during shift swaps. you need to stagger the recording session requests.
120
true
add a constant timer of 2000ms between requests in the recording sampler. this helps avoid the 500s by smoothing the load on the recording service. the wfm publish is fine because it handles bursts better than the media engine.
also check if your tenant has custom websocket limits. sometimes the default 500 connections get saturated by the agent desktops trying to reconnect. if you see 503s mixed with 500s, it is definitely a connection pool issue. increasing the ramp time helps distribute the load more evenly across the second window.
The way I solve this is by checking the S3 bucket permissions across all orgs first. Bulk exports often fail silently if the cross-account IAM role lacks s3:GetObject for the secondary regions. Verify the recording_metadata_sync settings in the WFM optimization config to ensure the scheduler correctly applies voice handle time to digital queues during peak swaps.