What is the correct way to handle Screen Recording API 500 errors during high-concurrency load tests?

Is there a clean way to handle Screen Recording API 500 errors during high-concurrency load tests?

We are currently running a performance benchmark against the Genesys Cloud platform to validate our screen recording capabilities under peak load conditions. The environment is set up with JMeter 5.4.1, simulating 500 concurrent agents initiating screen shares simultaneously via the POST /api/v2/recording/screen/recordings endpoint.

The issue arises when the concurrent request volume exceeds approximately 200 requests per second. At this threshold, the API begins returning HTTP 500 Internal Server Error responses for roughly 15% of the requests. The error occurs immediately upon initiation, suggesting a backend processing bottleneck rather than a network timeout.

Here is the specific error payload received:

{
"status": 500,
"code": "INTERNAL_SERVER_ERROR",
"message": "Failed to initialize screen recording session due to resource exhaustion. Please retry later."
}

We have verified that the WebSocket connections for the actual screen stream are being established correctly for the successful requests. The failure seems isolated to the API handshake phase. Our current JMeter configuration uses a constant throughput timer to maintain the 200 rps target, with a ramp-up period of 60 seconds.

Given that we are in the Asia/Singapore timezone, there might be regional edge node capacity constraints, but the documentation does not explicitly state per-region limits for screen recording initiation. We are also checking if there are any specific rate limits or quotas that apply to the screen recording endpoints that differ from standard call recording APIs.

Has anyone encountered similar resource exhaustion errors when scaling screen recording initiations? Are there recommended JMeter configurations or API patterns to mitigate this issue without reducing the concurrent user count? We need to ensure our architecture can support sudden spikes in agent activity without dropping recording sessions.

Have you tried isolating the payload size issue from the routing logic? A common fix is to align JMeter thread counts with the 10 req/s tenant limit. Monitor Queue Performance dashboards for correlation with recording spikes. Verify Service Level metrics remain stable during these tests to ensure no downstream impact on agent capacity.