Trying to get our configuration set up for the load parameters for the WFM Evaluation API endpoints. The goal is to simulate a burst of quality evaluation submissions to test the backend processing queue limits. The environment details are listed below.
- Environment: Genesys Cloud US1
- Tool: JMeter 5.6.2
- Thread Count: 200 concurrent threads
- Ramp-up: 10 seconds
- Endpoint:
/api/v2/wfm/evaluation/evaluations - Auth: OAuth2 Client Credentials flow
- Payload Size: Approx 2KB JSON per request
The issue appears when the thread count exceeds 50. At 50 threads, the response time is stable around 300ms. When increasing to 200 threads, the API returns 503 Service Unavailable for roughly 40% of the requests. The error message body contains: {"code":"service_unavailable","message":"The service is temporarily unavailable. Please retry later."}. This happens even though the rate limit headers (X-RateLimit-Remaining) show sufficient quota remaining. The requests are not being throttled by the standard rate limiter, so it seems like a downstream capacity issue or a WebSocket connection pool exhaustion on the Genesys side.
Is there a specific configuration in Architect or WFM settings that limits the concurrent processing of evaluation submissions? The documentation mentions rate limits per application, but it does not clearly define a hard cap on concurrent POST requests for this specific endpoint. We are using the standard OAuth2 client credentials. The JMeter script handles token refresh automatically, so authentication is not the bottleneck. The network latency is negligible. We need to understand if this 503 is a hard limit on the WFM Evaluation service or if there is a setting to increase the concurrency threshold. Any advice on how to structure the load test to avoid this error would be helpful. We want to ensure the system can handle peak volume during end-of-day evaluation processing.