Stuck on a problem and need help troubleshooting a weird failure mode in our outbound dialing load tests. We are running a Genesys Cloud instance in the ap-southeast-1 region (Singapore) and trying to validate the system’s capacity for high-volume predictive dialing campaigns.
The setup involves a JMeter script that simulates 500 concurrent agents initiating calls via the standard outbound campaign flow. We are hitting the /api/v2/outbound/campaigns/{id}/status endpoint to poll for status updates and using the predictive dialer configuration to drive the call volume. The initial ramp-up phase works fine. We can push about 200 concurrent calls without issues. The WebSocket connections stay stable, and the API responses are coming back in under 200ms.
However, once we hit the 300-call mark, the JMeter dashboard starts showing a spike in 503 Service Unavailable errors. This isn’t a 429 rate limit issue. The response headers do not include the Retry-After field, which usually indicates a rate limit breach. Instead, we are getting a generic 503 with an empty body. The interesting part is that the calls themselves seem to go through. The agents receive the audio, and the conversations are logged. But the API layer seems to choke on the status polling requests.
We have checked the server logs on our load balancer, and there are no backend timeouts. The Genesys Cloud admin console shows the tenant is well within the licensed concurrent call limits. We are using the latest version of the JMeter HTTP Request sampler with keep-alive connections enabled. We also verified that our JWT tokens are valid and not expiring during the test duration.
Has anyone seen this specific 503 behavior when polling outbound campaign status under heavy load? Is there a hidden limit on the number of concurrent status poll requests per campaign, or is this a known issue with the Singapore region’s outbound dialing infrastructure? Any insights on how to tune the polling frequency or batch the requests to avoid this would be greatly appreciated.
Is there a specific concurrency limit for outbound campaign status polling endpoints that triggers a 503 error instead of a 429, and how can we mitigate this in our load test scripts?