I’m completely stumped as to why the predictive routing queue depth spikes to over 500 agents during a controlled load test, even though the api/v2/analytics/queue/conversations endpoint shows stable throughput.
The environment is US1 with JMeter 5.4 simulating 200 concurrent inbound calls via WebSocket. The Architect flow is straightforward: inbound → predict → queue. No complex data actions. However, once the concurrent volume hits 150, the predicted answer time jumps from 2s to 45s. The WebSocket connections remain stable, and no 429 Too Many Requests errors are logged on the client side.
Is there a specific rate limit on the predictive routing engine itself that differs from the standard API limits? The api/v2/predictiverouting/settings returns default values, but I suspect the load balancer might be dropping connections before they reach the routing logic. Need to know if this is a configuration issue or a platform capacity ceiling for predictive queues.