looking for advice on handling rate limiting when stress testing a simple architect flow. running jmeter from singapore to simulate concurrent inbound calls hitting a specific ivr path. the goal is to validate capacity planning for a peak volume event.
setup:
- jmeter 5.6.2 running locally in asia/singapore.
- thread group: 200 threads, ramp up 10s, loop count 5.
- using the
/api/v2/architect/flows/{flow_id}/simulateendpoint to mimic call flow execution. - auth: oauth2 client credentials grant, refreshing token every 55 minutes.
issue:
when concurrent threads exceed 50, i start seeing 429 too many requests responses. the response headers include retry-after: 2. initially, i assumed this was just the standard api rate limit (100 requests per second per client id). however, even with a constant throughput timer set to 80 requests per second, the 429s persist.
i noticed that the error rate spikes specifically when the simulate endpoint tries to process the transfer action within the flow. is there a different rate limit bucket for flow simulation versus standard api calls? or is the bottleneck related to the underlying websocket connections required for real-time flow execution?
i have tried:
- adding a
random timer(50-150ms) to stagger requests. - splitting the load across two different client ids with separate oauth tokens.
- reducing payload size in the simulate request body.
none of these seem to help. the 429s continue to appear around the 60-70 concurrent request mark.
is there a way to increase the rate limit for load testing purposes? or should i be using a different endpoint/method for high-volume flow simulation? any insights on how to accurately model architect flow capacity without hitting these limits would be appreciated. currently blocked on finalizing our capacity report.