409 Conflict
{
"message": "Resource conflict detected. The campaign cannot be started because it is already in a transitional state or has active reservations that conflict with the new configuration."
}
running a load test against the predictive outbound platform using jmeter 5.6.2. the goal is to simulate 2000 concurrent agents logging in and triggering predictive campaigns simultaneously. the environment is us-east-1. we are using the python sdk 3.4.1 for some pre-test setup, but the main hammering is via raw http requests to the /api/v2/outbound/campaigns/{campaignId}/start endpoint.
the issue happens when we ramp up the load. initially, the campaigns start fine. but as the concurrent thread count hits around 500, we start seeing these 409 errors. the logs show that the campaign state in the ui flickers between ‘paused’ and ‘running’ but never settles.
we checked the architect flow attached to the campaign. it is a simple flow with just a disconnect node and a log event. no complex logic. the queue associated with the campaign has a capacity of 50 agents, but we are testing with more agents logging in than the queue can handle, expecting the system to queue the calls or drop them gracefully. instead, the api throws a conflict.
is there a specific rate limit on the campaign start endpoint that causes this state lock? we are hitting the endpoint once per campaign instance, not in a loop. the jmeter config uses a concurrent thread group with a ramp-up period of 10 seconds. we also noticed that the last_modified_date on the campaign resource does not update during the error window, suggesting the write is being blocked.
we have tried adding a retry mechanism in jmeter with an exponential backoff, but the 409 persists for up to 5 minutes after the initial failure. this seems like a backend state management issue rather than a simple rate limit, as we are not seeing 429s.
can anyone confirm if there is a hidden lock on the campaign resource during high-concurrency starts? we are using the standard predictive routing license tier. the test data includes 1000 unique contact lists, each with 1000 contacts. the dial plan is set to ‘predictive’ with a target interval of 5 seconds.