Predictive routing campaign stuck in 'paused' state with 409 conflict during high-throughput jmeter test

409 Conflict
{
 "message": "Resource conflict detected. The campaign cannot be started because it is already in a transitional state or has active reservations that conflict with the new configuration."
}

running a load test against the predictive outbound platform using jmeter 5.6.2. the goal is to simulate 2000 concurrent agents logging in and triggering predictive campaigns simultaneously. the environment is us-east-1. we are using the python sdk 3.4.1 for some pre-test setup, but the main hammering is via raw http requests to the /api/v2/outbound/campaigns/{campaignId}/start endpoint.

the issue happens when we ramp up the load. initially, the campaigns start fine. but as the concurrent thread count hits around 500, we start seeing these 409 errors. the logs show that the campaign state in the ui flickers between ‘paused’ and ‘running’ but never settles.

we checked the architect flow attached to the campaign. it is a simple flow with just a disconnect node and a log event. no complex logic. the queue associated with the campaign has a capacity of 50 agents, but we are testing with more agents logging in than the queue can handle, expecting the system to queue the calls or drop them gracefully. instead, the api throws a conflict.

is there a specific rate limit on the campaign start endpoint that causes this state lock? we are hitting the endpoint once per campaign instance, not in a loop. the jmeter config uses a concurrent thread group with a ramp-up period of 10 seconds. we also noticed that the last_modified_date on the campaign resource does not update during the error window, suggesting the write is being blocked.

we have tried adding a retry mechanism in jmeter with an exponential backoff, but the 409 persists for up to 5 minutes after the initial failure. this seems like a backend state management issue rather than a simple rate limit, as we are not seeing 429s.

can anyone confirm if there is a hidden lock on the campaign resource during high-concurrency starts? we are using the standard predictive routing license tier. the test data includes 1000 unique contact lists, each with 1000 contacts. the dial plan is set to ‘predictive’ with a target interval of 5 seconds.

You need to decouple the agent login events from the predictive campaign start triggers. The 409 Conflict arises because the predictive outbound engine locks the campaign resource during state transitions, and your JMeter script is likely firing concurrent POST /api/v2/outbound/campaigns/{campaignId}/start requests before the previous state change fully commits. In high-throughput scenarios, the API returns this conflict to prevent data corruption.

Instead of triggering the campaign start directly from the agent login webhook, implement a rate-limited queue or use the Genesys Cloud Data Action to check the current campaign status via GET /api/v2/outbound/campaigns/{campaignId} before attempting to start. Only proceed if the status is stopped. This prevents the race condition. Refer to the official documentation on predictive campaign lifecycle management for more details on state handling: https://developer.genesys.cloud/api/v2/outbound/campaigns. Also, ensure your Python SDK handles exponential backoff for 429 and 409 errors to avoid overwhelming the API gateway.

If I remember correctly, adding X-Genesys-Client: jmeter to the headers bypasses some queueing logic. It helped in our SG load tests when the API hammered the state endpoints.