Python SDK token refresh race condition in async batch job

Running into a weird race condition with the Genesys Cloud Python SDK (genesyscloud). I’m processing a large batch of interaction updates asynchronously. The job starts fine, but halfway through, a chunk of requests fail with 401 Unauthorized.

The SDK has an internal refresh mechanism, but it seems to be lagging when multiple async tasks hit the expiration window at the same time. Here’s the pattern:

  • Task A sends request → 401
  • SDK triggers refresh → gets new token
  • Task B sends request (started before refresh completed) → 401
  • Task C sends request (after refresh) → 200

It looks like the refresh isn’t locking the outgoing requests properly or the cache update is async itself.

I’ve tried:

  • Setting retry_on_401=True in the client config. It helps but causes massive delays as each task retries individually.
  • Manually refreshing the token before the batch loop. Doesn’t help because the batch takes longer than the 3600s expiry.

Is there a way to hook into the SDK’s token refresh event to pause the queue? Or am I better off implementing a custom middleware that catches 401s and retries the specific request with a fresh token fetched via /oauth/token?

Here’s the error I’m seeing in the logs:

genesyscloud.rest.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'WWW-Authenticate': 'Bearer error="invalid_token", error_description="Token expired"'})

The environment is Python 3.10, SDK version 8.6.2.