Why does this config cause intermittent 401 Unauthorized errors during secret rotation?
I am implementing a zero-downtime rotation for our Genesys Cloud OAuth client secrets in a FastAPI microservice. The goal is to update the client_secret in our configuration store without restarting the service or dropping active requests. I am using python-oauth2 with a custom token cache.
The documentation states: “The client_credentials grant type is suitable for server-to-server communication where no user context is required.” However, when I update the secret in the database and trigger a refresh, I see a race condition where requests using the old token fail with 401 Unauthorized before the new token is fully cached.
Here is my current configuration and refresh logic:
oauth:
provider: genesys_cloud
client_id: "my-app-client-id"
client_secret: "old-secret-value"
token_url: "https://api.mypurecloud.com/oauth/token"
scopes:
- "conversation:read"
- "analytics:read"
import httpx
import time
from fastapi import FastAPI
from oauthlib.oauth2 import BackendApplicationClient
from requests_oauthlib import OAuth2Session
app = FastAPI()
# Global session holder
oauth_session = None
def get_oauth_session():
global oauth_session
if oauth_session is None or time.time() > oauth_session.expires_at:
client = BackendApplicationClient(client_id="my-app-client-id")
# This block is not thread-safe
oauth_session = OAuth2Session(client=client)
oauth_session.fetch_token(
token_url="https://api.mypurecloud.com/oauth/token",
username="my-app-client-id",
password="new-secret-value" # Updated secret
)
return oauth_session
When I deploy the new secret, the fetch_token call succeeds, but concurrent requests sometimes grab the None or expired session object before it is re-assigned, leading to failures. The requests_oauthlib library does not seem to handle concurrent token refreshes gracefully in this global scope.
Is there a recommended pattern for thread-safe token rotation in Python that avoids this race condition? I need to ensure that the old token remains valid until the new one is fully cached, or that all requests wait for the refresh to complete without blocking the entire event loop.