We’re building a daily analytics export job that pulls interaction data from Genesys Cloud and writes it to an S3 bucket. The stack is Python 3.11, genesys-cloud-python-sdk v1.0.6, and boto3 for the storage layer.
The goal is straightforward: fetch all voice interactions from the previous 24 hours, flatten the JSON structure, and dump it as a CSV file in S3. The issue pops up when the volume spikes. On a quiet Tuesday, the script runs in under 5 minutes. On a Friday with roughly 50,000 interactions, it hangs on the pagination loop.
Here’s the relevant snippet:
from genesyscloud.api import ConversationApi
from genesyscloud.conversations.models import ConversationSearchQuery
api_instance = ConversationApi()
query = ConversationSearchQuery(
query='type:voice and startTime:[now-1d TO now]',
size=1000
)
results = []
while True:
response = api_instance.post_conversations_search(query=query)
results.extend(response.entities)
if not response.next_page_id:
break
query.page_id = response.next_page_id
The script stops responding after about 4,000 pages. I’ve checked the x-gc-request-id on the last successful request, and there’s no server-side error logged in Genesys. It looks like the Python SDK client is timing out on the HTTP keep-alive connection, or perhaps the next_page_id is becoming stale due to the volume of data shifting under the cursor.
I’ve tried increasing the timeout parameter in the SDK configuration, but that just makes the hang last longer before it eventually throws a ConnectionError. I can’t just batch this into smaller chunks because the search query needs to cover the full 24-hour window to ensure we don’t miss anything.
Has anyone hit this wall with large-scale pagination? Should I be switching to the async API endpoints, or is there a way to configure the underlying requests library to handle long-lived pagination loops better? The S3 write part works fine once the data is in memory.