Hey folks,
Looking for the best way to chunk requests when pulling historical agent summary data via the Python SDK and dumping it to S3. We’ve got a daily job that runs at 2 AM EST to archive last day’s metrics. It works fine for small teams, but when I scale it to our full 500-agent pool, the script keeps choking.
I’m using the genesyscloud Python SDK to hit /api/v2/analytics/agents/summary. The endpoint has a max limit of 1000 records per request, but the real killer seems to be the processing time on the Genesys side for large date ranges combined with high cardinality on the groupBy fields. I’m getting 504 Gateway Timeout errors after about 2-3 minutes of execution.
Here’s the rough loop I’m using to paginate through the agents:
from genesyscloud import analytics_api
import boto3
# ... auth setup ...
start_date = "2023-10-01T00:00:00Z"
end_date = "2023-10-02T00:00:00Z"
agent_ids = ["id1", "id2", "...id500"] # Fetched separately
response = analytics_api.post_analytics_agents_summary(
analytics_api.PostAnalyticsAgentsSummaryRequest(
query=analytics_api.SummaryQuery(
date_from=start_date,
date_to=end_date,
group_by=["agent.id"],
interval="PT1H"
),
agent_ids=agent_ids,
size=1000
)
)
The response object takes forever to populate, and sometimes the connection just drops. I’ve tried reducing the size to 100, but that just means more API calls and we hit the rate limits faster.
Should I be slicing the agent_ids list into smaller batches of 50 before hitting the API? Or is there a better pattern for handling these bulk exports without timing out? We’re using boto3 to write the resulting JSON to S3, which is fast enough, so the bottleneck is definitely the Genesys API call.
Any tips on handling the pagination or request timing would be appreciated. The docs aren’t super clear on the hard timeout limits for this specific endpoint.