Mastering Pagination in Genesys Cloud Analytics: Cursor vs. Page-Based Approaches
What You Will Build
- This tutorial demonstrates how to retrieve large volumes of conversation analytics data from Genesys Cloud CX without hitting rate limits or memory constraints.
- It utilizes the Genesys Cloud
/api/v2/analytics/conversations/details/queryendpoint and the official Python SDK. - The implementation covers both cursor-based pagination (recommended for real-time or near-real-time data) and page-based pagination (required for historical data), showing the exact code patterns for each.
Prerequisites
- OAuth Client Type: Service Account (Client Credentials) or User Access Token (Authorization Code).
- Required Scopes:
analytics:conversation:readis mandatory. If you need to filter by specific user attributes, you may also needuser:read. - SDK Version: Genesys Cloud Python SDK version
2.200.0or higher. - Language/Runtime: Python 3.9+.
- External Dependencies:
genesyscloud:pip install genesyscloudrequests: Included in SDK dependencies, but useful for raw HTTP debugging.
Authentication Setup
Before querying analytics, you must establish an authenticated session. Genesys Cloud uses OAuth 2.0. For backend services, the Client Credentials flow is the standard. The SDK handles token caching and automatic refresh if configured correctly, but understanding the initial setup is critical.
from genesyscloud.platform_client import PlatformClient
from genesyscloud.auth import OAuthClientCredentials
import os
def get_platform_client() -> PlatformClient:
"""
Initializes and returns a configured PlatformClient.
"""
pc = PlatformClient()
# Configure OAuth using Client Credentials
# These environment variables must be set in your deployment environment
auth_settings = {
'client_id': os.environ.get('GENESYS_CLIENT_ID'),
'client_secret': os.environ.get('GENESYS_CLIENT_SECRET'),
'environment': os.environ.get('GENESYS_ENVIRONMENT', 'mypurecloud.com') # e.g., usw2.pure.cloud
}
oauth = OAuthClientCredentials(auth_settings)
# Set the client for the platform
pc.set_oauth_client(oauth)
return pc
# Initialize the client
platform_client = get_platform_client()
Note on Scopes: The analytics endpoint requires analytics:conversation:read. Ensure your OAuth client in the Genesys Cloud admin console has this scope granted. Without it, the API returns a 403 Forbidden error.
Implementation
Step 1: Understanding the Two Pagination Models
The /api/v2/analytics/conversations/details/query endpoint behaves differently depending on the dateRangeType parameter.
- Cursor-Based Pagination: Used when
dateRangeTypeisrealtimeornearRealtime. The response includes anextUrifield. You do not send apagenumber; you follow the URI provided in the response. This is efficient for streaming data but has a time limit on how long the cursor remains valid. - Page-Based Pagination: Used when
dateRangeTypeishistorical. The response includespageSize,pageNumber,total, andnextUri. You must increment thepageNumberin your request body untilpageNumberexceedstotal / pageSize.
This tutorial focuses on Historical Data using Page-Based Pagination, as this is the most common use case for bulk data extraction, reporting, and machine learning training data preparation. However, the logic for Cursor-Based is also provided for completeness.
Step 2: Constructing the Query Payload
The analytics API requires a JSON payload to define the query. Key fields include dateRangeType, interval, groupBy, and filterBy.
Critical Parameter: interval. For historical data, this must be a valid ISO 8601 duration (e.g., PT1H for 1 hour, P1D for 1 day). The maximum interval for historical queries is typically 1 day (P1D) if you want detailed conversation-level data. Larger intervals may aggregate data differently or fail.
Here is a robust function to build the initial query payload:
from genesyscloud.analytics.api.analytics_conversations_api import AnalyticsConversationsApi
from typing import Dict, Any, List
def build_analytics_query(start_date: str, end_date: str, group_by: List[str] = None) -> Dict[str, Any]:
"""
Builds the JSON payload for the analytics query.
Args:
start_date: ISO 8601 start datetime (e.g., "2023-10-01T00:00:00Z")
end_date: ISO 8601 end datetime (e.g., "2023-10-02T00:00:00Z")
group_by: List of dimensions to group by (e.g., ["wrapupcode", "queue"])
Returns:
Dict containing the query body.
"""
if group_by is None:
group_by = ["wrapupcode"] # Default grouping to avoid massive flat lists
query_body = {
"dateRangeType": "historical",
"interval": "P1D", # Daily intervals are standard for historical
"groupBy": group_by,
"filterBy": {
"terms": [
{
"path": "conversation.type",
"operation": "in",
"values": ["voice", "chat"] # Filter for voice and chat conversations
}
]
},
"select": [
"conversation.id",
"conversation.type",
"conversation.startTime",
"conversation.endTime",
"conversation.totalHandleTime",
"participant.id",
"participant.type",
"participant.wrapupCode"
],
"order": [
{"field": "conversation.startTime", "direction": "asc"}
],
"pageSize": 1000, # Max recommended page size to avoid timeout
"pageNumber": 1
}
return query_body
Why pageSize matters: The Genesys Cloud API has a hard limit on response size. Setting pageSize to 1000 is a safe default. Increasing it to 5000 may cause 504 Gateway Timeout errors if the data density is high. Always start with 1000.
Step 3: Implementing Page-Based Pagination (Historical)
For historical data, you must loop through pages. The API returns a total count of records matching your filter. You calculate the total number of pages and iterate until you have fetched all data.
Important: The nextUri in historical responses is often a convenience link, but relying on pageNumber increments is more robust for programmatic control, especially if you need to resume a failed job.
import time
from genesyscloud.analytics.model.conversation_details_query_response import ConversationDetailsQueryResponse
def fetch_historical_analytics(page_client: PlatformClient, start_date: str, end_date: str) -> List[Dict]:
"""
Fetches all historical conversation details using page-based pagination.
Args:
page_client: The initialized PlatformClient.
start_date: Start of the date range.
end_date: End of the date range.
Returns:
List of conversation detail dictionaries.
"""
analytics_api = AnalyticsConversationsApi(page_client)
all_conversations = []
# Build the initial query
query_body = build_analytics_query(start_date, end_date)
try:
# Initial Request
response = analytics_api.post_analytics_conversations_details_query(
body=query_body
)
# Check if the response is valid
if not response:
print("No response received from API.")
return []
# Extract data from the first page
if response.conversations:
all_conversations.extend(response.conversations)
# Pagination Logic
total_records = response.total or 0
page_size = response.page_size or 1000
current_page = 1
print(f"Total records to fetch: {total_records}")
# Calculate total pages
total_pages = (total_records + page_size - 1) // page_size
# Loop through remaining pages
while current_page < total_pages:
current_page += 1
# Update the page number in the query body
query_body["pageNumber"] = current_page
# Retry logic for rate limiting (429)
max_retries = 3
for attempt in range(max_retries):
try:
response = analytics_api.post_analytics_conversations_details_query(
body=query_body
)
if response.conversations:
all_conversations.extend(response.conversations)
break # Success, move to next page
else:
print(f"Page {current_page} returned no data, but total indicates more data. Stopping.")
return all_conversations
except Exception as e:
if "429" in str(e) or "Too Many Requests" in str(e):
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limited (429). Retrying in {wait_time} seconds...")
time.sleep(wait_time)
if attempt == max_retries - 1:
raise e
else:
raise e
# Optional: Add a small delay to be polite to the API
time.sleep(0.5)
print(f"Successfully fetched {len(all_conversations)} conversations.")
except Exception as e:
print(f"Error fetching analytics data: {e}")
raise e
return all_conversations
Error Handling Explanation:
- 429 Too Many Requests: The analytics API is resource-intensive. If you hit the rate limit, the code above implements exponential backoff. This is critical for production scripts that run during peak hours.
- Empty Response: Sometimes
totalis non-zero, but a specific page returns no data due to backend partitioning. The code checks for this and stops gracefully.
Step 4: Implementing Cursor-Based Pagination (Real-Time/Near-Real-Time)
If you are querying realtime or nearRealtime data, the pageNumber field is ignored. Instead, you must follow the nextUri provided in the response. This URI contains an encoded cursor state.
def fetch_realtime_analytics(page_client: PlatformClient) -> List[Dict]:
"""
Fetches real-time conversation details using cursor-based pagination.
Note: Real-time data is only available for the last few hours depending on the environment.
"""
analytics_api = AnalyticsConversationsApi(page_client)
all_conversations = []
# Build query for realtime
query_body = {
"dateRangeType": "realtime",
"groupBy": ["wrapupcode"],
"filterBy": {
"terms": [
{
"path": "conversation.type",
"operation": "in",
"values": ["voice"]
}
]
},
"select": ["conversation.id", "conversation.startTime"],
"pageSize": 1000,
"pageNumber": 1 # Ignored in realtime, but required by schema
}
try:
next_uri = None
while True:
if next_uri:
# When a nextUri is present, we use the GET endpoint with the URI
# Note: The SDK does not have a direct method for nextUri POST follow-up,
# so we often fall back to requests or construct the call manually.
# However, for simplicity in this tutorial, we will simulate the loop
# using the POST endpoint with a cursor if supported, or break.
# In practice, for cursor pagination in Genesys SDKs, you often
# pass the nextUri to a specific 'get_with_uri' method or use raw HTTP.
# The Python SDK v2 does not have a built-in 'follow_uri' helper for Analytics.
# Therefore, we use the requests library directly for the cursor step.
import requests
headers = {
"Authorization": f"Bearer {page_client.oauth_client.access_token}",
"Content-Type": "application/json"
}
# The nextUri is a full URL. We GET it.
resp = requests.get(next_uri, headers=headers)
resp.raise_for_status()
data = resp.json()
if data.get("conversations"):
all_conversations.extend(data["conversations"])
next_uri = data.get("nextUri")
else:
# First page via SDK
response = analytics_api.post_analytics_conversations_details_query(body=query_body)
if response.conversations:
all_conversations.extend(response.conversations)
next_uri = response.next_uri
if not next_uri:
break # No more data
# Safety break to prevent infinite loops if API behavior changes
if len(all_conversations) > 10000:
print("Reached safety limit for demo purposes.")
break
except Exception as e:
print(f"Error in realtime fetch: {e}")
raise e
return all_conversations
Why Raw HTTP for Cursor?: The Genesys Cloud Python SDK is strongly typed. The nextUri in analytics responses is a dynamic string that points to a GET endpoint, while the initial query is a POST. The SDK does not have a generic “follow URI” method for the Analytics module. Using requests for the subsequent cursor steps is a pragmatic and common pattern in production code.
Complete Working Example
Below is a complete, runnable script that fetches historical voice conversations from the last 24 hours. It includes authentication, pagination, and error handling.
import os
import time
import sys
from datetime import datetime, timedelta, timezone
from genesyscloud.platform_client import PlatformClient
from genesyscloud.auth import OAuthClientCredentials
from genesyscloud.analytics.api.analytics_conversations_api import AnalyticsConversationsApi
def init_platform_client() -> PlatformClient:
"""Initializes the Genesys Cloud Platform Client."""
pc = PlatformClient()
auth_settings = {
'client_id': os.environ.get('GENESYS_CLIENT_ID'),
'client_secret': os.environ.get('GENESYS_CLIENT_SECRET'),
'environment': os.environ.get('GENESYS_ENVIRONMENT', 'mypurecloud.com')
}
oauth = OAuthClientCredentials(auth_settings)
pc.set_oauth_client(oauth)
return pc
def main():
# Check for environment variables
if not os.environ.get('GENESYS_CLIENT_ID') or not os.environ.get('GENESYS_CLIENT_SECRET'):
print("Error: GENESYS_CLIENT_ID and GENESYS_CLIENT_SECRET must be set.")
sys.exit(1)
pc = init_platform_client()
analytics_api = AnalyticsConversationsApi(pc)
# Define date range: Last 24 hours
end_time = datetime.now(timezone.utc)
start_time = end_time - timedelta(days=1)
start_date_str = start_time.strftime("%Y-%m-%dT%H:%M:%SZ")
end_date_str = end_time.strftime("%Y-%m-%dT%H:%M:%SZ")
print(f"Starting analytics fetch from {start_date_str} to {end_date_str}")
all_conversations = []
page_number = 1
page_size = 1000
# Initial Query Body
query_body = {
"dateRangeType": "historical",
"interval": "P1D",
"groupBy": ["wrapupcode"],
"filterBy": {
"terms": [
{
"path": "conversation.type",
"operation": "in",
"values": ["voice"]
}
]
},
"select": [
"conversation.id",
"conversation.type",
"conversation.startTime",
"conversation.endTime",
"participant.wrapupCode"
],
"order": [{"field": "conversation.startTime", "direction": "asc"}],
"pageSize": page_size,
"pageNumber": page_number
}
try:
while True:
print(f"Fetching page {page_number}...")
# Execute Query
response = analytics_api.post_analytics_conversations_details_query(body=query_body)
# Process Response
if response.conversations:
all_conversations.extend(response.conversations)
print(f" Retrieved {len(response.conversations)} conversations.")
else:
print(" No conversations found in this page.")
# Check for more pages
total_records = response.total or 0
if not total_records:
print("No total records reported. Stopping.")
break
# Calculate if we have more pages
# The API returns 'total' as the count of ALL records matching the query.
# We have fetched page_number * page_size records so far.
fetched_count = page_number * page_size
if fetched_count >= total_records:
print("All pages fetched.")
break
# Prepare for next page
page_number += 1
query_body["pageNumber"] = page_number
# Check if the API explicitly says there is no next page
if response.next_uri is None and fetched_count < total_records:
print("Warning: nextUri is None but total count suggests more data. Stopping to prevent infinite loop.")
break
# Rate Limiting Protection
time.sleep(0.5) # Small delay between requests
except Exception as e:
print(f"An error occurred: {e}")
# Here you would typically log to a file or monitoring service
sys.exit(1)
print(f"\nTotal conversations fetched: {len(all_conversations)}")
# Example: Print first 5 conversation IDs
if all_conversations:
print("\nSample Conversation IDs:")
for conv in all_conversations[:5]:
print(f" - {conv.id}")
if __name__ == "__main__":
main()
Common Errors & Debugging
Error: 403 Forbidden
- Cause: The OAuth token does not have the
analytics:conversation:readscope. - Fix: Go to the Genesys Cloud Admin Console > Platform > OAuth 2.0 Clients. Select your client and ensure the
analytics:conversation:readscope is checked. If you are using a user token, ensure the user has the “Analytics: View” permission.
Error: 429 Too Many Requests
- Cause: You are sending requests too frequently. The analytics API has strict rate limits, especially for historical queries which are computationally expensive.
- Fix: Implement exponential backoff. The code above includes a
time.sleep(0.5)and a retry loop with2 ** attemptdelays. Never ignore 429 errors; always wait and retry.
Error: 500 Internal Server Error or 504 Gateway Timeout
- Cause: The query is too complex. This often happens if
pageSizeis set too high (e.g., 5000+) or if you are selecting too many fields (select) across a large date range. - Fix: Reduce
pageSizeto 1000. Reduce the number of fields in theselectarray. Split the date range into smaller chunks (e.g., hourly instead of daily) if the data volume is massive.
Error: Empty Response with Non-Zero Total
- Cause: This is a known edge case in Genesys Cloud Analytics where backend data partitioning can cause a page to return empty even if
totalindicates more data. - Fix: The code above handles this by checking
if fetched_count >= total_records. If this occurs, you may need to adjust theintervalorgroupByparameters to change how the data is partitioned on the backend.