Mastering Genesys Cloud Analytics API Pagination: pageSize, pageNumber, and Total
What You Will Build
- A Python script that retrieves conversation detail records from the Genesys Cloud Analytics API, correctly handling pagination to fetch all available data without hitting rate limits.
- This tutorial uses the Genesys Cloud Python SDK (
purecloudplatformclientv2) and the raw REST API viahttpxto demonstrate both approaches. - The programming language covered is Python 3.8+.
Prerequisites
- OAuth Client Type: Service Account or Public/Private Client credentials.
- Required Scopes:
analytics:conversation:readandanalytics:report:read. - SDK Version:
purecloudplatformclientv2>= 180.0.0. - Runtime Requirements: Python 3.8 or higher.
- External Dependencies:
httpxfor async HTTP requests in the raw API example.purecloudplatformclientv2for the SDK example.python-dotenvfor secure credential management.
Install dependencies via pip:
pip install purecloudplatformclientv2 httpx python-dotenv
Authentication Setup
Genesys Cloud uses OAuth 2.0 for authentication. For server-side integrations, the Client Credentials Grant is the standard flow. You must obtain an access token before making any API calls. The token expires after 30 minutes, so your application must handle refresh logic or re-authentication.
Below is a robust helper function to acquire a token using httpx. This function includes basic error handling for 4xx and 5xx responses.
import httpx
import os
from dotenv import load_dotenv
load_dotenv()
GENESYS_CLIENT_ID = os.getenv("GENESYS_CLIENT_ID")
GENESYS_CLIENT_SECRET = os.getenv("GENESYS_CLIENT_SECRET")
GENESYS_ENVIRONMENT = os.getenv("GENESYS_ENVIRONMENT", "https://api.mypurecloud.com")
async def get_access_token() -> str:
"""
Acquires an OAuth2 access token using Client Credentials Grant.
"""
url = f"{GENESYS_ENVIRONMENT}/oauth/token"
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
data = {
"grant_type": "client_credentials",
"client_id": GENESYS_CLIENT_ID,
"client_secret": GENESYS_CLIENT_SECRET
}
async with httpx.AsyncClient() as client:
try:
response = await client.post(url, headers=headers, data=data)
response.raise_for_status()
token_data = response.json()
return token_data["access_token"]
except httpx.HTTPStatusError as e:
print(f"Authentication failed: {e.response.status_code} - {e.response.text}")
raise
except Exception as e:
print(f"An error occurred during authentication: {e}")
raise
Implementation
Step 1: Understanding the Pagination Model
The Genesys Cloud Analytics API (specifically /api/v2/analytics/conversations/details/query) uses a cursor-based pagination model that is exposed through standard query parameters: pageSize and pageNumber.
Key constraints:
- pageSize: The number of records per page. Maximum is 1000 for most analytics endpoints. Default is usually 25 or 50.
- pageNumber: The 1-based index of the page.
- total: The total number of records matching the query.
- pageCount: The total number of pages available.
A common mistake is assuming pageCount is static. It is calculated as ceil(total / pageSize). If you change pageSize, pageCount changes. You must always rely on the pageCount returned in the response header or body to determine when to stop iterating.
Step 2: Raw API Implementation with httpx
This section demonstrates fetching data using raw HTTP requests. This approach gives you full control over headers and retry logic.
The endpoint requires a POST body with the query definition. The response body contains the entities array and pagination metadata.
import httpx
import asyncio
async def fetch_conversations_raw(token: str, start_time: str, end_time: str) -> list:
"""
Fetches all conversation details using raw HTTP requests and pagination logic.
Args:
token: Valid OAuth2 access token.
start_time: ISO 8601 start time (e.g., "2023-10-01T00:00:00.000Z").
end_time: ISO 8601 end time (e.g., "2023-10-01T23:59:59.999Z").
Returns:
A list of conversation detail dictionaries.
"""
url = f"{GENESYS_ENVIRONMENT}/api/v2/analytics/conversations/details/query"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
# Define the query payload
query_body = {
"dateRange": {
"startDate": start_time,
"endDate": end_time
},
"groupBy": ["conversation.type"],
"filter": [
{
"dimension": "conversation.type",
"op": "in",
"values": ["voice"]
}
],
"metrics": [
"talkTime",
"waitTime",
"holdTime"
]
}
all_conversations = []
page_number = 1
page_size = 1000 # Max recommended size for performance
total_records = 0
async with httpx.AsyncClient() as client:
while True:
try:
# Make the request with pagination parameters
response = await client.post(
url,
headers=headers,
json=query_body,
params={
"pageSize": page_size,
"pageNumber": page_number
},
timeout=60.0 # Analytics queries can take time
)
# Handle specific status codes
if response.status_code == 401:
print("Token expired. Please refresh.")
break
elif response.status_code == 403:
print("Insufficient permissions. Check scopes.")
break
elif response.status_code == 429:
print("Rate limited. Implement exponential backoff.")
await asyncio.sleep(5)
continue
else:
response.raise_for_status()
data = response.json()
# Extract entities
entities = data.get("entities", [])
if not entities:
print("No more entities found.")
break
all_conversations.extend(entities)
# Update pagination state
total_records = data.get("total", 0)
page_count = data.get("pageCount", 0)
print(f"Fetched page {page_number} of {page_count}. Total records so far: {len(all_conversations)}")
# Stop if we have fetched all pages
if page_number >= page_count:
break
# Increment for next iteration
page_number += 1
except httpx.HTTPStatusError as e:
print(f"HTTP error occurred: {e.response.status_code} - {e.response.text}")
break
except Exception as e:
print(f"Unexpected error: {e}")
break
return all_conversations
Critical Explanation:
- The
while Trueloop continues untilpage_number >= page_count. - We check
data.get("entities", [])because if the query returns zero results, the list is empty, and we break immediately. - We use
response.raise_for_status()to catch unexpected 5xx errors, but we explicitly handle 401, 403, and 429 to provide actionable feedback.
Step 3: SDK Implementation with purecloudplatformclientv2
The SDK abstracts away the manual pagination loop by providing a with_pagination helper or by allowing you to inspect the response object’s metadata. However, for full control over batching and memory usage, manually iterating through pages is often preferred in production scripts.
First, initialize the client:
from purecloudplatformclientv2 import (
PlatformClient,
AnalyticsApi,
ConversationDetailsQuery,
DateRange,
Filter,
Metric
)
def init_sdk_client() -> PlatformClient:
"""
Initializes the Genesys Cloud SDK client with environment variables.
"""
client = PlatformClient()
client.set_environment(GENESYS_ENVIRONMENT.replace("https://", "").replace("api.", ""))
client.set_client_id(GENESYS_CLIENT_ID)
client.set_client_secret(GENESYS_CLIENT_SECRET)
return client
Now, implement the paginated fetch using the SDK:
async def fetch_conversations_sdk() -> list:
"""
Fetches all conversation details using the SDK with manual pagination control.
"""
client = init_sdk_client()
analytics_api = AnalyticsApi(client)
# Construct the query object
query = ConversationDetailsQuery(
date_range=DateRange(
start_date="2023-10-01T00:00:00.000Z",
end_date="2023-10-01T23:59:59.999Z"
),
group_by=["conversation.type"],
filter=[
Filter(
dimension="conversation.type",
op="in",
values=["voice"]
)
],
metrics=["talkTime", "waitTime", "holdTime"]
)
all_conversations = []
page_number = 1
page_size = 1000
total_pages = 1
while page_number <= total_pages:
try:
# Call the API with pagination parameters
# Note: The SDK method 'post_analytics_conversations_details_query' returns a ConversationDetailsResponse
response = analytics_api.post_analytics_conversations_details_query(
body=query,
page_size=page_size,
page_number=page_number
)
# Check if the response has entities
if response.entities:
all_conversations.extend(response.entities)
else:
break
# Update pagination state from the response object
total_pages = response.page_count
print(f"SDK Page {page_number} of {total_pages}. Total records: {len(all_conversations)}")
page_number += 1
except Exception as e:
print(f"SDK Error on page {page_number}: {e}")
break
return all_conversations
Why Manual Pagination in SDK?
While the SDK provides with_pagination, it often loads all data into memory at once. For large datasets (e.g., millions of conversation details), this will cause an OutOfMemoryError. By manually controlling page_number, you can process each batch (e.g., write to a file or database) before fetching the next, keeping memory usage constant.
Complete Working Example
Below is the complete, runnable Python script that combines authentication, raw API pagination, and error handling. It uses httpx for non-blocking I/O, which is crucial when dealing with potentially slow analytics queries.
import os
import asyncio
import httpx
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
GENESYS_CLIENT_ID = os.getenv("GENESYS_CLIENT_ID")
GENESYS_CLIENT_SECRET = os.getenv("GENESYS_CLIENT_SECRET")
GENESYS_ENVIRONMENT = os.getenv("GENESYS_ENVIRONMENT", "https://api.mypurecloud.com")
async def get_access_token() -> str:
url = f"{GENESYS_ENVIRONMENT}/oauth/token"
headers = {"Content-Type": "application/x-www-form-urlencoded"}
data = {
"grant_type": "client_credentials",
"client_id": GENESYS_CLIENT_ID,
"client_secret": GENESYS_CLIENT_SECRET
}
async with httpx.AsyncClient() as client:
try:
response = await client.post(url, headers=headers, data=data)
response.raise_for_status()
return response.json()["access_token"]
except httpx.HTTPStatusError as e:
print(f"Auth failed: {e.response.status_code}")
raise
async def fetch_all_conversations(token: str) -> list:
url = f"{GENESYS_ENVIRONMENT}/api/v2/analytics/conversations/details/query"
headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}
query_body = {
"dateRange": {
"startDate": "2023-10-01T00:00:00.000Z",
"endDate": "2023-10-01T23:59:59.999Z"
},
"groupBy": ["conversation.type"],
"filter": [{"dimension": "conversation.type", "op": "in", "values": ["voice"]}],
"metrics": ["talkTime", "waitTime"]
}
all_data = []
page = 1
page_size = 1000
total_pages = 1
async with httpx.AsyncClient() as client:
while page <= total_pages:
try:
resp = await client.post(
url,
headers=headers,
json=query_body,
params={"pageSize": page_size, "pageNumber": page},
timeout=60.0
)
if resp.status_code == 429:
print("Rate limited. Waiting 5s...")
await asyncio.sleep(5)
continue
resp.raise_for_status()
data = resp.json()
entities = data.get("entities", [])
if not entities:
break
all_data.extend(entities)
total_pages = data.get("pageCount", 1)
print(f"Processed page {page}/{total_pages}. Total items: {len(all_data)}")
page += 1
except httpx.HTTPStatusError as e:
print(f"HTTP Error {e.response.status_code}: {e.response.text}")
break
except Exception as e:
print(f"Error: {e}")
break
return all_data
async def main():
try:
token = await get_access_token()
conversations = await fetch_all_conversations(token)
print(f"\nTotal conversations fetched: {len(conversations)}")
if conversations:
print(f"Sample record: {conversations[0]}")
except Exception as e:
print(f"Main execution failed: {e}")
if __name__ == "__main__":
asyncio.run(main())
Common Errors & Debugging
Error: 429 Too Many Requests
- Cause: You are exceeding the rate limit for the Analytics API. Analytics endpoints have stricter rate limits than other Genesys Cloud APIs.
- Fix: Implement exponential backoff. Do not retry immediately. Start with a 1-second delay, doubling it with each subsequent 429.
- Code Fix:
if resp.status_code == 429: retry_after = int(resp.headers.get("Retry-After", 5)) await asyncio.sleep(retry_after) continue
Error: 400 Bad Request - “Invalid pageSize”
- Cause: The
pageSizeparameter exceeds the maximum allowed value for the specific endpoint. For/analytics/conversations/details/query, the max is 1000. - Fix: Cap your
pageSizeat 1000. - Code Fix:
page_size = min(requested_page_size, 1000)
Error: 403 Forbidden - “Insufficient Scopes”
- Cause: The OAuth token does not include
analytics:conversation:read. - Fix: Ensure your client credentials in Genesys Cloud Admin have the correct scopes assigned. Re-generate the token after updating scopes.
Error: Empty Entities List
- Cause: The query filters are too restrictive, or the date range contains no data.
- Fix: Verify the
startDateandendDateare in the past (analytics data is not real-time; there is a latency of up to 15-30 minutes). Check thefilterlogic to ensure it matches existing data types.