Mastering Analytics API Pagination: pageSize, pageNumber, and pageCount
What You Will Build
- You will build a robust data extraction script that iterates through paginated results from the Genesys Cloud Analytics API to retrieve complete conversation detail records.
- This tutorial uses the Genesys Cloud CX REST API (
/api/v2/analytics/conversations/details/query) and the official Python SDK. - The code examples are written in Python 3.9+ using the
genesys-cloudSDK.
Prerequisites
- OAuth Client: A Genesys Cloud OAuth client with the
analytics:conversation:readscope. - SDK Version:
genesys-cloudPython SDK v1.0.0 or later. - Runtime: Python 3.9 or higher.
- Dependencies: Install the SDK via pip:
pip install genesys-cloud - Environment Variables: You must have
GENESYS_CLIENT_ID,GENESYS_CLIENT_SECRET, andGENESYS_REGION(e.g.,us-east-1) set in your environment.
Authentication Setup
Genesys Cloud uses OAuth 2.0 for authentication. The Python SDK handles the token acquisition and refresh automatically when you initialize the PlatformClient. However, understanding the underlying flow is critical for debugging 401 Unauthorized errors.
The SDK uses the Client Credentials Grant flow. You initialize the client with your ID, Secret, and Region. The SDK caches the access token and refreshes it before expiration.
import os
from genesyscloud.platform.client import PlatformClient
from genesyscloud.platform.client.exceptions import ApiClientException
def get_platform_client() -> PlatformClient:
"""
Initializes and returns a configured Genesys Cloud PlatformClient.
"""
client_id = os.getenv("GENESYS_CLIENT_ID")
client_secret = os.getenv("GENESYS_CLIENT_SECRET")
region = os.getenv("GENESYS_REGION", "us-east-1")
if not client_id or not client_secret:
raise ValueError("GENESYS_CLIENT_ID and GENESYS_CLIENT_SECRET must be set.")
try:
# The SDK handles OAuth token acquisition and caching internally
platform_client = PlatformClient(
client_id=client_id,
client_secret=client_secret,
region=region
)
return platform_client
except ApiClientException as e:
print(f"Failed to initialize platform client: {e}")
raise
Implementation
Step 1: Understanding the Analytics Query Structure
The Analytics API in Genesys Cloud does not use simple offset-based pagination for most endpoints. Instead, it uses a cursor-based or page-count-based model depending on the specific endpoint. For conversations/details/query, the API returns a pageSize (number of items per page), a pageNumber (the current page index, 1-based), and a pageCount (total number of pages available for this query).
Critical Concept: pageCount is calculated based on the total number of records matching your query filter and the requested pageSize. If you request pageSize=100 and there are 500 records, pageCount will be 5.
Required Scope: analytics:conversation:read
First, we define the query body. This body filters the data. Without a proper filter, the API may return empty results or hit rate limits if the dataset is too large.
from datetime import datetime, timedelta
from genesyscloud.analytics.models import ConversationDetailsQuery
def build_query_body(start_date: str, end_date: str) -> dict:
"""
Constructs the request body for the analytics conversation details query.
Args:
start_date: ISO 8601 start date string (e.g., "2023-10-01T00:00:00.000Z")
end_date: ISO 8601 end date string (e.g., "2023-10-02T00:00:00.000Z")
Returns:
A dictionary representing the JSON body for the API request.
"""
query_body = {
"interval": f"{start_date}/{end_date}",
"pageSize": 100,
"view": "default",
"filter": {
"type": "AND",
"clauses": [
{
"type": "EQ",
"field": "mediaType",
"values": ["voice"]
}
]
}
}
return query_body
Step 2: Handling the Pagination Loop
The most common mistake developers make is assuming pageCount is static or infinite. In Genesys Cloud, pageCount is returned in the response header or body. For the conversation/details/query endpoint, the response body contains a pageSize, pageNumber, and pageCount.
The Logic:
- Request Page 1.
- Check
response.pageCount. - If
current_page < pageCount, incrementcurrent_pageand repeat. - If
current_page >= pageCount, stop.
Error Handling:
- 429 Too Many Requests: The Analytics API is heavily rate-limited. You must implement exponential backoff.
- 400 Bad Request: Usually indicates an invalid date range or malformed filter.
import time
from genesyscloud.platform.client.exceptions import ApiClientException
def fetch_all_conversations(platform_client: PlatformClient, query_body: dict) -> list:
"""
Iterates through all pages of conversation details.
Args:
platform_client: An authenticated PlatformClient instance.
query_body: The query body dictionary.
Returns:
A list of all conversation detail objects.
"""
all_conversations = []
current_page = 1
max_retries = 3
base_delay = 2 # seconds
# Extract pageSize from the query body to ensure consistency
page_size = query_body.get("pageSize", 100)
while True:
# Update the query body with the current page number
# Note: The SDK expects pageNumber to be passed in the body for this specific endpoint
query_body["pageNumber"] = current_page
print(f"Fetching page {current_page} (size: {page_size})...")
try:
# Call the API
# Endpoint: POST /api/v2/analytics/conversations/details/query
response = platform_client.analytics.post_analytics_conversations_details_query(
body=query_body
)
# Append results
if response.entities and len(response.entities) > 0:
all_conversations.extend(response.entities)
print(f"Retrieved {len(response.entities)} records. Total so far: {len(all_conversations)}")
else:
print("No more records found.")
break
# Check pagination metadata
# response.pageCount is the total number of pages available
if response.pageCount is not None and current_page >= response.pageCount:
print(f"Reached last page ({current_page}/{response.pageCount}).")
break
# Increment page
current_page += 1
# Polite delay to avoid hitting rate limits aggressively
# Even if not rate-limited, a small delay helps stabilize the connection
time.sleep(0.5)
except ApiClientException as e:
status_code = e.status if hasattr(e, 'status') else 500
if status_code == 429:
print(f"Rate limited (429). Retrying in {base_delay * (2 ** (max_retries - 1))} seconds...")
time.sleep(base_delay * (2 ** (max_retries - 1)))
continue # Retry the same page
elif status_code == 400:
print(f"Bad Request (400). Check your query body. Error: {e.body}")
break # Stop on bad request, as retrying will fail
else:
print(f"API Error ({status_code}): {e.body}")
raise
return all_conversations
Step 3: Processing and Validating Results
Once the data is retrieved, you must validate that the pagination completed correctly. A common edge case is when pageCount returns 0 but entities are not empty (rare, but possible in cached responses) or when pageCount is 1 but no entities are returned (empty result set).
def process_results(conversations: list) -> None:
"""
Processes the retrieved conversation data.
"""
if not conversations:
print("No conversations found for the specified criteria.")
return
print(f"\nProcessing {len(conversations)} conversations...")
# Example aggregation: Count conversations by wrap-up code
wrap_up_counts = {}
for conv in conversations:
# conv is a ConversationDetail object
# Accessing attributes safely
if hasattr(conv, 'wrapUpCode') and conv.wrapUpCode:
code = conv.wrapUpCode
wrap_up_counts[code] = wrap_up_counts.get(code, 0) + 1
else:
wrap_up_counts['None'] = wrap_up_counts.get('None', 0) + 1
print("\nWrap-up Code Distribution:")
for code, count in sorted(wrap_up_counts.items(), key=lambda x: x[1], reverse=True):
print(f" {code}: {count}")
Complete Working Example
This script combines all components into a single executable module. It initializes the client, builds the query, fetches all pages with retry logic, and processes the results.
import os
import sys
import time
from datetime import datetime, timedelta
from genesyscloud.platform.client import PlatformClient
from genesyscloud.platform.client.exceptions import ApiClientException
def get_platform_client() -> PlatformClient:
client_id = os.getenv("GENESYS_CLIENT_ID")
client_secret = os.getenv("GENESYS_CLIENT_SECRET")
region = os.getenv("GENESYS_REGION", "us-east-1")
if not client_id or not client_secret:
raise ValueError("GENESYS_CLIENT_ID and GENESYS_CLIENT_SECRET must be set.")
try:
return PlatformClient(
client_id=client_id,
client_secret=client_secret,
region=region
)
except ApiClientException as e:
print(f"Failed to initialize platform client: {e}")
raise
def build_query_body(start_date: str, end_date: str, page_size: int = 100) -> dict:
return {
"interval": f"{start_date}/{end_date}",
"pageSize": page_size,
"view": "default",
"filter": {
"type": "AND",
"clauses": [
{
"type": "EQ",
"field": "mediaType",
"values": ["voice"]
}
]
}
}
def fetch_all_conversations(platform_client: PlatformClient, query_body: dict) -> list:
all_conversations = []
current_page = 1
max_retries = 3
base_delay = 2
page_size = query_body.get("pageSize", 100)
while True:
query_body["pageNumber"] = current_page
print(f"Fetching page {current_page} (size: {page_size})...")
try:
response = platform_client.analytics.post_analytics_conversations_details_query(
body=query_body
)
if response.entities and len(response.entities) > 0:
all_conversations.extend(response.entities)
print(f"Retrieved {len(response.entities)} records. Total so far: {len(all_conversations)}")
else:
print("No more records found.")
break
if response.pageCount is not None and current_page >= response.pageCount:
print(f"Reached last page ({current_page}/{response.pageCount}).")
break
current_page += 1
time.sleep(0.5)
except ApiClientException as e:
status_code = e.status if hasattr(e, 'status') else 500
if status_code == 429:
wait_time = base_delay * (2 ** (max_retries - 1))
print(f"Rate limited (429). Retrying in {wait_time} seconds...")
time.sleep(wait_time)
continue
elif status_code == 400:
print(f"Bad Request (400). Check your query body. Error: {e.body}")
break
else:
print(f"API Error ({status_code}): {e.body}")
raise
return all_conversations
def main():
# Define date range: Last 24 hours
end_date = datetime.utcnow()
start_date = end_date - timedelta(days=1)
# Format to ISO 8601 with Z suffix for UTC
start_str = start_date.strftime("%Y-%m-%dT%H:%M:%S.000Z")
end_str = end_date.strftime("%Y-%m-%dT%H:%M:%S.000Z")
print(f"Querying analytics from {start_str} to {end_str}")
platform_client = get_platform_client()
query_body = build_query_body(start_str, end_str, page_size=100)
try:
conversations = fetch_all_conversations(platform_client, query_body)
print(f"\nTotal conversations fetched: {len(conversations)}")
# Simple processing example
if conversations:
print("Sample Conversation ID:", conversations[0].conversationId if hasattr(conversations[0], 'conversationId') else "N/A")
except Exception as e:
print(f"Fatal error: {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Common Errors & Debugging
Error: 429 Too Many Requests
- What causes it: The Analytics API has strict rate limits. If you request pages too quickly, or if your
pageSizeis too large causing the server to work harder, you will be throttled. - How to fix it: Implement exponential backoff. Never retry immediately. Start with a 2-second delay and double it on each subsequent 429 for the same request.
- Code Showing the Fix:
if status_code == 429: # Exponential backoff: 2s, 4s, 8s, etc. wait_time = base_delay * (2 ** retry_count) time.sleep(wait_time) retry_count += 1
Error: 400 Bad Request - Invalid Interval
- What causes it: The
intervalfield in the query body is malformed or the date range exceeds the retention policy (usually 12 months for detail data). - How to fix it: Ensure your dates are in ISO 8601 format (
YYYY-MM-DDTHH:MM:SS.000Z). Ensure the start date is before the end date. - Debugging Tip: Print the exact
intervalstring being sent to the API to verify formatting.
Error: Empty entities but pageCount > 0
- What causes it: This is rare but can happen if the data is still being indexed or if there is a transient server-side issue.
- How to fix it: Add a check: if
pageCount > current_pagebutentitiesis empty, wait 5 seconds and retry the same page. If this happens 3 times, break the loop to prevent an infinite hang. - Code Showing the Fix:
if not response.entities and response.pageCount > current_page: print("Empty page but more pages expected. Retrying...") time.sleep(5) continue # Retry same page
Error: pageCount is None
- What causes it: Some older analytics endpoints or specific views may not return
pageCountin the body. Instead, they may rely on the absence ofentitiesto signal completion. - How to fix it: Always check if
response.pageCountisNone. If it is, switch your termination condition toif not response.entities: break. - Code Showing the Fix:
if response.pageCount is None: # Fallback logic for endpoints that do not support pageCount if not response.entities: break else: if current_page >= response.pageCount: break