Mastering Genesys Cloud Analytics API Paging: pageSize, pageNumber, and Expansion
What You Will Build
- You will build a robust data extraction utility that iterates through paginated results from the Genesys Cloud Analytics API without hitting rate limits or missing data.
- This tutorial uses the Genesys Cloud Platform API v2, specifically the Analytics endpoints.
- The implementation is provided in Python using the official
genesys-cloud-sdkand rawrequestsfor comparative clarity.
Prerequisites
- OAuth Client Type: Client Credentials Grant.
- Required Scopes:
analytics:conversation:read,analytics:report:read. - SDK Version:
genesys-cloud-sdk>= 140.0.0 (Python). - Runtime: Python 3.9+.
- Dependencies:
pip install genesys-cloud-sdk requests httpx.
Authentication Setup
The Genesys Cloud Analytics API relies heavily on server-side processing. A single query can take seconds to minutes. If your authentication token expires during a long-running query or while fetching subsequent pages, the entire operation fails. You must implement token caching and automatic refresh.
The official SDK handles this automatically if configured correctly. For raw HTTP calls, you must manage the access_token lifecycle manually.
import os
from purecloudplatformclientv2 import (
Configuration,
ApiClient,
AnalyticsApi,
ConversationQuery
)
def get_analytics_api_instance() -> AnalyticsApi:
"""
Configures and returns an authenticated AnalyticsApi client.
Uses environment variables for credentials.
"""
configuration = Configuration()
configuration.host = "https://api.mypurecloud.com"
# The SDK handles token acquisition and refresh automatically
# when these environment variables are set.
configuration.client_id = os.environ.get("GENESYS_CLIENT_ID")
configuration.client_secret = os.environ.get("GENESYS_CLIENT_SECRET")
api_client = ApiClient(configuration)
return AnalyticsApi(api_client)
Implementation
Step 1: Understanding the Paging Model
Genesys Cloud Analytics uses a cursor-based paging model disguised as offset paging. You specify a pageSize and a pageNumber. The API returns a pageCount in the response header or body metadata.
However, there is a critical distinction between Query Execution and Result Retrieval:
- POST
/api/v2/analytics/conversations/details/query: This endpoint starts a query. It returns anid. This endpoint does not return data rows. It returns a status. - GET
/api/v2/analytics/conversations/details/query/{queryId}: This endpoint retrieves the results. This is where paging parameters apply.
Many developers make the mistake of passing pageSize to the POST endpoint. This has no effect on the result set size. You must pass paging parameters to the GET endpoint.
The pageSize Constraint
The maximum pageSize for most Analytics endpoints is 10,000. If you request more, the API returns a 400 Bad Request.
The pageCount Calculation
The API calculates pageCount based on the total number of matching records and the pageSize.
$$ \text{pageCount} = \lceil \frac{\text{totalRecords}}{\text{pageSize}} \rceil $$
You must fetch pages from 1 to pageCount. Note that Genesys Cloud uses 1-based indexing for pageNumber. Page 0 is invalid.
Step 2: Constructing the Query and Handling Asynchronous Execution
Before paging, you must submit the query. The response indicates whether the query is ready. If the query is still running, you must poll. If it is ready, you can begin paging.
from purecloudplatformclientv2 import ConversationQuery, ConversationQueryFilters
from datetime import datetime, timedelta
import time
def submit_and_wait_for_query(api: AnalyticsApi, query_body: ConversationQuery) -> str:
"""
Submits a query and polls until it is ready or fails.
Returns the query ID.
"""
# Submit the query
response = api.post_analytics_conversations_details_query(body=query_body)
query_id = response.id
status = response.status
print(f"Query submitted: {query_id}, Initial Status: {status}")
# Polling loop
max_wait_seconds = 300 # 5 minutes max wait
start_time = time.time()
while status not in ["ready", "failed", "error"]:
if time.time() - start_time > max_wait_seconds:
raise TimeoutError(f"Query {query_id} did not complete within {max_wait_seconds} seconds.")
time.sleep(2) # Wait 2 seconds between polls
poll_response = api.get_analytics_conversations_details_query(query_id=query_id)
status = poll_response.status
print(f"Polling status: {status}")
if status == "failed":
raise Exception(f"Query {query_id} failed: {poll_response.message}")
return query_id
Step 3: Iterating Through Pages Correctly
This is the core logic. You must read the pageCount from the first page of results. Then, loop from 1 to pageCount.
Critical Edge Case: If the total record count changes between pages (e.g., new data arrives), pageCount might increase. However, for historical analytics queries, the dataset is static once the query is marked “ready”. Therefore, reading pageCount from the first page is safe.
from purecloudplatformclientv2 import ConversationQueryResult
def fetch_all_pages(api: AnalyticsApi, query_id: str, page_size: int = 1000) -> list:
"""
Iterates through all pages of an analytics query result.
Args:
api: The AnalyticsApi instance.
query_id: The ID of the completed query.
page_size: Number of records per page. Max 10,000.
Returns:
A list of all conversation objects.
"""
all_results = []
# First, fetch page 1 to determine pageCount
# Note: pageNumber is 1-based
first_page_response = api.get_analytics_conversations_details_query_result(
query_id=query_id,
page_number=1,
page_size=page_size
)
# Extract metadata
page_count = first_page_response.page_count
if page_count is None or page_count == 0:
print("No pages found.")
return all_results
print(f"Total pages to fetch: {page_count}")
# Add results from page 1
if first_page_response.entities:
all_results.extend(first_page_response.entities)
# Fetch remaining pages
for page_num in range(2, page_count + 1):
try:
page_response = api.get_analytics_conversations_details_query_result(
query_id=query_id,
page_number=page_num,
page_size=page_size
)
if page_response.entities:
all_results.extend(page_response.entities)
print(f"Fetched page {page_num} of {page_count}")
# Optional: Add a small delay to be polite to the API
# This helps avoid 429s if you are running many queries in parallel
time.sleep(0.1)
except Exception as e:
print(f"Error fetching page {page_num}: {str(e)}")
# Decide whether to break or continue based on business logic
break
return all_results
Step 4: Handling Large Datasets and Rate Limits
If you are extracting millions of records, fetching them in Python lists will consume excessive memory. You should process records in batches or stream them to a file/database.
Additionally, Genesys Cloud imposes rate limits. For Analytics, the limit is typically around 100 requests per minute per user/client. If you are paging through 10,000 pages with pageSize=100, you will hit this limit quickly.
Strategy:
- Use the largest possible
pageSize(10,000) to minimize the number of HTTP requests. - Implement exponential backoff on
429 Too Many Requests.
import httpx
def fetch_page_with_retry(client: httpx.Client, url: str, headers: dict, max_retries: int = 5) -> dict:
"""
Fetches a page with exponential backoff for 429 errors.
"""
for attempt in range(max_retries):
response = client.get(url, headers=headers)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
print(f"Rate limited. Waiting {retry_after} seconds...")
time.sleep(retry_after)
else:
response.raise_for_status()
raise Exception("Max retries exceeded for 429 error.")
Complete Working Example
This script combines authentication, query submission, polling, and paginated retrieval into a single runnable module. It extracts the last 24 hours of conversation details.
import os
import time
from datetime import datetime, timedelta
from purecloudplatformclientv2 import (
Configuration,
ApiClient,
AnalyticsApi,
ConversationQuery,
ConversationQueryFilters,
ConversationQuerySorting
)
def main():
# 1. Setup Authentication
configuration = Configuration()
configuration.host = "https://api.mypurecloud.com"
configuration.client_id = os.environ.get("GENESYS_CLIENT_ID")
configuration.client_secret = os.environ.get("GENESYS_CLIENT_SECRET")
api_client = ApiClient(configuration)
analytics_api = AnalyticsApi(api_client)
# 2. Define Query Parameters
end_time = datetime.utcnow()
start_time = end_time - timedelta(hours=24)
# Define filters
query_filters = ConversationQueryFilters(
interval=f"{start_time.isoformat()}Z/{end_time.isoformat()}Z",
group_by=["conversationId"],
include_counts=True
)
# Define query body
query_body = ConversationQuery(
filters=query_filters,
size=1000, # This 'size' is ignored by the POST endpoint for paging purposes,
# but some endpoints use it for preview. For details/query, use paging on GET.
)
print("Submitting query...")
# 3. Submit and Wait
try:
query_id = submit_and_wait_for_query(analytics_api, query_body)
print(f"Query ready: {query_id}")
except Exception as e:
print(f"Failed to complete query: {e}")
return
# 4. Fetch All Pages
# Use max page size to minimize API calls
PAGE_SIZE = 10000
try:
all_conversations = fetch_all_pages(analytics_api, query_id, page_size=PAGE_SIZE)
print(f"Total conversations retrieved: {len(all_conversations)}")
# Example: Process first 5 records
for conv in all_conversations[:5]:
print(f"Conversation ID: {conv.conversation_id}, Type: {conv.type}")
except Exception as e:
print(f"Error fetching pages: {e}")
def submit_and_wait_for_query(api: AnalyticsApi, query_body: ConversationQuery) -> str:
response = api.post_analytics_conversations_details_query(body=query_body)
query_id = response.id
status = response.status
max_wait_seconds = 300
start_time = time.time()
while status not in ["ready", "failed", "error"]:
if time.time() - start_time > max_wait_seconds:
raise TimeoutError("Query timed out.")
time.sleep(2)
poll_response = api.get_analytics_conversations_details_query(query_id=query_id)
status = poll_response.status
if status == "failed":
raise Exception(f"Query failed: {poll_response.message}")
return query_id
def fetch_all_pages(api: AnalyticsApi, query_id: str, page_size: int = 1000) -> list:
all_results = []
# Fetch first page to get pageCount
first_page = api.get_analytics_conversations_details_query_result(
query_id=query_id,
page_number=1,
page_size=page_size
)
page_count = first_page.page_count
if not page_count:
return all_results
if first_page.entities:
all_results.extend(first_page.entities)
for page_num in range(2, page_count + 1):
page_response = api.get_analytics_conversations_details_query_result(
query_id=query_id,
page_number=page_num,
page_size=page_size
)
if page_response.entities:
all_results.extend(page_response.entities)
print(f"Processed page {page_num}/{page_count}")
return all_results
if __name__ == "__main__":
main()
Common Errors & Debugging
Error: 400 Bad Request - “Page size exceeds maximum”
- Cause: You set
pageSizegreater than 10,000. - Fix: Cap
pageSizeat 10,000. If you need more data, increase the number of pages, not the page size.
Error: 404 Not Found - “Query not found”
- Cause: The query ID is invalid, or the query has expired. Analytics query results are temporary. They typically expire after 24 hours.
- Fix: Ensure you are using a query ID from a recent submission. Do not store query IDs in long-term storage. Re-submit the query if the ID is expired.
Error: 401 Unauthorized - “Token expired”
- Cause: The OAuth token expired during a long polling interval or paging loop.
- Fix: The
genesys-cloud-sdkhandles this automatically if you use theApiClientcorrectly. If using rawrequests, ensure you refresh the token before every request or after a 401 response.
Error: 504 Gateway Timeout
- Cause: The query is taking too long to execute, and the polling request timed out.
- Fix: Increase the timeout on your HTTP client. For the SDK, you can configure the timeout in
Configuration. For raw requests, increase thetimeoutparameter inrequests.get().
Error: Missing Data in Final Page
- Cause: You stopped paging before
pageCount. - Fix: Ensure your loop runs from
1topageCountinclusive. Remember thatpageCountis an integer ceiling oftotal / pageSize.