Mastering Genesys Cloud Analytics API Paging: pageSize, pageNumber, and Expansion

Mastering Genesys Cloud Analytics API Paging: pageSize, pageNumber, and Expansion

What You Will Build

  • You will build a robust data extraction utility that iterates through paginated results from the Genesys Cloud Analytics API without hitting rate limits or missing data.
  • This tutorial uses the Genesys Cloud Platform API v2, specifically the Analytics endpoints.
  • The implementation is provided in Python using the official genesys-cloud-sdk and raw requests for comparative clarity.

Prerequisites

  • OAuth Client Type: Client Credentials Grant.
  • Required Scopes: analytics:conversation:read, analytics:report:read.
  • SDK Version: genesys-cloud-sdk >= 140.0.0 (Python).
  • Runtime: Python 3.9+.
  • Dependencies: pip install genesys-cloud-sdk requests httpx.

Authentication Setup

The Genesys Cloud Analytics API relies heavily on server-side processing. A single query can take seconds to minutes. If your authentication token expires during a long-running query or while fetching subsequent pages, the entire operation fails. You must implement token caching and automatic refresh.

The official SDK handles this automatically if configured correctly. For raw HTTP calls, you must manage the access_token lifecycle manually.

import os
from purecloudplatformclientv2 import (
    Configuration,
    ApiClient,
    AnalyticsApi,
    ConversationQuery
)

def get_analytics_api_instance() -> AnalyticsApi:
    """
    Configures and returns an authenticated AnalyticsApi client.
    Uses environment variables for credentials.
    """
    configuration = Configuration()
    configuration.host = "https://api.mypurecloud.com"
    
    # The SDK handles token acquisition and refresh automatically
    # when these environment variables are set.
    configuration.client_id = os.environ.get("GENESYS_CLIENT_ID")
    configuration.client_secret = os.environ.get("GENESYS_CLIENT_SECRET")
    
    api_client = ApiClient(configuration)
    return AnalyticsApi(api_client)

Implementation

Step 1: Understanding the Paging Model

Genesys Cloud Analytics uses a cursor-based paging model disguised as offset paging. You specify a pageSize and a pageNumber. The API returns a pageCount in the response header or body metadata.

However, there is a critical distinction between Query Execution and Result Retrieval:

  1. POST /api/v2/analytics/conversations/details/query: This endpoint starts a query. It returns an id. This endpoint does not return data rows. It returns a status.
  2. GET /api/v2/analytics/conversations/details/query/{queryId}: This endpoint retrieves the results. This is where paging parameters apply.

Many developers make the mistake of passing pageSize to the POST endpoint. This has no effect on the result set size. You must pass paging parameters to the GET endpoint.

The pageSize Constraint

The maximum pageSize for most Analytics endpoints is 10,000. If you request more, the API returns a 400 Bad Request.

The pageCount Calculation

The API calculates pageCount based on the total number of matching records and the pageSize.
$$ \text{pageCount} = \lceil \frac{\text{totalRecords}}{\text{pageSize}} \rceil $$

You must fetch pages from 1 to pageCount. Note that Genesys Cloud uses 1-based indexing for pageNumber. Page 0 is invalid.

Step 2: Constructing the Query and Handling Asynchronous Execution

Before paging, you must submit the query. The response indicates whether the query is ready. If the query is still running, you must poll. If it is ready, you can begin paging.

from purecloudplatformclientv2 import ConversationQuery, ConversationQueryFilters
from datetime import datetime, timedelta
import time

def submit_and_wait_for_query(api: AnalyticsApi, query_body: ConversationQuery) -> str:
    """
    Submits a query and polls until it is ready or fails.
    Returns the query ID.
    """
    # Submit the query
    response = api.post_analytics_conversations_details_query(body=query_body)
    
    query_id = response.id
    status = response.status
    
    print(f"Query submitted: {query_id}, Initial Status: {status}")
    
    # Polling loop
    max_wait_seconds = 300  # 5 minutes max wait
    start_time = time.time()
    
    while status not in ["ready", "failed", "error"]:
        if time.time() - start_time > max_wait_seconds:
            raise TimeoutError(f"Query {query_id} did not complete within {max_wait_seconds} seconds.")
        
        time.sleep(2)  # Wait 2 seconds between polls
        
        poll_response = api.get_analytics_conversations_details_query(query_id=query_id)
        status = poll_response.status
        print(f"Polling status: {status}")
        
        if status == "failed":
            raise Exception(f"Query {query_id} failed: {poll_response.message}")
            
    return query_id

Step 3: Iterating Through Pages Correctly

This is the core logic. You must read the pageCount from the first page of results. Then, loop from 1 to pageCount.

Critical Edge Case: If the total record count changes between pages (e.g., new data arrives), pageCount might increase. However, for historical analytics queries, the dataset is static once the query is marked “ready”. Therefore, reading pageCount from the first page is safe.

from purecloudplatformclientv2 import ConversationQueryResult

def fetch_all_pages(api: AnalyticsApi, query_id: str, page_size: int = 1000) -> list:
    """
    Iterates through all pages of an analytics query result.
    
    Args:
        api: The AnalyticsApi instance.
        query_id: The ID of the completed query.
        page_size: Number of records per page. Max 10,000.
        
    Returns:
        A list of all conversation objects.
    """
    all_results = []
    
    # First, fetch page 1 to determine pageCount
    # Note: pageNumber is 1-based
    first_page_response = api.get_analytics_conversations_details_query_result(
        query_id=query_id,
        page_number=1,
        page_size=page_size
    )
    
    # Extract metadata
    page_count = first_page_response.page_count
    
    if page_count is None or page_count == 0:
        print("No pages found.")
        return all_results
        
    print(f"Total pages to fetch: {page_count}")
    
    # Add results from page 1
    if first_page_response.entities:
        all_results.extend(first_page_response.entities)
        
    # Fetch remaining pages
    for page_num in range(2, page_count + 1):
        try:
            page_response = api.get_analytics_conversations_details_query_result(
                query_id=query_id,
                page_number=page_num,
                page_size=page_size
            )
            
            if page_response.entities:
                all_results.extend(page_response.entities)
                
            print(f"Fetched page {page_num} of {page_count}")
            
            # Optional: Add a small delay to be polite to the API
            # This helps avoid 429s if you are running many queries in parallel
            time.sleep(0.1)
            
        except Exception as e:
            print(f"Error fetching page {page_num}: {str(e)}")
            # Decide whether to break or continue based on business logic
            break
            
    return all_results

Step 4: Handling Large Datasets and Rate Limits

If you are extracting millions of records, fetching them in Python lists will consume excessive memory. You should process records in batches or stream them to a file/database.

Additionally, Genesys Cloud imposes rate limits. For Analytics, the limit is typically around 100 requests per minute per user/client. If you are paging through 10,000 pages with pageSize=100, you will hit this limit quickly.

Strategy:

  1. Use the largest possible pageSize (10,000) to minimize the number of HTTP requests.
  2. Implement exponential backoff on 429 Too Many Requests.
import httpx

def fetch_page_with_retry(client: httpx.Client, url: str, headers: dict, max_retries: int = 5) -> dict:
    """
    Fetches a page with exponential backoff for 429 errors.
    """
    for attempt in range(max_retries):
        response = client.get(url, headers=headers)
        
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
            print(f"Rate limited. Waiting {retry_after} seconds...")
            time.sleep(retry_after)
        else:
            response.raise_for_status()
            
    raise Exception("Max retries exceeded for 429 error.")

Complete Working Example

This script combines authentication, query submission, polling, and paginated retrieval into a single runnable module. It extracts the last 24 hours of conversation details.

import os
import time
from datetime import datetime, timedelta
from purecloudplatformclientv2 import (
    Configuration,
    ApiClient,
    AnalyticsApi,
    ConversationQuery,
    ConversationQueryFilters,
    ConversationQuerySorting
)

def main():
    # 1. Setup Authentication
    configuration = Configuration()
    configuration.host = "https://api.mypurecloud.com"
    configuration.client_id = os.environ.get("GENESYS_CLIENT_ID")
    configuration.client_secret = os.environ.get("GENESYS_CLIENT_SECRET")
    
    api_client = ApiClient(configuration)
    analytics_api = AnalyticsApi(api_client)
    
    # 2. Define Query Parameters
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(hours=24)
    
    # Define filters
    query_filters = ConversationQueryFilters(
        interval=f"{start_time.isoformat()}Z/{end_time.isoformat()}Z",
        group_by=["conversationId"],
        include_counts=True
    )
    
    # Define query body
    query_body = ConversationQuery(
        filters=query_filters,
        size=1000, # This 'size' is ignored by the POST endpoint for paging purposes, 
                   # but some endpoints use it for preview. For details/query, use paging on GET.
    )
    
    print("Submitting query...")
    
    # 3. Submit and Wait
    try:
        query_id = submit_and_wait_for_query(analytics_api, query_body)
        print(f"Query ready: {query_id}")
    except Exception as e:
        print(f"Failed to complete query: {e}")
        return

    # 4. Fetch All Pages
    # Use max page size to minimize API calls
    PAGE_SIZE = 10000 
    
    try:
        all_conversations = fetch_all_pages(analytics_api, query_id, page_size=PAGE_SIZE)
        print(f"Total conversations retrieved: {len(all_conversations)}")
        
        # Example: Process first 5 records
        for conv in all_conversations[:5]:
            print(f"Conversation ID: {conv.conversation_id}, Type: {conv.type}")
            
    except Exception as e:
        print(f"Error fetching pages: {e}")

def submit_and_wait_for_query(api: AnalyticsApi, query_body: ConversationQuery) -> str:
    response = api.post_analytics_conversations_details_query(body=query_body)
    query_id = response.id
    status = response.status
    
    max_wait_seconds = 300
    start_time = time.time()
    
    while status not in ["ready", "failed", "error"]:
        if time.time() - start_time > max_wait_seconds:
            raise TimeoutError("Query timed out.")
        time.sleep(2)
        poll_response = api.get_analytics_conversations_details_query(query_id=query_id)
        status = poll_response.status
        
        if status == "failed":
            raise Exception(f"Query failed: {poll_response.message}")
            
    return query_id

def fetch_all_pages(api: AnalyticsApi, query_id: str, page_size: int = 1000) -> list:
    all_results = []
    
    # Fetch first page to get pageCount
    first_page = api.get_analytics_conversations_details_query_result(
        query_id=query_id,
        page_number=1,
        page_size=page_size
    )
    
    page_count = first_page.page_count
    if not page_count:
        return all_results
        
    if first_page.entities:
        all_results.extend(first_page.entities)
        
    for page_num in range(2, page_count + 1):
        page_response = api.get_analytics_conversations_details_query_result(
            query_id=query_id,
            page_number=page_num,
            page_size=page_size
        )
        if page_response.entities:
            all_results.extend(page_response.entities)
        print(f"Processed page {page_num}/{page_count}")
        
    return all_results

if __name__ == "__main__":
    main()

Common Errors & Debugging

Error: 400 Bad Request - “Page size exceeds maximum”

  • Cause: You set pageSize greater than 10,000.
  • Fix: Cap pageSize at 10,000. If you need more data, increase the number of pages, not the page size.

Error: 404 Not Found - “Query not found”

  • Cause: The query ID is invalid, or the query has expired. Analytics query results are temporary. They typically expire after 24 hours.
  • Fix: Ensure you are using a query ID from a recent submission. Do not store query IDs in long-term storage. Re-submit the query if the ID is expired.

Error: 401 Unauthorized - “Token expired”

  • Cause: The OAuth token expired during a long polling interval or paging loop.
  • Fix: The genesys-cloud-sdk handles this automatically if you use the ApiClient correctly. If using raw requests, ensure you refresh the token before every request or after a 401 response.

Error: 504 Gateway Timeout

  • Cause: The query is taking too long to execute, and the polling request timed out.
  • Fix: Increase the timeout on your HTTP client. For the SDK, you can configure the timeout in Configuration. For raw requests, increase the timeout parameter in requests.get().

Error: Missing Data in Final Page

  • Cause: You stopped paging before pageCount.
  • Fix: Ensure your loop runs from 1 to pageCount inclusive. Remember that pageCount is an integer ceiling of total / pageSize.

Official References