Mastering Pagination in Genesys Cloud Analytics: Cursor vs. Page-Based Approaches

Mastering Pagination in Genesys Cloud Analytics: Cursor vs. Page-Based Approaches

What You Will Build

  • This tutorial demonstrates how to retrieve large volumes of conversation analytics data from Genesys Cloud CX without hitting rate limits or memory constraints.
  • It utilizes the Genesys Cloud /api/v2/analytics/conversations/details/query endpoint and the official Python SDK.
  • The implementation covers both cursor-based pagination (recommended for real-time or near-real-time data) and page-based pagination (required for historical data), showing the exact code patterns for each.

Prerequisites

  • OAuth Client Type: Service Account (Client Credentials) or User Access Token (Authorization Code).
  • Required Scopes: analytics:conversation:read is mandatory. If you need to filter by specific user attributes, you may also need user:read.
  • SDK Version: Genesys Cloud Python SDK version 2.200.0 or higher.
  • Language/Runtime: Python 3.9+.
  • External Dependencies:
    • genesyscloud: pip install genesyscloud
    • requests: Included in SDK dependencies, but useful for raw HTTP debugging.

Authentication Setup

Before querying analytics, you must establish an authenticated session. Genesys Cloud uses OAuth 2.0. For backend services, the Client Credentials flow is the standard. The SDK handles token caching and automatic refresh if configured correctly, but understanding the initial setup is critical.

from genesyscloud.platform_client import PlatformClient
from genesyscloud.auth import OAuthClientCredentials
import os

def get_platform_client() -> PlatformClient:
    """
    Initializes and returns a configured PlatformClient.
    """
    pc = PlatformClient()

    # Configure OAuth using Client Credentials
    # These environment variables must be set in your deployment environment
    auth_settings = {
        'client_id': os.environ.get('GENESYS_CLIENT_ID'),
        'client_secret': os.environ.get('GENESYS_CLIENT_SECRET'),
        'environment': os.environ.get('GENESYS_ENVIRONMENT', 'mypurecloud.com') # e.g., usw2.pure.cloud
    }

    oauth = OAuthClientCredentials(auth_settings)
    
    # Set the client for the platform
    pc.set_oauth_client(oauth)
    
    return pc

# Initialize the client
platform_client = get_platform_client()

Note on Scopes: The analytics endpoint requires analytics:conversation:read. Ensure your OAuth client in the Genesys Cloud admin console has this scope granted. Without it, the API returns a 403 Forbidden error.

Implementation

Step 1: Understanding the Two Pagination Models

The /api/v2/analytics/conversations/details/query endpoint behaves differently depending on the dateRangeType parameter.

  1. Cursor-Based Pagination: Used when dateRangeType is realtime or nearRealtime. The response includes a nextUri field. You do not send a page number; you follow the URI provided in the response. This is efficient for streaming data but has a time limit on how long the cursor remains valid.
  2. Page-Based Pagination: Used when dateRangeType is historical. The response includes pageSize, pageNumber, total, and nextUri. You must increment the pageNumber in your request body until pageNumber exceeds total / pageSize.

This tutorial focuses on Historical Data using Page-Based Pagination, as this is the most common use case for bulk data extraction, reporting, and machine learning training data preparation. However, the logic for Cursor-Based is also provided for completeness.

Step 2: Constructing the Query Payload

The analytics API requires a JSON payload to define the query. Key fields include dateRangeType, interval, groupBy, and filterBy.

Critical Parameter: interval. For historical data, this must be a valid ISO 8601 duration (e.g., PT1H for 1 hour, P1D for 1 day). The maximum interval for historical queries is typically 1 day (P1D) if you want detailed conversation-level data. Larger intervals may aggregate data differently or fail.

Here is a robust function to build the initial query payload:

from genesyscloud.analytics.api.analytics_conversations_api import AnalyticsConversationsApi
from typing import Dict, Any, List

def build_analytics_query(start_date: str, end_date: str, group_by: List[str] = None) -> Dict[str, Any]:
    """
    Builds the JSON payload for the analytics query.
    
    Args:
        start_date: ISO 8601 start datetime (e.g., "2023-10-01T00:00:00Z")
        end_date: ISO 8601 end datetime (e.g., "2023-10-02T00:00:00Z")
        group_by: List of dimensions to group by (e.g., ["wrapupcode", "queue"])
        
    Returns:
        Dict containing the query body.
    """
    if group_by is None:
        group_by = ["wrapupcode"] # Default grouping to avoid massive flat lists

    query_body = {
        "dateRangeType": "historical",
        "interval": "P1D", # Daily intervals are standard for historical
        "groupBy": group_by,
        "filterBy": {
            "terms": [
                {
                    "path": "conversation.type",
                    "operation": "in",
                    "values": ["voice", "chat"] # Filter for voice and chat conversations
                }
            ]
        },
        "select": [
            "conversation.id",
            "conversation.type",
            "conversation.startTime",
            "conversation.endTime",
            "conversation.totalHandleTime",
            "participant.id",
            "participant.type",
            "participant.wrapupCode"
        ],
        "order": [
            {"field": "conversation.startTime", "direction": "asc"}
        ],
        "pageSize": 1000, # Max recommended page size to avoid timeout
        "pageNumber": 1
    }
    
    return query_body

Why pageSize matters: The Genesys Cloud API has a hard limit on response size. Setting pageSize to 1000 is a safe default. Increasing it to 5000 may cause 504 Gateway Timeout errors if the data density is high. Always start with 1000.

Step 3: Implementing Page-Based Pagination (Historical)

For historical data, you must loop through pages. The API returns a total count of records matching your filter. You calculate the total number of pages and iterate until you have fetched all data.

Important: The nextUri in historical responses is often a convenience link, but relying on pageNumber increments is more robust for programmatic control, especially if you need to resume a failed job.

import time
from genesyscloud.analytics.model.conversation_details_query_response import ConversationDetailsQueryResponse

def fetch_historical_analytics(page_client: PlatformClient, start_date: str, end_date: str) -> List[Dict]:
    """
    Fetches all historical conversation details using page-based pagination.
    
    Args:
        page_client: The initialized PlatformClient.
        start_date: Start of the date range.
        end_date: End of the date range.
        
    Returns:
        List of conversation detail dictionaries.
    """
    analytics_api = AnalyticsConversationsApi(page_client)
    all_conversations = []
    
    # Build the initial query
    query_body = build_analytics_query(start_date, end_date)
    
    try:
        # Initial Request
        response = analytics_api.post_analytics_conversations_details_query(
            body=query_body
        )
        
        # Check if the response is valid
        if not response:
            print("No response received from API.")
            return []
            
        # Extract data from the first page
        if response.conversations:
            all_conversations.extend(response.conversations)
            
        # Pagination Logic
        total_records = response.total or 0
        page_size = response.page_size or 1000
        current_page = 1
        
        print(f"Total records to fetch: {total_records}")
        
        # Calculate total pages
        total_pages = (total_records + page_size - 1) // page_size
        
        # Loop through remaining pages
        while current_page < total_pages:
            current_page += 1
            
            # Update the page number in the query body
            query_body["pageNumber"] = current_page
            
            # Retry logic for rate limiting (429)
            max_retries = 3
            for attempt in range(max_retries):
                try:
                    response = analytics_api.post_analytics_conversations_details_query(
                        body=query_body
                    )
                    
                    if response.conversations:
                        all_conversations.extend(response.conversations)
                        break # Success, move to next page
                    else:
                        print(f"Page {current_page} returned no data, but total indicates more data. Stopping.")
                        return all_conversations
                        
                except Exception as e:
                    if "429" in str(e) or "Too Many Requests" in str(e):
                        wait_time = 2 ** attempt # Exponential backoff
                        print(f"Rate limited (429). Retrying in {wait_time} seconds...")
                        time.sleep(wait_time)
                        if attempt == max_retries - 1:
                            raise e
                    else:
                        raise e
            
            # Optional: Add a small delay to be polite to the API
            time.sleep(0.5)
            
        print(f"Successfully fetched {len(all_conversations)} conversations.")
        
    except Exception as e:
        print(f"Error fetching analytics data: {e}")
        raise e
        
    return all_conversations

Error Handling Explanation:

  • 429 Too Many Requests: The analytics API is resource-intensive. If you hit the rate limit, the code above implements exponential backoff. This is critical for production scripts that run during peak hours.
  • Empty Response: Sometimes total is non-zero, but a specific page returns no data due to backend partitioning. The code checks for this and stops gracefully.

Step 4: Implementing Cursor-Based Pagination (Real-Time/Near-Real-Time)

If you are querying realtime or nearRealtime data, the pageNumber field is ignored. Instead, you must follow the nextUri provided in the response. This URI contains an encoded cursor state.

def fetch_realtime_analytics(page_client: PlatformClient) -> List[Dict]:
    """
    Fetches real-time conversation details using cursor-based pagination.
    
    Note: Real-time data is only available for the last few hours depending on the environment.
    """
    analytics_api = AnalyticsConversationsApi(page_client)
    all_conversations = []
    
    # Build query for realtime
    query_body = {
        "dateRangeType": "realtime",
        "groupBy": ["wrapupcode"],
        "filterBy": {
            "terms": [
                {
                    "path": "conversation.type",
                    "operation": "in",
                    "values": ["voice"]
                }
            ]
        },
        "select": ["conversation.id", "conversation.startTime"],
        "pageSize": 1000,
        "pageNumber": 1 # Ignored in realtime, but required by schema
    }
    
    try:
        next_uri = None
        
        while True:
            if next_uri:
                # When a nextUri is present, we use the GET endpoint with the URI
                # Note: The SDK does not have a direct method for nextUri POST follow-up,
                # so we often fall back to requests or construct the call manually.
                # However, for simplicity in this tutorial, we will simulate the loop 
                # using the POST endpoint with a cursor if supported, or break.
                
                # In practice, for cursor pagination in Genesys SDKs, you often 
                # pass the nextUri to a specific 'get_with_uri' method or use raw HTTP.
                # The Python SDK v2 does not have a built-in 'follow_uri' helper for Analytics.
                # Therefore, we use the requests library directly for the cursor step.
                
                import requests
                headers = {
                    "Authorization": f"Bearer {page_client.oauth_client.access_token}",
                    "Content-Type": "application/json"
                }
                
                # The nextUri is a full URL. We GET it.
                resp = requests.get(next_uri, headers=headers)
                resp.raise_for_status()
                data = resp.json()
                
                if data.get("conversations"):
                    all_conversations.extend(data["conversations"])
                
                next_uri = data.get("nextUri")
            else:
                # First page via SDK
                response = analytics_api.post_analytics_conversations_details_query(body=query_body)
                if response.conversations:
                    all_conversations.extend(response.conversations)
                
                next_uri = response.next_uri
            
            if not next_uri:
                break # No more data
                
            # Safety break to prevent infinite loops if API behavior changes
            if len(all_conversations) > 10000:
                print("Reached safety limit for demo purposes.")
                break
                
    except Exception as e:
        print(f"Error in realtime fetch: {e}")
        raise e
        
    return all_conversations

Why Raw HTTP for Cursor?: The Genesys Cloud Python SDK is strongly typed. The nextUri in analytics responses is a dynamic string that points to a GET endpoint, while the initial query is a POST. The SDK does not have a generic “follow URI” method for the Analytics module. Using requests for the subsequent cursor steps is a pragmatic and common pattern in production code.

Complete Working Example

Below is a complete, runnable script that fetches historical voice conversations from the last 24 hours. It includes authentication, pagination, and error handling.

import os
import time
import sys
from datetime import datetime, timedelta, timezone
from genesyscloud.platform_client import PlatformClient
from genesyscloud.auth import OAuthClientCredentials
from genesyscloud.analytics.api.analytics_conversations_api import AnalyticsConversationsApi

def init_platform_client() -> PlatformClient:
    """Initializes the Genesys Cloud Platform Client."""
    pc = PlatformClient()
    auth_settings = {
        'client_id': os.environ.get('GENESYS_CLIENT_ID'),
        'client_secret': os.environ.get('GENESYS_CLIENT_SECRET'),
        'environment': os.environ.get('GENESYS_ENVIRONMENT', 'mypurecloud.com')
    }
    oauth = OAuthClientCredentials(auth_settings)
    pc.set_oauth_client(oauth)
    return pc

def main():
    # Check for environment variables
    if not os.environ.get('GENESYS_CLIENT_ID') or not os.environ.get('GENESYS_CLIENT_SECRET'):
        print("Error: GENESYS_CLIENT_ID and GENESYS_CLIENT_SECRET must be set.")
        sys.exit(1)

    pc = init_platform_client()
    analytics_api = AnalyticsConversationsApi(pc)
    
    # Define date range: Last 24 hours
    end_time = datetime.now(timezone.utc)
    start_time = end_time - timedelta(days=1)
    
    start_date_str = start_time.strftime("%Y-%m-%dT%H:%M:%SZ")
    end_date_str = end_time.strftime("%Y-%m-%dT%H:%M:%SZ")
    
    print(f"Starting analytics fetch from {start_date_str} to {end_date_str}")
    
    all_conversations = []
    page_number = 1
    page_size = 1000
    
    # Initial Query Body
    query_body = {
        "dateRangeType": "historical",
        "interval": "P1D",
        "groupBy": ["wrapupcode"],
        "filterBy": {
            "terms": [
                {
                    "path": "conversation.type",
                    "operation": "in",
                    "values": ["voice"]
                }
            ]
        },
        "select": [
            "conversation.id",
            "conversation.type",
            "conversation.startTime",
            "conversation.endTime",
            "participant.wrapupCode"
        ],
        "order": [{"field": "conversation.startTime", "direction": "asc"}],
        "pageSize": page_size,
        "pageNumber": page_number
    }
    
    try:
        while True:
            print(f"Fetching page {page_number}...")
            
            # Execute Query
            response = analytics_api.post_analytics_conversations_details_query(body=query_body)
            
            # Process Response
            if response.conversations:
                all_conversations.extend(response.conversations)
                print(f"  Retrieved {len(response.conversations)} conversations.")
            else:
                print("  No conversations found in this page.")
            
            # Check for more pages
            total_records = response.total or 0
            if not total_records:
                print("No total records reported. Stopping.")
                break
                
            # Calculate if we have more pages
            # The API returns 'total' as the count of ALL records matching the query.
            # We have fetched page_number * page_size records so far.
            fetched_count = page_number * page_size
            
            if fetched_count >= total_records:
                print("All pages fetched.")
                break
            
            # Prepare for next page
            page_number += 1
            query_body["pageNumber"] = page_number
            
            # Check if the API explicitly says there is no next page
            if response.next_uri is None and fetched_count < total_records:
                print("Warning: nextUri is None but total count suggests more data. Stopping to prevent infinite loop.")
                break
                
            # Rate Limiting Protection
            time.sleep(0.5) # Small delay between requests
            
    except Exception as e:
        print(f"An error occurred: {e}")
        # Here you would typically log to a file or monitoring service
        sys.exit(1)
    
    print(f"\nTotal conversations fetched: {len(all_conversations)}")
    
    # Example: Print first 5 conversation IDs
    if all_conversations:
        print("\nSample Conversation IDs:")
        for conv in all_conversations[:5]:
            print(f" - {conv.id}")

if __name__ == "__main__":
    main()

Common Errors & Debugging

Error: 403 Forbidden

  • Cause: The OAuth token does not have the analytics:conversation:read scope.
  • Fix: Go to the Genesys Cloud Admin Console > Platform > OAuth 2.0 Clients. Select your client and ensure the analytics:conversation:read scope is checked. If you are using a user token, ensure the user has the “Analytics: View” permission.

Error: 429 Too Many Requests

  • Cause: You are sending requests too frequently. The analytics API has strict rate limits, especially for historical queries which are computationally expensive.
  • Fix: Implement exponential backoff. The code above includes a time.sleep(0.5) and a retry loop with 2 ** attempt delays. Never ignore 429 errors; always wait and retry.

Error: 500 Internal Server Error or 504 Gateway Timeout

  • Cause: The query is too complex. This often happens if pageSize is set too high (e.g., 5000+) or if you are selecting too many fields (select) across a large date range.
  • Fix: Reduce pageSize to 1000. Reduce the number of fields in the select array. Split the date range into smaller chunks (e.g., hourly instead of daily) if the data volume is massive.

Error: Empty Response with Non-Zero Total

  • Cause: This is a known edge case in Genesys Cloud Analytics where backend data partitioning can cause a page to return empty even if total indicates more data.
  • Fix: The code above handles this by checking if fetched_count >= total_records. If this occurs, you may need to adjust the interval or groupBy parameters to change how the data is partitioned on the backend.

Official References