Using the Analytics API paging object correctly — pageSize, pageNumber, and pageCount

Using the Analytics API paging object correctly — pageSize, pageNumber, and pageCount

What You Will Build

  • A Python script that queries the Genesys Cloud Analytics API to retrieve conversation details in batches without hitting rate limits or truncating data.
  • Logic that dynamically calculates the total number of pages based on the pageCount response field and iterates until all data is fetched.
  • A complete example using the requests library to handle pagination, error retries, and JSON payload processing.

Prerequisites

  • OAuth Client: A Genesys Cloud API Client (Service Account or Authorization Code Grant) with the scope analytics:conversation:read and analytics:report:read.
  • SDK/Library: Python 3.8+ with requests (v2.28+) and pydantic (optional, for type validation).
  • Environment: Access to a Genesys Cloud organization with recorded conversation data.
  • Dependencies:
    pip install requests python-dateutil
    

Authentication Setup

Before interacting with the Analytics API, you must obtain a valid OAuth 2.0 access token. The Analytics endpoints are strict about token validity and expiration. If your token expires mid-pagination loop, the subsequent requests will fail with 401 Unauthorized.

The following function handles the token acquisition using the Client Credentials flow, which is ideal for server-to-server integrations.

import requests
import time
from datetime import datetime, timedelta

# Configuration
GENESYS_BASE_URL = "https://api.mypurecloud.com"
CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"

def get_access_token() -> dict:
    """
    Retrieves an OAuth 2.0 access token from Genesys Cloud.
    
    Returns:
        dict: Token response containing 'access_token' and 'expires_in'.
    """
    token_url = f"{GENESYS_BASE_URL}/oauth/token"
    payload = {
        "grant_type": "client_credentials",
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET
    }
    
    headers = {
        "Content-Type": "application/x-www-form-urlencoded"
    }
    
    response = requests.post(token_url, data=payload, headers=headers)
    
    if response.status_code != 200:
        raise Exception(f"Failed to get token: {response.status_code} - {response.text}")
        
    return response.json()

# Example usage
token_data = get_access_token()
access_token = token_data['access_token']
expires_in = token_data['expires_in']
token_expiry = datetime.now() + timedelta(seconds=expires_in)

Note on Token Refresh: In a production pagination loop, check datetime.now() < token_expiry before each API call. If the token is near expiration, refresh it immediately to avoid 401 errors during data retrieval.

Implementation

Step 1: Understanding the Analytics Paging Object

The Genesys Cloud Analytics API (specifically the POST /api/v2/analytics/conversations/details/query endpoint) uses a specific paging object in the request body. Unlike simple GET endpoints that use page and pageSize query parameters, this endpoint requires a paging object inside the JSON payload.

The paging object contains three critical fields:

  1. pageSize: The maximum number of records to return per page (max 1000 for most analytics endpoints).
  2. pageNumber: The current page you are requesting (1-based index).
  3. pageCount: The total number of pages available for this query. This is calculated by the server based on your pageSize and the total number of matching records.

Crucial Insight: You do not set pageCount in the request. You set pageSize and pageNumber. The server returns the actual pageCount in the response. This allows you to determine how many iterations your loop needs.

Step 2: Constructing the Initial Query

To fetch conversation details, you must define a time range and a filter. The paging object must be included in the root of the JSON body.

OAuth Scope Required: analytics:conversation:read

import json

def build_query_payload(page_number: int, page_size: int = 1000) -> dict:
    """
    Constructs the JSON payload for the Analytics Conversation Details Query.
    
    Args:
        page_number (int): The current page number (1-based).
        page_size (int): Number of records per page (max 1000).
        
    Returns:
        dict: The JSON payload ready for POST request.
    """
    # Define the time range (last 24 hours)
    now = datetime.utcnow()
    start_time = (now - timedelta(hours=24)).isoformat() + "Z"
    end_time = now.isoformat() + "Z"
    
    payload = {
        "interval": f"{start_time}/{end_time}",
        "groupBy": ["conversationId"],
        "paging": {
            "pageSize": page_size,
            "pageNumber": page_number
        },
        "filters": {
            "from": {
                "type": "any",
                "items": [
                    {"type": "any", "items": [{"type": "string", "value": "webchat"}]}
                ]
            }
        },
        "select": ["conversationId", "channelType", "direction", "startTime", "endTime", "wrapUpCode"]
    }
    
    return payload

# Example payload for page 1
payload_page_1 = build_query_payload(page_number=1, page_size=500)
print(json.dumps(payload_page_1, indent=2))

Expected Response Structure:
The response will contain a data array and a paging object. The paging object in the response confirms the pageSize, the pageNumber, and provides the pageCount.

{
  "paging": {
    "pageSize": 500,
    "pageNumber": 1,
    "pageCount": 12
  },
  "data": [
    {
      "conversationId": "12345678-1234-1234-1234-123456789012",
      "channelType": "webchat",
      "direction": "inbound",
      "startTime": "2023-10-27T10:00:00.000Z",
      "endTime": "2023-10-27T10:05:00.000Z",
      "wrapUpCode": "Resolved"
    },
    ...
  ]
}

Step 3: Implementing the Pagination Loop

The core logic involves:

  1. Fetching Page 1.
  2. Extracting pageCount from the response.
  3. Looping from pageNumber = 2 to pageCount.
  4. Handling rate limits (429) and transient errors (5xx).
import requests
import time
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class AnalyticsPaginator:
    def __init__(self, base_url: str, access_token: str):
        self.base_url = base_url
        self.access_token = access_token
        self.headers = {
            "Authorization": f"Bearer {access_token}",
            "Content-Type": "application/json",
            "Accept": "application/json"
        }
        self.endpoint = f"{base_url}/api/v2/analytics/conversations/details/query"

    def fetch_page(self, payload: dict, retries: int = 3) -> dict:
        """
        Fetches a single page of analytics data with retry logic for 429/5xx errors.
        """
        for attempt in range(retries):
            try:
                response = requests.post(self.endpoint, json=payload, headers=self.headers)
                
                if response.status_code == 200:
                    return response.json()
                
                elif response.status_code == 429:
                    # Rate limited. Wait for Retry-After header or default backoff.
                    retry_after = int(response.headers.get("Retry-After", 5))
                    logger.warning(f"Rate limited (429). Waiting {retry_after}s before retry.")
                    time.sleep(retry_after)
                    continue
                
                elif response.status_code in [500, 502, 503, 504]:
                    # Server error. Exponential backoff.
                    wait_time = 2 ** attempt
                    logger.warning(f"Server error {response.status_code}. Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                    continue
                
                else:
                    # Client error (400, 401, 403) or unexpected status. Do not retry.
                    logger.error(f"Request failed with status {response.status_code}: {response.text}")
                    raise Exception(f"API Error: {response.status_code} - {response.text}")
                    
            except requests.exceptions.RequestException as e:
                logger.error(f"Network error: {e}")
                raise

        raise Exception("Max retries exceeded.")

    def fetch_all_conversations(self, page_size: int = 1000) -> list:
        """
        Iterates through all pages of conversation data.
        """
        all_data = []
        current_page = 1
        total_pages = 1 # Start with assumption of 1 page
        
        while current_page <= total_pages:
            logger.info(f"Fetching page {current_page}...")
            
            # Build payload for current page
            payload = build_query_payload(page_number=current_page, page_size=page_size)
            
            # Fetch data
            response_data = self.fetch_page(payload)
            
            # Extract paging info from response
            paging_info = response_data.get("paging", {})
            total_pages = paging_info.get("pageCount", 1)
            
            # Append data
            page_data = response_data.get("data", [])
            all_data.extend(page_data)
            
            logger.info(f"Retrieved {len(page_data)} records. Total pages: {total_pages}")
            
            # Increment page
            current_page += 1
            
            # Optional: Small delay between pages to be polite to the API
            if current_page <= total_pages:
                time.sleep(0.5)

        return all_data

# Usage
paginator = AnalyticsPaginator(GENESYS_BASE_URL, access_token)
try:
    conversations = paginator.fetch_all_conversations(page_size=500)
    print(f"Total conversations fetched: {len(conversations)}")
except Exception as e:
    print(f"Error fetching data: {e}")

Complete Working Example

This script combines authentication, payload construction, and pagination into a single runnable module.

import requests
import json
import time
from datetime import datetime, timedelta
import logging

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# --- Configuration ---
GENESYS_BASE_URL = "https://api.mypurecloud.com"
CLIENT_ID = "YOUR_CLIENT_ID"
CLIENT_SECRET = "YOUR_CLIENT_SECRET"
MAX_PAGE_SIZE = 1000

# --- Authentication ---
def get_access_token() -> str:
    token_url = f"{GENESYS_BASE_URL}/oauth/token"
    payload = {
        "grant_type": "client_credentials",
        "client_id": CLIENT_ID,
        "client_secret": CLIENT_SECRET
    }
    headers = {"Content-Type": "application/x-www-form-urlencoded"}
    
    response = requests.post(token_url, data=payload, headers=headers)
    if response.status_code != 200:
        raise Exception(f"Auth failed: {response.status_code} - {response.text}")
    return response.json()['access_token']

# --- Payload Construction ---
def build_query_payload(page_number: int, page_size: int) -> dict:
    now = datetime.utcnow()
    start_time = (now - timedelta(hours=24)).isoformat() + "Z"
    end_time = now.isoformat() + "Z"
    
    return {
        "interval": f"{start_time}/{end_time}",
        "groupBy": ["conversationId"],
        "paging": {
            "pageSize": page_size,
            "pageNumber": page_number
        },
        "filters": {
            "from": {
                "type": "any",
                "items": [
                    {"type": "any", "items": [{"type": "string", "value": "voice"}]}
                ]
            }
        },
        "select": ["conversationId", "channelType", "startTime", "endTime", "duration"]
    }

# --- Core Logic ---
def fetch_analytics_data(access_token: str, page_size: int = 500) -> list:
    endpoint = f"{GENESYS_BASE_URL}/api/v2/anversations/details/query"
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Content-Type": "application/json"
    }
    
    all_records = []
    current_page = 1
    total_pages = 1
    
    while current_page <= total_pages:
        logger.info(f"Requesting page {current_page}...")
        payload = build_query_payload(current_page, page_size)
        
        response = requests.post(endpoint, json=payload, headers=headers)
        
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            logger.warning(f"Rate limited. Waiting {retry_after}s.")
            time.sleep(retry_after)
            continue
        elif response.status_code != 200:
            logger.error(f"Error {response.status_code}: {response.text}")
            break
            
        resp_json = response.json()
        paging = resp_json.get("paging", {})
        total_pages = paging.get("pageCount", 1)
        data = resp_json.get("data", [])
        
        all_records.extend(data)
        logger.info(f"Page {current_page}: {len(data)} records. Total pages: {total_pages}")
        
        current_page += 1
        if current_page <= total_pages:
            time.sleep(0.2) # Rate limiting courtesy
            
    return all_records

# --- Main Execution ---
if __name__ == "__main__":
    try:
        token = get_access_token()
        records = fetch_analytics_data(token, page_size=500)
        print(f"\nSuccess! Fetched {len(records)} conversation records.")
        
        # Sample output of first record
        if records:
            print("Sample record:", json.dumps(records[0], indent=2))
            
    except Exception as e:
        logger.error(f"Fatal error: {e}")

Common Errors & Debugging

Error: 400 Bad Request - “Invalid paging object”

Cause: The paging object is missing pageSize or pageNumber, or pageSize exceeds the maximum allowed (usually 1000 for analytics details).
Fix: Ensure pageSize is between 1 and 1000. Ensure pageNumber is a positive integer.

# Correct
"paging": { "pageSize": 100, "pageNumber": 1 }

# Incorrect (PageSize too large)
"paging": { "pageSize": 5000, "pageNumber": 1 }

Error: 401 Unauthorized - “Token expired”

Cause: The access token expired during a long pagination loop.
Fix: Implement token refresh logic. Check the expires_in field from the initial token response. If datetime.now() > token_expiry, call get_access_token() again before the next request.

Error: 429 Too Many Requests

Cause: You are sending requests too quickly. Genesys Cloud has strict rate limits for Analytics APIs.
Fix: Implement exponential backoff. Check the Retry-After header in the response.

if response.status_code == 429:
    retry_after = int(response.headers.get("Retry-After", 10))
    time.sleep(retry_after)

Error: 500 Internal Server Error

Cause: Transient server issue or malformed JSON in the request body.
Fix: Validate your JSON payload using a JSON linter. Implement retry logic for 5xx errors, as they are often temporary.

Official References