Retrieving the Full Conversation Transcript via Genesys Cloud Speech Analytics API

Retrieving the Full Conversation Transcript via Genesys Cloud Speech Analytics API

What You Will Build

  • A Python script that queries the Genesys Cloud Analytics API for voice conversations, retrieves their associated speech analytics data, and extracts the verbatim text transcript.
  • This tutorial uses the Genesys Cloud CX REST API (/api/v2/analytics/conversations/voice/details/query) and the Speech Analytics API (/api/v2/analytics/conversations/voice/details/{conversationId}/speech).
  • The implementation is written in Python 3.9+ using the requests library for HTTP communication.

Prerequisites

  • OAuth Client: A Genesys Cloud OAuth 2.0 client with the following scopes:
    • analytics:conversation:view (to query conversation details)
    • analytics:speech:view (to access speech analytics data)
    • user:login (optional, if using user context)
  • Environment: Python 3.9 or higher.
  • Dependencies:
    • requests (HTTP client)
    • pyjwt (optional, for token parsing/debugging)
  • Data Requirement: Voice conversations must have been processed by Genesys Cloud Speech Analytics. Transcripts are not available immediately after a call ends; they require asynchronous processing. Ensure your organization has Speech Analytics enabled and that the specific conversations you are querying have completed processing.

Authentication Setup

Genesys Cloud uses OAuth 2.0 for authentication. For server-to-server integrations, the Client Credentials Grant flow is the standard approach. This flow exchanges your client ID and client secret for an access token.

The following Python code demonstrates how to obtain and cache an access token. In production, you should implement token expiration checking to avoid refreshing tokens unnecessarily.

import requests
import time
from typing import Optional

class GenesysAuth:
    def __init__(self, client_id: str, client_secret: str, environment: str = "mypurecloud.com"):
        self.client_id = client_id
        self.client_secret = client_secret
        self.environment = environment
        self.base_url = f"https://{environment}"
        self.access_token: Optional[str] = None
        self.token_expiry: float = 0

    def get_token(self) -> str:
        # Return cached token if valid
        if self.access_token and time.time() < self.token_expiry:
            return self.access_token

        # Request new token
        token_url = f"{self.base_url}/oauth/token"
        headers = {
            "Content-Type": "application/x-www-form-urlencoded"
        }
        data = {
            "grant_type": "client_credentials",
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }

        response = requests.post(token_url, headers=headers, data=data)
        
        if response.status_code != 200:
            raise Exception(f"Failed to obtain OAuth token: {response.status_code} - {response.text}")

        token_data = response.json()
        self.access_token = token_data["access_token"]
        # Set expiry slightly before actual expiry to prevent edge-case failures
        self.token_expiry = time.time() + token_data["expires_in"] - 10
        
        return self.access_token

    def get_auth_header(self) -> dict:
        return {
            "Authorization": f"Bearer {self.get_token()}",
            "Content-Type": "application/json"
        }

Note on Scopes: If you receive a 403 Forbidden error when accessing speech data later, verify that your OAuth client has the analytics:speech:view scope. The analytics:conversation:view scope alone is insufficient for speech transcripts.

Implementation

Step 1: Query Voice Conversations

The first step is to identify the conversations you want to transcribe. Genesys Cloud does not store transcripts in the basic conversation summary. You must first query the Analytics API to get the conversationId for voice interactions.

The endpoint /api/v2/analytics/conversations/voice/details/query accepts a JSON body defining the query criteria (date range, queues, etc.) and returns a list of conversations.

import requests
from datetime import datetime, timedelta

def get_voice_conversations(auth: GenesysAuth, days_back: int = 1) -> list:
    """
    Queries Genesys Cloud for voice conversations from the last N days.
    
    Args:
        auth: GenesysAuth instance
        days_back: Number of days to look back
        
    Returns:
        List of conversation IDs
    """
    url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/query"
    
    # Define the query body
    # We request only the 'id' to minimize payload size
    query_body = {
        "dateRangeType": "relative",
        "interval": f"P{days_back}D",
        "view": "realtime",
        "filters": {
            "types": ["voice"]
        },
        "groupBy": [],
        "select": ["id", "startTime", "endTime"],
        "limit": 100
    }

    headers = auth.get_auth_header()
    response = requests.post(url, headers=headers, json=query_body)

    if response.status_code != 200:
        raise Exception(f"Failed to query conversations: {response.status_code} - {response.text}")

    data = response.json()
    conversations = data.get("conversations", [])
    
    if not conversations:
        print("No voice conversations found in the specified time range.")
        return []

    print(f"Found {len(conversations)} conversations.")
    return conversations

Key Parameters:

  • view: Set to "realtime" for historical data. Use "historical" if you are querying data older than 30 days in some environments, though realtime is generally recommended for recent data.
  • select: Always specify the fields you need. Requesting * can lead to performance issues and larger payloads.
  • limit: The API returns a maximum of 100 items per request. If you need more, you must implement pagination using the after parameter found in the response headers.

Step 2: Retrieve Speech Analytics Data

Once you have the conversationId, you must fetch the speech analytics data. The transcript is not part of the standard conversation detail object. It resides in the Speech Analytics endpoint.

The endpoint is: /api/v2/analytics/conversations/voice/details/{conversationId}/speech

This endpoint returns a JSON object containing metadata about the speech analysis, including the transcript field.

def get_speech_transcript(auth: GenesysAuth, conversation_id: str) -> dict:
    """
    Retrieves the speech analytics data for a specific conversation.
    
    Args:
        auth: GenesysAuth instance
        conversation_id: The ID of the voice conversation
        
    Returns:
        Dictionary containing speech analytics data, including the transcript
    """
    url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/{conversation_id}/speech"
    
    headers = auth.get_auth_header()
    
    # Retry logic for 429 Too Many Requests
    max_retries = 3
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
        
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            # Rate limited. Wait and retry.
            wait_time = 2 ** attempt  # Exponential backoff
            print(f"Rate limited (429). Retrying in {wait_time} seconds...")
            time.sleep(wait_time)
            continue
        elif response.status_code == 404:
            # Speech data not available yet or conversation not found
            print(f"Speech data not available for conversation {conversation_id}. It may still be processing.")
            return None
        else:
            raise Exception(f"Failed to retrieve speech data: {response.status_code} - {response.text}")
            
    raise Exception("Max retries exceeded for speech data retrieval.")

Important: The speech analytics data is processed asynchronously. If a conversation just ended, this endpoint may return 404 or an empty result. In production, you should implement a polling mechanism or rely on webhooks to know when speech processing is complete.

Step 3: Extract and Format the Transcript

The response from the speech analytics endpoint contains a transcript object. This object includes a text field with the full transcript and a segments array with timestamped utterances.

The text field is the easiest way to get the full conversation. The segments array allows you to distinguish between agent and customer speech, which is often critical for analytics.

def extract_transcript_text(speech_data: dict) -> str:
    """
    Extracts the full text transcript from speech analytics data.
    
    Args:
        speech_data: The JSON response from the speech analytics endpoint
        
    Returns:
        A formatted string of the conversation transcript
    """
    if not speech_data or "transcript" not in speech_data:
        return "No transcript available."

    transcript_obj = speech_data["transcript"]
    
    # Option 1: Simple full text
    full_text = transcript_obj.get("text", "")
    
    # Option 2: Structured output with speaker labels
    # This is more useful for analysis
    structured_transcript = []
    
    if "segments" in transcript_obj:
        for segment in transcript_obj["segments"]:
            speaker = segment.get("speaker", "Unknown")
            text = segment.get("text", "")
            start_time = segment.get("start", 0)
            end_time = segment.get("end", 0)
            
            # Format time as MM:SS
            start_min = int(start_time // 60)
            start_sec = int(start_time % 60)
            end_min = int(end_time // 60)
            end_sec = int(end_time % 60)
            
            time_str = f"{start_min:02d}:{start_sec:02d}-{end_min:02d}:{end_sec:02d}"
            structured_transcript.append(f"[{time_str}] {speaker}: {text}")
            
    return "\n".join(structured_transcript) if structured_transcript else full_text

Speaker Identification:
Genesys Cloud Speech Analytics attempts to identify speakers based on voice profiles. The speaker field in each segment usually contains values like "Agent" or "Customer". If voice profiling is not configured, it may use generic identifiers like "Speaker 1" and "Speaker 2".

Complete Working Example

The following script combines all the previous steps into a single, runnable module. It authenticates, queries for recent voice conversations, retrieves the speech data for each, and prints the formatted transcript.

import requests
import time
import sys
from typing import Optional, List, Dict

# ==============================================================================
# Authentication Module
# ==============================================================================

class GenesysAuth:
    def __init__(self, client_id: str, client_secret: str, environment: str = "mypurecloud.com"):
        self.client_id = client_id
        self.client_secret = client_secret
        self.environment = environment
        self.base_url = f"https://{environment}"
        self.access_token: Optional[str] = None
        self.token_expiry: float = 0

    def get_token(self) -> str:
        if self.access_token and time.time() < self.token_expiry:
            return self.access_token

        token_url = f"{self.base_url}/oauth/token"
        headers = {"Content-Type": "application/x-www-form-urlencoded"}
        data = {
            "grant_type": "client_credentials",
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }

        response = requests.post(token_url, headers=headers, data=data)
        
        if response.status_code != 200:
            raise Exception(f"Failed to obtain OAuth token: {response.status_code} - {response.text}")

        token_data = response.json()
        self.access_token = token_data["access_token"]
        self.token_expiry = time.time() + token_data["expires_in"] - 10
        
        return self.access_token

    def get_auth_header(self) -> dict:
        return {
            "Authorization": f"Bearer {self.get_token()}",
            "Content-Type": "application/json"
        }

# ==============================================================================
# API Interaction Module
# ==============================================================================

def get_voice_conversations(auth: GenesysAuth, days_back: int = 1) -> List[Dict]:
    url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/query"
    
    query_body = {
        "dateRangeType": "relative",
        "interval": f"P{days_back}D",
        "view": "realtime",
        "filters": {"types": ["voice"]},
        "groupBy": [],
        "select": ["id", "startTime", "endTime"],
        "limit": 10
    }

    headers = auth.get_auth_header()
    response = requests.post(url, headers=headers, json=query_body)

    if response.status_code != 200:
        raise Exception(f"Failed to query conversations: {response.status_code} - {response.text}")

    data = response.json()
    conversations = data.get("conversations", [])
    
    print(f"Found {len(conversations)} conversations in the last {days_back} day(s).")
    return conversations

def get_speech_transcript(auth: GenesysAuth, conversation_id: str) -> Optional[Dict]:
    url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/{conversation_id}/speech"
    
    headers = auth.get_auth_header()
    max_retries = 3
    
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
        
        if response.status_code == 200:
            return response.json()
        elif response.status_code == 429:
            wait_time = 2 ** attempt
            print(f"Rate limited (429). Retrying in {wait_time} seconds...")
            time.sleep(wait_time)
            continue
        elif response.status_code == 404:
            print(f"Speech data not available for conversation {conversation_id}. Skipping.")
            return None
        else:
            print(f"Failed to retrieve speech data for {conversation_id}: {response.status_code} - {response.text}")
            return None
            
    return None

def format_transcript(speech_data: Dict) -> str:
    if not speech_data or "transcript" not in speech_data:
        return "No transcript data."

    transcript_obj = speech_data["transcript"]
    segments = transcript_obj.get("segments", [])
    
    if not segments:
        return transcript_obj.get("text", "No segments found.")

    lines = []
    for segment in segments:
        speaker = segment.get("speaker", "Unknown")
        text = segment.get("text", "")
        start = segment.get("start", 0)
        
        # Format start time as MM:SS
        mins = int(start // 60)
        secs = int(start % 60)
        time_str = f"{mins:02d}:{secs:02d}"
        
        lines.append(f"[{time_str}] {speaker}: {text}")
        
    return "\n".join(lines)

# ==============================================================================
# Main Execution
# ==============================================================================

def main():
    # Configuration
    CLIENT_ID = "YOUR_CLIENT_ID"
    CLIENT_SECRET = "YOUR_CLIENT_SECRET"
    ENVIRONMENT = "mypurecloud.com" # Change if using a different region
    
    if CLIENT_ID == "YOUR_CLIENT_ID":
        print("Error: Please configure CLIENT_ID and CLIENT_SECRET.")
        sys.exit(1)

    # Initialize Auth
    auth = GenesysAuth(CLIENT_ID, CLIENT_SECRET, ENVIRONMENT)
    
    try:
        # Step 1: Get Conversations
        conversations = get_voice_conversations(auth, days_back=1)
        
        if not conversations:
            print("No conversations to process.")
            return

        # Step 2 & 3: Get and Format Transcripts
        for conv in conversations:
            conv_id = conv["id"]
            start_time = conv.get("startTime", "Unknown")
            
            print(f"\n{'='*60}")
            print(f"Conversation ID: {conv_id}")
            print(f"Start Time: {start_time}")
            print(f"{'='*60}")
            
            speech_data = get_speech_transcript(auth, conv_id)
            
            if speech_data:
                transcript_text = format_transcript(speech_data)
                print(transcript_text)
            else:
                print("Transcript unavailable or still processing.")
            
            # Small delay to respect rate limits
            time.sleep(0.5)
            
    except Exception as e:
        print(f"An error occurred: {e}")
        sys.exit(1)

if __name__ == "__main__":
    main()

Common Errors & Debugging

Error: 403 Forbidden

  • Cause: The OAuth client lacks the required scope.
  • Fix: Verify that the client has the analytics:speech:view scope. The analytics:conversation:view scope is not sufficient for accessing speech transcripts.
  • Code Check: Ensure you are using the correct client ID and secret associated with a client that has these scopes.

Error: 404 Not Found on Speech Endpoint

  • Cause: The speech analytics processing has not completed for the conversation.
  • Fix: Speech analytics is asynchronous. Transcripts are typically available within minutes, but can take longer for complex calls or during peak processing loads.
  • Debugging: Check the status field in the speech analytics response if available, or poll the endpoint repeatedly with exponential backoff.

Error: 429 Too Many Requests

  • Cause: You have exceeded the API rate limits.
  • Fix: Implement exponential backoff and retry logic. The example code includes a basic retry mechanism for the speech data retrieval.
  • Best Practice: Cache access tokens and avoid re-authenticating unnecessarily. Batch requests where possible, though the speech endpoint is per-conversation.

Error: Transcript is Empty or “No transcript available”

  • Cause: The conversation did not have speech analytics enabled, or the call duration was too short to generate a transcript.
  • Fix: Verify that Speech Analytics is enabled for the queues or skills associated with the conversation. Ensure the conversation had sufficient audio data to process.

Official References