Retrieving the Full Conversation Transcript via Genesys Cloud Speech Analytics API
What You Will Build
- A Python script that queries the Genesys Cloud Analytics API for voice conversations, retrieves their associated speech analytics data, and extracts the verbatim text transcript.
- This tutorial uses the Genesys Cloud CX REST API (
/api/v2/analytics/conversations/voice/details/query) and the Speech Analytics API (/api/v2/analytics/conversations/voice/details/{conversationId}/speech). - The implementation is written in Python 3.9+ using the
requestslibrary for HTTP communication.
Prerequisites
- OAuth Client: A Genesys Cloud OAuth 2.0 client with the following scopes:
analytics:conversation:view(to query conversation details)analytics:speech:view(to access speech analytics data)user:login(optional, if using user context)
- Environment: Python 3.9 or higher.
- Dependencies:
requests(HTTP client)pyjwt(optional, for token parsing/debugging)
- Data Requirement: Voice conversations must have been processed by Genesys Cloud Speech Analytics. Transcripts are not available immediately after a call ends; they require asynchronous processing. Ensure your organization has Speech Analytics enabled and that the specific conversations you are querying have completed processing.
Authentication Setup
Genesys Cloud uses OAuth 2.0 for authentication. For server-to-server integrations, the Client Credentials Grant flow is the standard approach. This flow exchanges your client ID and client secret for an access token.
The following Python code demonstrates how to obtain and cache an access token. In production, you should implement token expiration checking to avoid refreshing tokens unnecessarily.
import requests
import time
from typing import Optional
class GenesysAuth:
def __init__(self, client_id: str, client_secret: str, environment: str = "mypurecloud.com"):
self.client_id = client_id
self.client_secret = client_secret
self.environment = environment
self.base_url = f"https://{environment}"
self.access_token: Optional[str] = None
self.token_expiry: float = 0
def get_token(self) -> str:
# Return cached token if valid
if self.access_token and time.time() < self.token_expiry:
return self.access_token
# Request new token
token_url = f"{self.base_url}/oauth/token"
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
data = {
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret
}
response = requests.post(token_url, headers=headers, data=data)
if response.status_code != 200:
raise Exception(f"Failed to obtain OAuth token: {response.status_code} - {response.text}")
token_data = response.json()
self.access_token = token_data["access_token"]
# Set expiry slightly before actual expiry to prevent edge-case failures
self.token_expiry = time.time() + token_data["expires_in"] - 10
return self.access_token
def get_auth_header(self) -> dict:
return {
"Authorization": f"Bearer {self.get_token()}",
"Content-Type": "application/json"
}
Note on Scopes: If you receive a 403 Forbidden error when accessing speech data later, verify that your OAuth client has the analytics:speech:view scope. The analytics:conversation:view scope alone is insufficient for speech transcripts.
Implementation
Step 1: Query Voice Conversations
The first step is to identify the conversations you want to transcribe. Genesys Cloud does not store transcripts in the basic conversation summary. You must first query the Analytics API to get the conversationId for voice interactions.
The endpoint /api/v2/analytics/conversations/voice/details/query accepts a JSON body defining the query criteria (date range, queues, etc.) and returns a list of conversations.
import requests
from datetime import datetime, timedelta
def get_voice_conversations(auth: GenesysAuth, days_back: int = 1) -> list:
"""
Queries Genesys Cloud for voice conversations from the last N days.
Args:
auth: GenesysAuth instance
days_back: Number of days to look back
Returns:
List of conversation IDs
"""
url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/query"
# Define the query body
# We request only the 'id' to minimize payload size
query_body = {
"dateRangeType": "relative",
"interval": f"P{days_back}D",
"view": "realtime",
"filters": {
"types": ["voice"]
},
"groupBy": [],
"select": ["id", "startTime", "endTime"],
"limit": 100
}
headers = auth.get_auth_header()
response = requests.post(url, headers=headers, json=query_body)
if response.status_code != 200:
raise Exception(f"Failed to query conversations: {response.status_code} - {response.text}")
data = response.json()
conversations = data.get("conversations", [])
if not conversations:
print("No voice conversations found in the specified time range.")
return []
print(f"Found {len(conversations)} conversations.")
return conversations
Key Parameters:
view: Set to"realtime"for historical data. Use"historical"if you are querying data older than 30 days in some environments, thoughrealtimeis generally recommended for recent data.select: Always specify the fields you need. Requesting*can lead to performance issues and larger payloads.limit: The API returns a maximum of 100 items per request. If you need more, you must implement pagination using theafterparameter found in the response headers.
Step 2: Retrieve Speech Analytics Data
Once you have the conversationId, you must fetch the speech analytics data. The transcript is not part of the standard conversation detail object. It resides in the Speech Analytics endpoint.
The endpoint is: /api/v2/analytics/conversations/voice/details/{conversationId}/speech
This endpoint returns a JSON object containing metadata about the speech analysis, including the transcript field.
def get_speech_transcript(auth: GenesysAuth, conversation_id: str) -> dict:
"""
Retrieves the speech analytics data for a specific conversation.
Args:
auth: GenesysAuth instance
conversation_id: The ID of the voice conversation
Returns:
Dictionary containing speech analytics data, including the transcript
"""
url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/{conversation_id}/speech"
headers = auth.get_auth_header()
# Retry logic for 429 Too Many Requests
max_retries = 3
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
# Rate limited. Wait and retry.
wait_time = 2 ** attempt # Exponential backoff
print(f"Rate limited (429). Retrying in {wait_time} seconds...")
time.sleep(wait_time)
continue
elif response.status_code == 404:
# Speech data not available yet or conversation not found
print(f"Speech data not available for conversation {conversation_id}. It may still be processing.")
return None
else:
raise Exception(f"Failed to retrieve speech data: {response.status_code} - {response.text}")
raise Exception("Max retries exceeded for speech data retrieval.")
Important: The speech analytics data is processed asynchronously. If a conversation just ended, this endpoint may return 404 or an empty result. In production, you should implement a polling mechanism or rely on webhooks to know when speech processing is complete.
Step 3: Extract and Format the Transcript
The response from the speech analytics endpoint contains a transcript object. This object includes a text field with the full transcript and a segments array with timestamped utterances.
The text field is the easiest way to get the full conversation. The segments array allows you to distinguish between agent and customer speech, which is often critical for analytics.
def extract_transcript_text(speech_data: dict) -> str:
"""
Extracts the full text transcript from speech analytics data.
Args:
speech_data: The JSON response from the speech analytics endpoint
Returns:
A formatted string of the conversation transcript
"""
if not speech_data or "transcript" not in speech_data:
return "No transcript available."
transcript_obj = speech_data["transcript"]
# Option 1: Simple full text
full_text = transcript_obj.get("text", "")
# Option 2: Structured output with speaker labels
# This is more useful for analysis
structured_transcript = []
if "segments" in transcript_obj:
for segment in transcript_obj["segments"]:
speaker = segment.get("speaker", "Unknown")
text = segment.get("text", "")
start_time = segment.get("start", 0)
end_time = segment.get("end", 0)
# Format time as MM:SS
start_min = int(start_time // 60)
start_sec = int(start_time % 60)
end_min = int(end_time // 60)
end_sec = int(end_time % 60)
time_str = f"{start_min:02d}:{start_sec:02d}-{end_min:02d}:{end_sec:02d}"
structured_transcript.append(f"[{time_str}] {speaker}: {text}")
return "\n".join(structured_transcript) if structured_transcript else full_text
Speaker Identification:
Genesys Cloud Speech Analytics attempts to identify speakers based on voice profiles. The speaker field in each segment usually contains values like "Agent" or "Customer". If voice profiling is not configured, it may use generic identifiers like "Speaker 1" and "Speaker 2".
Complete Working Example
The following script combines all the previous steps into a single, runnable module. It authenticates, queries for recent voice conversations, retrieves the speech data for each, and prints the formatted transcript.
import requests
import time
import sys
from typing import Optional, List, Dict
# ==============================================================================
# Authentication Module
# ==============================================================================
class GenesysAuth:
def __init__(self, client_id: str, client_secret: str, environment: str = "mypurecloud.com"):
self.client_id = client_id
self.client_secret = client_secret
self.environment = environment
self.base_url = f"https://{environment}"
self.access_token: Optional[str] = None
self.token_expiry: float = 0
def get_token(self) -> str:
if self.access_token and time.time() < self.token_expiry:
return self.access_token
token_url = f"{self.base_url}/oauth/token"
headers = {"Content-Type": "application/x-www-form-urlencoded"}
data = {
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret
}
response = requests.post(token_url, headers=headers, data=data)
if response.status_code != 200:
raise Exception(f"Failed to obtain OAuth token: {response.status_code} - {response.text}")
token_data = response.json()
self.access_token = token_data["access_token"]
self.token_expiry = time.time() + token_data["expires_in"] - 10
return self.access_token
def get_auth_header(self) -> dict:
return {
"Authorization": f"Bearer {self.get_token()}",
"Content-Type": "application/json"
}
# ==============================================================================
# API Interaction Module
# ==============================================================================
def get_voice_conversations(auth: GenesysAuth, days_back: int = 1) -> List[Dict]:
url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/query"
query_body = {
"dateRangeType": "relative",
"interval": f"P{days_back}D",
"view": "realtime",
"filters": {"types": ["voice"]},
"groupBy": [],
"select": ["id", "startTime", "endTime"],
"limit": 10
}
headers = auth.get_auth_header()
response = requests.post(url, headers=headers, json=query_body)
if response.status_code != 200:
raise Exception(f"Failed to query conversations: {response.status_code} - {response.text}")
data = response.json()
conversations = data.get("conversations", [])
print(f"Found {len(conversations)} conversations in the last {days_back} day(s).")
return conversations
def get_speech_transcript(auth: GenesysAuth, conversation_id: str) -> Optional[Dict]:
url = f"https://{auth.environment}/api/v2/analytics/conversations/voice/details/{conversation_id}/speech"
headers = auth.get_auth_header()
max_retries = 3
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.json()
elif response.status_code == 429:
wait_time = 2 ** attempt
print(f"Rate limited (429). Retrying in {wait_time} seconds...")
time.sleep(wait_time)
continue
elif response.status_code == 404:
print(f"Speech data not available for conversation {conversation_id}. Skipping.")
return None
else:
print(f"Failed to retrieve speech data for {conversation_id}: {response.status_code} - {response.text}")
return None
return None
def format_transcript(speech_data: Dict) -> str:
if not speech_data or "transcript" not in speech_data:
return "No transcript data."
transcript_obj = speech_data["transcript"]
segments = transcript_obj.get("segments", [])
if not segments:
return transcript_obj.get("text", "No segments found.")
lines = []
for segment in segments:
speaker = segment.get("speaker", "Unknown")
text = segment.get("text", "")
start = segment.get("start", 0)
# Format start time as MM:SS
mins = int(start // 60)
secs = int(start % 60)
time_str = f"{mins:02d}:{secs:02d}"
lines.append(f"[{time_str}] {speaker}: {text}")
return "\n".join(lines)
# ==============================================================================
# Main Execution
# ==============================================================================
def main():
# Configuration
CLIENT_ID = "YOUR_CLIENT_ID"
CLIENT_SECRET = "YOUR_CLIENT_SECRET"
ENVIRONMENT = "mypurecloud.com" # Change if using a different region
if CLIENT_ID == "YOUR_CLIENT_ID":
print("Error: Please configure CLIENT_ID and CLIENT_SECRET.")
sys.exit(1)
# Initialize Auth
auth = GenesysAuth(CLIENT_ID, CLIENT_SECRET, ENVIRONMENT)
try:
# Step 1: Get Conversations
conversations = get_voice_conversations(auth, days_back=1)
if not conversations:
print("No conversations to process.")
return
# Step 2 & 3: Get and Format Transcripts
for conv in conversations:
conv_id = conv["id"]
start_time = conv.get("startTime", "Unknown")
print(f"\n{'='*60}")
print(f"Conversation ID: {conv_id}")
print(f"Start Time: {start_time}")
print(f"{'='*60}")
speech_data = get_speech_transcript(auth, conv_id)
if speech_data:
transcript_text = format_transcript(speech_data)
print(transcript_text)
else:
print("Transcript unavailable or still processing.")
# Small delay to respect rate limits
time.sleep(0.5)
except Exception as e:
print(f"An error occurred: {e}")
sys.exit(1)
if __name__ == "__main__":
main()
Common Errors & Debugging
Error: 403 Forbidden
- Cause: The OAuth client lacks the required scope.
- Fix: Verify that the client has the
analytics:speech:viewscope. Theanalytics:conversation:viewscope is not sufficient for accessing speech transcripts. - Code Check: Ensure you are using the correct client ID and secret associated with a client that has these scopes.
Error: 404 Not Found on Speech Endpoint
- Cause: The speech analytics processing has not completed for the conversation.
- Fix: Speech analytics is asynchronous. Transcripts are typically available within minutes, but can take longer for complex calls or during peak processing loads.
- Debugging: Check the
statusfield in the speech analytics response if available, or poll the endpoint repeatedly with exponential backoff.
Error: 429 Too Many Requests
- Cause: You have exceeded the API rate limits.
- Fix: Implement exponential backoff and retry logic. The example code includes a basic retry mechanism for the speech data retrieval.
- Best Practice: Cache access tokens and avoid re-authenticating unnecessarily. Batch requests where possible, though the speech endpoint is per-conversation.
Error: Transcript is Empty or “No transcript available”
- Cause: The conversation did not have speech analytics enabled, or the call duration was too short to generate a transcript.
- Fix: Verify that Speech Analytics is enabled for the queues or skills associated with the conversation. Ensure the conversation had sufficient audio data to process.