Streaming Genesys Cloud Media Files with Python
What You Will Build
- A Python application that queries the Genesys Cloud Media API for file metadata, extracts presigned download URLs, and streams audio assets with partial content support.
- A production-grade pipeline that validates MD5 checksums on the fly, compresses downloads using gzip, implements exponential backoff for rate limits, and generates availability reports.
- A lightweight Flask-based streaming proxy that forwards range requests from frontend players to Genesys Cloud while preserving HTTP 206 responses.
Prerequisites
- OAuth 2.0 client credentials with the
media:viewscope - Genesys Cloud Python SDK (
genesyscloud) version 10.0.0 or higher - Python 3.9+ runtime
- External dependencies:
requests,urllib3,flask,hashlib,gzip,json,logging
Authentication Setup
Genesys Cloud requires OAuth 2.0 client credentials flow for server-to-server integrations. The SDK handles token acquisition and automatic refresh, but you must configure the required scopes upfront.
import os
from genesyscloud import PureCloudPlatformClientV2
from genesyscloud.auth.client_credentials_flow import ClientCredentialsFlow
def initialize_genesys_platform() -> PureCloudPlatformClientV2:
platform_client = PureCloudPlatformClientV2()
# Configure OAuth client credentials
client_id = os.environ["GENESYS_CLIENT_ID"]
client_secret = os.environ["GENESYS_CLIENT_SECRET"]
base_url = os.environ.get("GENESYS_BASE_URL", "https://api.mypurecloud.com")
platform_client.set_base_url(base_url)
auth_flow = ClientCredentialsFlow(
client_id=client_id,
client_secret=client_secret,
scopes=["media:view"]
)
platform_client.set_auth_flow(auth_flow)
# Trigger initial token fetch
platform_client.login()
return platform_client
The media:view scope grants read access to media records and presigned download URLs. The SDK caches the access token and automatically requests a new one when expiration approaches. You do not need to implement manual token rotation.
Implementation
Step 1: Initialize SDK and Query Media Metadata
The Media API returns a Media object containing file properties, encoding details, and a time-limited presigned URL. You query this endpoint before initiating any download.
from genesyscloud import MediaApi
from typing import Optional
def get_media_metadata(platform_client: PureCloudPlatformClientV2, media_id: str) -> Optional[dict]:
media_api = MediaApi(platform_client)
try:
response = media_api.get_media(media_id)
return {
"id": response.id,
"file_name": response.file_name,
"content_type": response.content_type,
"file_size": response.file_size,
"md5": response.md5,
"download_url": response.download_url,
"created_date": response.created_date.isoformat() if response.created_date else None
}
except Exception as e:
print(f"Failed to retrieve media metadata: {e}")
return None
Expected Response Structure:
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"file_name": "call_recording_2024_10_15.wav",
"content_type": "audio/wav",
"file_size": 52428800,
"md5": "d41d8cd98f00b204e9800998ecf8427e",
"download_url": "https://s3.amazonaws.com/genesys-media-us-east-1/recording/...?AWSAccessKeyId=...&Signature=...&Expires=...",
"created_date": "2024-10-15T14:32:00Z"
}
Error Handling: A 403 response indicates missing media:view scope. A 404 response means the media ID is invalid or the record was purged. The SDK raises PureCloudSdkException with the HTTP status code embedded in the message.
Step 2: Configure Retry Logic and Streaming Session
Transferring large audio files triggers HTTP 429 Too Many Requests when Genesys Cloud enforces rate limits. You must attach a retry adapter with exponential backoff to the session. The SDK does not support streaming or range headers, so you switch to requests for the download phase.
import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
def create_resilient_session() -> requests.Session:
session = requests.Session()
retry_strategy = Retry(
total=5,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
return session
The backoff_factor creates delays of 1s, 2s, 4s, 8s, and 16s between retries. The status_forcelist ensures only transient errors trigger retries. Permanent errors like 401 or 403 fail immediately.
Step 3: Stream with Range Headers and Validate MD5
You construct streaming requests with Range headers to support partial content retrieval. The server responds with HTTP 206 and a Content-Range header. You compute the MD5 checksum incrementally to avoid loading the entire file into memory.
import hashlib
import gzip
import os
from typing import Tuple
def stream_media_with_validation(
session: requests.Session,
download_url: str,
expected_md5: str,
output_path: str,
chunk_size: int = 1024 * 1024
) -> Tuple[bool, str]:
md5_hash = hashlib.md5()
bytes_downloaded = 0
headers = {"Range": "bytes=0-"}
response = session.get(download_url, headers=headers, stream=True)
response.raise_for_status()
actual_size = int(response.headers.get("Content-Length", 0))
with gzip.open(output_path, "wb", compresslevel=6) as f_out:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
f_out.write(chunk)
md5_hash.update(chunk)
bytes_downloaded += len(chunk)
computed_md5 = md5_hash.hexdigest()
is_valid = computed_md5 == expected_md5
return is_valid, computed_md5
Why Range Headers Matter: Frontend audio players and mobile applications request specific byte ranges to seek or buffer. The Range: bytes=0- header requests the entire file, but you can modify the header to Range: bytes=1024-2048 for partial requests. The server responds with Content-Range: bytes 1024-2048/52428800.
MD5 Validation: Genesys Cloud calculates the checksum on upload. You verify integrity immediately after download. If the checksums mismatch, the file is corrupted or the presigned URL expired and returned a cached error document.
Step 4: Compress Assets and Generate Availability Report
You compress media assets using gzip encoding to reduce storage footprint. The pipeline aggregates metadata into a JSON availability report for content management systems.
import json
from datetime import datetime
from typing import List
def generate_availability_report(media_records: List[dict]) -> str:
report = {
"generated_at": datetime.utcnow().isoformat(),
"total_files": len(media_records),
"total_size_bytes": sum(m.get("file_size", 0) for m in media_records),
"files": []
}
for record in media_records:
report["files"].append({
"id": record["id"],
"file_name": record["file_name"],
"status": "available" if record.get("download_url") else "unavailable",
"size_bytes": record.get("file_size"),
"md5": record.get("md5"),
"created_date": record.get("created_date")
})
return json.dumps(report, indent=2)
The report excludes presigned URLs to prevent credential leakage. You store the JSON alongside the compressed media archive. Content management systems parse the report to track storage utilization and playback readiness.
Step 5: Expose Streaming Proxy for Frontend Playback Integration
Frontend players cannot authenticate directly with Genesys Cloud. You expose a Flask endpoint that accepts range requests, forwards them to the presigned URL, and pipes the response back with correct headers.
from flask import Flask, request, Response, jsonify
app = Flask(__name__)
@app.route("/api/stream/<media_id>")
def proxy_media_stream(media_id: str) -> Response:
platform_client = initialize_genesys_platform()
metadata = get_media_metadata(platform_client, media_id)
if not metadata or not metadata.get("download_url"):
return jsonify({"error": "Media not found or download URL expired"}), 404
session = create_resilient_session()
range_header = request.headers.get("Range")
proxy_headers = {}
if range_header:
proxy_headers["Range"] = range_header
response = session.get(metadata["download_url"], headers=proxy_headers, stream=True)
if response.status_code == 206:
return Response(
response.iter_content(chunk_size=1024 * 1024),
status=206,
content_type=metadata["content_type"],
headers={
"Content-Range": response.headers.get("Content-Range", ""),
"Accept-Ranges": "bytes",
"Content-Length": response.headers.get("Content-Length", ""),
"ETag": response.headers.get("ETag", "")
}
)
return Response(
response.iter_content(chunk_size=1024 * 1024),
status=200,
content_type=metadata["content_type"],
headers={
"Content-Length": response.headers.get("Content-Length", ""),
"ETag": response.headers.get("ETag", "")
}
)
The proxy preserves HTTP 206 responses so frontend players can seek accurately. You attach Accept-Ranges: bytes to signal that partial requests are supported. The ETag header enables browser caching validation.
Complete Working Example
The following script combines all components into a single runnable module. Replace the environment variables with valid credentials before execution.
import os
import hashlib
import gzip
import json
import logging
from typing import Optional, Tuple, List
import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
from flask import Flask, request, Response, jsonify
from genesyscloud import PureCloudPlatformClientV2
from genesyscloud.auth.client_credentials_flow import ClientCredentialsFlow
from genesyscloud import MediaApi
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
def initialize_genesys_platform() -> PureCloudPlatformClientV2:
platform_client = PureCloudPlatformClientV2()
client_id = os.environ["GENESYS_CLIENT_ID"]
client_secret = os.environ["GENESYS_CLIENT_SECRET"]
base_url = os.environ.get("GENESYS_BASE_URL", "https://api.mypurecloud.com")
platform_client.set_base_url(base_url)
auth_flow = ClientCredentialsFlow(
client_id=client_id,
client_secret=client_secret,
scopes=["media:view"]
)
platform_client.set_auth_flow(auth_flow)
platform_client.login()
return platform_client
def get_media_metadata(platform_client: PureCloudPlatformClientV2, media_id: str) -> Optional[dict]:
media_api = MediaApi(platform_client)
try:
response = media_api.get_media(media_id)
return {
"id": response.id,
"file_name": response.file_name,
"content_type": response.content_type,
"file_size": response.file_size,
"md5": response.md5,
"download_url": response.download_url,
"created_date": response.created_date.isoformat() if response.created_date else None
}
except Exception as e:
logging.error(f"Failed to retrieve media metadata: {e}")
return None
def create_resilient_session() -> requests.Session:
session = requests.Session()
retry_strategy = Retry(
total=5,
backoff_factor=1,
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
session.mount("http://", adapter)
return session
def stream_media_with_validation(
session: requests.Session,
download_url: str,
expected_md5: str,
output_path: str,
chunk_size: int = 1024 * 1024
) -> Tuple[bool, str]:
md5_hash = hashlib.md5()
headers = {"Range": "bytes=0-"}
response = session.get(download_url, headers=headers, stream=True)
response.raise_for_status()
with gzip.open(output_path, "wb", compresslevel=6) as f_out:
for chunk in response.iter_content(chunk_size=chunk_size):
if chunk:
f_out.write(chunk)
md5_hash.update(chunk)
computed_md5 = md5_hash.hexdigest()
return computed_md5 == expected_md5, computed_md5
def generate_availability_report(media_records: List[dict]) -> str:
report = {
"generated_at": datetime.utcnow().isoformat(),
"total_files": len(media_records),
"total_size_bytes": sum(m.get("file_size", 0) for m in media_records),
"files": [
{
"id": m["id"],
"file_name": m["file_name"],
"status": "available" if m.get("download_url") else "unavailable",
"size_bytes": m.get("file_size"),
"md5": m.get("md5")
}
for m in media_records
]
}
return json.dumps(report, indent=2)
app = Flask(__name__)
@app.route("/api/stream/<media_id>")
def proxy_media_stream(media_id: str) -> Response:
platform_client = initialize_genesys_platform()
metadata = get_media_metadata(platform_client, media_id)
if not metadata or not metadata.get("download_url"):
return jsonify({"error": "Media not found or download URL expired"}), 404
session = create_resilient_session()
range_header = request.headers.get("Range")
proxy_headers = {"Range": range_header} if range_header else {}
response = session.get(metadata["download_url"], headers=proxy_headers, stream=True)
if response.status_code == 206:
return Response(
response.iter_content(chunk_size=1024 * 1024),
status=206,
content_type=metadata["content_type"],
headers={
"Content-Range": response.headers.get("Content-Range", ""),
"Accept-Ranges": "bytes",
"Content-Length": response.headers.get("Content-Length", ""),
"ETag": response.headers.get("ETag", "")
}
)
return Response(
response.iter_content(chunk_size=1024 * 1024),
status=200,
content_type=metadata["content_type"],
headers={
"Content-Length": response.headers.get("Content-Length", ""),
"ETag": response.headers.get("ETag", "")
}
)
if __name__ == "__main__":
import datetime
media_ids = os.environ.get("MEDIA_IDS", "a1b2c3d4-e5f6-7890-abcd-ef1234567890").split(",")
platform_client = initialize_genesys_platform()
session = create_resilient_session()
records = []
for mid in media_ids:
metadata = get_media_metadata(platform_client, mid.strip())
if metadata and metadata.get("download_url"):
records.append(metadata)
output_file = f"/tmp/{metadata['file_name']}.gz"
is_valid, computed = stream_media_with_validation(
session, metadata["download_url"], metadata["md5"], output_file
)
logging.info(f"Downloaded {metadata['file_name']} | Valid: {is_valid}")
report = generate_availability_report(records)
with open("/tmp/media_availability_report.json", "w") as f:
f.write(report)
logging.info("Starting streaming proxy on port 5000")
app.run(host="0.0.0.0", port=5000)
Common Errors & Debugging
Error: HTTP 401 Unauthorized
- What causes it: The OAuth token expired, the client credentials are incorrect, or the
media:viewscope is missing. - How to fix it: Verify environment variables. Restart the script to trigger a fresh token fetch. Confirm the OAuth client in the Genesys Cloud admin console includes
media:view. - Code showing the fix: The SDK handles token refresh automatically. If 401 persists, call
platform_client.login()explicitly before the request.
Error: HTTP 403 Forbidden
- What causes it: The OAuth client lacks permissions to access the media record, or the record belongs to an organization the client cannot read.
- How to fix it: Assign the OAuth client to the appropriate Genesys Cloud user or role with media read permissions. Verify the
media:viewscope is attached to the client. - Code showing the fix: No code change required. Adjust IAM policies in the Genesys Cloud UI and retest.
Error: HTTP 429 Too Many Requests
- What causes it: Genesys Cloud enforces rate limits on media download endpoints. Concurrent streaming requests exceed the threshold.
- How to fix it: The retry adapter handles this automatically. Reduce concurrent workers or increase the
backoff_factor. - Code showing the fix: The
Retryconfiguration increate_resilient_session()already implements exponential backoff for 429. Addlogging.warning(f"Rate limited. Retrying in {backoff}s")inside a custom retry event hook if you need visibility.
Error: HTTP 5xx Server Errors
- What causes it: Transient backend failures in Genesys Cloud media storage or presigned URL generation.
- How to fix it: The retry adapter covers 500, 502, 503, and 504. If failures persist beyond five retries, the storage backend is degraded. Wait and retry later.
- Code showing the fix: The
status_forcelistin theRetrystrategy includes all 5xx codes. No additional code is required.
Error: MD5 Checksum Mismatch
- What causes it: Network corruption, incomplete download, or the presigned URL returned a fallback error page instead of the audio file.
- How to fix it: Verify the
Content-Typeheader matchesaudio/wavoraudio/mp3. If the server returnstext/html, the URL expired. Regenerate the metadata and download again. - Code showing the fix: Add a header validation step before streaming:
if "audio" not in response.headers.get("Content-Type", ""): raise ValueError("Invalid content type returned").