Streaming Genesys Cloud Media Files with Python

StarAdmin · June 16, 2026, 8:32am

Streaming Genesys Cloud Media Files with Python

What You Will Build

A Python application that queries the Genesys Cloud Media API for file metadata, extracts presigned download URLs, and streams audio assets with partial content support.
A production-grade pipeline that validates MD5 checksums on the fly, compresses downloads using gzip, implements exponential backoff for rate limits, and generates availability reports.
A lightweight Flask-based streaming proxy that forwards range requests from frontend players to Genesys Cloud while preserving HTTP 206 responses.

Prerequisites

OAuth 2.0 client credentials with the media:view scope
Genesys Cloud Python SDK (genesyscloud) version 10.0.0 or higher
Python 3.9+ runtime
External dependencies: requests, urllib3, flask, hashlib, gzip, json, logging

Authentication Setup

Genesys Cloud requires OAuth 2.0 client credentials flow for server-to-server integrations. The SDK handles token acquisition and automatic refresh, but you must configure the required scopes upfront.

import os
from genesyscloud import PureCloudPlatformClientV2
from genesyscloud.auth.client_credentials_flow import ClientCredentialsFlow

def initialize_genesys_platform() -> PureCloudPlatformClientV2:
    platform_client = PureCloudPlatformClientV2()
    
    # Configure OAuth client credentials
    client_id = os.environ["GENESYS_CLIENT_ID"]
    client_secret = os.environ["GENESYS_CLIENT_SECRET"]
    base_url = os.environ.get("GENESYS_BASE_URL", "https://api.mypurecloud.com")
    
    platform_client.set_base_url(base_url)
    auth_flow = ClientCredentialsFlow(
        client_id=client_id,
        client_secret=client_secret,
        scopes=["media:view"]
    )
    platform_client.set_auth_flow(auth_flow)
    
    # Trigger initial token fetch
    platform_client.login()
    return platform_client

The media:view scope grants read access to media records and presigned download URLs. The SDK caches the access token and automatically requests a new one when expiration approaches. You do not need to implement manual token rotation.

Implementation

Step 1: Initialize SDK and Query Media Metadata

The Media API returns a Media object containing file properties, encoding details, and a time-limited presigned URL. You query this endpoint before initiating any download.

from genesyscloud import MediaApi
from typing import Optional

def get_media_metadata(platform_client: PureCloudPlatformClientV2, media_id: str) -> Optional[dict]:
    media_api = MediaApi(platform_client)
    
    try:
        response = media_api.get_media(media_id)
        return {
            "id": response.id,
            "file_name": response.file_name,
            "content_type": response.content_type,
            "file_size": response.file_size,
            "md5": response.md5,
            "download_url": response.download_url,
            "created_date": response.created_date.isoformat() if response.created_date else None
        }
    except Exception as e:
        print(f"Failed to retrieve media metadata: {e}")
        return None

Expected Response Structure:

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "file_name": "call_recording_2024_10_15.wav",
  "content_type": "audio/wav",
  "file_size": 52428800,
  "md5": "d41d8cd98f00b204e9800998ecf8427e",
  "download_url": "https://s3.amazonaws.com/genesys-media-us-east-1/recording/...?AWSAccessKeyId=...&Signature=...&Expires=...",
  "created_date": "2024-10-15T14:32:00Z"
}

Error Handling: A 403 response indicates missing media:view scope. A 404 response means the media ID is invalid or the record was purged. The SDK raises PureCloudSdkException with the HTTP status code embedded in the message.

Step 2: Configure Retry Logic and Streaming Session

Transferring large audio files triggers HTTP 429 Too Many Requests when Genesys Cloud enforces rate limits. You must attach a retry adapter with exponential backoff to the session. The SDK does not support streaming or range headers, so you switch to requests for the download phase.

import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter

def create_resilient_session() -> requests.Session:
    session = requests.Session()
    
    retry_strategy = Retry(
        total=5,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET"]
    )
    
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("https://", adapter)
    session.mount("http://", adapter)
    
    return session

The backoff_factor creates delays of 1s, 2s, 4s, 8s, and 16s between retries. The status_forcelist ensures only transient errors trigger retries. Permanent errors like 401 or 403 fail immediately.

Step 3: Stream with Range Headers and Validate MD5

You construct streaming requests with Range headers to support partial content retrieval. The server responds with HTTP 206 and a Content-Range header. You compute the MD5 checksum incrementally to avoid loading the entire file into memory.

import hashlib
import gzip
import os
from typing import Tuple

def stream_media_with_validation(
    session: requests.Session,
    download_url: str,
    expected_md5: str,
    output_path: str,
    chunk_size: int = 1024 * 1024
) -> Tuple[bool, str]:
    md5_hash = hashlib.md5()
    bytes_downloaded = 0
    
    headers = {"Range": "bytes=0-"}
    
    response = session.get(download_url, headers=headers, stream=True)
    response.raise_for_status()
    
    actual_size = int(response.headers.get("Content-Length", 0))
    
    with gzip.open(output_path, "wb", compresslevel=6) as f_out:
        for chunk in response.iter_content(chunk_size=chunk_size):
            if chunk:
                f_out.write(chunk)
                md5_hash.update(chunk)
                bytes_downloaded += len(chunk)
                
    computed_md5 = md5_hash.hexdigest()
    is_valid = computed_md5 == expected_md5
    
    return is_valid, computed_md5

Why Range Headers Matter: Frontend audio players and mobile applications request specific byte ranges to seek or buffer. The Range: bytes=0- header requests the entire file, but you can modify the header to Range: bytes=1024-2048 for partial requests. The server responds with Content-Range: bytes 1024-2048/52428800.

MD5 Validation: Genesys Cloud calculates the checksum on upload. You verify integrity immediately after download. If the checksums mismatch, the file is corrupted or the presigned URL expired and returned a cached error document.

Step 4: Compress Assets and Generate Availability Report

You compress media assets using gzip encoding to reduce storage footprint. The pipeline aggregates metadata into a JSON availability report for content management systems.

import json
from datetime import datetime
from typing import List

def generate_availability_report(media_records: List[dict]) -> str:
    report = {
        "generated_at": datetime.utcnow().isoformat(),
        "total_files": len(media_records),
        "total_size_bytes": sum(m.get("file_size", 0) for m in media_records),
        "files": []
    }
    
    for record in media_records:
        report["files"].append({
            "id": record["id"],
            "file_name": record["file_name"],
            "status": "available" if record.get("download_url") else "unavailable",
            "size_bytes": record.get("file_size"),
            "md5": record.get("md5"),
            "created_date": record.get("created_date")
        })
    
    return json.dumps(report, indent=2)

The report excludes presigned URLs to prevent credential leakage. You store the JSON alongside the compressed media archive. Content management systems parse the report to track storage utilization and playback readiness.

Step 5: Expose Streaming Proxy for Frontend Playback Integration

Frontend players cannot authenticate directly with Genesys Cloud. You expose a Flask endpoint that accepts range requests, forwards them to the presigned URL, and pipes the response back with correct headers.

from flask import Flask, request, Response, jsonify

app = Flask(__name__)

@app.route("/api/stream/<media_id>")
def proxy_media_stream(media_id: str) -> Response:
    platform_client = initialize_genesys_platform()
    metadata = get_media_metadata(platform_client, media_id)
    
    if not metadata or not metadata.get("download_url"):
        return jsonify({"error": "Media not found or download URL expired"}), 404
    
    session = create_resilient_session()
    range_header = request.headers.get("Range")
    
    proxy_headers = {}
    if range_header:
        proxy_headers["Range"] = range_header
    
    response = session.get(metadata["download_url"], headers=proxy_headers, stream=True)
    
    if response.status_code == 206:
        return Response(
            response.iter_content(chunk_size=1024 * 1024),
            status=206,
            content_type=metadata["content_type"],
            headers={
                "Content-Range": response.headers.get("Content-Range", ""),
                "Accept-Ranges": "bytes",
                "Content-Length": response.headers.get("Content-Length", ""),
                "ETag": response.headers.get("ETag", "")
            }
        )
    
    return Response(
        response.iter_content(chunk_size=1024 * 1024),
        status=200,
        content_type=metadata["content_type"],
        headers={
            "Content-Length": response.headers.get("Content-Length", ""),
            "ETag": response.headers.get("ETag", "")
        }
    )

The proxy preserves HTTP 206 responses so frontend players can seek accurately. You attach Accept-Ranges: bytes to signal that partial requests are supported. The ETag header enables browser caching validation.

Complete Working Example

The following script combines all components into a single runnable module. Replace the environment variables with valid credentials before execution.

import os
import hashlib
import gzip
import json
import logging
from typing import Optional, Tuple, List
import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
from flask import Flask, request, Response, jsonify
from genesyscloud import PureCloudPlatformClientV2
from genesyscloud.auth.client_credentials_flow import ClientCredentialsFlow
from genesyscloud import MediaApi

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")

def initialize_genesys_platform() -> PureCloudPlatformClientV2:
    platform_client = PureCloudPlatformClientV2()
    client_id = os.environ["GENESYS_CLIENT_ID"]
    client_secret = os.environ["GENESYS_CLIENT_SECRET"]
    base_url = os.environ.get("GENESYS_BASE_URL", "https://api.mypurecloud.com")
    platform_client.set_base_url(base_url)
    auth_flow = ClientCredentialsFlow(
        client_id=client_id,
        client_secret=client_secret,
        scopes=["media:view"]
    )
    platform_client.set_auth_flow(auth_flow)
    platform_client.login()
    return platform_client

def get_media_metadata(platform_client: PureCloudPlatformClientV2, media_id: str) -> Optional[dict]:
    media_api = MediaApi(platform_client)
    try:
        response = media_api.get_media(media_id)
        return {
            "id": response.id,
            "file_name": response.file_name,
            "content_type": response.content_type,
            "file_size": response.file_size,
            "md5": response.md5,
            "download_url": response.download_url,
            "created_date": response.created_date.isoformat() if response.created_date else None
        }
    except Exception as e:
        logging.error(f"Failed to retrieve media metadata: {e}")
        return None

def create_resilient_session() -> requests.Session:
    session = requests.Session()
    retry_strategy = Retry(
        total=5,
        backoff_factor=1,
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["GET"]
    )
    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("https://", adapter)
    session.mount("http://", adapter)
    return session

def stream_media_with_validation(
    session: requests.Session,
    download_url: str,
    expected_md5: str,
    output_path: str,
    chunk_size: int = 1024 * 1024
) -> Tuple[bool, str]:
    md5_hash = hashlib.md5()
    
    headers = {"Range": "bytes=0-"}
    response = session.get(download_url, headers=headers, stream=True)
    response.raise_for_status()
    
    with gzip.open(output_path, "wb", compresslevel=6) as f_out:
        for chunk in response.iter_content(chunk_size=chunk_size):
            if chunk:
                f_out.write(chunk)
                md5_hash.update(chunk)
    
    computed_md5 = md5_hash.hexdigest()
    return computed_md5 == expected_md5, computed_md5

def generate_availability_report(media_records: List[dict]) -> str:
    report = {
        "generated_at": datetime.utcnow().isoformat(),
        "total_files": len(media_records),
        "total_size_bytes": sum(m.get("file_size", 0) for m in media_records),
        "files": [
            {
                "id": m["id"],
                "file_name": m["file_name"],
                "status": "available" if m.get("download_url") else "unavailable",
                "size_bytes": m.get("file_size"),
                "md5": m.get("md5")
            }
            for m in media_records
        ]
    }
    return json.dumps(report, indent=2)

app = Flask(__name__)

@app.route("/api/stream/<media_id>")
def proxy_media_stream(media_id: str) -> Response:
    platform_client = initialize_genesys_platform()
    metadata = get_media_metadata(platform_client, media_id)
    
    if not metadata or not metadata.get("download_url"):
        return jsonify({"error": "Media not found or download URL expired"}), 404
    
    session = create_resilient_session()
    range_header = request.headers.get("Range")
    proxy_headers = {"Range": range_header} if range_header else {}
    
    response = session.get(metadata["download_url"], headers=proxy_headers, stream=True)
    
    if response.status_code == 206:
        return Response(
            response.iter_content(chunk_size=1024 * 1024),
            status=206,
            content_type=metadata["content_type"],
            headers={
                "Content-Range": response.headers.get("Content-Range", ""),
                "Accept-Ranges": "bytes",
                "Content-Length": response.headers.get("Content-Length", ""),
                "ETag": response.headers.get("ETag", "")
            }
        )
    
    return Response(
        response.iter_content(chunk_size=1024 * 1024),
        status=200,
        content_type=metadata["content_type"],
        headers={
            "Content-Length": response.headers.get("Content-Length", ""),
            "ETag": response.headers.get("ETag", "")
        }
    )

if __name__ == "__main__":
    import datetime
    
    media_ids = os.environ.get("MEDIA_IDS", "a1b2c3d4-e5f6-7890-abcd-ef1234567890").split(",")
    platform_client = initialize_genesys_platform()
    session = create_resilient_session()
    records = []
    
    for mid in media_ids:
        metadata = get_media_metadata(platform_client, mid.strip())
        if metadata and metadata.get("download_url"):
            records.append(metadata)
            output_file = f"/tmp/{metadata['file_name']}.gz"
            is_valid, computed = stream_media_with_validation(
                session, metadata["download_url"], metadata["md5"], output_file
            )
            logging.info(f"Downloaded {metadata['file_name']} | Valid: {is_valid}")
    
    report = generate_availability_report(records)
    with open("/tmp/media_availability_report.json", "w") as f:
        f.write(report)
    
    logging.info("Starting streaming proxy on port 5000")
    app.run(host="0.0.0.0", port=5000)

Common Errors & Debugging

Error: HTTP 401 Unauthorized

What causes it: The OAuth token expired, the client credentials are incorrect, or the media:view scope is missing.
How to fix it: Verify environment variables. Restart the script to trigger a fresh token fetch. Confirm the OAuth client in the Genesys Cloud admin console includes media:view.
Code showing the fix: The SDK handles token refresh automatically. If 401 persists, call platform_client.login() explicitly before the request.

Error: HTTP 403 Forbidden

What causes it: The OAuth client lacks permissions to access the media record, or the record belongs to an organization the client cannot read.
How to fix it: Assign the OAuth client to the appropriate Genesys Cloud user or role with media read permissions. Verify the media:view scope is attached to the client.
Code showing the fix: No code change required. Adjust IAM policies in the Genesys Cloud UI and retest.

Error: HTTP 429 Too Many Requests

What causes it: Genesys Cloud enforces rate limits on media download endpoints. Concurrent streaming requests exceed the threshold.
How to fix it: The retry adapter handles this automatically. Reduce concurrent workers or increase the backoff_factor.
Code showing the fix: The Retry configuration in create_resilient_session() already implements exponential backoff for 429. Add logging.warning(f"Rate limited. Retrying in {backoff}s") inside a custom retry event hook if you need visibility.

Error: HTTP 5xx Server Errors

What causes it: Transient backend failures in Genesys Cloud media storage or presigned URL generation.
How to fix it: The retry adapter covers 500, 502, 503, and 504. If failures persist beyond five retries, the storage backend is degraded. Wait and retry later.
Code showing the fix: The status_forcelist in the Retry strategy includes all 5xx codes. No additional code is required.

Error: MD5 Checksum Mismatch

What causes it: Network corruption, incomplete download, or the presigned URL returned a fallback error page instead of the audio file.
How to fix it: Verify the Content-Type header matches audio/wav or audio/mp3. If the server returns text/html, the URL expired. Regenerate the metadata and download again.
Code showing the fix: Add a header validation step before streaming: if "audio" not in response.headers.get("Content-Type", ""): raise ValueError("Invalid content type returned").

Streaming Genesys Cloud Media Files with Python

Streaming Genesys Cloud Media Files with Python

What You Will Build

Prerequisites

Authentication Setup

Implementation

Step 1: Initialize SDK and Query Media Metadata

Step 2: Configure Retry Logic and Streaming Session

Step 3: Stream with Range Headers and Validate MD5

Step 4: Compress Assets and Generate Availability Report

Step 5: Expose Streaming Proxy for Frontend Playback Integration

Complete Working Example

Common Errors & Debugging

Error: HTTP 401 Unauthorized

Error: HTTP 403 Forbidden

Error: HTTP 429 Too Many Requests

Error: HTTP 5xx Server Errors

Error: MD5 Checksum Mismatch

Official References