Handling File Attachments in Genesys Cloud Web Messaging: MIME Validation and Size Limits

Handling File Attachments in Genesys Cloud Web Messaging: MIME Validation and Size Limits

What You Will Build

  • You will build a server-side middleware component that validates incoming file attachments from Genesys Cloud Web Messaging before processing them.
  • This tutorial uses the Genesys Cloud api/v2/conversations/messaging/participants endpoint and the api/v2/analytics/conversations/details/query endpoint to retrieve attachment metadata.
  • The code examples are provided in Python using the official genesyscloud SDK and raw requests for granular control.

Prerequisites

  • OAuth Client Type: Service Account or Confidential Client with webchat:participant and webchat:conversation:read scopes.
  • SDK Version: genesyscloud Python SDK v2.15.0 or later.
  • Runtime: Python 3.9+ with pip.
  • Dependencies:
    pip install genesyscloud httpx python-multipart
    
  • Web Messaging Configuration: A Genesys Cloud Web Messaging widget configured to allow file attachments in the Admin Console (Experience > Web Messaging > Settings > Attachments).

Authentication Setup

Genesys Cloud APIs require OAuth 2.0 Bearer tokens. For server-side integration, the Client Credentials flow is the standard pattern. You must cache the token and refresh it before expiration to avoid 401 Unauthorized errors during long-running file processing jobs.

The following Python snippet demonstrates a robust token retrieval mechanism using httpx. This replaces the internal SDK token caching for demonstration purposes, allowing you to see the exact HTTP flow.

import httpx
import time
from typing import Optional, Dict

class GenesysAuth:
    def __init__(self, client_id: str, client_secret: str, org_id: str):
        self.client_id = client_id
        self.client_secret = client_secret
        self.org_id = org_id
        self.token_url = f"https://{org_id}.mypurecloud.com/oauth/token"
        self.access_token: Optional[str] = None
        self.token_expiry: float = 0

    def get_token(self) -> str:
        # Return cached token if not expired
        if self.access_token and time.time() < self.token_expiry:
            return self.access_token

        # Request new token
        headers = {
            "Content-Type": "application/x-www-form-urlencoded"
        }
        data = {
            "grant_type": "client_credentials",
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }

        response = httpx.post(self.token_url, headers=headers, data=data)
        response.raise_for_status()
        
        token_data = response.json()
        self.access_token = token_data["access_token"]
        # Cache for slightly less than the actual expiry to avoid edge-case 401s
        self.token_expiry = time.time() + (token_data["expires_in"] - 10)
        
        return self.access_token

    def get_headers(self) -> Dict[str, str]:
        return {
            "Authorization": f"Bearer {self.get_token()}",
            "Content-Type": "application/json"
        }

Required Scope: webchat:participant is required to read participant details and attachment metadata.

Implementation

Step 1: Retrieving Conversation and Attachment Metadata

When a customer uploads a file in Web Messaging, Genesys Cloud does not send the file content directly to your webhook immediately. Instead, it creates a message event with an attachment object. This object contains metadata: fileName, contentType (MIME type), size (bytes), and a url pointing to the temporary storage of the file.

You must retrieve this metadata to validate the file before downloading it.

import httpx
import json

class GenesysMessagingHandler:
    def __init__(self, auth: GenesysAuth):
        self.auth = auth
        self.base_url = f"https://{auth.org_id}.mypurecloud.com/api/v2"

    def get_conversation_details(self, conversation_id: str) -> dict:
        """
        Retrieves detailed conversation history including attachment metadata.
        """
        url = f"{self.base_url}/conversations/messaging/{conversation_id}"
        
        try:
            response = httpx.get(url, headers=self.auth.get_headers())
            response.raise_for_status()
            return response.json()
        except httpx.HTTPStatusError as e:
            if e.response.status_code == 401:
                # Token expired, attempt refresh
                self.auth.access_token = None
                return self.get_conversation_details(conversation_id)
            raise

    def extract_attachments(self, conversation_data: dict) -> list:
        """
        Parses the conversation object to find all attachments.
        """
        attachments = []
        participants = conversation_data.get("participants", [])
        
        for participant in participants:
            # Filter for customer-side participants only if needed
            if participant.get("direction") == "outbound":
                continue
                
            messages = participant.get("messages", [])
            for msg in messages:
                if msg.get("type") == "attachment":
                    attachments.append(msg.get("attachment", {}))
                    
        return attachments

Expected Response Structure:
The attachment object within a message looks like this:

{
  "id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "fileName": "invoice_final.pdf",
  "contentType": "application/pdf",
  "size": 245890,
  "url": "https://<org-id>.mypurecloud.com/api/v2/conversations/messaging/attachments/a1b2c3d4...",
  "thumbUrl": "https://<org-id>.mypurecloud.com/api/v2/conversations/messaging/attachments/a1b2c3d4.../thumb"
}

Step 2: Validating MIME Types and Size Limits

Genesys Cloud enforces server-side limits (default max file size is often 5MB or 10MB depending on configuration, max MIME types are configurable in Admin Console). However, your application logic should enforce stricter business rules. For example, you may only accept .pdf and .jpg for security reasons, or limit size to 2MB for your internal storage.

This step implements the validation logic.

import mimetypes
import os

class FileValidator:
    # Define allowed MIME types and their extensions
    ALLOWED_MIMES = {
        "application/pdf": [".pdf"],
        "image/jpeg": [".jpg", ".jpeg"],
        "image/png": [".png"],
        "text/plain": [".txt"]
    }
    
    MAX_FILE_SIZE = 2 * 1024 * 1024  # 2 MB in bytes

    @classmethod
    def validate_attachment(cls, attachment: dict) -> tuple[bool, str]:
        """
        Validates an attachment object against business rules.
        Returns (is_valid, error_message).
        """
        file_name = attachment.get("fileName", "")
        content_type = attachment.get("contentType", "")
        file_size = attachment.get("size", 0)

        # 1. Validate File Size
        if file_size > cls.MAX_FILE_SIZE:
            return False, f"File size {file_size} bytes exceeds limit of {cls.MAX_FILE_SIZE} bytes."

        # 2. Validate MIME Type
        if content_type not in cls.ALLOWED_MIMES:
            return False, f"MIME type '{content_type}' is not allowed. Allowed: {list(cls.ALLOWED_MIMES.keys())}."

        # 3. Validate File Extension matches MIME Type (Basic Spoofing Check)
        _, ext = os.path.splitext(file_name)
        ext = ext.lower()
        
        if ext not in cls.ALLOWED_MIMES.get(content_type, []):
            return False, f"File extension '{ext}' does not match MIME type '{content_type}'."

        return True, "Valid"

Critical Note on MIME Spoofing:
A user can rename a .exe file to .pdf. The Genesys Cloud API reports the contentType provided by the client browser. The fileName is also provided by the client. Never trust these fields alone for security. Always validate the actual file content after download. The code above checks consistency between name and MIME, but Step 3 handles the actual content verification.

Step 3: Downloading and Verifying File Content

Once metadata passes validation, download the file using the url provided in the attachment object. This URL is a temporary, signed URL that expires. You must download the file immediately upon receipt of the event.

After downloading, verify the actual magic bytes (file signature) to ensure the content matches the expected MIME type.

import magic  # python-magic library for file signature detection
import io

class FileProcessor:
    def __init__(self, auth: GenesysAuth):
        self.auth = auth

    def download_and_verify(self, attachment: dict) -> tuple[bytes, bool, str]:
        """
        Downloads the file and verifies its actual content signature.
        Returns (file_bytes, is_valid, error_message).
        """
        attachment_url = attachment.get("url")
        expected_mime = attachment.get("contentType")
        
        if not attachment_url:
            return b"", False, "Missing attachment URL."

        try:
            # Download the file
            # Note: The attachment URL is pre-signed, so no auth header is needed for the download itself
            response = httpx.get(attachment_url, timeout=30.0)
            response.raise_for_status()
            
            file_bytes = response.content
            
            # Verify Magic Bytes
            detected_mime = magic.from_buffer(file_bytes, mime=True)
            
            if detected_mime != expected_mime:
                return b"", False, f"MIME mismatch. Expected {expected_mime}, detected {detected_mime}."
                
            return file_bytes, True, "Downloaded and verified."
            
        except httpx.RequestError as e:
            return b"", False, f"Download failed: {str(e)}"
        except Exception as e:
            return b"", False, f"Verification failed: {str(e)}"

Dependencies:
You need python-magic installed. On Linux, ensure libmagic is installed (sudo apt-get install libmagic1). On macOS, brew install libmagic.

Complete Working Example

This script ties together authentication, retrieval, validation, and download. It assumes you have a webhook listener or a poller that provides the conversation_id.

import os
import sys
import httpx
import time
import magic
from typing import Dict, Optional, Tuple

# --- Authentication Module ---
class GenesysAuth:
    def __init__(self, client_id: str, client_secret: str, org_id: str):
        self.client_id = client_id
        self.client_secret = client_secret
        self.org_id = org_id
        self.token_url = f"https://{org_id}.mypurecloud.com/oauth/token"
        self.access_token: Optional[str] = None
        self.token_expiry: float = 0

    def get_token(self) -> str:
        if self.access_token and time.time() < self.token_expiry:
            return self.access_token

        headers = {"Content-Type": "application/x-www-form-urlencoded"}
        data = {
            "grant_type": "client_credentials",
            "client_id": self.client_id,
            "client_secret": self.client_secret
        }

        response = httpx.post(self.token_url, headers=headers, data=data)
        response.raise_for_status()
        
        token_data = response.json()
        self.access_token = token_data["access_token"]
        self.token_expiry = time.time() + (token_data["expires_in"] - 10)
        
        return self.access_token

    def get_headers(self) -> Dict[str, str]:
        return {
            "Authorization": f"Bearer {self.get_token()}",
            "Content-Type": "application/json"
        }

# --- Validation Module ---
class FileValidator:
    ALLOWED_MIMES = {
        "application/pdf": [".pdf"],
        "image/jpeg": [".jpg", ".jpeg"],
        "image/png": [".png"]
    }
    MAX_FILE_SIZE = 5 * 1024 * 1024  # 5 MB

    @classmethod
    def validate_metadata(cls, attachment: dict) -> Tuple[bool, str]:
        file_name = attachment.get("fileName", "")
        content_type = attachment.get("contentType", "")
        file_size = attachment.get("size", 0)

        if file_size > cls.MAX_FILE_SIZE:
            return False, f"Size limit exceeded: {file_size} > {cls.MAX_FILE_SIZE}"

        if content_type not in cls.ALLOWED_MIMES:
            return False, f"Blocked MIME type: {content_type}"

        _, ext = os.path.splitext(file_name)
        ext = ext.lower()
        if ext not in cls.ALLOWED_MIMES.get(content_type, []):
            return False, f"Extension {ext} mismatch for {content_type}"

        return True, "OK"

# --- Main Handler ---
def process_attachment(conversation_id: str, auth: GenesysAuth) -> None:
    base_url = f"https://{auth.org_id}.mypurecloud.com/api/v2"
    url = f"{base_url}/conversations/messaging/{conversation_id}"
    
    # 1. Fetch Conversation
    try:
        resp = httpx.get(url, headers=auth.get_headers())
        resp.raise_for_status()
        conv_data = resp.json()
    except httpx.HTTPStatusError as e:
        print(f"Failed to fetch conversation: {e.response.status_code}")
        return

    # 2. Extract Attachments
    attachments = []
    for participant in conv_data.get("participants", []):
        for msg in participant.get("messages", []):
            if msg.get("type") == "attachment":
                attachments.append(msg["attachment"])

    if not attachments:
        print("No attachments found in this conversation.")
        return

    # 3. Process Each Attachment
    for att in attachments:
        print(f"Processing: {att['fileName']}")
        
        # Validate Metadata
        is_valid, msg = FileValidator.validate_metadata(att)
        if not is_valid:
            print(f"  [REJECTED] {msg}")
            continue
        
        # Download and Verify Content
        try:
            file_resp = httpx.get(att["url"], timeout=30.0)
            file_resp.raise_for_status()
            file_bytes = file_resp.content
            
            # Check Magic Bytes
            detected_mime = magic.from_buffer(file_bytes, mime=True)
            expected_mime = att["contentType"]
            
            if detected_mime == expected_mime:
                print(f"  [SUCCESS] Verified. Saving file...")
                # Save to disk
                with open(att["fileName"], "wb") as f:
                    f.write(file_bytes)
            else:
                print(f"  [REJECTED] MIME spoofing detected. Expected {expected_mime}, got {detected_mime}")
                
        except Exception as e:
            print(f"  [ERROR] {str(e)}")

if __name__ == "__main__":
    # Configuration
    CLIENT_ID = os.getenv("GENESYS_CLIENT_ID")
    CLIENT_SECRET = os.getenv("GENESYS_CLIENT_SECRET")
    ORG_ID = os.getenv("GENESYS_ORG_ID")
    CONVERSATION_ID = os.getenv("TEST_CONVERSATION_ID")

    if not all([CLIENT_ID, CLIENT_SECRET, ORG_ID, CONVERSATION_ID]):
        print("Missing environment variables.")
        sys.exit(1)

    auth = GenesysAuth(CLIENT_ID, CLIENT_SECRET, ORG_ID)
    process_attachment(CONVERSATION_ID, auth)

Common Errors & Debugging

Error: 403 Forbidden on Attachment URL

  • Cause: The attachment URL is a temporary, pre-signed URL. It expires after a short period (usually 15-30 minutes) or after a single use.
  • Fix: Ensure you download the file immediately after receiving the webhook or polling the conversation. Do not store the URL for later use. If the URL has expired, you must re-fetch the conversation details to get a new URL.

Error: 429 Too Many Requests

  • Cause: You are polling the conversation endpoint too frequently or processing many files in parallel without respecting rate limits.
  • Fix: Implement exponential backoff.
    import time
    
    def fetch_with_retry(url, headers, max_retries=3):
        for i in range(max_retries):
            resp = httpx.get(url, headers=headers)
            if resp.status_code != 429:
                return resp
            wait_time = 2 ** i
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)
        return resp
    

Error: magic.from_buffer returns application/octet-stream

  • Cause: The file is encrypted, compressed, or has an unrecognized signature.
  • Fix: If you expect a PDF, and magic returns octet-stream, reject the file. It is likely not a valid PDF. Do not attempt to force-interpret it.

Error: MIME Type Mismatch Between Client and Server

  • Cause: The browser reported image/png but the file content is image/jpeg.
  • Fix: This is a client-side error or a spoofing attempt. Reject the file. Do not trust the contentType from the Genesys API payload alone.

Official References