Handling File Attachments in Genesys Cloud Web Messaging: MIME Validation and Size Limits
What You Will Build
- You will build a server-side middleware component that validates incoming file attachments from Genesys Cloud Web Messaging before processing them.
- This tutorial uses the Genesys Cloud
api/v2/conversations/messaging/participantsendpoint and theapi/v2/analytics/conversations/details/queryendpoint to retrieve attachment metadata. - The code examples are provided in Python using the official
genesyscloudSDK and rawrequestsfor granular control.
Prerequisites
- OAuth Client Type: Service Account or Confidential Client with
webchat:participantandwebchat:conversation:readscopes. - SDK Version:
genesyscloudPython SDK v2.15.0 or later. - Runtime: Python 3.9+ with
pip. - Dependencies:
pip install genesyscloud httpx python-multipart - Web Messaging Configuration: A Genesys Cloud Web Messaging widget configured to allow file attachments in the Admin Console (Experience > Web Messaging > Settings > Attachments).
Authentication Setup
Genesys Cloud APIs require OAuth 2.0 Bearer tokens. For server-side integration, the Client Credentials flow is the standard pattern. You must cache the token and refresh it before expiration to avoid 401 Unauthorized errors during long-running file processing jobs.
The following Python snippet demonstrates a robust token retrieval mechanism using httpx. This replaces the internal SDK token caching for demonstration purposes, allowing you to see the exact HTTP flow.
import httpx
import time
from typing import Optional, Dict
class GenesysAuth:
def __init__(self, client_id: str, client_secret: str, org_id: str):
self.client_id = client_id
self.client_secret = client_secret
self.org_id = org_id
self.token_url = f"https://{org_id}.mypurecloud.com/oauth/token"
self.access_token: Optional[str] = None
self.token_expiry: float = 0
def get_token(self) -> str:
# Return cached token if not expired
if self.access_token and time.time() < self.token_expiry:
return self.access_token
# Request new token
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
data = {
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret
}
response = httpx.post(self.token_url, headers=headers, data=data)
response.raise_for_status()
token_data = response.json()
self.access_token = token_data["access_token"]
# Cache for slightly less than the actual expiry to avoid edge-case 401s
self.token_expiry = time.time() + (token_data["expires_in"] - 10)
return self.access_token
def get_headers(self) -> Dict[str, str]:
return {
"Authorization": f"Bearer {self.get_token()}",
"Content-Type": "application/json"
}
Required Scope: webchat:participant is required to read participant details and attachment metadata.
Implementation
Step 1: Retrieving Conversation and Attachment Metadata
When a customer uploads a file in Web Messaging, Genesys Cloud does not send the file content directly to your webhook immediately. Instead, it creates a message event with an attachment object. This object contains metadata: fileName, contentType (MIME type), size (bytes), and a url pointing to the temporary storage of the file.
You must retrieve this metadata to validate the file before downloading it.
import httpx
import json
class GenesysMessagingHandler:
def __init__(self, auth: GenesysAuth):
self.auth = auth
self.base_url = f"https://{auth.org_id}.mypurecloud.com/api/v2"
def get_conversation_details(self, conversation_id: str) -> dict:
"""
Retrieves detailed conversation history including attachment metadata.
"""
url = f"{self.base_url}/conversations/messaging/{conversation_id}"
try:
response = httpx.get(url, headers=self.auth.get_headers())
response.raise_for_status()
return response.json()
except httpx.HTTPStatusError as e:
if e.response.status_code == 401:
# Token expired, attempt refresh
self.auth.access_token = None
return self.get_conversation_details(conversation_id)
raise
def extract_attachments(self, conversation_data: dict) -> list:
"""
Parses the conversation object to find all attachments.
"""
attachments = []
participants = conversation_data.get("participants", [])
for participant in participants:
# Filter for customer-side participants only if needed
if participant.get("direction") == "outbound":
continue
messages = participant.get("messages", [])
for msg in messages:
if msg.get("type") == "attachment":
attachments.append(msg.get("attachment", {}))
return attachments
Expected Response Structure:
The attachment object within a message looks like this:
{
"id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"fileName": "invoice_final.pdf",
"contentType": "application/pdf",
"size": 245890,
"url": "https://<org-id>.mypurecloud.com/api/v2/conversations/messaging/attachments/a1b2c3d4...",
"thumbUrl": "https://<org-id>.mypurecloud.com/api/v2/conversations/messaging/attachments/a1b2c3d4.../thumb"
}
Step 2: Validating MIME Types and Size Limits
Genesys Cloud enforces server-side limits (default max file size is often 5MB or 10MB depending on configuration, max MIME types are configurable in Admin Console). However, your application logic should enforce stricter business rules. For example, you may only accept .pdf and .jpg for security reasons, or limit size to 2MB for your internal storage.
This step implements the validation logic.
import mimetypes
import os
class FileValidator:
# Define allowed MIME types and their extensions
ALLOWED_MIMES = {
"application/pdf": [".pdf"],
"image/jpeg": [".jpg", ".jpeg"],
"image/png": [".png"],
"text/plain": [".txt"]
}
MAX_FILE_SIZE = 2 * 1024 * 1024 # 2 MB in bytes
@classmethod
def validate_attachment(cls, attachment: dict) -> tuple[bool, str]:
"""
Validates an attachment object against business rules.
Returns (is_valid, error_message).
"""
file_name = attachment.get("fileName", "")
content_type = attachment.get("contentType", "")
file_size = attachment.get("size", 0)
# 1. Validate File Size
if file_size > cls.MAX_FILE_SIZE:
return False, f"File size {file_size} bytes exceeds limit of {cls.MAX_FILE_SIZE} bytes."
# 2. Validate MIME Type
if content_type not in cls.ALLOWED_MIMES:
return False, f"MIME type '{content_type}' is not allowed. Allowed: {list(cls.ALLOWED_MIMES.keys())}."
# 3. Validate File Extension matches MIME Type (Basic Spoofing Check)
_, ext = os.path.splitext(file_name)
ext = ext.lower()
if ext not in cls.ALLOWED_MIMES.get(content_type, []):
return False, f"File extension '{ext}' does not match MIME type '{content_type}'."
return True, "Valid"
Critical Note on MIME Spoofing:
A user can rename a .exe file to .pdf. The Genesys Cloud API reports the contentType provided by the client browser. The fileName is also provided by the client. Never trust these fields alone for security. Always validate the actual file content after download. The code above checks consistency between name and MIME, but Step 3 handles the actual content verification.
Step 3: Downloading and Verifying File Content
Once metadata passes validation, download the file using the url provided in the attachment object. This URL is a temporary, signed URL that expires. You must download the file immediately upon receipt of the event.
After downloading, verify the actual magic bytes (file signature) to ensure the content matches the expected MIME type.
import magic # python-magic library for file signature detection
import io
class FileProcessor:
def __init__(self, auth: GenesysAuth):
self.auth = auth
def download_and_verify(self, attachment: dict) -> tuple[bytes, bool, str]:
"""
Downloads the file and verifies its actual content signature.
Returns (file_bytes, is_valid, error_message).
"""
attachment_url = attachment.get("url")
expected_mime = attachment.get("contentType")
if not attachment_url:
return b"", False, "Missing attachment URL."
try:
# Download the file
# Note: The attachment URL is pre-signed, so no auth header is needed for the download itself
response = httpx.get(attachment_url, timeout=30.0)
response.raise_for_status()
file_bytes = response.content
# Verify Magic Bytes
detected_mime = magic.from_buffer(file_bytes, mime=True)
if detected_mime != expected_mime:
return b"", False, f"MIME mismatch. Expected {expected_mime}, detected {detected_mime}."
return file_bytes, True, "Downloaded and verified."
except httpx.RequestError as e:
return b"", False, f"Download failed: {str(e)}"
except Exception as e:
return b"", False, f"Verification failed: {str(e)}"
Dependencies:
You need python-magic installed. On Linux, ensure libmagic is installed (sudo apt-get install libmagic1). On macOS, brew install libmagic.
Complete Working Example
This script ties together authentication, retrieval, validation, and download. It assumes you have a webhook listener or a poller that provides the conversation_id.
import os
import sys
import httpx
import time
import magic
from typing import Dict, Optional, Tuple
# --- Authentication Module ---
class GenesysAuth:
def __init__(self, client_id: str, client_secret: str, org_id: str):
self.client_id = client_id
self.client_secret = client_secret
self.org_id = org_id
self.token_url = f"https://{org_id}.mypurecloud.com/oauth/token"
self.access_token: Optional[str] = None
self.token_expiry: float = 0
def get_token(self) -> str:
if self.access_token and time.time() < self.token_expiry:
return self.access_token
headers = {"Content-Type": "application/x-www-form-urlencoded"}
data = {
"grant_type": "client_credentials",
"client_id": self.client_id,
"client_secret": self.client_secret
}
response = httpx.post(self.token_url, headers=headers, data=data)
response.raise_for_status()
token_data = response.json()
self.access_token = token_data["access_token"]
self.token_expiry = time.time() + (token_data["expires_in"] - 10)
return self.access_token
def get_headers(self) -> Dict[str, str]:
return {
"Authorization": f"Bearer {self.get_token()}",
"Content-Type": "application/json"
}
# --- Validation Module ---
class FileValidator:
ALLOWED_MIMES = {
"application/pdf": [".pdf"],
"image/jpeg": [".jpg", ".jpeg"],
"image/png": [".png"]
}
MAX_FILE_SIZE = 5 * 1024 * 1024 # 5 MB
@classmethod
def validate_metadata(cls, attachment: dict) -> Tuple[bool, str]:
file_name = attachment.get("fileName", "")
content_type = attachment.get("contentType", "")
file_size = attachment.get("size", 0)
if file_size > cls.MAX_FILE_SIZE:
return False, f"Size limit exceeded: {file_size} > {cls.MAX_FILE_SIZE}"
if content_type not in cls.ALLOWED_MIMES:
return False, f"Blocked MIME type: {content_type}"
_, ext = os.path.splitext(file_name)
ext = ext.lower()
if ext not in cls.ALLOWED_MIMES.get(content_type, []):
return False, f"Extension {ext} mismatch for {content_type}"
return True, "OK"
# --- Main Handler ---
def process_attachment(conversation_id: str, auth: GenesysAuth) -> None:
base_url = f"https://{auth.org_id}.mypurecloud.com/api/v2"
url = f"{base_url}/conversations/messaging/{conversation_id}"
# 1. Fetch Conversation
try:
resp = httpx.get(url, headers=auth.get_headers())
resp.raise_for_status()
conv_data = resp.json()
except httpx.HTTPStatusError as e:
print(f"Failed to fetch conversation: {e.response.status_code}")
return
# 2. Extract Attachments
attachments = []
for participant in conv_data.get("participants", []):
for msg in participant.get("messages", []):
if msg.get("type") == "attachment":
attachments.append(msg["attachment"])
if not attachments:
print("No attachments found in this conversation.")
return
# 3. Process Each Attachment
for att in attachments:
print(f"Processing: {att['fileName']}")
# Validate Metadata
is_valid, msg = FileValidator.validate_metadata(att)
if not is_valid:
print(f" [REJECTED] {msg}")
continue
# Download and Verify Content
try:
file_resp = httpx.get(att["url"], timeout=30.0)
file_resp.raise_for_status()
file_bytes = file_resp.content
# Check Magic Bytes
detected_mime = magic.from_buffer(file_bytes, mime=True)
expected_mime = att["contentType"]
if detected_mime == expected_mime:
print(f" [SUCCESS] Verified. Saving file...")
# Save to disk
with open(att["fileName"], "wb") as f:
f.write(file_bytes)
else:
print(f" [REJECTED] MIME spoofing detected. Expected {expected_mime}, got {detected_mime}")
except Exception as e:
print(f" [ERROR] {str(e)}")
if __name__ == "__main__":
# Configuration
CLIENT_ID = os.getenv("GENESYS_CLIENT_ID")
CLIENT_SECRET = os.getenv("GENESYS_CLIENT_SECRET")
ORG_ID = os.getenv("GENESYS_ORG_ID")
CONVERSATION_ID = os.getenv("TEST_CONVERSATION_ID")
if not all([CLIENT_ID, CLIENT_SECRET, ORG_ID, CONVERSATION_ID]):
print("Missing environment variables.")
sys.exit(1)
auth = GenesysAuth(CLIENT_ID, CLIENT_SECRET, ORG_ID)
process_attachment(CONVERSATION_ID, auth)
Common Errors & Debugging
Error: 403 Forbidden on Attachment URL
- Cause: The attachment URL is a temporary, pre-signed URL. It expires after a short period (usually 15-30 minutes) or after a single use.
- Fix: Ensure you download the file immediately after receiving the webhook or polling the conversation. Do not store the URL for later use. If the URL has expired, you must re-fetch the conversation details to get a new URL.
Error: 429 Too Many Requests
- Cause: You are polling the conversation endpoint too frequently or processing many files in parallel without respecting rate limits.
- Fix: Implement exponential backoff.
import time def fetch_with_retry(url, headers, max_retries=3): for i in range(max_retries): resp = httpx.get(url, headers=headers) if resp.status_code != 429: return resp wait_time = 2 ** i print(f"Rate limited. Waiting {wait_time}s...") time.sleep(wait_time) return resp
Error: magic.from_buffer returns application/octet-stream
- Cause: The file is encrypted, compressed, or has an unrecognized signature.
- Fix: If you expect a PDF, and
magicreturnsoctet-stream, reject the file. It is likely not a valid PDF. Do not attempt to force-interpret it.
Error: MIME Type Mismatch Between Client and Server
- Cause: The browser reported
image/pngbut the file content isimage/jpeg. - Fix: This is a client-side error or a spoofing attempt. Reject the file. Do not trust the
contentTypefrom the Genesys API payload alone.