Handle Customer File Uploads in Genesys Cloud Web Messaging: MIME Validation and Size Limits
What You Will Build
- A server-side middleware component that intercepts incoming file attachments from Genesys Cloud Web Messaging channels.
- Validation logic that enforces strict MIME type allowlists and file size limits before storing the file in persistent storage.
- A Python implementation using the
requestslibrary to download the file from the temporary URL provided in the webhook payload.
Prerequisites
- Genesys Cloud Environment: An active Genesys Cloud organization with Web Messaging enabled.
- Webhook Configuration: An outbound webhook configured in Genesys Cloud for the
conversation:messageevent or specifically for file attachments. - Python Runtime: Python 3.8 or higher.
- Dependencies:
requests: For HTTP operations.python-magic: For robust MIME type detection (requireslibmagicsystem library).fastapiorflask: For hosting the webhook endpoint (this tutorial uses a generic HTTP handler structure compatible with both).
- Required OAuth Scopes: None for the webhook receiver, as the webhook is pushed from Genesys. However, if you need to reply to the conversation, you will need
conversation:message:writeorconversation:write.
Authentication Setup
This tutorial focuses on the webhook ingestion side of the equation. Genesys Cloud pushes data to your server; your server does not need to authenticate to Genesys to receive the webhook.
However, to verify the source of the webhook and prevent spoofing, you must validate the request. Genesys Cloud webhooks do not include a standard HMAC signature by default in the header for all event types, but you can secure your endpoint using one of two methods:
- IP Whitelisting: Restrict your endpoint to only accept traffic from Genesys Cloud IP ranges.
- Secret Token in Header: Configure a custom header in your Genesys Cloud webhook definition that contains a shared secret, and validate this header in your code.
For this tutorial, we assume a shared secret approach for simplicity and security.
import os
import hmac
import hashlib
SHARED_SECRET = os.environ.get("GENESYS_WEBHOOK_SECRET", "your-secret-key")
def verify_webhook_source(headers: dict, payload_body: bytes) -> bool:
"""
Validates that the request originated from Genesys Cloud.
Note: Genesys Cloud standard webhooks do not sign the body by default.
This function checks for a custom header 'X-Genesys-Secret' if configured.
"""
# If you configured a custom header in the webhook definition:
provided_secret = headers.get("X-Genesys-Secret")
if not provided_secret:
# Fallback: If no header, you might rely on IP whitelisting at the firewall level.
# For this code example, we will proceed but log a warning.
print("Warning: No security header found. Ensure IP whitelisting is enabled.")
return True
if provided_secret != SHARED_SECRET:
return False
return True
Implementation
Step 1: Parse the Webhook Payload and Extract File Metadata
When a customer uploads a file in Web Messaging, Genesys Cloud sends a webhook event. The payload contains a temporary URL to the file and metadata about the file.
The relevant part of the JSON payload looks like this:
{
"event": "conversation:message",
"data": {
"id": "conversation-id-message-id",
"type": "message",
"channel_id": "channel-id",
"text": "Please see the attached document.",
"attachments": [
{
"id": "attachment-id",
"name": "invoice.pdf",
"mimeType": "application/pdf",
"size": 124500,
"url": "https://files.us-east-1.genesiscloud.com/...temporary-url...",
"expiresAt": "2023-10-27T12:00:00.000Z"
}
]
}
}
Critical Note: The url provided is temporary and expiring. You must download the file immediately upon receiving the webhook. Do not store the URL for later use.
Here is the code to parse the incoming request and extract the attachment details.
import json
from typing import Dict, List, Optional
class AttachmentData:
def __init__(self, name: str, mime_type: str, size: int, url: str, attachment_id: str):
self.name = name
self.mime_type = mime_type
self.size = size
self.url = url
self.attachment_id = attachment_id
def extract_attachments(payload: dict) -> List[AttachmentData]:
"""
Extracts attachment objects from the Genesys Cloud webhook payload.
"""
data = payload.get("data", {})
attachments_list = data.get("attachments", [])
parsed_attachments = []
for att in attachments_list:
parsed_attachments.append(AttachmentData(
name=att.get("name", "unknown"),
mime_type=att.get("mimeType", "application/octet-stream"),
size=att.get("size", 0),
url=att.get("url", ""),
attachment_id=att.get("id", "")
))
return parsed_attachments
Step 2: Validate MIME Type and File Size
This is the core security and compliance step. You must define an allowlist of acceptable MIME types and a maximum file size.
Genesys Cloud Web Messaging supports a wide range of MIME types, but your application may only need specific ones (e.g., PDFs for invoices, images for ID verification). Relying on the mimeType field in the webhook payload is insufficient because it can be spoofed by the client. You must verify the actual content of the file after downloading it.
We will use the magic library to detect the MIME type from the file bytes.
import magic
import os
# Configuration
ALLOWED_MIME_TYPES = {
"application/pdf",
"image/png",
"image/jpeg",
"image/gif",
"text/plain"
}
MAX_FILE_SIZE_BYTES = 5 * 1024 * 1024 # 5 MB
def validate_file_content(file_bytes: bytes, declared_mime_type: str) -> tuple[bool, str]:
"""
Validates the downloaded file bytes against security policies.
Returns:
tuple: (is_valid, error_message)
"""
# 1. Check Size
if len(file_bytes) > MAX_FILE_SIZE_BYTES:
return False, f"File size {len(file_bytes)} exceeds limit of {MAX_FILE_SIZE_BYTES} bytes."
# 2. Detect Actual MIME Type
# magic.from_buffer analyzes the binary signature of the file
detected_mime = magic.from_buffer(file_bytes, mime=True)
# 3. Verify against Allowlist
if detected_mime not in ALLOWED_MIME_TYPES:
return False, f"File type '{detected_mime}' is not allowed. Allowed types: {', '.join(ALLOWED_MIME_TYPES)}."
# 4. (Optional) Verify consistency with declared type
# It is common for clients to mislabel files.
# If strict consistency is required, uncomment the following:
# if detected_mime != declared_mime_type:
# return False, f"MIME mismatch: declared '{declared_mime_type}' but detected '{detected_mime}'."
return True, "Valid"
Step 3: Download and Store the File
Now we combine the parsing, validation, and storage logic. We will use requests to stream the download to handle potentially large files efficiently, although Web Messaging limits are usually small.
import requests
import logging
logger = logging.getLogger(__name__)
def process_attachment(attachment: AttachmentData) -> dict:
"""
Downloads the attachment, validates it, and returns status.
In a real scenario, you would save file_bytes to S3, Azure Blob, or local disk here.
"""
# 1. Download the file
try:
# Genesys temporary URLs are typically accessible without auth
# but are short-lived.
response = requests.get(attachment.url, timeout=10)
response.raise_for_status()
file_bytes = response.content
except requests.exceptions.RequestException as e:
logger.error(f"Failed to download attachment {attachment.id}: {e}")
return {"status": "error", "message": "Download failed", "attachment_id": attachment.id}
# 2. Validate the file
is_valid, validation_message = validate_file_content(file_bytes, attachment.mime_type)
if not is_valid:
logger.warning(f"Attachment {attachment.id} rejected: {validation_message}")
# Optional: Send a message back to the customer informing them of the rejection
# send_rejection_message(channel_id, validation_message)
return {"status": "rejected", "message": validation_message, "attachment_id": attachment.id}
# 3. Save the file (Placeholder)
# save_to_storage(file_bytes, attachment.name, attachment.mime_type)
logger.info(f"Successfully processed and saved attachment {attachment.id}: {attachment.name}")
return {"status": "success", "message": "File processed", "attachment_id": attachment.id, "size": len(file_bytes)}
def handle_webhook(payload: dict) -> List[dict]:
"""
Main handler for the webhook.
"""
attachments = extract_attachments(payload)
results = []
for att in attachments:
result = process_attachment(att)
results.append(result)
return results
Complete Working Example
This is a complete, runnable FastAPI application that exposes a webhook endpoint. It includes the validation logic, download mechanism, and error handling.
Prerequisites:
pip install fastapi uvicorn requests python-magic
import os
import logging
from typing import List
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse
import requests
import magic
import json
# Configure Logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
app = FastAPI(title="Genesys Web Messaging File Handler")
# Configuration
ALLOWED_MIME_TYPES = {
"application/pdf",
"image/png",
"image/jpeg",
"image/gif"
}
MAX_FILE_SIZE_BYTES = 5 * 1024 * 1024 # 5 MB
WEBHOOK_SECRET = os.environ.get("GENESYS_WEBHOOK_SECRET", "dev-secret")
class AttachmentData:
def __init__(self, name: str, mime_type: str, size: int, url: str, attachment_id: str):
self.name = name
self.mime_type = mime_type
self.size = size
self.url = url
self.attachment_id = attachment_id
def extract_attachments(payload: dict) -> List[AttachmentData]:
data = payload.get("data", {})
attachments_list = data.get("attachments", [])
parsed_attachments = []
for att in attachments_list:
parsed_attachments.append(AttachmentData(
name=att.get("name", "unknown"),
mime_type=att.get("mimeType", "application/octet-stream"),
size=att.get("size", 0),
url=att.get("url", ""),
attachment_id=att.get("id", "")
))
return parsed_attachments
def validate_file_content(file_bytes: bytes, declared_mime_type: str) -> tuple[bool, str]:
if len(file_bytes) > MAX_FILE_SIZE_BYTES:
return False, f"File size {len(file_bytes)} exceeds limit of {MAX_FILE_SIZE_BYTES} bytes."
detected_mime = magic.from_buffer(file_bytes, mime=True)
if detected_mime not in ALLOWED_MIME_TYPES:
return False, f"File type '{detected_mime}' is not allowed."
return True, "Valid"
def process_attachment(attachment: AttachmentData) -> dict:
try:
response = requests.get(attachment.url, timeout=10)
response.raise_for_status()
file_bytes = response.content
except requests.exceptions.RequestException as e:
logger.error(f"Failed to download attachment {attachment.id}: {e}")
return {"status": "error", "message": "Download failed", "attachment_id": attachment.id}
is_valid, validation_message = validate_file_content(file_bytes, attachment.mime_type)
if not is_valid:
logger.warning(f"Attachment {attachment.id} rejected: {validation_message}")
return {"status": "rejected", "message": validation_message, "attachment_id": attachment.id}
# Simulate saving to storage
logger.info(f"Saved {attachment.name} ({len(file_bytes)} bytes)")
return {"status": "success", "message": "File processed", "attachment_id": attachment.id}
@app.post("/webhook/genesys/messaging")
async def webhook_receiver(request: Request):
# 1. Verify Security Header
secret = request.headers.get("X-Genesys-Secret")
if secret != WEBHOOK_SECRET:
raise HTTPException(status_code=403, detail="Forbidden: Invalid Secret")
# 2. Parse Payload
try:
body = await request.json()
except json.JSONDecodeError:
raise HTTPException(status_code=400, detail="Invalid JSON")
# 3. Process Attachments
attachments = extract_attachments(body)
if not attachments:
return JSONResponse(content={"message": "No attachments found"}, status_code=200)
results = []
for att in attachments:
result = process_attachment(att)
results.append(result)
return JSONResponse(content={"results": results}, status_code=200)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Common Errors & Debugging
Error: 403 Forbidden on File Download
What causes it: The temporary URL provided in the webhook payload has expired or is being accessed from an IP address that is not whitelisted (if you have strict network policies).
How to fix it:
- Ensure your webhook handler processes the event immediately. Do not queue the webhook for hours.
- Check the
expiresAtfield in the webhook payload. If your system is slow, you may need to implement a retry mechanism that requests the file again if the first attempt fails, though Genesys does not provide a refresh token for these URLs. - Verify that your server can reach
files.us-east-1.genesiscloud.com(or your region’s equivalent).
Error: magic.from_buffer throws MagicException
What causes it: The libmagic system library is not installed on the server.
How to fix it:
- On Ubuntu/Debian:
sudo apt-get install libmagic1 - On CentOS/RHEL:
sudo yum install file-libs - On macOS:
brew install libmagic
Error: File Size Mismatch
What causes it: The size field in the webhook payload does not match the actual content length of the downloaded file.
How to fix it: This is usually a benign discrepancy caused by compression or metadata. However, if the difference is significant, verify that the Content-Length header in the requests.get response matches the downloaded bytes. If the file is truncated, increase the timeout parameter in requests.get.
Error: 429 Too Many Requests
What causes it: You are downloading files too rapidly from the Genesys file storage endpoints.
How to fix it: Implement exponential backoff in your requests.get call.
import time
def download_with_retry(url: str, max_retries: int = 3) -> bytes:
for attempt in range(max_retries):
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
return response.content
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429:
wait_time = 2 ** attempt
logger.warning(f"Rate limited. Waiting {wait_time} seconds before retry {attempt + 1}")
time.sleep(wait_time)
else:
raise
raise Exception("Max retries exceeded for file download")