We are seeing a consistent pattern of 502 Bad Gateway errors when Genesys Cloud attempts to deliver outbound conversation events to our internal webhook endpoint. The endpoint is a simple Flask app hosted on AWS ECS, and while the application logic is sound, the load balancer occasionally drops the connection during peak hours, causing the platform to mark the delivery as failed.
The current setup relies on Genesys Cloud’s built-in retry mechanism, which attempts delivery up to five times with exponential backoff. However, if all retries fail, the event is dropped, and we lose critical data for our downstream analytics pipeline. We need a way to capture these failed events for manual reprocessing or to trigger a separate alert.
We’ve tried implementing a local retry queue in our Flask app, but the issue is that the webhook never reaches the app if the load balancer fails. We need a solution that sits between Genesys Cloud and our application, capable of acknowledging the webhook receipt immediately to prevent Genesys from retrying, while storing the payload for later processing.
Here is the current webhook configuration JSON we are using via the Terraform CX as Code provider:
resource "genesyscloud_webhook" "outbound_events" {
name = "Outbound Event Collector"
uri = "https://api.internal.company.com/webhooks/genesys"
request_headers = {
"Content-Type" = "application/json"
"X-API-Key" = var.webhook_api_key
}
event_type = "conversation:all"
delivery_config {
retry_count = 5
retry_interval = 60
}
}
Is there a recommended pattern for implementing a dead letter queue in this scenario? We are considering using AWS SQS as an intermediate buffer, but we are unsure how to configure the webhook to send directly to SQS, as Genesys Cloud expects an HTTP 200 response from a standard endpoint. Can we use a Lambda function as the webhook target to handle the SQS push, and how do we manage the acknowledgment logic to ensure Genesys Cloud does not retry successfully processed events?