We’ve got a Rust service listening to /api/v2/analytics/events and pushing to a custom webhook handler. Genesys Cloud keeps dropping a 502 when our downstream queue spikes. The webhook config is set to retry three times. Events just vanish after that limit hits. I need to spin up a dead letter queue pattern in Tokio to catch the failed payloads before GC marks them as permanently delivered.
Also, should I be catching the 5xx at the webhook listener level or pushing the retry logic into the subscription settings via the PATCH /api/v2/notification/eventsubscriptions endpoint? The payload GC sends doesn’t include a delivery attempt counter. Tracking retries client-side feels messy. Right now the handler looks something like this:
async fn handle_webhook(payload: JsonValue) -> Result<(), AppError> {
let batch = parse_batch(payload)?;
for event in batch {
if let Err(e) = process_event(event).await {
tracing::warn!("failed to process: {}", e);
}
}
Ok(())
}
The problem is GC expects a 2xx within a two-second window. If the downstream Kafka producer backlogs, the thread blocks and we hit a 504 from our load balancer. GC sees that as a 5xx failure and stops retrying. I’ve tried adding a bounded channel to offload the processing. The channel fills up faster than the consumer can drain it. Dropping messages isn’t an option here. We’ve got compliance requirements to keep every agent_state_changed and interaction_created event.
How are others structuring the dead letter queue in Rust for this exact flow? I’m looking at using tokio::sync::mpsc with a separate retry worker that reads from a Redis list. Worried about event ordering when we replay. The notification_api docs mention idempotency keys, but they’re only exposed on the subscription level, not per event. Anyone got a clean pattern for this? The deadline for the staging rollout is Thursday.