We have a Node.js service consuming Genesys Cloud webhooks for conversation analytics. The endpoint is registered via the UI. When the service is healthy, it logs the event and returns 200 OK. The problem happens during brief network hiccups or when our downstream database is slow. The Genesys Cloud platform retries the webhook, but if our server is down for more than a few minutes, the event is lost. We want to implement a Dead Letter Queue (DLQ) pattern to catch these failures.
I tried adding a try-catch block around the database insert. If it fails, I push the payload to an SQS queue. The issue is that Genesys Cloud expects an immediate 2xx response. If I push to SQS and then return 200, the platform assumes success. If the SQS push fails, I return 500, which triggers a retry. This seems correct, but I am seeing duplicate events in my analytics because the platform retries on any 5xx, and my SQS push is sometimes flaky.
Here is the simplified handler:
app.post('/webhook/genesys', async (req, res) => {
try {
const payload = req.body;
await db.saveEvent(payload); // This can timeout
res.status(200).send('OK');
} catch (err) {
await sqs.sendToDeadLetterQueue(payload);
res.status(500).send('Processing failed');
}
});
The logic feels wrong. If sqs.sendToDeadLetterQueue fails, I lose the event completely because I already returned 500 to Genesys. The platform will retry, but my local state is messy.
- Node.js 18 Express server
- AWS SQS for DLQ
- Genesys Cloud Webhook v2
- Postgres for primary storage
Is there a standard way to handle this? Should I buffer the requests in memory and process them asynchronously? I don’t want to block the webhook response. The documentation says to return 200 as soon as possible. But how do I ensure durability? I’m stuck on the error handling strategy.