We’ve got a custom agent desktop app built on the embeddable SDK that listens to routing queue member status changes via a Genesys Cloud webhook. The endpoint sits behind Nginx and occasionally throws a 502 Bad Gateway when the upstream Node service restarts. Genesys keeps retrying for a bit, then marks the delivery as failed. I want to catch those 5xx responses, push the payload to a local SQS queue, and retry from there without blocking the webhook thread.
Right now the handler looks like this:
app.post('/webhooks/genesys/events', (req, res) => {
const payload = req.body;
if (payload.eventType === 'genesyscloud:routing:queue:memberstatus:updated') {
processMemberUpdate(payload);
}
res.status(200).send();
});
The problem is that processMemberUpdate sometimes blows up with a 503 from our internal screen pop service. I tried wrapping it in a try/catch and returning 200 anyway, but that loses the event if the downstream call actually fails. I need a way to acknowledge Genesys immediately, then queue the raw JSON for retry logic. I’m thinking of switching to a DLQ pattern where the webhook handler just writes to SQS and returns 200. The logic would go: grab req.headers['x-genesys-webhook-id'], attach it to the body, then run await sqs.sendMessage({ QueueUrl: dlqUrl, MessageBody: JSON.stringify(event) }).promise();. But I’m not sure how to handle duplicate events or preserve the original header for idempotency. The client app SDK doesn’t expose a retry interface, and I don’t want to manually parse the Genesys retry headers. Has anyone wired up a DLQ for this exact flow? The queue consumer would need to call our internal /screenpop endpoint with a backoff strategy. Not sure how to map the Genesys retry delay without hardcoding intervals. The Retry-After header usually comes in seconds, but SQS visibility timeout needs milliseconds. Just throwing the raw JSON into the queue feels messy.
Genesys Cloud won’t retry if you return anything other than 500-599, so your handler needs to throw immediately.
try { await processMessage(body); } catch (e) { console.error(e); throw new Error('Processing failed'); }
Don’t block the request with SQS logic.
The advice to throw immediately is only half the battle. If you just throw, Genesys sees a 500 and retries. That’s fine for a few seconds, but if your Node service is actually restarting or under load, those retries pile up and you get the exact 502/503 spikes you’re trying to avoid. You need to signal “I got it, but I’m not processing it yet” vs “I failed.”
The trick is using async/await correctly. You acknowledge the webhook instantly (200 OK) to stop Genesys retrying, then spawn the SQS push as a fire-and-forget background task. If that background task fails, that’s when you push to your local dead letter queue.
Here’s how that looks in an Express handler. Note the res.send(200) happens before the heavy lifting.
app.post('/genesys-webhook', async (req, res) => {
// 1. Acknowledge immediately. Genesys stops retrying.
res.status(200).send('Received');
try {
// 2. Fire off the SQS push. Don't await if you want true async,
// but you'll need to handle errors carefully since the response is already sent.
await sendToSQS(req.body);
} catch (error) {
// 3. If SQS fails, this is your actual DLQ logic.
// Log to file, push to a separate 'failed' SQS queue, or write to DB.
console.error('Failed to push to SQS, sending to DLQ:', error);
await pushToDLQ(req.body, error);
}
});
async function sendToSQS(payload) {
// Your AWS SDK logic here
// If this throws, it bubbles up to the catch above
}
If you await the SQS call, your HTTP thread is blocked until AWS responds. That’s slow. If you drop the await, you lose the error context unless you wrap it in a .catch(). Pick your poison, but don’t let the webhook handler do the work.
Also check your retry policy in Genesys. If you have “Immediate” retries set to 5, you’ll hammer that Nginx instance hard during a restart. Set the backoff to exponential.