We have a critical workflow that triggers a webhook whenever a specific custom data attribute is updated on a contact. The endpoint is hosted on an AWS ALB with a Lambda function behind it.
The issue is that during high load, the Lambda gets throttled or the ALB returns a 502 Bad Gateway. CXone marks these as failed and retries, but eventually, they stop retrying after the max attempts. We lose these events, which breaks our downstream billing logic.
I want to implement a dead letter queue (DLQ) pattern. The idea is to have the webhook endpoint always return a 200 OK to CXone immediately to acknowledge receipt, then push the payload to an SQS queue. A separate consumer process reads from SQS and processes the data. If the consumer fails, it goes to a DLQ SQS for manual inspection or retry.
Here is the basic flow I’m trying to set up in the Lambda (Node.js 18):
exports.handler = async (event) => {
const payload = JSON.parse(event.body);
try {
await sqs.sendMessage({
QueueUrl: process.env.SQS_QUEUE_URL,
MessageBody: JSON.stringify(payload)
}).promise();
return {
statusCode: 200,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ status: 'accepted' })
};
} catch (error) {
console.error('Failed to push to SQS:', error);
// Should I return 500 here? Or still 200 and log?
return {
statusCode: 500,
body: JSON.stringify({ error: 'Internal Server Error' })
};
}
};
My concern is the idempotency. CXone might retry the webhook if it gets a timeout or a non-2xx status. If I return 200, CXone stops retrying. That’s good. But if the SQS send fails inside the Lambda, I return 500. CXone will retry. If the SQS send fails again, CXone retries again. This creates a loop of failures.
Should I always return 200 and log the SQS failure locally? Or is there a better way to handle this in the webhook configuration? Also, does CXone support idempotency keys in the webhook payload to prevent duplicate processing if we do get duplicates from retries?