What is the standard approach to process high-volume interaction events from EventBridge without hitting Lambda concurrency limits?
- We are currently ingesting interaction lifecycle events from Genesys Cloud via AWS EventBridge and routing them to an AWS Lambda function for real-time metric calculation in New Relic.
- During peak call center hours, our Lambda function consistently hits the account-level concurrency limit, resulting in dropped events and data gaps in our NRQL dashboards.
- The current implementation processes each event individually, which is inefficient given the bursty nature of Genesys Cloud webhooks.
- The error observed in CloudWatch logs is:
2023-10-27T14:35:12.456Z ERROR: Unhandled rejection. Error: Lambda concurrency limit exceeded. - We have attempted to increase the reserved concurrency, but this only delays the issue rather than solving the root cause of inefficient processing.
- The current Lambda handler code is as follows:
def lambda_handler(event, context):
for record in event['Records']:
payload = json.loads(record['body'])
interaction_id = payload['interactionId']
# Heavy processing logic here
calculate_metrics(interaction_id)
send_to_new_relic(interaction_id)
- We suspect that batching the events or using a queue-based architecture (such as SQS) might help, but we are unsure of the best practice for maintaining event order and ensuring at-least-once delivery.
- Specifically, we need to know if we should implement a batch window in the EventBridge rule to aggregate events before sending them to Lambda.
- Alternatively, should we offload the heavy processing to an SQS queue and have the Lambda function poll the queue with a larger batch size?
- We are looking for a code-level solution that balances throughput with cost efficiency, while ensuring no interaction events are lost during peak loads.