We’ve got a high-volume environment pushing interaction events to EventBridge. The goal is to process these in a Lambda function to update an external CRM. Everything works fine during quiet hours, but as soon as call volume spikes, we hit the Lambda concurrency limits. The function times out, events drop into the dead-letter queue, and we lose data.
I’m trying to figure out the best way to handle this without just increasing the concurrency limit to something ridiculous. Here’s the current handler structure:
import json
import boto3
def lambda_handler(event, context):
# Log the event for debugging
print(json.dumps(event))
# Process each record
for record in event['records']:
body = json.loads(record['data'])
try:
# Call external API
update_crm(body)
except Exception as e:
print(f"Error processing record: {e}")
# Need to handle partial failures here
The issue is that when 500 events come in at once, the Lambda function gets stuck. I’ve tried batching, but the EventBridge put_events API has a 10 MB limit per call, and we’re hitting that fast.
Is there a way to dynamically scale the Lambda concurrency based on the EventBridge queue depth? Or should I be using SQS as a buffer between EventBridge and Lambda? I don’t want to add latency, but data loss is worse.
Any code examples for handling this pattern? I’m stuck on the architecture side of the Lambda config.