EventBridge Lambda concurrency spikes on high-volume interactions

Dropping a snippet. Our Lambda consumer for InteractionCreated events is hitting the reserved concurrency limit during peak hours. The payload looks fine, but we’re getting throttled errors.

{"error": "Service Unavailable", "message": "Handler throttled"}

We’ve tried increasing the batch size in the EventBridge rule, but it’s not helping. Is there a way to batch these events in the Lambda handler itself before processing? Or should we be using SQS as a buffer? Current handler processes one event at a time.

You’re fighting the wrong end of the stick. Increasing the EventBridge batch size just piles more pressure on a Lambda that’s already choking. The real fix is to stop processing heavy logic in the synchronous handler and push the work to an async queue. Genesys Cloud fires InteractionCreated events rapidly during peaks, and if your Lambda takes more than a few hundred milliseconds to respond, EventBridge retries the same batch, creating a thundering herd.

Move the payload to an SQS queue with a FIFO configuration to preserve order if needed, then have a separate, scalable consumer process the actual business logic. Here’s how you structure the immediate handler to just acknowledge and queue:

import json
import boto3
from botocore.exceptions import ClientError

sqs = boto3.client('sqs')
QUEUE_URL = 'https://sqs.ap-southeast-2.amazonaws.com/123456789012/gc-interaction-queue'

def lambda_handler(event, context):
 # Iterate through the EventBridge batch
 for record in event.get('detail', []):
 try:
 sqs.send_message(
 QueueUrl=QUEUE_URL,
 MessageBody=json.dumps(record),
 MessageGroupId='interaction-group' # Required for FIFO
 )
 except ClientError as e:
 # Log and continue, don't fail the whole batch
 print(f"Failed to send to SQS: {e}")
 
 return {
 'statusCode': 200,
 'body': json.dumps('Processed')
 }

Make sure your Lambda timeout is set low, like 1-2 seconds. If it hangs, EventBridge thinks it failed. Don’t try to validate complex schema in this handler. Just pass it through. The SQS visibility timeout should be higher than your max processing time in the downstream consumer. This decouples the ingestion rate from the processing rate. If you still see spikes, check if you’re missing the Authorization header validation in the Genesys Cloud webhook config, which causes silent drops and retries.