EventBridge batch size blowing up Lambda concurrency for Genesys Cloud interactions

Hey folks,

Running into a wall with our EventBridge consumer for Genesys Cloud interaction events. We’re seeing massive spikes in Lambda concurrency during peak hours, and the batch window is apparently too aggressive.

Our current setup triggers on the genesyscloud:interaction:created event. The default batch size is 100, but we’re seeing batches come in way larger than that during traffic surges. When the Lambda times out (we’re at the 15 min limit), the records fail and get retried, which just piles up more concurrency.

Here’s the rough skeleton of our handler:

def lambda_handler(event, context):
 records = event.get('records', [])
 failed_count = 0
 
 for record in records:
 try:
 process_interaction(record['body'])
 except Exception as e:
 failed_count += 1
 logging.error(f"Failed processing: {e}")
 
 if failed_count > 0:
 raise Exception(f"{failed_count} records failed")

I’ve tried adjusting the batch size in the EventBridge destination settings, but it seems like Genesys Cloud sends bursts that ignore the configured max batch size if the throughput is high enough. The documentation mentions using the maxBatchingWindowInSeconds, but I’m not sure if that applies to the outbound Genesys side or just the consumer.

Is there a way to force smaller batch sizes from the Genesys Cloud webhook destination config? Or should I be handling the partial batch response differently in the Lambda to prevent the retry storm? Right now the retries are just making the concurrency problem worse.

Any ideas on how to tune this without rewriting the whole consumer?