Optimizing Lambda concurrency for high-volume Genesys Cloud EventBridge consumers

Trying to understand the best practice for managing AWS Lambda concurrency limits when processing high-throughput interaction events from Genesys Cloud via EventBridge.

We are deploying a Terraform module that provisions the necessary infrastructure for real-time analytics. The current implementation uses a single Lambda function triggered by an EventBridge rule matching genesyscloud:interaction:*. During peak business hours, we observe significant throttling. The Lambda function is configured with a reserved concurrency of 100, yet the event rate spikes well beyond this threshold, causing dropped events.

The Terraform configuration for the trigger is standard:

resource "aws_cloudwatch_event_rule" "genesys_events" {
 name = "genesys-interaction-events"
 description = "Capture Genesys Cloud interaction events"
 event_pattern = jsonencode({
 source = ["genesyscloud"]
 detail-type = ["Interaction Event"]
 })
}

resource "aws_cloudwatch_event_target" "lambda_target" {
 rule = aws_cloudwatch_event_rule.genesys_events.name
 target_id = "ProcessInteraction"
 arn = aws_lambda_function.analytics_handler.arn
}

The Lambda function itself performs synchronous API calls to update a downstream data warehouse. The latency is approximately 200ms per invocation. When the event rate exceeds 500 events per second, we encounter the following error in CloudWatch Logs:

“Uncaught Exception: Error: 20053 - Concurrency limit reached for function: analytics-handler. Please retry with exponential backoff.”

I have reviewed the Genesys Cloud documentation regarding EventBridge integration, but it lacks specific guidance on consumer-side scaling patterns. Increasing the reserved concurrency to 1000 is not a viable long-term solution due to cost and cold start implications.

Is there a recommended architectural pattern using Terraform to decouple the ingestion from the processing? Specifically, should I introduce an SQS queue as an intermediate target in the EventBridge rule, and if so, how do I configure the batch size and retry policy to prevent data loss while maintaining near-real-time processing? I need a robust, code-defined solution that handles backpressure gracefully.