Trying to understand the best practice for managing AWS Lambda concurrency limits when processing high-throughput interaction events from Genesys Cloud via EventBridge.
We are deploying a Terraform module that provisions the necessary infrastructure for real-time analytics. The current implementation uses a single Lambda function triggered by an EventBridge rule matching genesyscloud:interaction:*. During peak business hours, we observe significant throttling. The Lambda function is configured with a reserved concurrency of 100, yet the event rate spikes well beyond this threshold, causing dropped events.
The Terraform configuration for the trigger is standard:
resource "aws_cloudwatch_event_rule" "genesys_events" {
name = "genesys-interaction-events"
description = "Capture Genesys Cloud interaction events"
event_pattern = jsonencode({
source = ["genesyscloud"]
detail-type = ["Interaction Event"]
})
}
resource "aws_cloudwatch_event_target" "lambda_target" {
rule = aws_cloudwatch_event_rule.genesys_events.name
target_id = "ProcessInteraction"
arn = aws_lambda_function.analytics_handler.arn
}
The Lambda function itself performs synchronous API calls to update a downstream data warehouse. The latency is approximately 200ms per invocation. When the event rate exceeds 500 events per second, we encounter the following error in CloudWatch Logs:
“Uncaught Exception: Error: 20053 - Concurrency limit reached for function: analytics-handler. Please retry with exponential backoff.”
I have reviewed the Genesys Cloud documentation regarding EventBridge integration, but it lacks specific guidance on consumer-side scaling patterns. Increasing the reserved concurrency to 1000 is not a viable long-term solution due to cost and cold start implications.
Is there a recommended architectural pattern using Terraform to decouple the ingestion from the processing? Specifically, should I introduce an SQS queue as an intermediate target in the EventBridge rule, and if so, how do I configure the batch size and retry policy to prevent data loss while maintaining near-real-time processing? I need a robust, code-defined solution that handles backpressure gracefully.