EventBridge batch processing in Kotlin Lambda hits concurrent execution limits

TessaTalks · January 14, 2026, 6:40pm

We’re routing Genesys Cloud interaction events through EventBridge to a Kotlin Lambda function for real-time analytics aggregation. The setup works fine during off-peak hours, but as soon as the queue depth spikes past ~500 events/second, the Lambda starts dropping messages due to concurrency limits. I’m using the aws-lambda-java-events library with a SqsEvent input, but the overhead of deserializing the EventBridge envelope for each batch item is killing the throughput.

Here’s the core handler logic:

fun handler(event: SqsEvent, context: Context): List<SqsEvent.SqsBatchResponse?> {
 val failures = mutableListOf<SqsEvent.SqsBatchResponse>()
 
 event.records.forEach { record ->
 try {
 // This deserialization is taking ~80ms per record
 val gcEvent = Json.decodeFromString<GcInteractionEvent>(record.body)
 processEvent(gcEvent)
 } catch (e: Exception) {
 log.error("Failed to process record ${record.messageId}", e)
 failures.add(SqsEvent.SqsBatchResponse(record.messageId))
 }
 }
 
 return failures
}

The processEvent call itself is fast, but the JSON parsing of the nested EventBridge detail object is the bottleneck. I’ve tried increasing the Lambda memory to 1024MB, which helps slightly, but we’re still hitting the account-level concurrency cap during peak call center hours.

Is there a way to batch process the EventBridge payloads more efficiently in Kotlin without rewriting the entire handler in Java? Or should I be looking at changing the EventBridge source to push directly to SQS with a larger batch size and handling the deserialization asynchronously? The current SqsEvent structure seems rigid.

sip_wanderer88 · January 15, 2026, 6:40am

You might want to check if you’re actually hitting the Lambda concurrency limit or just the SQS batch processing timeout. The overhead of deserializing the EventBridge envelope in Java/Kotlin is brutal. The aws-lambda-java-events library isn’t optimized for high-throughput JSON parsing.

Try switching to a streaming approach or at least optimizing the deserialization. If you’re stuck with the SqsEvent model, you can bypass the heavy object mapping by accessing the raw JSON payload directly. It’s a bit hacky but saves a ton of CPU cycles.

Here’s how you can grab the raw payload in Kotlin without inflating the full object graph:

import com.amazonaws.services.lambda.runtime.events.SQSEvent

fun handleRequest(event: SQSEvent, context: Context) {
 // Instead of iterating event.records and parsing each one's body,
 // access the raw input if possible, or minimize object creation.
 
 event.records.forEach { record ->
 // Directly use record.body as a String if you're doing simple regex or 
 // lightweight parsing. Avoid Jackson/ObjectMapper for every single message.
 val rawPayload = record.body
 
 // Process rawPayload directly. 
 // If you need specific fields, consider using a lighter parser like 
 // Jackson's JsonNode or even a custom string split if the format is predictable.
 processPayload(rawPayload)
 }
}

Also, make sure your Lambda’s reserved concurrency is actually set. By default, it shares the account limit. If you’re processing analytics, you might want to increase the timeout too. The default 3 seconds is often too short for batch processing.

Set your function timeout to 15 or 30 seconds in the AWS console. Then, tune the SQS trigger’s batch size. Start with 10 and see if the error rate drops. If it’s still dropping messages, you’re likely hitting the provisioned concurrency cap. Check CloudWatch metrics for Throttles vs ProvisionedConcurrencyExecutions.

The 500 errors you might see downstream are probably because the Lambda timed out and SQS retried the batch. Fix the timeout first.

QueueBreaker · January 18, 2026, 6:40am

The suggestion above hits the mark. Deserializing the full envelope is expensive. In my Kotlin Lambdas, I use kotlinx.serialization with Json.decodeFromString directly on the raw JSON string. It’s faster than the Java library. Also, check your SQS visibility timeout. If processing takes longer than the timeout, you get duplicate events. That kills performance fast.