We’re ingesting high-volume interaction events from Genesys Cloud via EventBridge to trigger downstream tracing logic. The goal is to maintain a single OpenTelemetry trace across the EventBridge → Lambda → Data Action flow.
Currently, we’re hitting Lambda concurrency limits during peak hours because we’re processing each event in the batch individually before the context propagates correctly. We’re trying to batch the processing to reduce invocations, but the X-Amzn-Trace-Id header isn’t present in the EventBridge payload, so we can’t easily correlate the batch back to the originating conversation.
Here’s the Lambda handler structure we’re using:
def lambda_handler(event, context):
# event is a list of EventBridge records
for record in event['Records']:
body = json.loads(record['body'])
# Trying to extract conversationId for tracing
conv_id = body.get('data', {}).get('conversationId')
# Create new span for each event - this is causing the concurrency spike
with tracer.start_as_current_span(f"process-event-{conv_id}") as span:
span.set_attribute("genesys.conversation.id", conv_id)
process_interaction(body)
The issue is that when we receive a batch of 100 events, we fire 100 separate spans and potentially 100 downstream calls if the logic isn’t optimized. We want to batch these into a single Data Action call or a single HTTP request to /api/v2/conversations/interactions to reduce overhead, but we lose the individual trace context for each interaction in the batch.
Is there a way to propagate the parent trace ID from the EventBridge trigger into a batched API call while keeping individual interaction spans? Or are we missing a header in the EventBridge event format that allows us to correlate these without individual invocations?
We’ve tried using the aws_request_id as a trace parent, but it doesn’t map back to the Genesys conversation ID cleanly for our dashboarding.
Any pointers on how to structure the batching logic without losing the trace lineage?