We have built a custom NLP pipeline using AWS Comprehend for Japanese language processing because the built-in Genesys NLU does not support Japanese intent classification well enough for our 800-agent environment. The architecture uses the Genesys Bot Connector framework to route utterances to an AWS Lambda function that calls Comprehend and returns the structured intent/slot response.
The Lambda function works perfectly when tested directly - average response time is 800ms. But when called through the Bot Connector, approximately 40% of requests are timing out with a “Bot connector response timeout” error in the Architect flow execution logs.
Our Lambda configuration:
- Runtime: Python 3.12
- Memory: 512MB
- Timeout: 30 seconds
- Region: ap-northeast-1 (Tokyo, same as our GC org)
The Bot Connector integration is configured with the Lambda ARN. The IAM role has the correct lambda:InvokeFunction permission. The successful 60% of calls return in under 2 seconds.
Is there a hidden timeout on the Bot Connector side that is shorter than our Lambda timeout?
Yep, this is the Bot Connector’s internal timeout and it is way shorter than most people expect. The Bot Connector enforces a 3-second hard timeout on the external bot response, regardless of what your Lambda timeout is set to.
Your Lambda averages 800ms but clearly has a tail latency that pushes some calls past the 3-second mark. Cold starts on Python 3.12 Lambdas in ap-northeast-1 can add 1.5-2.5 seconds on the first invocation, which puts you right at or beyond the 3-second cutoff.
Two fixes:
-
Eliminate cold starts. Enable Provisioned Concurrency on your Lambda with a minimum of 5 instances. This keeps warm execution environments ready and eliminates the cold start penalty entirely. The cost for 5 provisioned instances in ap-northeast-1 is roughly $15/month.
-
Reduce payload size. The Bot Connector serializes the full conversation context (all previous turns) in the request payload. For long conversations with 10+ turns, this payload can be 50KB+, which adds serialization overhead. In your Lambda handler, only parse the currentInput field and ignore the conversation history if you do not need it for context.
The 3-second timeout is not configurable. Genesys has stated it will not increase it because the timeout protects the overall flow execution SLA.
Adding to the cold start mitigation - if Provisioned Concurrency cost is concern, there is alternative approach we use in our German environment.
We set up CloudWatch Events rule that invokes the Lambda every 5 minutes with a warm-up payload. The Lambda handler detects the warm-up event and returns immediately without calling Comprehend:
def handler(event, context):
if event.get('source') == 'aws.events':
return {'statusCode': 200, 'body': 'warm'}
# Normal Bot Connector processing
utterance = event['currentInput']['text']
# ... Comprehend call ...
This is not as reliable as Provisioned Concurrency because AWS can still recycle the execution environment between pings, but it reduced our cold start rate from 40% to under 5% at zero additional cost.
Also important from GDPR perspective: if you are processing customer utterances through AWS Comprehend, ensure your Comprehend endpoint is in same region as your GC org and that your Data Processing Agreement with AWS covers the NLP processing. Customer utterances sent to Comprehend contain PII that falls under Article 28 processor obligations.