Designing Asynchronous Digital Interaction SLA Tracking with Escalation Automation
What This Guide Covers
- Architecting strict Service Level Agreement (SLA) timers for asynchronous messaging channels (Email, WhatsApp, SMS) in Genesys Cloud.
- Building automated escalation workflows using EventBridge and AWS Step Functions to identify “stale” interactions that agents have parked and forgotten about.
- The end result is a highly disciplined digital back-office where no customer email or WhatsApp message sits unresolved for more than 4 hours, and managers receive automated Slack alerts for impending SLA breaches.
Prerequisites, Roles & Licensing
- Licensing: Genesys Cloud CX 2 or 3 (Digital).
- Permissions:
Routing > Queue > Edit,Integrations > Integration > Edit,Analytics > Conversation Detail > View. - Infrastructure: AWS EventBridge, AWS Step Functions (or Lambda), and an outbound notification channel (e.g., a Slack Webhook).
The Implementation Deep-Dive
1. The Trap of Asynchronous Concurrency
Unlike Voice interactions, which are synchronous (the agent must finish the call before taking the next one), Digital interactions are asynchronous. An agent can handle 5 emails simultaneously.
The Trap:
If an agent opens a difficult email, they might “Park” it or simply leave the interaction open while they wait for an internal SME to respond. Hours go by. The agent goes home for the day, leaving the interaction locked in their workspace. The customer receives no reply, but from the perspective of standard Genesys Cloud routing, the interaction is “Handled” because it is assigned to an agent, so it does not flag as an Abandon. Your SLAs rot silently.
2. EventBridge: Listening for the “AcdStart” and “AcdEnd”
To track true asynchronous SLAs, you cannot rely purely on the ACD queue timer. You must track the total elapsed time of the interaction.
Implementation Steps:
- Configure an Amazon EventBridge Integration in Genesys Cloud.
- Subscribe to the
v2.detail.events.conversation.{id}.acwandv2.detail.events.conversation.{id}.customer.endtopics. - In AWS EventBridge, route these events to an AWS Step Function.
3. Architecting the AWS Step Function Timer
AWS Step Functions excel at long-running asynchronous workflows (up to 1 year). We will use a Step Function to act as our ticking SLA bomb.
Architectural Reasoning:
When the interaction starts, the Step Function starts. If the interaction finishes normally, we kill the Step Function. If the Step Function timer hits zero, it triggers the escalation.
Implementation Steps:
- The Trigger: When EventBridge detects a new inbound email or message, it triggers the Step Function execution, passing the
conversationId. - The Wait State: The first step is a
Waitstate set to 3.5 hours (your Warning SLA). - The Check State: After 3.5 hours, the Step Function wakes up. It executes an AWS Lambda function that queries the Genesys Cloud Analytics API:
GET /api/v2/analytics/conversations/{conversationId}/details. - The Evaluation: The Lambda checks if the conversation
endTimeexists.- If it exists, the agent successfully handled the interaction. The Step Function succeeds and terminates quietly.
- If
endTimeisnull, the interaction is still open. Proceed to Escalation.
4. The Escalation Automation
If the interaction is still open after 3.5 hours, the Step Function executes the Escalation sequence.
Implementation Steps:
- Identify the Culprit: The Lambda parses the Analytics data to find the
participantobject wherepurpose == "agent"andendTime == null. This identifies the specific agent who is holding the interaction. - Send the Manager Alert: The Lambda executes a
POSTrequest to your Slack/Teams webhook:- Message:
SLA WARNING: Conversation {conversationId} has been open for 3.5 hours. It is currently locked by agent {AgentName}. Please intervene.
- Message:
- Optional: Force Disconnect (Extreme Measure): If the interaction hits 48 hours, you can use the API (
PATCH /api/v2/conversations/emails/{conversationId}/participants/{participantId}) to forcefully disconnect the agent and throw the interaction back into the queue for re-routing.
Validation, Edge Cases & Troubleshooting
Edge Case 1: The “Parked” Interaction Loophole
- The Failure Condition: Your agent receives a WhatsApp message. They don’t know the answer, so they use the native Genesys Cloud “Park” feature. Your SLA timer keeps ticking and alerts the manager, but the manager says “That’s fine, it’s parked waiting for the customer to send a photo.”
- The Root Cause: Not all open interactions are stalled by the agent. Sometimes they are stalled waiting for the customer.
- The Solution: Update your Step Function Lambda logic. When evaluating the open interaction, check the last message timestamp in the thread. If the last message was sent by the agent (e.g., “Please send a photo of the serial number”), then the SLA clock should be paused. Only escalate if the last message was sent by the customer and has been unread/unanswered for 3.5 hours.
Edge Case 2: Weekend and Off-Hours Slippage
- The Failure Condition: A customer emails on Friday at 4:45 PM. Your business hours end at 5:00 PM. At 8:15 PM, your Slack channel lights up with critical SLA breach alerts.
- The Root Cause: Step Functions use absolute time (UTC). They do not natively understand Genesys Cloud Schedule Groups.
- The Solution: In your Step Function Lambda, before initiating the 3.5-hour
Waitstate, query the Genesys Cloud Routing API (GET /api/v2/routing/schedules) to evaluate your business hours. If the current time + 3.5 hours crosses the end of the business day, calculate the remaining delta and add it to the start of the next business day, effectively “pausing” the Step Function timer over the weekend.