Architecting a Multi-Cloud Interaction Routing Strategy across AWS and Azure
What This Guide Covers
- Breaking out of a single-vendor cloud dependency by architecting a true active-active multi-cloud routing strategy.
- Leveraging Genesys Cloud (hosted on AWS) as the primary CCaaS engine, while using Microsoft Azure Communication Services (ACS) and Azure Functions as an automated failover and secondary routing brain.
- The end result is a highly resilient enterprise telephony architecture that can seamlessly survive a total region-wide failure of Amazon Web Services without dropping a single customer call.
Prerequisites, Roles & Licensing
- Licensing: Genesys Cloud CX 2 or 3 (BYOC-Cloud).
- Permissions:
Telephony > Trunk > Edit,Architect > Flow > Edit. - Infrastructure: An active AWS Account, an active Microsoft Azure Account, and a Tier-1 SIP Carrier capable of advanced SIP routing (e.g., Bandwidth, Twilio, or AT&T).
The Implementation Deep-Dive
1. The Reality of Public Cloud Outages
Genesys Cloud boasts a 99.99% SLA. However, Genesys Cloud runs on Amazon Web Services (AWS). If the AWS us-east-1 region suffers a catastrophic failure (which happened in 2017, 2020, and 2021), Genesys Cloud goes down with it.
The Trap:
Most architects build a failover strategy within AWS (e.g., routing calls to a backup AWS region). But if the underlying AWS networking or IAM infrastructure fails globally, your backup region fails too. For Fortune 500 banks or emergency services, you cannot have a single point of failure at the foundational cloud layer. You must build a multi-cloud failover.
2. The Carrier-Level Split (The Traffic Cop)
The multi-cloud split must occur before the call ever reaches AWS or Azure. It must happen at your telecom carrier.
Architectural Reasoning:
You will configure your SIP Carrier (e.g., Twilio Programmable Voice or Bandwidth) to act as the traffic cop. Under normal conditions, the carrier routes 100% of your inbound toll-free calls to Genesys Cloud (AWS). If the carrier detects that Genesys Cloud is unreachable (SIP 503 Service Unavailable or a Timeout), the carrier automatically routes the call to a backup SIP trunk terminating in Microsoft Azure.
Implementation Steps (Carrier Side):
- In your Carrier’s routing portal, configure a Primary SIP Trunk pointing to the Genesys Cloud BYOC-Cloud AWS edge (e.g.,
sip:yourorg.mypurecloud.com). - Configure a Secondary Failover SIP Trunk pointing to your Azure Communication Services (ACS) endpoint.
- Set the failover timeout to 3 seconds. If AWS does not respond with a
100 Tryingwithin 3 seconds, the carrier executes the failover.
3. The Azure Failover Environment (The Lifeboat)
When AWS is down, the call lands in Microsoft Azure. You cannot duplicate the entire Genesys Cloud platform in Azure; you just need a “Lifeboat” IVR and routing engine to keep the business alive.
Implementation Steps (Azure Side):
- Provision Azure Communication Services (ACS) and configure Direct Routing to accept the SIP trunk from your carrier.
- Build an Azure Function (in Node.js or C#) to act as the IVR controller using the ACS Call Automation API.
- The Lifeboat Flow: When a call lands in the Azure Function, the function must play a synthesized voice prompt: “We are currently experiencing a system outage. To reach the emergency dispatch team, press 1. To leave a voicemail, press 2.”
- The Routing: If the customer presses 1, the Azure Function executes a SIP
REFERorINVITEto route the call to a predefined list of cell phone numbers (e.g., the duty managers). - The Data Store: If the customer leaves a voicemail, the Azure Function records the audio and saves it to an Azure Blob Storage container, completely bypassing AWS S3.
4. Data Synchronization (Active-Active Customer Profiles)
If you are running a multi-cloud architecture, the Azure Lifeboat needs to know who is calling to prioritize VIPs, even when the primary AWS database is offline.
Implementation Steps:
- Do not rely on Genesys Cloud’s internal External Contacts database as your sole source of truth.
- Implement a unified Customer Data Platform (CDP) or use a multi-cloud database like MongoDB Atlas or CockroachDB, deployed across both AWS and Azure.
- During normal operations, Genesys Cloud Architect Data Actions query the AWS node of the database.
- During an outage, the Azure Function Lifeboat queries the Azure node of the database. Because the database replicates across clouds at the storage layer, the Azure IVR instantly recognizes the VIP customer by their phone number and plays a customized emergency greeting, ensuring a premium experience even during a total platform failure.
Validation, Edge Cases & Troubleshooting
Edge Case 1: The “Split-Brain” State
- The Failure Condition: AWS experiences a “brownout.” It is not completely dead, but it is extremely slow. The carrier routes a call to AWS. AWS takes 4 seconds to respond. The carrier assumes AWS is dead and routes the call to Azure. AWS eventually responds. You now have two active call legs for the same customer-one stuck in a broken Genesys queue, and one talking to the Azure Lifeboat.
- The Root Cause: Loose failover timers and lack of SIP CANCEL propagation.
- The Solution: You must ensure your carrier explicitly sends a
SIP CANCELto the primary AWS trunk before initiating the failoverINVITEto Azure. This forces Genesys Cloud to tear down the ghost call, preventing inaccurate queue metrics and “zombie” interactions from appearing in your analytics when the system finally recovers.
Edge Case 2: Agent Access During an Outage
- The Failure Condition: The carrier successfully routes calls to the Azure Lifeboat. The Azure IVR tries to transfer the calls to your agents. However, the agents cannot answer because they are staring at a spinning “Connecting…” screen on their Genesys Cloud WebRTC browser tab (which is hosted in the dead AWS region).
- The Root Cause: A multi-cloud routing strategy only solves the inbound customer path, not the agent access path.
- The Solution: If you require live agents during an AWS outage, you must have a secondary softphone deployed to their laptops (e.g., Microsoft Teams or a generic SIP client like Zoiper). The Azure Function must be configured to route emergency calls to the agents’ Microsoft Teams direct-routing extensions, entirely bypassing the Genesys Cloud desktop application.