I am pulling my hair out. I inherited an org where the old admin set up four different SIP trunks for PSTN failover.
I think he literally just copy-pasted the config files 4 times. There is zero routing logic or priority weighting. If the primary BYOC Cloud trunk goes down, the system just throws a 503 error instead of attempting the secondary carrier. Where is the actual failover logic configured in GC?
When our trunk failover happens, my custom WebRTC Chrome extension completely loses access to the microphone.
I know it is a browser thing, but I have to go into chrome://flags and reset the WebRTC bindings every time the SIP path drops. The extension logs just show a massive ICE negotiation failure.
In NICE CXone, you literally just set an ‘Alternate Routing Path’ on the Point of Contact in the Studio canvas.
Genesys Cloud’s BYOC architecture is much more traditional. You configure failover by putting multiple trunks into an ‘Outbound Route’ and using the sequential evaluation order, or by utilizing standard SIP DNS SRV records to let your SBC handle the carrier failover transparently.
I actually ran empirical testing on the failover speed to document the API behavior.
When a primary BYOC trunk is abruptly disconnected (simulating a hard network drop), it takes exactly 1.2 seconds for the /api/v2/telephony/providers/edges/trunks/{trunkId}/metrics endpoint to report the state change from IN_SERVICE to OUT_OF_SERVICE. The Edge then takes approximately 800ms to negotiate the next trunk in the logical route group.
From a user experience standpoint, you can catch this failure natively in Architect.
If the SIP response from the carrier is a 503 Service Unavailable, your outbound routing block in Architect will hit the ‘Failure’ path. I always design a creative containment prompt there-something like, ‘We are experiencing temporary network turbulence, please hold while we reroute your call’-before attempting a secondary data action or fallback queue.
Please ensure you test the audio quality on the secondary PSTN trunk if you manage to fix the failover.
We discovered that our fallback carrier used an aggressive G.729 compression codec to save money. It sounded like the caller was underwater, which completely broke our topic detection and sentiment calibration models during the 3-hour outage.
Just to follow up on the Chrome extension comment above:
If the WebRTC path fails to reconnect after a trunk failover, instruct your agents to do a hard refresh (Ctrl+F5) on the browser. Sometimes the GC client caches a stale ICE candidate, and clearing the cache forces the browser to pull the new SDP details from the secondary media path.
Before you reconfigure the failover logic, you must verify the encryption standards of the secondary carriers.
If your primary BYOC trunk enforces TLS/SRTP, and you fail over to a legacy PSTN carrier trunk that only negotiates unencrypted UDP/RTP, you will immediately violate PCI-DSS compliance if credit card data is transmitted. Ensure all failover trunks in your route group meet your baseline security audits.
To automate the monitoring of this, we wrote a Java service that constantly polls the trunk metrics.
TelephonyProvidersEdgeApi edgeApi = new TelephonyProvidersEdgeApi();
TrunkMetrics metrics = edgeApi.getTelephonyProvidersEdgesTrunkMetrics(trunkId);
if (metrics.getLogicalStatus().equals("OUT_OF_SERVICE")) {
// Fire alert to MuleSoft / PagerDuty
alertSystem.triggerSev1("Primary SIP Trunk Down");
}
It is methodical and prevents us from finding out about the outage from angry customers.