Quick question, has anyone seen this weird error? with our failover logic on the Singapore BYOC trunks. We have configured a primary carrier and a backup carrier with a strict 408 Request Timeout threshold to trigger the switch. The SBC logs clearly show that the primary carrier is returning 408s consistently for outbound calls originating from the +65 DID pool.
Despite these failures, Genesys Cloud continues to route traffic to the primary trunk. The health status in the Telephony administration console shows the primary trunk as ‘Healthy’ with green indicators, while the backup trunk remains idle. I have verified that the SIP credentials are correct and that the SBC is properly registered. The issue persists across all 15 trunks in the APAC region, suggesting a regional configuration quirk rather than a single trunk misconfiguration.
Has anyone seen this behavior where the platform ignores specific SIP error codes for failover decisions? I suspect the platform might be evaluating the trunk health based on registration status rather than transactional success rates. Any insights into how the failover algorithm weights 408 responses versus registration keepalives would be appreciated.
it depends, but generally… the issue here is likely that the failover logic in the telephony admin is not strictly tied to the specific 408 response codes from the SBC in the way you might expect. Genesys Cloud’s BYOC health checks often rely on a broader set of metrics, including overall call attempt success rates and latency, rather than just individual SIP timeout responses.
if the primary trunk is still showing as “healthy” in the console, it might be because the aggregate health score hasn’t dropped below the threshold defined in your routing configuration. you might need to adjust the sensitivity of the health check or explicitly define the 408 response as a critical failure event in your SBC configuration.
another approach is to use a Data Action to monitor the 408 responses directly from the SBC logs. you can set up a webhook to send these events to a service like ServiceNow, where you can trigger an automated response to update the trunk status in Genesys Cloud. this way, you can ensure that the failover logic is triggered immediately upon detecting the 408 errors.
here’s a quick example of how you might structure the webhook payload:
{
"event": "call_failure",
"carrier": "primary",
"response_code": 408,
"timestamp": "2023-10-01T12:34:56Z",
"did_pool": "+65"
}
this payload can be used to trigger a Data Action in Genesys Cloud to update the trunk status or route calls to the backup carrier. you can also use the Platform API to programmatically update the trunk health status based on these events.
make sure to test this setup in a non-production environment first to ensure that the failover logic works as expected.