SIP Trunk Failover Metrics Discrepancy in EU-West Queue Performance

PlatformOps · May 8, 2026, 1:07pm

What is the standard approach to reconcile SIP trunk failure rates reported in the Telephony dashboard against the actual call drop metrics visible in the Queue Performance view within the EU-West environment?

The infrastructure team recently implemented a secondary SIP trunk provider to ensure business continuity during peak hours. The SIP trunk monitoring panel indicates a 2% failure rate on the primary trunk due to timeout errors (SIP 503 Service Unavailable) during the last reporting period. However, the corresponding Queue Performance dashboard shows a Service Level adherence of 98%, implying that nearly all calls were successfully routed and answered within the defined SLA thresholds.

This divergence creates confusion regarding the actual customer experience and system reliability. If the primary trunk is failing at a 2% rate, the expectation is a measurable impact on wait times or abandoned call rates in the associated routing queues. The Architect flow is configured to automatically retry failed connections on the secondary trunk, but the latency introduced by this retry mechanism does not appear to be reflected in the standard performance metrics.

How should one interpret these conflicting metrics to accurately report system availability and customer impact to stakeholders?