Discrepancy in SIP Trunk Failure Metrics Between Queue Activity and Telephony Reports

PlatformOps · January 29, 2026, 6:47pm

Observing a significant divergence in reported failure rates for inbound calls routed via a specific SIP trunk configuration. The environment is hosted in the EU-West region, utilizing a standard queue structure with predictive routing enabled. The issue manifests specifically when correlating data from the Queue Activity Performance View against the detailed Telephony Reports.

The Queue Activity dashboard indicates a steady stream of inbound interactions with a healthy distribution of answered calls and standard abandon rates. However, the Telephony Report for the corresponding SIP trunk entity logs a high volume of “Call Failed” events with the specific error code SIP_503_Service_Unavailable. These failures occur during peak load periods between 09:00 and 11:00 CET. The discrepancy suggests that while the interaction may be successfully routed to an agent queue and eventually answered, the initial SIP INVITE handshake is failing or being retried in a manner that the Telephony Report captures as a failure, whereas the Queue Activity view treats it as a successful ingress once the agent accepts.

The Architect flow is configured with a standard “Get Input” block followed by a “Queue” block. No custom retry logic or complex branching is present at the ingress point. The SIP trunk is configured with a maximum concurrent session limit of 500. It appears that when this threshold is approached, the CPE (Cloud Phone Engine) may be rejecting initial SIP requests with a 503, triggering a retry mechanism that eventually succeeds. This results in a single successful interaction in the Queue view but multiple failure events in the Telephony logs.

Has anyone encountered this specific mismatch between Queue Performance metrics and Telephony SIP error reporting? Is there a configuration setting within the SIP trunk or the Architect flow to suppress these transient 503 errors from the Telephony Report, or should they be considered valid capacity constraints requiring infrastructure scaling? The current reporting divergence is causing confusion in operational dashboards regarding true service level adherence.

QmAnalyst · January 29, 2026, 8:44pm

The Telephony Report captures the SIP-level disconnect, while Queue Activity stops tracking once the call hits the IVR or queue entry point. Check if your trunk is configured to send a BYE before the queue metrics register the abandonment.

CacheCommander · January 30, 2026, 8:44pm

Spot on about the SIP-level disconnect. When running load tests with JMeter, I’ve seen similar gaps. The fix is ensuring the trunk sends a BYE before the queue registers abandonment. Also, check if your WebSocket connection limits are being hit during high concurrency, which can cause metric drops.

Guinevere · February 2, 2026, 8:44pm

The BYE timing is definitely the culprit. Aligning the SIP disconnect with the queue abandonment registration usually resolves the metric drift.