Observing a significant divergence in reported failure rates for inbound calls routed via a specific SIP trunk configuration. The environment is hosted in the EU-West region, utilizing a standard queue structure with predictive routing enabled. The issue manifests specifically when correlating data from the Queue Activity Performance View against the detailed Telephony Reports.
The Queue Activity dashboard indicates a steady stream of inbound interactions with a healthy distribution of answered calls and standard abandon rates. However, the Telephony Report for the corresponding SIP trunk entity logs a high volume of “Call Failed” events with the specific error code SIP_503_Service_Unavailable. These failures occur during peak load periods between 09:00 and 11:00 CET. The discrepancy suggests that while the interaction may be successfully routed to an agent queue and eventually answered, the initial SIP INVITE handshake is failing or being retried in a manner that the Telephony Report captures as a failure, whereas the Queue Activity view treats it as a successful ingress once the agent accepts.
The Architect flow is configured with a standard “Get Input” block followed by a “Queue” block. No custom retry logic or complex branching is present at the ingress point. The SIP trunk is configured with a maximum concurrent session limit of 500. It appears that when this threshold is approached, the CPE (Cloud Phone Engine) may be rejecting initial SIP requests with a 503, triggering a retry mechanism that eventually succeeds. This results in a single successful interaction in the Queue view but multiple failure events in the Telephony logs.
Has anyone encountered this specific mismatch between Queue Performance metrics and Telephony SIP error reporting? Is there a configuration setting within the SIP trunk or the Architect flow to suppress these transient 503 errors from the Telephony Report, or should they be considered valid capacity constraints requiring infrastructure scaling? The current reporting divergence is causing confusion in operational dashboards regarding true service level adherence.