How does the SIP keepalive retry logic actually behave when a BYOC trunk hits a carrier-side NAT timeout? The Ohio BYOC pool dropped into failover mode around 2 PM ET yesterday. Primary carrier’s session border controller started dropping OPTIONS packets after roughly 90 seconds of idle time. Console shows a cascade of SIP 408 Request Timeout followed by a hard SIP 503 Service Unavailable. Architect v2024.3.1 is routing outbound calls through the fl-8821-out flow. The trunk group has the Use failover on error toggle switched on, but the secondary Twilio trunk never catches the traffic. Instead, the platform keeps hammering the primary with REGISTER refreshes every 15 seconds. Outbound queue backed up for three hours while agents watched calls bounce to voicemail.
Checked the trunk group settings via GET /api/v2/architect/trunkgroups/ohio-byoc-01. The keepalive interval is set to 30 seconds, carrier expects 60. Docs say it’s supposed to scale the retry window, but the logs show a flat 15-second cadence. Outbound routing rules point to the correct trunk group, but the failover trigger seems stuck on the 408 instead of waiting for the 503. Tried adjusting the SIP URI formatting in the carrier portal. Nothing changed. The secondary trunk just sits there doing jack all with a healthy 200 OK on its own REGISTER cycle. Console metrics flatline. You’ll notice the platform doesn’t even attempt a re-route.
Just need to know if the keepalive backoff is hardcoded or if there’s a hidden parameter in the trunk group payload that controls the multiplier. Here’s the raw REGISTER header dump from the last failed cycle:
REGISTER sip:34.210.xx.xx SIP/2.0
To: sip:byoc-ohio-01@34.210.xx.xx
From: sip:byoc-ohio-01@34.210.xx.xx;tag=gbk7721
Call-ID: 8842f1a2@34.210.xx.xx
CSeq: 4 REGISTER
Contact: sip:34.210.xx.xx:5060
Expires: 15