SIP Trunk Registration Flapping During WFM Schedule Publish Peak

Quick question about the correlation between our WFM schedule publish events and SIP trunk registration stability. We operate a BYOC Edge deployment in the America/Chicago timezone, and our weekly schedule publishing window coincides with a significant spike in SIP registration failures on our primary trunk provider (Twilio). The error logs show a cascade of 408 Request Timeout errors followed by 401 Unauthorized errors for approximately 15% of our agent endpoints right when the /v2/wfm/schedules API call completes. The agents are able to log in, but their SIP devices lose registration with the Edge node, causing inbound calls to route to voicemail or fail entirely. This is critically impacting our adherence metrics because agents cannot take calls during the first hour of their shift if the trunk is unstable. The Architect flow handling the schedule publish is straightforward, just a simple API integration, but the timing suggests a resource contention issue on the Edge node processing both the WFM payload and the SIP signaling traffic. We have checked the network latency between the Edge node and the Twilio endpoint, and it remains stable under 50ms, which rules out general network issues. The problem seems isolated to the specific window where the WFM service is writing large schedule objects to the database while simultaneously handling SIP REGISTER requests for 500+ agents. I suspect the Edge node might be hitting a thread pool limit or a memory threshold that causes it to drop SIP keep-alives during the heavy WFM write operations. We have already tried the following steps to mitigate the issue:

  • Increased the SIP registration timeout and retry intervals in the Edge configuration to allow more grace period during the publish window, but this only delayed the failure rather than preventing it.
  • Split the weekly schedule publish into two smaller batches to reduce the load spike, which helped reduce the failure rate to 5%, but we need a zero-failure solution for compliance reporting.

Does anyone have experience tuning Edge node resources specifically for concurrent WFM heavy writes and SIP signaling? Are there specific JVM flags or connection pool settings for the SIP stack that we should adjust to prevent this registration flapping? Any insights into how the Edge prioritizes WFM API calls versus SIP signaling would be greatly appreciated.