Predictive outbound campaigns stall with 408 during Edge survivability switchover

Running Edge 2024.6.1 on Dell R750 nodes here in Tokyo. Primary WAN link flapped again during the nightly ISP maintenance window. Cluster flips straight into local survivability mode. Local SIP desk phones keep ringing fine, but the outbound predictive dialer campaigns just freeze. Dual NIC bond holds steady at 10GbE. BIOS stays locked in performance mode with C-states disabled. The dialer UI shows campaigns suspended after about 45 seconds. API returns a hard timeout on the status endpoint. Logs from the edge-dialer service show repeated connection resets to the cloud routing tier. Network path to the Genesys public endpoints stays open via the secondary ISP, but the media proxy doesn’t handshake properly. Checked the iptables rules on the R750. It’s not blocking port 443 or 5060. DNS resolution on the appliance points to internal resolvers. Switching to 8.8.8.8 didn’t change anything. Campaigns can’t restart until the primary WAN link comes back and the cluster drops out of survivability mode. The outbound routing table seems to ignore the secondary gateway IP during local mode.

{
 "status": "SUSPENDED",
 "error_code": "ERR_OUTBOUND_DIALER_TIMEOUT",
 "message": "Connection to cloud dialer tier timed out after 45000ms",
 "gateway_ip": "10.20.30.4",
 "survivability_mode": true
}

Looking at the edge-dialer systemd status, the process keeps restarting every 12 seconds. Maybe the local media proxy needs a specific route metric adjustment for the secondary bond.