Hey everyone, I’ve run into a really strange issue with our predictive dialer campaign in the ap1 region. we are using fifteen byoc trunks via aws sip media application, and the outbound routing is configured with a primary and secondary carrier for failover. the issue is intermittent but frequent during peak hours (9 am to 11 am sgt). the architect flow initiates the dial, and the sip trace shows a 100 trying followed by a 200 ok from the carrier. however, the call state in genesys cloud stays in “dialing” for about 15 seconds before timing out with a generic “call failed” error in the analytics dashboard. no 486 busy or 480 temporary unavailable is returned, just a hard timeout.
we have verified that the aws security groups allow all udp traffic on ports 5060-5061 from the genesys cloud ap1 egress ips. the sip registrar timeout is set to 32 seconds, which should be sufficient. we also checked the carrier logs, and they confirm that the media stream is established successfully on their end, suggesting the issue might be in the ack handling or the early media negotiation within the genesys cloud platform. the sdk version we are using for monitoring is genesys cloud js sdk 3.x, but the actual dialing is happening via the native outbound campaign engine.
has anyone seen this specific behavior where the 200 ok is received but the call does not progress to “connected”? we are considering shifting the pacing logic to use a pause action with a dynamic duration to see if it helps with the race condition, but that feels like a workaround. we need to know if this is a known issue with the ap1 region’s sip proxy or if there is a specific configuration in the trunk settings we are missing. the error code in the api response is 408 request timeout, which is confusing given the 200 ok status. any insights on how to debug the sip dialog state between genesys cloud and the byoc trunk would be appreciated.