WebRTC Softphone Connection Stability and Queue Metrics Discrepancy

PlatformOps · April 29, 2026, 8:48pm

My current config is completely failing… The WebRTC softphone client is experiencing intermittent connection drops during peak hours in the EU-West region, specifically when agents are handling high-volume inbound calls. The softphone logs indicate a sudden loss of media stream connectivity without any prior warning signs, such as increased latency or jitter. This issue is particularly problematic as it leads to dropped calls and inaccurate queue metrics in the Performance Dashboard. The agents report that the call drops occur randomly, and there is no consistent pattern related to specific queues or time intervals. The current configuration includes standard WebRTC settings with no custom modifications to the media handling parameters. The environment details are as follows: Genesys Cloud version 23.12, WebRTC softphone version 2.5.1, and the agents are using Chrome 119 on Windows 11. The issue has been observed across multiple agents and devices, ruling out individual hardware or software configurations as the root cause. The Performance Dashboard shows a significant discrepancy between the expected call volume and the actual handled calls, suggesting that the dropped calls are not being properly logged or attributed to the respective queues. The business impact is substantial, as the inaccurate metrics are affecting the overall performance evaluation and resource allocation strategies. The technical team has already reviewed the network connectivity and firewall settings, confirming that there are no restrictions or blockages affecting the WebRTC traffic. The issue persists despite these checks, indicating a potential problem within the Genesys Cloud platform or the WebRTC softphone client itself. A detailed analysis of the softphone logs and the Performance Dashboard metrics is required to identify the root cause and implement a fix. Is there a known issue or a specific configuration parameter that needs to be adjusted to improve the stability of the WebRTC softphone and ensure accurate queue metrics?

QmAnalyst · April 29, 2026, 10:05pm

The way I solve this is by forcing the WebRTC client to use UDP-only transport and disabling the fallback to TCP in the admin settings, as TCP retransmission often causes the media stream to desynchronize from the signaling layer during high jitter. Check the STUN/TURN server configuration for your EU-West region to ensure the candidate gathering isn’t timing out before the ice connection establishes.

CacheCommander · May 1, 2026, 10:05pm

The way I solve this is by validating the WebSocket handshake parameters against the actual load test results from JMeter. The suggestion about UDP-only transport is technically sound for media jitter, but it doesn’t address the signaling layer stability which often fails under high concurrent connection counts in EU-West. The platform has strict rate limits on WebSocket reconnections, and if the softphone client is aggressively retrying on minor network blips, it can trigger a temporary ban or connection drop from the edge.

You need to adjust the reconnection backoff strategy in your WebRTC configuration. A fixed retry interval often causes a thundering herd problem during peak hours. Instead, implement an exponential backoff with jitter. Here is a sample config payload that helps stabilize the connection by spacing out the retry attempts:

{
 "webrtc_config": {
 "transport_policy": "udp_preferred",
 "reconnection": {
 "enabled": true,
 "strategy": "exponential_backoff",
 "initial_delay_ms": 1000,
 "max_delay_ms": 30000,
 "jitter_factor": 0.5,
 "max_attempts": 5
 },
 "ice_servers": [
 {
 "urls": ["stun:stun-eu-west-1.genesys.cloud:3478"]
 }
 ]
 }
}

Also, check your API throughput metrics in the Performance Dashboard. If you see a spike in 429 errors coinciding with the connection drops, the issue is definitely rate limiting on the signaling channel. Reducing the initial delay to 500ms might seem faster, but it increases the risk of hitting the edge rate limits during a mass reconnect event. Keep the max attempts low to fail fast and let the UI handle the user notification, rather than hanging on a dead connection. This approach usually reduces the queue metric discrepancies because the system accurately reflects the active session state.