Looking for advice on why our custom BYOC media server is dropping connections when we push past 1,500 concurrent calls in our US-East region cluster.
We are currently running a JMeter script to validate the capacity of a new Bring Your Own Container deployment. The goal is to simulate a peak load of 2,000 concurrent voice calls using the Genesys Cloud BYOC API. The test setup uses a single EC2 instance running a custom media server application written in Node.js, which handles the WebSocket signaling and RTP media streams. The Genesys Cloud tenant is configured with a BYOC edge in the US-East region, and the media server endpoint is registered via the /api/v2/byoc/mediaservers endpoint.
The issue occurs consistently when the concurrent call count exceeds 1,500. At this point, the media server starts returning HTTP 503 Service Unavailable errors for new connection requests. The JMeter logs show that the WebSocket handshake fails with a Connection Refused error from the media server side. We have verified that the EC2 instance has sufficient CPU and memory resources, and the network bandwidth is not saturated. The Genesys Cloud side does not show any error logs related to the BYOC integration, suggesting that the issue is isolated to our media server application or the network configuration between the Genesys Cloud edge and our container.
We are using the latest version of the Genesys Cloud BYOC SDK for Node.js. The media server application is configured to handle up to 5,000 concurrent connections, but the actual throughput seems to be limited to around 1,500. We have checked the firewall rules and security groups, and there are no restrictions blocking the traffic. The DNS resolution for the media server endpoint is also working correctly.
Has anyone encountered similar issues with BYOC media servers under high load? Are there specific configuration settings or best practices for scaling media servers to handle 2,000+ concurrent calls? Any insights into potential bottlenecks in the WebSocket handling or RTP stream management would be greatly appreciated. We are also considering splitting the load across multiple media server instances, but we want to understand if this is a necessary step or if there is a configuration issue on our end.