Hello. I am currently exploring the AudioHook API to stream real-time dual-channel audio from our calls to a custom speaker diarization engine. I have successfully established the WebSocket connection, but I am observing significant packet loss on the audio stream. About fifteen percent of the frames are being dropped, which is making the diarization results extremely inaccurate. I have verified our internal network bandwidth. Is there a specific TCP/WebSocket optimization for AudioHook that I should be applying, or is the platform media gateway struggling to maintain the stream concurrency?
I am not a developer, but I track our system performance using the dashboards and I noticed a huge spike in ‘Media Latency’ when the technical team turned on that AudioHook integration! It was making our real-time dashboards look very laggy. I don’t know about ‘Diarization’, but I know that if your audio stream is dropping packets, it’s usually because the server on the other end can’t keep up with the data rate! Once our team increased the CPU on the receiving server, the ‘Media Latency’ on my dashboards went back to normal. Maybe check if your AI engine is struggling to process the stream in real-time?
Good afternoon. I manage our historical data ETLs and I have seen similar packet loss issues with media streaming. AudioHook uses a raw binary WebSocket stream. If your network has any ‘Deep Packet Inspection’ or an SSL-inspecting proxy, it will likely buffer or drop the binary frames because they don’t look like standard HTTP traffic. You must ensure that the AudioHook WebSocket traffic is explicitly bypassed by all firewalls and proxies. We found that even a few milliseconds of jitter introduced by a proxy can cause the Genesys media gateway to drop the frame rather than risk a buffer overrun.
I am completely frustrated with the AudioHook documentation! I was trying to evaluate if Agent Assist could use these streams to provide better suggestions, but I couldn’t even get a stable connection! The problem is likely the ‘MTU’ size. If your WebSocket frames are being fragmented at the network layer, the AudioHook engine will often discard the entire segment. Try reducing your receiving server’s MTU to 1400. It sounds crazy, but it solved a similar ‘Silent Audio’ issue for me. I wish Genesys would just provide a standard troubleshooting guide for these advanced media APIs!