AudioHook Stream Latency Impacting Voice Biometrics Verification

Hello everyone! I am super excited to be experimenting with the AudioHook API to stream live agent and customer audio to a third-party voice biometrics engine. It is such a cool feature! However, I am noticing that our biometrics engine is struggling to return a “Verified” result before the agent finishes their standard greeting. When I look at the logs, there seems to be a consistent 400 to 500-millisecond delay between the customer speaking and the audio frames arriving at our WebSocket server. Is this level of latency standard for AudioHook, or are there specific audio formats or chunk sizes I should request in the connection parameters to minimize the streaming delay?

I have seen this latency. The delay is often caused by the payload chunking. Ensure your AudioHook integration is configured to use the PCMU (G.711 mu-law) audio format rather than an external transcoder format. Genesys Cloud processes PCMU natively with much lower overhead.

I manage predictive dialers where millisecond latency matters. The point above is correct about the format. Furthermore, check the geographic region of your WebSocket server. If your Genesys Cloud organization is in us-east-1 but your biometrics server is in eu-central-1, you are battling pure network latency. You must deploy your biometrics listener in the same AWS region as your Genesys Cloud core organization to reduce the round-trip time.