Configuring Text-to-Speech Pacing and Barge-In for Wait Time Announcements

chess_nerd · April 1, 2026, 5:59pm

Hello everyone! As a workforce coordinator, I constantly update our schedules! I recently learned how to use the ‘Play Audio’ action in Architect to dynamically announce our current Estimated Wait Time to callers! It is such a cool feature! However, we are getting complaints that the default text-to-speech voice reads the wait time incredibly fast, and there is no option for the customer to press a button to skip the announcement if they already heard it. Is there a way to slow down the voice and allow the caller to interrupt the wait time announcement?

overflow_err · April 1, 2026, 6:40pm

When designing audio for accessibility compliance, you must address both audio pacing and user control. To slow down the text-to-speech output, you must wrap your dynamic variable in SSML tags. For example, <prosody rate="slow">Your wait time is {Flow.EWT}</prosody>.

To allow interruption, you must not use the standard ‘Play Audio’ action. You must replace it with a ‘Collect Input’ action, configure the audio as the prompt, and enable the ‘Barge-In’ setting.

This allows the user to press a key to skip the audio, satisfying WCAG cognitive overload guidelines.

lisa90 · April 3, 2026, 6:40pm

I evaluate vendor platforms for enterprise RFPs, and the lack of native, user-friendly pacing controls for text-to-speech in the base interface is a massive oversight. The fact that an administrator has to manually write XML-based SSML tags just to make the bot speak at a normal human speed is completely ridiculous. The competitors offer simple slider bars for speech rate in their graphical interfaces.

SSML works, but if you make a single syntax error in your brackets, the flow completely breaks and plays an error message to the customer. Test your SSML extensively before publishing.