Hello everyone! As a workforce coordinator, I constantly update our schedules! I recently learned how to use the ‘Play Audio’ action in Architect to dynamically announce our current Estimated Wait Time to callers! It is such a cool feature! However, we are getting complaints that the default text-to-speech voice reads the wait time incredibly fast, and there is no option for the customer to press a button to skip the announcement if they already heard it. Is there a way to slow down the voice and allow the caller to interrupt the wait time announcement?
When designing audio for accessibility compliance, you must address both audio pacing and user control. To slow down the text-to-speech output, you must wrap your dynamic variable in SSML tags. For example, <prosody rate="slow">Your wait time is {Flow.EWT}</prosody>.
To allow interruption, you must not use the standard ‘Play Audio’ action. You must replace it with a ‘Collect Input’ action, configure the audio as the prompt, and enable the ‘Barge-In’ setting.
This allows the user to press a key to skip the audio, satisfying WCAG cognitive overload guidelines.
I evaluate vendor platforms for enterprise RFPs, and the lack of native, user-friendly pacing controls for text-to-speech in the base interface is a massive oversight. The fact that an administrator has to manually write XML-based SSML tags just to make the bot speak at a normal human speed is completely ridiculous. The competitors offer simple slider bars for speech rate in their graphical interfaces.
SSML works, but if you make a single syntax error in your brackets, the flow completely breaks and plays an error message to the customer. Test your SSML extensively before publishing.