Implementing Automated Bot Performance Auditing using Ground-Truth Human Evaluations
What This Guide Covers
- Architecting a continuous feedback loop to objectively measure Voice and Chatbot performance using Genesys Cloud Quality Management (QM).
- Routing “Bot Hand-Offs” to a specialized human auditing queue where human evaluators score the bot’s accuracy, empathy, and containment failure reasons.
- The end result is a highly quantified dashboard that tells you why your bots are failing, driven by human ground-truth data, rather than guessing based on raw NLU confidence scores.
Prerequisites, Roles & Licensing
- Licensing: Genesys Cloud CX 2 or 3 (Quality Management).
- Permissions:
Quality > Evaluation > Edit,Architect > Flow > Edit,Routing > Queue > Edit. - Infrastructure: A designated team of QA evaluators trained to audit Bot transcripts.
The Implementation Deep-Dive
1. The Fallacy of “Containment Rate”
Most contact centers judge their bots by a single metric: Containment Rate (e.g., “The bot handled 40% of calls without an agent”).
The Trap:
A high containment rate does not equal a good customer experience. A poorly designed bot that frustrates the customer until they hang up is technically “contained.” Conversely, relying solely on NLU Confidence Scores is flawed because a bot can be 99% confident it understood the customer, but still execute the wrong business logic. You need qualitative human review.
2. Designing the Bot Evaluation Form
You must treat the Bot as an “Agent” and evaluate it using a Quality Management form.
Implementation Steps:
- Navigate to Admin > Quality > Evaluation Forms.
- Create a new form titled
Bot Performance Audit v1. - Do not use standard agent questions (like “Did they say the company name?”). Instead, use bot-specific questions:
- Q1: Did the NLU correctly identify the primary intent? (Yes/No)
- Q2: Did the Bot extract the correct slot entities (e.g., Account Number)? (Yes/No)
- Q3: Was the Bot’s response clear and contextually accurate? (Yes/No)
- Q4: Why did the customer escalate to a human? (Multiple Choice: NLU Failure / Logic Loop / Complex Request / System Error / Asked for Agent Immediately)
- Set the form to evaluate the
Botparticipant, not the Agent participant.
3. Architecting the Escalation Audit Flow
You cannot evaluate every bot interaction. You need a targeted sampling mechanism. We will focus on “Escalations” (when the bot hands off to a human) because that represents a containment failure.
Architectural Reasoning:
If you force a QA auditor to manually search for bot interactions, they will waste hours. You must push the interactions to them using a specialized Queue.
Implementation Steps:
- In your Architect Bot Flow, locate the path where the bot escalates to an agent (e.g., the
Transfer to ACDblock). - Just before the transfer, use a
Set Participant Dataaction to stamp the interaction withBotFailureReason = "Escalated". - Create a dedicated Quality Management Policy (Admin > Quality > Policies).
- Set the condition: If Participant Data
BotFailureReasonexists AND the interaction contains a Bot participant. - Set the Action: Assign an Evaluation using the
Bot Performance Audit v1form. - Route these evaluations directly to the inbox of your specialized “Bot QA” team.
4. Evaluating “Contained” Interactions
Escalations tell you why the bot failed. But you must also audit a random sample of contained interactions to ensure the bot isn’t “containing” customers by frustrating them into hanging up.
Implementation Steps:
- Create a second Quality Management Policy.
- Set the condition: If interaction contains a Bot participant AND did NOT transfer to an ACD queue.
- Set the Action: Randomly assign 5% of matching interactions for Evaluation.
- In the Evaluation Form for contained calls, add a specific fatal question: Did the customer abandon the interaction due to bot failure/looping? If Yes, the bot’s score for that interaction is 0%.
Validation, Edge Cases & Troubleshooting
Edge Case 1: The “Silent Abandon” in Chat
- The Failure Condition: An auditor is reviewing a Web Chat. The bot asked “Please enter your account number.” The customer never replied, and the chat timed out after 15 minutes. The auditor doesn’t know if the customer fixed it themselves or got frustrated.
- The Root Cause: Asynchronous channels often end in “silent abandons” that are hard to classify.
- The Solution: Train auditors to look at the preceding steps. Did the bot ask for the account number in a confusing way? Did the customer provide the number, but the bot failed to parse it using Regex, asking them again? If the bot repeated a prompt immediately before the abandon, it is a bot failure. If the bot provided the correct knowledge article and the customer abandoned, it is a successful containment.
Edge Case 2: Bot Form Calibration
- The Failure Condition: Two QA evaluators review the exact same bot transcript. Evaluator A gives the bot a 90%, arguing the NLU worked perfectly. Evaluator B gives the bot a 40%, arguing the backend API returned the wrong data.
- The Root Cause: Subjective interpretation of “Bot Performance.”
- The Solution: You must run Calibration Sessions for your Bot QA team just like you do for human agents. Because bots are deterministic, any failure is ultimately a design failure. Your evaluation form must strictly separate NLU failures (the bot didn’t understand) from Logic failures (the bot understood, but the backend data dip failed).