Designing Customer Deduplication Workflows with Manual Review Queues for Ambiguous Matches

Designing Customer Deduplication Workflows with Manual Review Queues for Ambiguous Matches

What This Guide Covers

This guide details the architectural implementation of a hybrid deduplication workflow that automatically merges high-confidence duplicate records while routing low-confidence matches to a specialized manual review queue. You will configure Genesys Cloud CX Architect flows, Data Connector logic, and Omni-Channel routing rules to create a closed-loop system where agents resolve identity conflicts without interrupting the primary customer journey. The end result is a scalable data hygiene pipeline that reduces database bloat while ensuring human oversight for complex identity resolution scenarios.

Prerequisites, Roles & Licensing

  • Licensing Tiers:
    • Genesys Cloud CX 1 (or higher) for Omni-Channel routing and Agent capabilities.
    • Genesys Cloud CX 3 (or higher) for Advanced Architect features (specifically Data Connector integration and complex expression handling).
    • Optional: Genesys Cloud Speech Analytics if audio context is required for review decisions.
  • Permissions:
    • Routing > Flow > Edit (to create the deduplication flow).
    • Routing > Queue > Edit (to create the manual review queue).
    • Data > Data Connector > Edit (to manage external system writes).
    • Admin > User > Edit (to assign specific roles to review agents).
  • External Dependencies:
    • A target CRM or CDP (e.g., Salesforce, Microsoft Dynamics, or a custom PostgreSQL instance) capable of accepting merge or update API calls.
    • A defined “Golden Record” strategy within the target system (e.g., which record ID survives a merge).

The Implementation Deep-Dive

1. Establishing the Deterministic Matching Engine

The foundation of any deduplication strategy is the matching algorithm. In Genesys Cloud, we do not rely solely on the platform to “guess” duplicates. Instead, we leverage the Data Connector to push potential match sets to an external logic layer or use Architect Expressions for simple deterministic checks. For this workflow, we assume a hybrid approach: Genesys performs a preliminary check using email and phone number, and if a conflict exists, it evaluates a confidence score.

Architect Flow Configuration

Create a new Architect Flow named Deduplication_Engine_v1.

  1. Trigger: Set the trigger to Data Event or API Request depending on whether this is batch-processed or real-time. For real-time customer interaction, use Interaction triggers (e.g., Web Chat or Voice) that initiate a lookup.
  2. Lookup Block: Use a Data Lookup block to query the external CRM.
    • Endpoint: POST /api/v2/customers/lookup
    • Payload:
    {
      "email": "{{interaction.customer.email}}",
      "phone": "{{interaction.customer.phoneNumber}}"
    }
    
  3. Expression Evaluation: Create an expression to calculate a MatchScore.
    • Expression Name: CalculateMatchConfidence
    • Expression Code:
    var emailMatch = false;
    var phoneMatch = false;
    var nameSimilarity = 0;
    
    // Assume lookupResults is an array of potential matches from the CRM
    if (lookupResults.length > 0) {
        // Check for exact email match
        if (lookupResults[0].email === interaction.customer.email) {
            emailMatch = true;
        }
        
        // Check for exact phone match
        if (lookupResults[0].phone === interaction.customer.phoneNumber) {
            phoneMatch = true;
        }
    
        // Simple Levenshtein distance approximation for name
        // Note: In production, use a dedicated string similarity library or external API
        nameSimilarity = calculateSimilarity(lookupResults[0].firstName, interaction.customer.firstName);
    }
    
    var score = 0;
    if (emailMatch) score += 50;
    if (phoneMatch) score += 40;
    if (nameSimilarity > 0.8) score += 10;
    
    return score;
    

The Trap: The “Exact Match” Fallacy

A common misconfiguration is treating any match on a single attribute (e.g., email) as a definitive duplicate. This causes catastrophic data loss when two distinct customers share the same email address (e.g., info@company.com or a family shared inbox).

Architectural Reasoning: We enforce a Confidence Threshold. If MatchScore >= 90, the system treats this as a deterministic duplicate. If MatchScore < 90 but > 50, it is an ambiguous match. If <= 50, it is a unique customer. Only the ambiguous range triggers the manual review queue. This prevents automated merges from overwriting distinct records while avoiding manual review for obvious duplicates like identical email and phone combinations.

2. Routing Ambiguous Matches to a Manual Review Queue

When the MatchScore falls into the ambiguous zone, the workflow must pause the automated process and engage a human agent. We do not use standard IVR queues for this. We create a dedicated Task Queue within Genesys Cloud Omni-Channel.

Queue Configuration

  1. Navigate to Admin > Routing > Queues.
  2. Create a new Queue named Identity_Review_High_Priority.
  3. Skills: Assign a specific skill, e.g., Data_Stewardship_L1.
  4. Wrap-Up Codes: Configure mandatory wrap-up codes that map to CRM actions.
    • MERGE_RECORDS: Agent confirms these are the same person.
    • SEPARATE_RECORDS: Agent confirms these are different people.
    • ESCALATE_L2: Agent needs senior data analyst input.

Architect Flow: Creating the Work Item

In the Deduplication_Engine_v1 flow, add a Set Variable block to prepare the context for the agent.

  • Variable: ReviewContext
  • Value:
{
  "potentialDuplicateId": "{{lookupResults[0].id}}",
  "newCustomerId": "{{interaction.customer.id}}",
  "matchScore": "{{CalculateMatchConfidence}}",
  "conflictingFields": {
     "address": "{{lookupResults[0].address}}",
     "preferredName": "{{lookupResults[0].firstName}}"
  }
}

Next, use the Create Task block (or API Request to the Task API) to generate a work item.

  • API Endpoint: POST /api/v2/tasks
  • Payload:
{
  "type": "task",
  "priority": 1,
  "targetAddress": "queue:Identity_Review_High_Priority",
  "callbackAddress": "flow:Deduplication_Engine_v1:step:WaitForAgentDecision",
  "context": {
    "ReviewContext": "{{ReviewContext}}",
    "interactionId": "{{interaction.id}}"
  },
  "media": {
    "type": "chat",
    "from": {
      "address": "system@deduplication.genesis.com",
      "name": "Deduplication Engine"
    },
    "to": [
      {
        "address": "queue:Identity_Review_High_Priority",
        "name": "Identity Review Queue"
      }
    ]
  }
}

The Trap: Blocking the Customer Journey

If this flow is triggered during a live customer interaction (e.g., a web chat), creating a task that blocks the flow until the agent responds will cause a timeout. The customer will experience a “silent hang” or a disconnect.

Architectural Reasoning: Implement a Decoupled Pattern. The Architect flow should create the task and then immediately proceed to a Wait block with a short timeout (e.g., 30 seconds) or, preferably, transition the customer to a standard “Thank You” state while the background task is processed. The “Manual Review” is a post-interaction data hygiene process, not a real-time conversation blocker. If real-time resolution is required, use a Callback Address in the task creation that points to a separate “Resume” flow, but only initiate this if the customer has explicitly opted into extended wait times. For 99% of deduplication cases, asynchronous processing is the correct engineering choice.

3. The Agent Workspace for Identity Resolution

Agents assigned to the Identity_Review_High_Priority queue need a specialized view. They are not having a conversation; they are performing a data audit.

Configuring the Task View

  1. Ensure agents have the Data Steward role.
  2. Customize the Agent Desktop to display the ReviewContext JSON clearly.
  3. Integrate a Side Panel application (using Genesys Cloud App Framework) that renders the two conflicting records side-by-side.

The Resolution Action

When the agent makes a decision, they must trigger a specific action that the Architect flow can interpret.

  1. Wrap-Up Code Integration: When the agent selects MERGE_RECORDS, the wrap-up code is attached to the task.
  2. API Update: The task completion triggers a webhook or an Architect Task Completed trigger.

The Trap: Subjective Agent Decisions

Agents often make inconsistent merge decisions based on incomplete information. One agent might merge two records because the names are similar, while another might split them because the zip codes differ. This leads to “ping-pong” duplicates where records are merged and split repeatedly.

Architectural Reasoning: Enforce Deterministic Rules in the UI. The Agent Desktop app should highlight why the match is ambiguous. If the email matches but the phone differs, the UI should prompt: “Email matches. Phone differs. Do you want to update the phone number on the existing record?” This guides the agent toward a specific data action (Update vs. Merge) rather than a binary Yes/No. This reduces cognitive load and increases data consistency.

4. Executing the Merge via Data Connector

Once the agent completes the task, the Deduplication_Engine_v1 flow resumes (or is triggered by the task completion event). It must now execute the actual database change.

The Merge Logic

  1. Event Trigger: Task Completed
  2. Expression: Check the wrapupCode.
    • If MERGE_RECORDS:
      • Construct a merge payload for the CRM.
      • Use Data Connector to call the CRM Merge API.
    • If SEPARATE_RECORDS:
      • Log the event for audit.
      • Optionally, add a tag to the new record: Verified_Unique.

API Payload for Merge (Salesforce Example)

{
  "recordsToKeep": [
    {
      "id": "{{ReviewContext.potentialDuplicateId}}",
      "type": "Contact"
    }
  ],
  "recordsToDelete": [
    {
      "id": "{{ReviewContext.newCustomerId}}",
      "type": "Contact"
    }
  ]
}

The Trap: Race Conditions

If two ambiguous matches occur for the same customer record within a short timeframe (e.g., two different agents reviewing two different new contacts against the same old record), a race condition can occur. One agent merges Record A into Record B. Simultaneously, another agent merges Record C into Record B. If Record B was already deleted or modified by the first merge, the second API call will fail.

Architectural Reasoning: Implement Optimistic Locking or Idempotency Keys. Every merge API call must include a unique Idempotency-Key header. If the CRM receives a duplicate key, it returns a success status without re-executing the merge. Additionally, the Architect flow should check the status of the target record before attempting a merge. If the target record is marked as Deleted or Merged, the flow should abort and log an error. This prevents cascading API failures during high-volume data ingestion periods.

Validation, Edge Cases & Troubleshooting

Edge Case 1: The “Phantom” Duplicate

The Failure Condition: The system flags a duplicate, the agent reviews it, and selects SEPARATE_RECORDS. However, the next day, the same customer interacts again, and the system flags the same pair as duplicates again.

The Root Cause: The “Separate” decision was logged as a wrap-up code on the task, but it was not persisted as a “Do Not Merge” rule in the CRM or Genesys Customer Data. The deduplication engine re-evaluates the raw data (email/phone) and sees the same ambiguity.

The Solution: Implement a Negative Cache. When an agent selects SEPARATE_RECORDS, create a custom attribute on both records in the CRM: Genesys_Dedup_Exclude_ID. This attribute stores the ID of the other record. The initial lookup expression must check for this attribute. If RecordA.Genesis_Dedup_Exclude_ID == RecordB.ID, the system bypasses the match entirely. This ensures that human decisions are respected in future automated runs.

Edge Case 2: Agent Queue Saturation

The Failure Condition: During a marketing campaign, thousands of new leads are ingested. The ambiguous match rate spikes to 40%. The Identity_Review_High_Priority queue depth exceeds 5,000 tasks. Agents are overwhelmed, and the backlog grows. The data quality degrades as old tasks expire.

The Root Cause: The threshold for “Ambiguous” is too low, or the campaign data is particularly dirty (e.g., typos in email addresses causing near-matches).

The Solution: Implement Dynamic Thresholding. Use a Genesys Cloud Prediction or Analytics API to monitor queue depth. If the queue depth exceeds a defined limit (e.g., 500 tasks), automatically adjust the MatchScore threshold in the Architect flow from 50 to 70. This forces more matches into the “Automatic Merge” or “Automatic Unique” buckets, reducing the burden on agents. Alternatively, implement Batch Processing for non-real-time interactions. Instead of creating a task for every ambiguous match, aggregate them into a single daily digest for data stewards to review in bulk.

Edge Case 3: API Rate Limiting on Merge

The Failure Condition: The Architect flow attempts to merge 100 records per minute. The CRM API returns 429 Too Many Requests. The flow enters a retry loop, eventually timing out.

The Root Cause: The flow processes tasks sequentially or in parallel without respecting the downstream system’s rate limits.

The Solution: Use Genesys Cloud Bulk APIs if available. If not, implement a Throttling Mechanism in the Architect flow. Use a Wait block with a dynamic duration based on the API response headers. If the CRM returns a Retry-After header, parse it and pause the flow for that duration. For high-volume scenarios, move the merge logic out of Genesys Architect entirely. Instead, have the Architect flow write the merge requests to a Message Queue (e.g., AWS SQS or Azure Service Bus). A separate microservice consumes the queue and handles rate limiting, retries, and error handling. This decouples the contact center platform from the data integration performance constraints.

Official References