Architecting a dbt Transformation Layer for Genesys Cloud Raw Analytics Data

Architecting a dbt Transformation Layer for Genesys Cloud Raw Analytics Data

What This Guide Covers

You are building a dbt (data build tool) transformation layer that converts raw Genesys Cloud analytics exports into a clean, business-analytics-ready dimensional data model. When complete, your analytics engineers will work with curated, well-documented dbt models-fct_conversations, dim_agents, dim_queues, fct_evaluations-that abstract away the complex, normalized API response structures, handle common data quality issues (null handling, late-arriving data, duplicate records), and produce metrics tables with pre-computed KPIs ready for Looker, Tableau, or Power BI consumption.


Prerequisites, Roles & Licensing

  • Genesys Cloud: Any CX tier with Analytics API access.
  • Infrastructure:
    • A data warehouse (Snowflake, BigQuery, or AWS Redshift) already populated with raw Genesys Cloud data (either via the Apache Iceberg pipeline from the companion guide, or Genesys Cloud’s native data export to S3).
    • dbt Core (open source) or dbt Cloud installed and connected to your warehouse.

The Implementation Deep-Dive

1. The Raw Analytics Data Problem

Genesys Cloud’s Analytics API returns highly normalized, nested data. A single conversation has:

  • Multiple participants (customer, agent, IVR, conference parties).
  • Multiple segments per participant (alerting, connected, hold, ACW).
  • Multiple sessions per segment (for transfers).
  • Evaluation data attached separately to the conversation ID.

Analysts who query raw data directly often make mistakes:

  • Double-counting handle time by summing all segments without deduplication.
  • Missing transferred interactions by filtering only on agentId IS NOT NULL.
  • Incorrectly attributing conversations to queues when a transfer occurred.

dbt models encode these business logic rules as SQL transformations, creating a single authoritative source of truth.


2. dbt Project Structure

genesys_dbt/
├── dbt_project.yml
├── models/
│   ├── staging/
│   │   ├── stg_conversations.sql          # Raw API data, minimal cleaning
│   │   ├── stg_participants.sql            # Exploded participant rows
│   │   └── stg_evaluations.sql            # Raw quality evaluations
│   ├── intermediate/
│   │   ├── int_conversation_segments.sql  # Segment deduplication
│   │   └── int_agent_handling.sql         # Attributed agent metrics
│   ├── marts/
│   │   ├── fct_conversations.sql          # One row per conversation
│   │   ├── fct_evaluations.sql            # One row per evaluation
│   │   ├── dim_agents.sql                 # Agent dimension table
│   │   ├── dim_queues.sql                 # Queue dimension table
│   │   └── metrics/
│   │       ├── mrt_daily_queue_kpis.sql   # Pre-computed daily KPIs
│   │       └── mrt_agent_scorecard.sql    # Agent performance summary
├── tests/
│   ├── generic/
│   │   └── assert_conversation_duration_positive.sql
│   └── singular/
│       └── assert_no_orphan_evaluations.sql
└── sources.yml

3. The Staging Layer - Minimal Cleaning

-- models/staging/stg_conversations.sql
-- Selects and renames raw columns from the Iceberg/data warehouse source table.
-- No business logic - only type casting, renaming, and null coalescing.

{{ config(materialized='view') }}

WITH source AS (
    SELECT * FROM {{ source('genesys_raw', 'conversations') }}
),

renamed AS (
    SELECT
        conversation_id,
        -- Standardize timestamps
        CAST(start_time AS TIMESTAMP) AS conversation_start_at,
        CAST(end_time AS TIMESTAMP) AS conversation_end_at,
        
        -- Standardize channel to lowercase enum
        LOWER(channel) AS channel,
        
        -- Queue identity
        queue_id,
        queue_name,
        
        -- Agent identity (may be NULL for bot-only interactions)
        agent_id,
        agent_name,
        
        -- Duration fields
        COALESCE(handle_seconds, 0) AS handle_seconds,
        COALESCE(talk_seconds, 0) AS talk_seconds,
        COALESCE(hold_seconds, 0) AS hold_seconds,
        COALESCE(acw_seconds, 0) AS acw_seconds,
        
        -- Flags
        COALESCE(abandoned, FALSE) AS is_abandoned,
        COALESCE(transferred, FALSE) AS is_transferred,
        
        -- Wrap-up
        wrapup_code,
        
        -- Metadata
        ingested_at,
        
        -- Data quality flag
        CASE 
            WHEN handle_seconds < 0 THEN 'INVALID_DURATION'
            WHEN conversation_end_at < conversation_start_at THEN 'INVALID_TIMESTAMPS'
            ELSE 'OK'
        END AS dq_flag

    FROM source
)

SELECT * FROM renamed

4. The Intermediate Layer - Business Logic

-- models/intermediate/int_agent_handling.sql
-- Computes attributable handle time per agent, accounting for transfers.
-- Rule: Credit goes to the last agent who handled the interaction to resolution,
-- unless it was abandoned, in which case credit goes to no agent.

{{ config(materialized='ephemeral') }}

WITH conversations AS (
    SELECT * FROM {{ ref('stg_conversations') }}
    WHERE dq_flag = 'OK'
),

agent_interactions AS (
    SELECT
        conversation_id,
        channel,
        conversation_start_at,
        conversation_end_at,
        queue_id,
        queue_name,
        agent_id,
        agent_name,
        handle_seconds,
        talk_seconds,
        hold_seconds,
        acw_seconds,
        is_abandoned,
        is_transferred,
        wrapup_code,
        
        -- Handle time bucketing (for heatmap analysis)
        CASE
            WHEN handle_seconds < 120 THEN '0-2min'
            WHEN handle_seconds < 300 THEN '2-5min'
            WHEN handle_seconds < 600 THEN '5-10min'
            WHEN handle_seconds < 1200 THEN '10-20min'
            ELSE '20min+'
        END AS handle_time_bucket,
        
        -- Hour of day for temporal analysis
        EXTRACT(HOUR FROM conversation_start_at) AS hour_of_day,
        DAYOFWEEK(conversation_start_at) AS day_of_week,
        DATE_TRUNC('day', conversation_start_at) AS interaction_date

    FROM conversations
    -- Only include agent-handled interactions
    WHERE agent_id IS NOT NULL
      AND NOT is_abandoned
)

SELECT * FROM agent_interactions

5. The Mart Layer - KPI Tables

-- models/marts/metrics/mrt_daily_queue_kpis.sql
-- Pre-computed daily KPI table consumed directly by BI tools.

{{ config(
    materialized='table',
    post_hook="ANALYZE TABLE {{ this }}"
) }}

WITH base AS (
    SELECT * FROM {{ ref('int_agent_handling') }}
),

daily_queue_summary AS (
    SELECT
        interaction_date,
        queue_id,
        queue_name,
        channel,
        
        -- Volume metrics
        COUNT(conversation_id) AS total_interactions,
        SUM(CASE WHEN is_abandoned THEN 1 ELSE 0 END) AS abandoned_count,
        SUM(CASE WHEN is_transferred THEN 1 ELSE 0 END) AS transferred_count,
        
        -- Efficiency metrics
        ROUND(AVG(handle_seconds), 1) AS avg_handle_seconds,
        ROUND(AVG(talk_seconds), 1) AS avg_talk_seconds,
        ROUND(AVG(acw_seconds), 1) AS avg_acw_seconds,
        
        -- Percentile metrics (for SLA distribution analysis)
        PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY handle_seconds) AS p50_handle_seconds,
        PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY handle_seconds) AS p90_handle_seconds,
        PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY handle_seconds) AS p95_handle_seconds,
        
        -- Answer rate
        ROUND(
            (COUNT(*) - SUM(CASE WHEN is_abandoned THEN 1 ELSE 0 END)) * 100.0 / NULLIF(COUNT(*), 0),
            2
        ) AS answer_rate_pct

    FROM base
    GROUP BY 1, 2, 3, 4
)

SELECT * FROM daily_queue_summary
ORDER BY interaction_date DESC, total_interactions DESC

Validation, Edge Cases & Troubleshooting

Edge Case 1: Duplicate Conversations from Re-queuing

If a customer is transferred between queues, some raw data pipelines may create duplicate conversation_id rows in the staging table (one per queue leg). Aggregating without deduplication inflates total_interactions.
Solution: Add a dbt test unique: conversation_id to stg_conversations. If duplicates are detected, add a deduplication CTE that selects only the final agent/queue assignment based on conversation_end_at.

Edge Case 2: Model Freshness SLA

BI dashboards powered by mrt_daily_queue_kpis are only as fresh as the last dbt run. If the dbt job fails silently, executives may be making decisions based on 3-day-old data without knowing it.
Solution: Configure dbt source freshness checks in sources.yml with a max freshness of 4 hours: loaded_at_field: ingested_at and freshness: warn_after: {count: 4, period: hour}. Add a Slack notification on freshness warnings in your CI pipeline.

Edge Case 3: Long-Running Models on Full Historical Data

Running mrt_daily_queue_kpis on 2 years of data takes 20 minutes in Snowflake, causing BI tool timeout errors when engineers refresh data.
Solution: Add an incremental materialization strategy. Change materialized='table' to materialized='incremental' with a unique_key='interaction_date || queue_id || channel'. dbt will only process new/modified rows since the last run, reducing runtime from 20 minutes to under 2 minutes for daily runs.

Official References