Architecting a dbt Transformation Layer for Genesys Cloud Raw Analytics Data
What This Guide Covers
You are building a dbt (data build tool) transformation layer that converts raw Genesys Cloud analytics exports into a clean, business-analytics-ready dimensional data model. When complete, your analytics engineers will work with curated, well-documented dbt models-fct_conversations, dim_agents, dim_queues, fct_evaluations-that abstract away the complex, normalized API response structures, handle common data quality issues (null handling, late-arriving data, duplicate records), and produce metrics tables with pre-computed KPIs ready for Looker, Tableau, or Power BI consumption.
Prerequisites, Roles & Licensing
- Genesys Cloud: Any CX tier with Analytics API access.
- Infrastructure:
- A data warehouse (Snowflake, BigQuery, or AWS Redshift) already populated with raw Genesys Cloud data (either via the Apache Iceberg pipeline from the companion guide, or Genesys Cloud’s native data export to S3).
- dbt Core (open source) or dbt Cloud installed and connected to your warehouse.
The Implementation Deep-Dive
1. The Raw Analytics Data Problem
Genesys Cloud’s Analytics API returns highly normalized, nested data. A single conversation has:
- Multiple participants (customer, agent, IVR, conference parties).
- Multiple segments per participant (alerting, connected, hold, ACW).
- Multiple sessions per segment (for transfers).
- Evaluation data attached separately to the conversation ID.
Analysts who query raw data directly often make mistakes:
- Double-counting handle time by summing all segments without deduplication.
- Missing transferred interactions by filtering only on
agentId IS NOT NULL. - Incorrectly attributing conversations to queues when a transfer occurred.
dbt models encode these business logic rules as SQL transformations, creating a single authoritative source of truth.
2. dbt Project Structure
genesys_dbt/
├── dbt_project.yml
├── models/
│ ├── staging/
│ │ ├── stg_conversations.sql # Raw API data, minimal cleaning
│ │ ├── stg_participants.sql # Exploded participant rows
│ │ └── stg_evaluations.sql # Raw quality evaluations
│ ├── intermediate/
│ │ ├── int_conversation_segments.sql # Segment deduplication
│ │ └── int_agent_handling.sql # Attributed agent metrics
│ ├── marts/
│ │ ├── fct_conversations.sql # One row per conversation
│ │ ├── fct_evaluations.sql # One row per evaluation
│ │ ├── dim_agents.sql # Agent dimension table
│ │ ├── dim_queues.sql # Queue dimension table
│ │ └── metrics/
│ │ ├── mrt_daily_queue_kpis.sql # Pre-computed daily KPIs
│ │ └── mrt_agent_scorecard.sql # Agent performance summary
├── tests/
│ ├── generic/
│ │ └── assert_conversation_duration_positive.sql
│ └── singular/
│ └── assert_no_orphan_evaluations.sql
└── sources.yml
3. The Staging Layer - Minimal Cleaning
-- models/staging/stg_conversations.sql
-- Selects and renames raw columns from the Iceberg/data warehouse source table.
-- No business logic - only type casting, renaming, and null coalescing.
{{ config(materialized='view') }}
WITH source AS (
SELECT * FROM {{ source('genesys_raw', 'conversations') }}
),
renamed AS (
SELECT
conversation_id,
-- Standardize timestamps
CAST(start_time AS TIMESTAMP) AS conversation_start_at,
CAST(end_time AS TIMESTAMP) AS conversation_end_at,
-- Standardize channel to lowercase enum
LOWER(channel) AS channel,
-- Queue identity
queue_id,
queue_name,
-- Agent identity (may be NULL for bot-only interactions)
agent_id,
agent_name,
-- Duration fields
COALESCE(handle_seconds, 0) AS handle_seconds,
COALESCE(talk_seconds, 0) AS talk_seconds,
COALESCE(hold_seconds, 0) AS hold_seconds,
COALESCE(acw_seconds, 0) AS acw_seconds,
-- Flags
COALESCE(abandoned, FALSE) AS is_abandoned,
COALESCE(transferred, FALSE) AS is_transferred,
-- Wrap-up
wrapup_code,
-- Metadata
ingested_at,
-- Data quality flag
CASE
WHEN handle_seconds < 0 THEN 'INVALID_DURATION'
WHEN conversation_end_at < conversation_start_at THEN 'INVALID_TIMESTAMPS'
ELSE 'OK'
END AS dq_flag
FROM source
)
SELECT * FROM renamed
4. The Intermediate Layer - Business Logic
-- models/intermediate/int_agent_handling.sql
-- Computes attributable handle time per agent, accounting for transfers.
-- Rule: Credit goes to the last agent who handled the interaction to resolution,
-- unless it was abandoned, in which case credit goes to no agent.
{{ config(materialized='ephemeral') }}
WITH conversations AS (
SELECT * FROM {{ ref('stg_conversations') }}
WHERE dq_flag = 'OK'
),
agent_interactions AS (
SELECT
conversation_id,
channel,
conversation_start_at,
conversation_end_at,
queue_id,
queue_name,
agent_id,
agent_name,
handle_seconds,
talk_seconds,
hold_seconds,
acw_seconds,
is_abandoned,
is_transferred,
wrapup_code,
-- Handle time bucketing (for heatmap analysis)
CASE
WHEN handle_seconds < 120 THEN '0-2min'
WHEN handle_seconds < 300 THEN '2-5min'
WHEN handle_seconds < 600 THEN '5-10min'
WHEN handle_seconds < 1200 THEN '10-20min'
ELSE '20min+'
END AS handle_time_bucket,
-- Hour of day for temporal analysis
EXTRACT(HOUR FROM conversation_start_at) AS hour_of_day,
DAYOFWEEK(conversation_start_at) AS day_of_week,
DATE_TRUNC('day', conversation_start_at) AS interaction_date
FROM conversations
-- Only include agent-handled interactions
WHERE agent_id IS NOT NULL
AND NOT is_abandoned
)
SELECT * FROM agent_interactions
5. The Mart Layer - KPI Tables
-- models/marts/metrics/mrt_daily_queue_kpis.sql
-- Pre-computed daily KPI table consumed directly by BI tools.
{{ config(
materialized='table',
post_hook="ANALYZE TABLE {{ this }}"
) }}
WITH base AS (
SELECT * FROM {{ ref('int_agent_handling') }}
),
daily_queue_summary AS (
SELECT
interaction_date,
queue_id,
queue_name,
channel,
-- Volume metrics
COUNT(conversation_id) AS total_interactions,
SUM(CASE WHEN is_abandoned THEN 1 ELSE 0 END) AS abandoned_count,
SUM(CASE WHEN is_transferred THEN 1 ELSE 0 END) AS transferred_count,
-- Efficiency metrics
ROUND(AVG(handle_seconds), 1) AS avg_handle_seconds,
ROUND(AVG(talk_seconds), 1) AS avg_talk_seconds,
ROUND(AVG(acw_seconds), 1) AS avg_acw_seconds,
-- Percentile metrics (for SLA distribution analysis)
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY handle_seconds) AS p50_handle_seconds,
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY handle_seconds) AS p90_handle_seconds,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY handle_seconds) AS p95_handle_seconds,
-- Answer rate
ROUND(
(COUNT(*) - SUM(CASE WHEN is_abandoned THEN 1 ELSE 0 END)) * 100.0 / NULLIF(COUNT(*), 0),
2
) AS answer_rate_pct
FROM base
GROUP BY 1, 2, 3, 4
)
SELECT * FROM daily_queue_summary
ORDER BY interaction_date DESC, total_interactions DESC
Validation, Edge Cases & Troubleshooting
Edge Case 1: Duplicate Conversations from Re-queuing
If a customer is transferred between queues, some raw data pipelines may create duplicate conversation_id rows in the staging table (one per queue leg). Aggregating without deduplication inflates total_interactions.
Solution: Add a dbt test unique: conversation_id to stg_conversations. If duplicates are detected, add a deduplication CTE that selects only the final agent/queue assignment based on conversation_end_at.
Edge Case 2: Model Freshness SLA
BI dashboards powered by mrt_daily_queue_kpis are only as fresh as the last dbt run. If the dbt job fails silently, executives may be making decisions based on 3-day-old data without knowing it.
Solution: Configure dbt source freshness checks in sources.yml with a max freshness of 4 hours: loaded_at_field: ingested_at and freshness: warn_after: {count: 4, period: hour}. Add a Slack notification on freshness warnings in your CI pipeline.
Edge Case 3: Long-Running Models on Full Historical Data
Running mrt_daily_queue_kpis on 2 years of data takes 20 minutes in Snowflake, causing BI tool timeout errors when engineers refresh data.
Solution: Add an incremental materialization strategy. Change materialized='table' to materialized='incremental' with a unique_key='interaction_date || queue_id || channel'. dbt will only process new/modified rows since the last run, reducing runtime from 20 minutes to under 2 minutes for daily runs.