Architecting Data Warehouse Star Schema Designs Optimized for Contact Center Fact Tables
What This Guide Covers
This guide details the construction of a partitioned, surrogate-keyed star schema optimized for contact center interaction fact tables ingested from Genesys Cloud CX and NICE CXone. You will build a dimensional model that prevents metric inflation across multi-step journeys, handles slowly changing agent metadata without historical breaks, and aligns with platform-specific API pagination and rate limits. The end result is a query-ready warehouse layer that supports sub-second SLA, AHT, and occupancy aggregations at enterprise scale.
Prerequisites, Roles & Licensing
- Licensing Tiers: Genesys Cloud CX requires CX 2 or CX 3 licensing for full interaction analytics access. NICE CXone requires Data Mart Standard or Premium tier for historical interaction exports. Workforce metrics require the WEM Add-on on both platforms.
- Platform Permissions:
- Genesys:
analytics:report:view,analytics:report:export,data:export:view,ucm:interaction:view - NICE:
Data Mart > Export,Reporting > Analytics Export,Interactions > View
- Genesys:
- OAuth Scopes:
analytics:report:view,data:export:view,ucm:interaction:view(Genesys).export:read,analytics:read,interaction:read(NICE). - External Dependencies: Cloud data warehouse (Snowflake, BigQuery, or Redshift), orchestration layer (Airflow, Dagster, or dbt Cloud), and a secure credential vault for OAuth token rotation. You must also have network egress configured for API calls or SFTP pull endpoints.
- Cross-Platform Note: Both platforms deliver interaction data as flat, denormalized exports. The star schema requires explicit decomposition to prevent query degradation. Refer to the WFM Integration Guide for how this fact table feeds into schedule adherence calculations, and consult the Speech Analytics Pipeline Guide for transcript-to-fact joining strategies.
The Implementation Deep-Dive
1. Dimensional Modeling Strategy
Contact center platforms assign natural keys to agents, queues, skills, and channels. These identifiers are volatile. Agents change departments, queues get archived and recreated, and skill groups undergo rebranding. If you anchor your warehouse to platform-generated IDs, you will experience referential integrity breaks during org migrations or skill reassignments.
Build all dimension tables with integer surrogate keys generated in the warehouse. The surrogate key serves as the immutable primary key. The platform ID becomes a source_system_id column. Implement Slowly Changing Dimension Type 2 (SCD2) for dim_agent, dim_queue, and dim_skill. SCD2 preserves historical state by creating new rows when attributes change, bounded by valid_start_date and valid_end_date.
Dimension Table Structure Example:
CREATE TABLE dim_agent (
agent_sk INT PRIMARY KEY,
source_system_id VARCHAR(64),
agent_name VARCHAR(256),
department_code VARCHAR(64),
shift_type VARCHAR(32),
valid_start_date TIMESTAMP,
valid_end_date TIMESTAMP,
is_current BOOLEAN
);
The Trap: Using updated_after timestamps from the platform API as the sole trigger for SCD2 updates. Platform exports frequently backfill historical records without changing the updated_after field, or they push soft attribute changes (like a queue manager reassignment) that do not trigger a timestamp update. If your ETL only checks timestamps, you will miss dimension drift. Historical AHT calculations will suddenly attribute performance to the wrong department or shift type.
Architectural Reasoning: We decouple warehouse identity from vendor identity because vendor IDs are operational, not analytical. The surrogate key guarantees stable joins across fact tables. SCD2 is mandatory for any attribute that impacts reporting segmentation. We implement a hash-diff column (attribute_hash) in the staging layer. The hash compares the current record against the previous warehouse state. Only when the hash changes do we close the prior row and insert a new one. This eliminates false positives from timestamp drift while capturing every meaningful attribute change.
2. Fact Table Architecture
Interaction data arrives as a stream of events. A single customer request often spans multiple steps: initial voice call, callback chat, email follow-up, and internal disposition routing. If you flatten all steps into a single wide row, you will double-count handle time, misalign SLA breaches, and break first-contact resolution metrics.
Design a unified fact_interaction table at the interaction level, not the step level. Each row represents one customer request lifecycle. Multi-step journeys are handled through explicit parent_interaction_id and step_sequence columns. This approach preserves analytical simplicity while maintaining full journey visibility.
Fact Table Schema:
CREATE TABLE fact_interaction (
interaction_sk BIGINT PRIMARY KEY,
interaction_date DATE,
queue_sk INT,
agent_sk INT,
channel_sk INT,
parent_interaction_id VARCHAR(64),
step_sequence INT,
is_primary_step BOOLEAN,
wait_time_seconds INT,
handle_time_seconds INT,
total_duration_seconds INT,
disposition_code VARCHAR(32),
sla_breached BOOLEAN,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
The Trap: Storing handle_time and wait_time at the step level and summing them without filtering for is_primary_step. When a voice call escalates to a supervisor, the platform exports two steps. Summing both steps inflates AHT by 30 to 40 percent. SLA calculations break because the wait time is counted twice.
Architectural Reasoning: We isolate metrics at the interaction level because business KPIs are defined per request, not per touchpoint. The is_primary_step flag identifies the initial contact that triggers the customer journey. All duration metrics attach to that step. Subsequent steps carry zero duration values for aggregation purposes, but retain their own disposition and routing metadata for journey analytics. This design allows standard SUM() and AVG() queries to return accurate results without complex window functions. For platforms like NICE CXone that export step-level granularity by default, the ELT layer must pivot steps into a single interaction row using MAX() or LAST_VALUE() window functions keyed on interaction_id.
3. ETL Pipeline Construction
Platform APIs enforce strict rate limits and cursor-based pagination. Genesys Cloud CX limits analytics exports to 1,000 records per page with a maximum of 10 concurrent requests per OAuth token. NICE CXone uses token-based pagination with a 500-request-per-minute cap on Data Mart endpoints. Your pipeline must handle delta loads, idempotent inserts, and token rotation without dropping records.
Configure the extraction layer to pull interaction details using deterministic cursors. Never rely on offset-based pagination. Offsets degrade exponentially as datasets grow and cause skipped records when concurrent updates occur.
Genesys Cloud CX Extraction Payload:
POST https://api.mypurecloud.com/api/v2/analytics/interactions/queues/details
Authorization: Bearer <oauth_token>
Content-Type: application/json
{
"dateFrom": "2024-01-01T00:00:00.000Z",
"dateTo": "2024-01-01T23:59:59.999Z",
"interval": "PT1H",
"groupings": ["queue", "stepType", "dispositionCode"],
"metrics": ["totalHandleTime", "totalWaitTime", "interactions"],
"size": 1000,
"cursor": "eyJvZmZzZXQiOjAsInNvcnQiOiJpbnRlcmFjdGlvbl9pZCJ9"
}
NICE CXone Extraction Payload:
GET https://api.nice-incontact.com/api/v2/data-mart/exports/interactions?dateFrom=2024-01-01&dateTo=2024-01-02&cursor=abc123
Authorization: Bearer <oauth_token>
Accept: application/json
The Trap: Using updated_after as the sole delta load trigger without checking interaction status fields. Both platforms reprocess interactions when dispositions change, when recordings attach, or when routing rules retroactively update metadata. A reprocessed record gets a new updated_at timestamp but retains the same interaction_id. If your pipeline performs INSERT only, you will create duplicate fact rows. Aggregations will inflate. SLA breach counts will double.
Architectural Reasoning: We implement idempotent MERGE operations keyed on interaction_id and step_sequence. The staging layer lands raw JSON/CSV into a transient table. A transformation job hashes the entire row payload. If the hash matches the current warehouse state, the record is skipped. If the hash differs, the old row is updated, and SCD2 logic applies to any joined dimensions. Token rotation is handled by refreshing the OAuth bearer token 5 minutes before expiration. Failed API calls trigger exponential backoff with a maximum retry of 3. We partition the staging layer by extraction_timestamp to isolate bad batches without blocking downstream consumption.
4. Query Optimization & Storage Tiering
Contact center fact tables grow rapidly. A 500-seat center generates 2 to 3 million interaction rows monthly. At this volume, full table scans destroy query performance and inflate cloud warehouse costs. You must implement partitioning, clustering, and materialized views aligned with actual reporting patterns.
Partition the fact_interaction table on interaction_date. This enables partition pruning for date-range queries, which covers 80 percent of standard reporting use cases. Cluster the table on queue_sk and channel_sk. These columns have medium cardinality and align with how analysts filter data. Avoid clustering on high-cardinality fields like interaction_id or agent_sk. High cardinality clustering prevents effective data skipping and increases storage recomputation costs.
Snowflake Clustering Configuration:
ALTER TABLE fact_interaction CLUSTER BY (queue_sk, channel_sk, interaction_date);
BigQuery Partition & Cluster Configuration:
CREATE TABLE fact_interaction (
-- columns...
)
PARTITION BY DATE(interaction_date)
CLUSTER BY queue_sk, channel_sk;
The Trap: Building materialized views on top of unpartitioned staging tables. When analysts run ad-hoc queries against raw exports, the warehouse scans terabytes of historical data. Query costs spike. Dashboard timeouts occur. The underlying fact table remains unoptimized because the materialized view masks the performance debt until a schema change invalidates it.
Architectural Reasoning: We tier storage by access frequency. Current month data resides in the primary partition. Data older than 12 months moves to a cold storage tier via warehouse-specific lifecycle policies. Materialized views are built only on the optimized fact table, not on staging layers. We pre-aggregate common KPIs into a fact_interaction_daily_summary table that rolls up total_handle_time, total_wait_time, and interaction_count by queue_sk and interaction_date. This summary table powers executive dashboards with sub-second response times. The detailed fact table remains available for drill-through. This separation prevents report writers from accidentally triggering full fact scans when they only need daily trends.
Validation, Edge Cases & Troubleshooting
Edge Case 1: Multi-Step Interaction Double Counting in SLA Calculations
- The failure condition: SLA breach reports show 65 percent of interactions failing to meet the 20-second wait target. Operations leadership disputes the number because real-time dashboards show 42 percent.
- The root cause: The ETL pipeline ingests both the initial voice step and the subsequent callback chat step as independent interaction rows. Both steps carry non-zero
wait_time_seconds. The aggregation query sums all steps without filtering foris_primary_step. The callback step inherits the original wait time or records a secondary wait time depending on platform routing logic. - The solution: Enforce
is_primary_step = TRUEin all SLA and AHT aggregation queries. Update the ELT transformation to setwait_time_seconds = 0andhandle_time_seconds = 0on all steps wherestep_sequence > 1. Add a validation test in your dbt/Airflow pipeline that comparesCOUNT(*)from the fact table againstSUM(CASE WHEN is_primary_step THEN 1 ELSE 0 END). The delta should equal the number of multi-step interactions. Alert if the difference exceeds 5 percent.
Edge Case 2: Agent Dimension Slowly Changing Dimension Drift During Shift Handoffs
- The failure condition: End-of-month occupancy reports show agents working 14 hours in a single day. Schedule adherence scores drop to 60 percent. WEM compliance flags trigger automatically.
- The root cause: The platform exports agent metadata updates asynchronously. When an agent switches from a day queue to an evening queue, the
queue_idchanges, but theupdated_attimestamp on the agent record lags by 15 to 30 minutes. The ETL captures the old queue assignment for interactions that occurred during the switch. The warehouse joins the interaction fact to the stale agent dimension row. Historical occupancy calculations attribute time to the wrong shift type. - The solution: Implement a time-window reconciliation job that runs daily at 02:00 UTC. The job compares
agent_skandqueue_skin the fact table against thedim_agentvalid date ranges. When a mismatch is detected, the job updates the fact table’sagent_skto match the dimension row that was active atinteraction_date. Disable SCD2 backfilling during live hours. Route all dimension updates through a staging queue that applies changes only after platform export completion windows close. Cross-reference the WFM Integration Guide for how to align this reconciliation with shift boundary timestamps.