Implementing Type-2 Slowly Changing Dimensions for Agent Skill History in Genesys Cloud CX

Implementing Type-2 Slowly Changing Dimensions for Agent Skill History in Genesys Cloud CX

What This Guide Covers

This guide details the implementation of a robust data architecture to track agent skill assignments and proficiency levels over time using Genesys Cloud Platform Events and an external relational database. The end result is a fully versioned history table that allows auditing of skill changes, accurate historical reporting on agent utilization, and compliance with data retention policies for regulated industries.

Prerequisites, Roles & Licensing

To implement this architecture successfully, the following components must be in place:

  • Licensing Tier: Genesys Cloud CX (Premium or Enterprise) is required to access Platform Events and granular skill assignment events. Basic tiers may lack event subscription capabilities for real-time tracking.
  • Permissions: The integration user requires specific scopes within the OAuth client configuration. You must enable skills:read to query current states and events:subscriptions to register event listeners. Additionally, users:read is necessary if you need to map internal Genesys IDs to external HR identifiers during the ETL process.
  • External Dependencies: A relational database capable of handling high-write throughput (e.g., PostgreSQL 14+, Amazon RDS for PostgreSQL) is required for the history store. This system must support ACID transactions to ensure data integrity during bulk inserts.
  • API Credentials: You will need a Client ID and Client Secret from the Genesys Cloud OAuth application configuration, along with an access token management strategy (e.g., short-lived tokens refreshed via refresh tokens).

The Implementation Deep-Dive

1. Designing the SCD Type-2 Schema

The core of this solution is defining how data persists when attributes change. A Slowly Changing Dimension Type 2 (SCD Type 2) strategy preserves history by creating a new row for every change event rather than overwriting existing data. This allows queries to retrieve the state of an agent’s skills at any specific point in time.

You must design a schema that captures the transactional nature of skill changes. Do not rely on Genesys Cloud reporting exports alone, as they aggregate data and may smooth out transient assignments. The external table must include effective start times, end times, and a current state flag to optimize read performance.

Schema Definition:

CREATE TABLE agent_skill_history (
    record_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id VARCHAR(50) NOT NULL,
    skill_id VARCHAR(50) NOT NULL,
    skill_name VARCHAR(100) NOT NULL,
    proficiency_level INTEGER NOT NULL CHECK (proficiency_level BETWEEN 0 AND 5),
    effective_start TIMESTAMP WITH TIME ZONE NOT NULL,
    effective_end TIMESTAMP WITH TIME ZONE DEFAULT '9999-12-31 23:59:59',
    is_current BOOLEAN DEFAULT TRUE,
    change_type VARCHAR(20) NOT NULL CHECK (change_type IN ('ASSIGNED', 'UNASSIGNED', 'LEVEL_CHANGED')),
    raw_payload JSONB,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX idx_agent_skill_active ON agent_skill_history (user_id, skill_id, effective_end);
CREATE INDEX idx_agent_skill_time ON agent_skill_history (effective_start, effective_end);

The Trap: A common misconfiguration occurs when the effective_end timestamp is not set correctly during an update. If you fail to close the previous record (set is_current to FALSE and update effective_end) before inserting the new active record, you create overlapping time ranges. This leads to ambiguous query results where a user appears to have two different proficiency levels simultaneously. Always ensure that the logic for closing the old record executes atomically with the insertion of the new record within a database transaction.

Architectural Reasoning:
We use SCD Type 2 instead of Type 1 because contact center analytics often require “as-of” reporting. For example, if an agent was assigned to a critical compliance skill last week but removed today, you need to prove they were qualified for that skill during the specific shift window. Storing only the current state (Type 1) makes this impossible without relying on Genesys Cloud’s internal historical reports, which may have latency or retention limits.

2. Configuring Platform Event Subscriptions

Genesys Cloud provides a robust event bus for real-time data ingestion. To capture skill changes without incurring API rate limit penalties from constant polling, you must register for specific skill assignment events. These events trigger when an agent is assigned a skill, the proficiency level changes, or the assignment is removed.

You will create an OAuth application and subscribe to the SkillAssignmentAdded, SkillAssignmentUpdated, and SkillAssignmentRemoved events. The integration user must have sufficient privileges to subscribe to these events without triggering audit flags for suspicious activity.

API Endpoint for Event Subscription:

POST /api/v2/notifications/subscriptions HTTP/1.1
Host: {env}.purecloud.com
Content-Type: application/json
Authorization: Bearer <access_token>

{
    "name": "AgentSkillHistoryTracker",
    "callbackUrl": "https://your-secure-gateway.example.com/webhook/skill-events",
    "eventTypes": [
        "skillassignment.added",
        "skillassignment.updated",
        "skillassignment.removed"
    ],
    "filters": {
        "userIds": null, 
        "skills": null
    },
    "format": "JSON"
}

The Trap: Do not configure the callback URL with a public IP address or a load balancer that does not support persistent connections. If the webhook endpoint is unreachable for more than 30 seconds, Genesys Cloud will stop retrying and may mark the subscription as failed. Furthermore, ensure your webhook handler returns an HTTP 200 status code immediately upon receipt. Failing to acknowledge receipt quickly causes the platform to retransmit events, potentially flooding your database with duplicate records if deduplication logic is not in place.

Architectural Reasoning:
Polling the GET /api/v2/users/{userId}/skills endpoint every minute is a viable fallback strategy for small deployments (under 50 agents). However, this approach scales poorly and introduces data latency. Under load, API rate limits will throttle your requests, causing gaps in history tracking during peak operational hours. Event-driven architecture decouples the ingestion of state changes from your database write logic, ensuring that every change is captured at the source before any downstream processing occurs.

3. Implementing Idempotency and Deduplication Logic

Event systems are eventually consistent. Network blips or retries can result in the same event payload being delivered multiple times to your webhook handler. Without robust deduplication, you will create duplicate rows in your history table, corrupting the time-series integrity of your data.

You must implement a check that validates if the event has already been processed. This typically involves storing a unique identifier from the platform (often found in the event payload headers or body) and checking it against a processed events cache or database index before writing to the history table.

Webhook Processing Logic (Pseudocode):

def handle_skill_event(payload, headers):
    event_id = headers.get('X-Event-ID')
    
    # Check if already processed within 5 minutes
    if is_duplicate(event_id):
        return HTTPResponse(status=200)

    user_id = payload['userId']
    skill_id = payload['skillId']
    change_type = payload['type'] # added, updated, removed
    
    record_timestamp = get_current_timestamp()
    
    if change_type == 'removed':
        update_history_record(user_id, skill_id, end_time=record_timestamp)
    else:
        insert_new_skill_record(
            user_id=user_id, 
            skill_id=skill_id, 
            level=payload['proficiencyLevel'],
            start_time=record_timestamp
        )

    mark_as_processed(event_id)

The Trap: The most frequent failure in this step is relying solely on the event ID for deduplication without considering time-window overlaps. If an agent’s skill assignment expires and is immediately renewed, you may receive two separate events with distinct IDs that represent the same logical state (current active assignment). Your logic must verify the state transition. Specifically, do not insert a new record if the effective_start timestamp of the incoming event is identical to the effective_end of an existing current record for that specific user-skill pair without a change in proficiency level or assignment status.

Architectural Reasoning:
Idempotency is non-negotiable for financial and compliance reporting. If your system writes two records for the same skill assignment event, your calculation of total hours worked in that skill will be double-counted. By validating against the effective_start timestamp relative to the current active record’s effective_end, you ensure that state transitions are linear and non-overlapping. This also allows you to handle “late arriving” events gracefully by comparing timestamps rather than just matching IDs.

Validation, Edge Cases & Troubleshooting

Edge Case 1: Agent Deactivation and Skill Retention

When an agent is deactivated or removed from the contact center, their skill assignments are typically cleared in Genesys Cloud immediately. Your SCD logic must handle this gracefully to maintain historical integrity. If you simply delete the active record upon receiving a deactivation event, you lose the context of which skills they held at the time of termination.

The Failure Condition:
An agent leaves the company. Their records are purged from the database because the system interprets “Skill Assignment Removed” as a signal to close the data without preserving the historical state of their last active assignment.

The Solution:
When processing a skillassignment.removed event for a user who has been deactivated (verified via an external HR feed or Genesys Cloud user status), ensure that the effective_end timestamp is set to the moment of removal, but do not delete the record. Mark the record as historical. This allows auditors to query “What skills did this agent hold in their last 30 days of employment?” without needing access to a deleted state.

**Edge Case 2: Skill Proficiency Level Resets
Skill proficiency levels (0-5) can fluctuate frequently due to training or recertification cycles. If an agent’s skill level drops from 5 to 0 and then immediately back to 5, your history table should reflect this as a distinct transition event rather than assuming the state is unchanged.

The Failure Condition:
A deduplication check compares the incoming payload with the current active record. If the proficiencyLevel matches exactly, the system discards the update event. This causes a loss of data regarding when the skill was reassigned or if the agent failed a recertification attempt.

The Solution:
Implement a logic check that distinguishes between a “no-op” state and a “re-assignment.” If the proficiency level is identical but the change_type indicates an assignment event, you must record the timestamp of this specific action. In Genesys Cloud, skill reassignments often trigger an updated event even if values remain static due to system resets or manual overrides. Log these events with a flag indicating “State Unchanged but Event Triggered” to maintain an audit trail of all administrative actions.

**Edge Case 3: Backfilling Historical Data
New integrations will miss historical data prior to deployment. You cannot rely on past Platform Events to populate the initial state of your SCD table.

The Failure Condition:
You deploy the solution today. The database is empty. You query for an agent’s skill history from six months ago, and the system returns no results because no events were captured during that period.

The Solution:
Perform an initial backfill operation using the GET /api/v2/users/{userId}/skills endpoint for all active agents at deployment time. For each record returned, insert a row into your history table with an effective_start timestamp of zero or a specific initialization date, and set is_current to TRUE. This establishes the baseline state. Document this initialization date clearly in your data dictionary so that downstream analytics know that any data prior to the deployment timestamp is derived from a snapshot rather than real-time event tracking.

Official References