Implementing Screen Recording Playback Integration with Time-Synced Evaluation Annotations

StarAdmin · May 19, 2026, 9:06am

Implementing Screen Recording Playback Integration with Time-Synced Evaluation Annotations

What This Guide Covers

This guide details the architectural implementation of a custom integration that synchronizes agent desktop screen recordings with NICE CXone Interaction Recording (IR) audio and NICE CXone Workforce Engagement Management (WEM) evaluation scores. You will build a system where a quality analyst can play back an agent interaction, view the associated screen activity, and see evaluation criteria results appear exactly at the timestamp where the relevant behavior occurred. The end result is a unified playback interface that eliminates manual timestamp hunting and provides contextual evidence for coaching sessions.

Prerequisites, Roles & Licensing

Licensing Tiers:
- NICE CXone Base License (for Interaction Recording).
- NICE CXone WEM License (for Evaluations and Recording Access).
- NICE CXone Screen Recording License (optional, if using native screen capture; otherwise, a third-party screen capture solution with API access is required).
Permissions:
- Recordings > Recording > Read
- Recordings > ScreenRecording > Read
- WEM > Evaluation > Read
- WEM > Evaluation > Write (if pushing annotations back)
- Users > User > Read
OAuth Scopes:
- urn:nicemedia:recordings:read
- urn:nicemedia:wem:read
- urn:nicemedia:user:read
External Dependencies:
- A middleware layer (e.g., AWS Lambda, Azure Function, or Node.js service) to handle the correlation logic.
- A frontend component (React/Vue/Angular) for the unified playback UI.
- Access to the NICE CXone Developer Portal for API key generation.

The Implementation Deep-Dive

1. Correlating Interaction Events Across Data Silos

The fundamental challenge in NICE CXone is that Interaction Recordings, Screen Recordings, and WEM Evaluations are stored in distinct data domains. Interaction Recordings are tied to interactionId, Screen Recordings are tied to userAgentId and timestamp, and Evaluations are tied to interactionId but contain discrete criteria objects without inherent temporal data.

To achieve time-synced playback, you must establish a deterministic mapping between these three entities. The anchor point is the interactionId.

Step 1.1: Retrieve the Interaction Metadata
First, fetch the interaction details to obtain the start time, duration, and associated user agent. This provides the temporal bounds for the screen recording search.

GET /api/v2/recordings/interactions/{interactionId}/details
Authorization: Bearer <access_token>

The response payload includes:

{
  "id": "interaction-12345",
  "startTime": "2023-10-27T14:30:00.000Z",
  "endTime": "2023-10-27T14:35:00.000Z",
  "userAgent": {
    "id": "agent-67890",
    "name": "Jane Doe"
  }
}

Step 1.2: Fetch Screen Recordings by Time Window
NICE CXone Screen Recording APIs do not support direct filtering by interactionId. You must query for screen recordings based on the userAgentId and the interaction’s startTime and endTime.

GET /api/v2/recordings/screenrecordings?userId={agentId}&startTime={startTime}&endTime={endTime}
Authorization: Bearer <access_token>

The Trap: Relying on exact timestamp matches.
Screen recording chunks are often segmented by activity or fixed intervals (e.g., every 5 minutes). An interaction starting at 14:30:01 might fall into a screen recording chunk that started at 14:29:55. If you query for exact start times, you will miss the recording.

Architectural Reasoning: You must implement a sliding window query. Expand the startTime and endTime parameters by a buffer (e.g., +/- 5 minutes) to capture the surrounding screen activity. Then, filter the results client-side to find the screen recording chunk that overlaps with the interaction timeline.

// Client-side filtering logic
const screenRecordings = await fetchScreenRecordings(agentId, startTime, endTime);
const matchingScreenRec = screenRecordings.find(rec => 
  rec.startTime <= interaction.endTime && rec.endTime >= interaction.startTime
);

Step 1.3: Retrieve WEM Evaluation Data
Fetch the evaluation associated with the interaction. The evaluation object contains a list of criteria, each with a score and potentially a comment. However, the native API does not return timestamps for when each criterion was assessed.

GET /api/v2/wem/evaluations/{evaluationId}
Authorization: Bearer <access_token>

Response payload:

{
  "id": "eval-98765",
  "interaction": {
    "id": "interaction-12345"
  },
  "criteria": [
    {
      "id": "crit-001",
      "name": "Greeting",
      "score": 10,
      "comment": "Excellent greeting"
    },
    {
      "id": "crit-002",
      "name": "Problem Resolution",
      "score": 8,
      "comment": "Missed one step"
    }
  ]
}

2. Establishing Time-Sync Logic for Evaluation Annotations

Since WEM evaluations do not natively store timestamps for individual criteria, you must derive these timestamps. There are two approaches: Heuristic Estimation (using audio transcription) or Explicit Annotation (using a custom evaluation form with timestamp fields).

For a robust production system, the Explicit Annotation approach is superior because it removes ambiguity.

Step 2.1: Design a Custom Evaluation Form with Timestamp Fields
Modify the WEM Evaluation Template to include a custom field for each criterion: timestampOffset. This field should accept a numeric value representing seconds from the start of the interaction.

When an evaluator completes the form, they must enter the timestamp where the behavior occurred. For example, for the “Greeting” criterion, the evaluator enters 15 (15 seconds into the call).

Step 2.2: Store Timestamps in Custom Metadata
NICE CXone WEM API allows you to attach custom data to evaluation criteria. You can use the customData field to store the timestamp.

PATCH /api/v2/wem/evaluations/{evaluationId}
Content-Type: application/json

{
  "criteria": [
    {
      "id": "crit-001",
      "score": 10,
      "customData": {
        "timestampOffsetSeconds": 15
      }
    },
    {
      "id": "crit-002",
      "score": 8,
      "customData": {
        "timestampOffsetSeconds": 120
      }
    }
  ]
}

The Trap: Assuming customData is indexed or searchable.
The customData field is opaque to the NICE CXone search engine. You cannot query for “all evaluations where Greeting timestamp is < 10”. You must retrieve the full evaluation object and parse the JSON locally. This requires careful handling of large payloads.

Architectural Reasoning: Use a lightweight middleware service to fetch the evaluation, parse the customData, and reconstruct a timeline array. This array will map timestampOffset to criterionName and score.

function buildEvaluationTimeline(evaluation) {
  return evaluation.criteria
    .filter(crit => crit.customData && crit.customData.timestampOffsetSeconds !== undefined)
    .map(crit => ({
      timestamp: crit.customData.timestampOffsetSeconds,
      name: crit.name,
      score: crit.score,
      comment: crit.comment
    }))
    .sort((a, b) => a.timestamp - b.timestamp);
}

3. Building the Unified Playback Interface

The frontend must synchronize three media streams: Audio (Interaction Recording), Video (Screen Recording), and Overlay (Evaluation Annotations).

Step 3.1: Audio Playback Integration
Use the NICE CXone Recording Player SDK or direct HLS stream URLs. The audio player must expose a currentTime event listener.

const audioPlayer = new NICECXoneAudioPlayer({
  recordingUrl: audioStreamUrl,
  onTimeUpdate: (currentTime) => {
    updateAnnotations(currentTime);
  }
});

Step 3.2: Screen Recording Synchronization
Screen recordings are typically MP4 or WebM files. You must calculate the offset between the interaction start time and the screen recording start time.

const interactionStart = new Date(interaction.startTime).getTime();
const screenRecStart = new Date(screenRecording.startTime).getTime();
const offsetMs = interactionStart - screenRecStart;

// When audio player is at currentTime (seconds from interaction start)
const screenPlaybackTime = (currentTime * 1000) + offsetMs;
videoPlayer.currentTime = screenPlaybackTime / 1000;

The Trap: Drift between audio and video streams.
Network latency and decoding differences can cause audio and video to drift out of sync over long interactions. If the interaction is 10 minutes long, a 100ms drift per minute results in a 1-second total drift, which is noticeable.

Architectural Reasoning: Implement a periodic resync mechanism. Every 30 seconds, check the difference between the audio player’s currentTime and the video player’s currentTime (adjusted for offset). If the difference exceeds a threshold (e.g., 500ms), reset the video player’s time to match the audio player.

setInterval(() => {
  const audioTime = audioPlayer.currentTime;
  const expectedVideoTime = (audioTime * 1000) + offsetMs;
  const actualVideoTime = videoPlayer.currentTime * 1000;
  
  if (Math.abs(expectedVideoTime - actualVideoTime) > 500) {
    videoPlayer.currentTime = expectedVideoTime / 1000;
  }
}, 30000);

Step 3.3: Rendering Time-Synced Annotations
Create an overlay component that listens to the currentTime event from the audio player. When the current time matches or passes a timestamp in the evaluationTimeline, display the annotation.

function AnnotationOverlay({ currentTime, timeline }) {
  const activeAnnotations = timeline.filter(
    annot => currentTime >= annot.timestamp && currentTime < annot.timestamp + 5
  );

  return (
    <div className="annotation-overlay">
      {activeAnnotations.map(annot => (
        <div key={annot.name} className="annotation-card">
          <strong>{annot.name}</strong>: {annot.score}/10
          <p>{annot.comment}</p>
        </div>
      ))}
    </div>
  );
}

The Trap: Annotation flickering.
If multiple annotations have the same timestamp or close timestamps, the UI may flicker as it rapidly switches between them.

Architectural Reasoning: Implement a debounce or a “hold time” for annotations. Once an annotation is displayed, keep it visible for a minimum duration (e.g., 5 seconds) or until the user manually dismisses it. Use a queue system for annotations that occur within the same time window.

Validation, Edge Cases & Troubleshooting

Edge Case 1: Missing Screen Recording Chunks

The Failure Condition:
The interaction audio plays, but the screen recording area is blank or shows an error.

The Root Cause:
The agent’s screen recording feature was disabled, crashed, or the recording chunk was deleted due to retention policies. Alternatively, the time window query did not capture the correct chunk due to timezone mismatches.

The Solution:

Verify that the ScreenRecording license is active for the user.
Check the NICE CXone Admin Console > Settings > Recordings > Screen Recording to ensure the feature is enabled.
Implement a fallback UI state. If no screen recording is found, display a message: “Screen recording unavailable for this interaction.” Do not crash the player.
Ensure all timestamps are converted to UTC before querying the API. NICE CXone APIs use UTC, but local JavaScript Date objects may default to local time.

// Correct UTC conversion
const utcStartTime = new Date(interaction.startTime).toISOString();

Edge Case 2: Evaluation Criteria Without Timestamps

The Failure Condition:
The evaluation scores are displayed, but they do not appear at the correct time. They may appear at the start of the call or not at all.

The Root Cause:
The evaluator did not populate the timestampOffsetSeconds field in the customData of the evaluation criteria. This is a human error during the evaluation process.

The Solution:

Enforce mandatory fields in the WEM Evaluation Template. Configure the timestampOffset field as required.
Implement a validation check in the frontend. If a criterion lacks a timestamp, exclude it from the timeline or display it in a “Summary” section rather than the time-synced overlay.
Provide training to evaluators on the importance of timestamp accuracy for coaching effectiveness.

Edge Case 3: Audio-Video Drift in Long Interactions

The Failure Condition:
During a 15-minute interaction, the agent’s mouth movements on the screen recording no longer match the audio speech.

The Root Cause:
Accumulated clock drift between the audio decoder and the video decoder.

The Solution:

Implement the periodic resync mechanism described in Step 3.2.
Use the requestVideoFrameCallback API (if supported by the browser) to achieve more precise synchronization than setInterval.
Consider using a single combined media stream if possible. NICE CXone does not natively combine audio and screen recording into one file, but you can use a server-side tool like FFmpeg to merge them on-demand for critical audits. This is computationally expensive and not recommended for real-time playback.

# Example FFmpeg command for server-side merging (not for real-time)
ffmpeg -i audio.mp3 -i screen.mp4 -map 0:a -map 1:v -c copy -shortest output.mp4

Implementing Screen Recording Playback Integration with Time-Synced Evaluation Annotations

Implementing Screen Recording Playback Integration with Time-Synced Evaluation Annotations

What This Guide Covers

Prerequisites, Roles & Licensing

The Implementation Deep-Dive

1. Correlating Interaction Events Across Data Silos

2. Establishing Time-Sync Logic for Evaluation Annotations

3. Building the Unified Playback Interface

Validation, Edge Cases & Troubleshooting

Edge Case 1: Missing Screen Recording Chunks

Edge Case 2: Evaluation Criteria Without Timestamps

Edge Case 3: Audio-Video Drift in Long Interactions

Official References