Building a Real-Time Agent Assist Suggestion Engine with Node.js

Building a Real-Time Agent Assist Suggestion Engine with Node.js

What You Will Build

  • A Node.js service that subscribes to Genesys Cloud conversation transcripts, extracts speaker fragments, and sends them to an LLM for knowledge base matching.
  • The system uses the Genesys Cloud Conversation Stream WebSocket and the Agent Assist REST API.
  • This tutorial covers authentication, WebSocket message parsing, LLM integration, and suggestion injection in JavaScript.

Prerequisites

  • OAuth 2.0 confidential client application with conversation:stream and agentassist:interaction:write scopes
  • Genesys Cloud API v2
  • Node.js 18 or higher
  • Dependencies: ws, axios, dotenv, uuid

Authentication Setup

The Genesys Cloud platform requires OAuth 2.0 bearer tokens for all API and WebSocket connections. You must implement a token manager that caches credentials and handles expiration gracefully. The following class manages the client credentials flow, stores the token with a time-to-live buffer, and automatically refreshes when necessary.

import axios from 'axios';
import dotenv from 'dotenv';
dotenv.config();

export class GenesysAuthManager {
  constructor() {
    this.clientId = process.env.GENESYS_CLIENT_ID;
    this.clientSecret = process.env.GENESYS_CLIENT_SECRET;
    this.environment = process.env.GENESYS_ENVIRONMENT || 'mypurecloud.com';
    this.tokenEndpoint = `https://api.${this.environment}/oauth/token`;
    this.accessToken = null;
    this.tokenExpiry = 0;
    this.refreshBufferMs = 60000; // Refresh 1 minute before expiry
  }

  async getAccessToken() {
    const now = Date.now();
    if (this.accessToken && now < (this.tokenExpiry - this.refreshBufferMs)) {
      return this.accessToken;
    }

    try {
      const response = await axios.post(
        this.tokenEndpoint,
        new URLSearchParams({
          grant_type: 'client_credentials',
          client_id: this.clientId,
          client_secret: this.clientSecret,
        }),
        {
          headers: {
            'Content-Type': 'application/x-www-form-urlencoded',
            'Accept': 'application/json',
          },
        }
      );

      this.accessToken = response.data.access_token;
      this.tokenExpiry = now + (response.data.expires_in * 1000);
      return this.accessToken;
    } catch (error) {
      if (error.response) {
        throw new Error(`OAuth authentication failed: ${error.response.status} - ${error.response.statusText}`);
      }
      throw new Error(`OAuth network error: ${error.message}`);
    }
  }
}

Required OAuth Scopes: conversation:stream, agentassist:interaction:write

The token manager returns a valid bearer token for downstream WebSocket and REST calls. You must cache the token to avoid triggering rate limits on the OAuth endpoint during high-volume conversation streams.

Implementation

Step 1: Connect to Genesys Cloud Conversation Stream WebSocket

Genesys Cloud exposes real-time conversation data through a WebSocket endpoint. The connection requires a valid bearer token in the Authorization header. The stream delivers JSON messages containing transcript events, participant updates, and channel metadata.

import WebSocket from 'ws';

export class TranscriptStream {
  constructor(authManager, environment) {
    this.authManager = authManager;
    this.wsUrl = `wss://api.${environment}/api/v2/conversations/stream`;
    this.ws = null;
    this.onTranscriptFragment = null;
  }

  async connect(onFragmentCallback) {
    this.onTranscriptFragment = onFragmentCallback;
    const token = await this.authManager.getAccessToken();

    this.ws = new WebSocket(this.wsUrl, {
      headers: {
        'Authorization': `Bearer ${token}`,
        'Accept': 'application/json',
      },
    });

    this.ws.on('open', () => {
      console.log('WebSocket connected to conversation stream');
    });

    this.ws.on('message', (data) => {
      const message = JSON.parse(data.toString());
      this.handleStreamMessage(message);
    });

    this.ws.on('error', (error) => {
      console.error('WebSocket error:', error.message);
      if (error.message.includes('401') || error.message.includes('403')) {
        console.log('Token expired or invalid. Refreshing and reconnecting...');
        this.reconnect();
      }
    });

    this.ws.on('close', () => {
      console.log('WebSocket closed. Reconnecting in 5 seconds...');
      setTimeout(() => this.reconnect(), 5000);
    });
  }

  handleStreamMessage(message) {
    if (message.type !== 'transcript') return;
    if (!message.data?.text || !message.conversationId) return;

    const fragment = {
      conversationId: message.conversationId,
      channelId: message.channelId,
      speaker: message.data.speaker || 'unknown',
      text: message.data.text,
      timestamp: message.timestamp,
    };

    if (this.onTranscriptFragment) {
      this.onTranscriptFragment(fragment);
    }
  }

  async reconnect() {
    await this.connect(this.onTranscriptFragment);
  }
}

HTTP/WebSocket Cycle:

  • Method: WebSocket Upgrade
  • Path: /api/v2/conversations/stream
  • Headers: Authorization: Bearer <token>, Accept: application/json
  • Response: Continuous JSON stream. Transcript messages contain type: "transcript", conversationId, channelId, and data.text.

You must filter for type: "transcript" and validate that data.text exists. The stream delivers fragments incrementally as agents or customers speak. You will buffer these fragments before sending them to the LLM.

Step 2: Extract Transcript Fragments and Route to LLM

Raw transcript fragments arrive rapidly. You must group them by conversation, wait for a natural pause, and submit the accumulated text to an LLM. The following class manages conversation buffers, enforces a debounce timeout, and calls an LLM with a structured prompt.

import axios from 'axios';

export class LLMProcessor {
  constructor() {
    this.buffers = new Map();
    this.debounceMs = 3000;
    this.llmEndpoint = 'https://api.openai.com/v1/chat/completions';
    this.llmApiKey = process.env.OPENAI_API_KEY;
  }

  addFragment(fragment) {
    const { conversationId, text, speaker } = fragment;
    if (!this.buffers.has(conversationId)) {
      this.buffers.set(conversationId, {
        text: [],
        timer: null,
        interactionId: null,
      });
    }

    const buffer = this.buffers.get(conversationId);
    buffer.text.push(`${speaker}: ${text}`);

    if (buffer.timer) clearTimeout(buffer.timer);
    buffer.timer = setTimeout(() => this.processBuffer(conversationId), this.debounceMs);
  }

  async processBuffer(conversationId) {
    const buffer = this.buffers.get(conversationId);
    if (!buffer || buffer.text.length === 0) return;

    const transcript = buffer.text.join('\n');
    buffer.text = [];

    try {
      const suggestions = await this.callLLM(transcript);
      if (suggestions && suggestions.length > 0) {
        await this.onSuggestionsReady(conversationId, suggestions);
      }
    } catch (error) {
      console.error(`LLM processing failed for ${conversationId}:`, error.message);
    }
  }

  async callLLM(transcript) {
    const payload = {
      model: 'gpt-4o-mini',
      response_format: { type: 'json_object' },
      messages: [
        { role: 'system', content: 'You are a knowledge base assistant. Return a JSON array of relevant article suggestions based on the transcript. Each suggestion must have: title, url, preview, score (0-1). Return empty array if no matches.' },
        { role: 'user', content: transcript }
      ],
      temperature: 0.2,
    };

    const response = await axios.post(this.llmEndpoint, payload, {
      headers: {
        'Authorization': `Bearer ${this.llmApiKey}`,
        'Content-Type': 'application/json',
      },
    });

    const content = response.data.choices[0].message.content;
    try {
      return JSON.parse(content);
    } catch {
      console.error('LLM returned invalid JSON:', content);
      return [];
    }
  }

  onSuggestionsReady(conversationId, suggestions) {
    console.log(`LLM returned ${suggestions.length} suggestions for ${conversationId}`);
    // Override in subclass or assign callback
  }
}

Required LLM Parameters:

  • response_format: { type: "json_object" } forces deterministic parsing
  • temperature: 0.2 reduces hallucination for knowledge base matching
  • The system prompt explicitly defines the output schema to prevent parsing failures

You must handle LLM network errors and malformed JSON responses. The debounce timer prevents flooding the LLM endpoint with partial sentences.

Step 3: Inject Results Through the Agent Assist API

Genesys Cloud Agent Assist requires an active interaction context before you can submit suggestions. You must create an interaction using the conversation ID, then POST the LLM results to the suggestions endpoint. The following class handles interaction lifecycle management, implements exponential backoff for rate limits, and formats payloads to match the Genesys schema.

import axios from 'axios';
import { v4 as uuidv4 } from 'uuid';

export class AgentAssistInjector {
  constructor(authManager, environment) {
    this.authManager = authManager;
    this.baseUrl = `https://api.${environment}/api/v2`;
    this.interactionCache = new Map();
  }

  async createInteraction(conversationId) {
    const token = await this.authManager.getAccessToken();
    const interactionId = uuidv4();
    
    const payload = {
      externalConversationId: conversationId,
      name: `AgentAssist-${conversationId}`,
      type: 'agent-assist',
    };

    const response = await axios.post(
      `${this.baseUrl}/agentassist/interactions`,
      payload,
      {
        headers: {
          'Authorization': `Bearer ${token}`,
          'Content-Type': 'application/json',
          'Accept': 'application/json',
        },
      }
    );

    this.interactionCache.set(conversationId, interactionId);
    return interactionId;
  }

  async postSuggestions(conversationId, suggestions) {
    const interactionId = this.interactionCache.get(conversationId);
    if (!interactionId) {
      await this.createInteraction(conversationId);
      return await this.postSuggestions(conversationId, suggestions);
    }

    const token = await this.authManager.getAccessToken();
    
    const formattedSuggestions = suggestions.map((item, index) => ({
      id: `${interactionId}-suggestion-${index}`,
      title: item.title || 'Knowledge Base Article',
      url: item.url || '#',
      preview: item.preview || '',
      score: item.score ?? 0.8,
      metadata: {
        source: 'llm-knowledge-base',
        timestamp: new Date().toISOString(),
      },
    }));

    const payload = {
      suggestions: formattedSuggestions,
    };

    const url = `${this.baseUrl}/agentassist/interactions/${interactionId}/suggestions`;
    await this.executeWithRetry(() => axios.post(url, payload, {
      headers: {
        'Authorization': `Bearer ${token}`,
        'Content-Type': 'application/json',
        'Accept': 'application/json',
      },
    }));
  }

  async executeWithRetry(requestFn, maxRetries = 3) {
    let attempt = 0;
    while (attempt < maxRetries) {
      try {
        return await requestFn();
      } catch (error) {
        if (error.response?.status === 429) {
          const retryAfter = error.response.headers['retry-after'] 
            ? parseInt(error.response.headers['retry-after'], 10) 
            : Math.pow(2, attempt) * 1000;
          console.log(`Rate limited (429). Retrying in ${retryAfter}ms...`);
          await new Promise(resolve => setTimeout(resolve, retryAfter));
          attempt++;
          continue;
        }
        if (error.response?.status === 401) {
          throw new Error('Authentication failed. Token refresh required.');
        }
        throw error;
      }
    }
  }
}

HTTP Request/Response Cycle:

  • Method: POST
  • Path: /api/v2/agentassist/interactions/{interactionId}/suggestions
  • Headers: Authorization: Bearer <token>, Content-Type: application/json, Accept: application/json
  • Request Body:
{
  "suggestions": [
    {
      "id": "int-123-suggestion-0",
      "title": "How to reset customer password",
      "url": "https://kb.example.com/article/123",
      "preview": "Step-by-step guide for password reset...",
      "score": 0.92,
      "metadata": {
        "source": "llm-knowledge-base",
        "timestamp": "2024-01-15T10:30:00Z"
      }
    }
  ]
}
  • Response: 204 No Content on success. 400 Bad Request if payload schema is invalid. 429 Too Many Requests if rate limited.

The executeWithRetry method implements exponential backoff for 429 responses. Genesys Cloud enforces strict rate limits on Agent Assist endpoints. You must respect the Retry-After header when present. The interaction cache ensures you reuse the same interaction ID for a single conversation.

Complete Working Example

The following script combines authentication, streaming, LLM processing, and Agent Assist injection into a single runnable module. Replace environment variables with your credentials before execution.

import dotenv from 'dotenv';
dotenv.config();

import { GenesysAuthManager } from './auth.js';
import { TranscriptStream } from './stream.js';
import { LLMProcessor } from './llm.js';
import { AgentAssistInjector } from './injector.js';

const ENV = process.env.GENESYS_ENVIRONMENT || 'mypurecloud.com';

async function main() {
  const auth = new GenesysAuthManager();
  const stream = new TranscriptStream(auth, ENV);
  const llm = new LLMProcessor();
  const injector = new AgentAssistInjector(auth, ENV);

  llm.onSuggestionsReady = async (conversationId, suggestions) => {
    try {
      await injector.postSuggestions(conversationId, suggestions);
      console.log(`Suggestions injected successfully for ${conversationId}`);
    } catch (error) {
      console.error(`Failed to inject suggestions for ${conversationId}:`, error.message);
    }
  };

  stream.connect((fragment) => {
    llm.addFragment(fragment);
  });

  console.log('Agent Assist LLM proxy running. Press Ctrl+C to stop.');
}

main().catch((error) => {
  console.error('Fatal error:', error);
  process.exit(1);
});

Dependencies:

npm install ws axios dotenv uuid

Environment Variables:

GENESYS_CLIENT_ID=your_client_id
GENESYS_CLIENT_SECRET=your_client_secret
GENESYS_ENVIRONMENT=mypurecloud.com
OPENAI_API_KEY=your_openai_key

Run the script with node index.js. The service will maintain the WebSocket connection, buffer transcript fragments, query the LLM, and push suggestions to the active Agent Assist session.

Common Errors & Debugging

Error: 401 Unauthorized on WebSocket or REST Calls

  • Cause: The OAuth token expired, was revoked, or lacks required scopes.
  • Fix: Verify the confidential client has conversation:stream and agentassist:interaction:write scopes. Ensure the token manager refreshes tokens before expiry. Check environment variables for typos.
  • Code Fix: The GenesysAuthManager class includes a 60-second refresh buffer. If 401 persists, revoke and regenerate the OAuth client secret in the Genesys Cloud admin console.

Error: 429 Too Many Requests on Agent Assist Suggestions

  • Cause: You exceeded the Genesys Cloud rate limit for /api/v2/agentassist/interactions/{id}/suggestions.
  • Fix: Implement exponential backoff and respect the Retry-After header. Reduce LLM polling frequency by increasing the debounce timeout.
  • Code Fix: The executeWithRetry method in AgentAssistInjector handles 429 responses automatically. Adjust maxRetries and base delay if your deployment requires stricter throttling.

Error: WebSocket 403 Forbidden

  • Cause: The OAuth token lacks conversation:stream scope, or the client IP is blocked by Genesys Cloud network policies.
  • Fix: Add conversation:stream to the OAuth client scopes. Verify your server IP is allowed through Genesys Cloud firewall rules. Ensure the region subdomain matches your organization.
  • Code Fix: Update GENESYS_ENVIRONMENT to match your actual region (e.g., us-gov-purecloud.com). The WebSocket constructor passes the Authorization header correctly.

Error: LLM Returns Malformed JSON

  • Cause: The model ignored the response_format constraint or exceeded token limits.
  • Fix: Enforce JSON mode in the request payload. Add fallback parsing logic that extracts arrays using regex if strict parsing fails.
  • Code Fix: The callLLM method uses response_format: { type: "json_object" }. Wrap JSON.parse in a try-catch block and return an empty array on failure to prevent pipeline crashes.

Official References