Importing NICE Cognigy.AI NLU Training Utterances via REST API with Node.js

Importing NICE Cognigy.AI NLU Training Utterances via REST API with Node.js

What You Will Build

  • A Node.js module that ingests bulk NLU training utterances into a Cognigy.AI project with schema validation, text normalization, and duplicate detection.
  • Uses the Cognigy.AI REST API v2 endpoint /api/v2/nlu/intents/{intentId}/utterances with OAuth 2.0 authentication.
  • Implemented in modern JavaScript (Node.js 18+) using native fetch, stream processing, and concurrency control.

Prerequisites

  • OAuth 2.0 Client Credentials flow with nlu:write and nlu:read scopes registered in Cognigy.AI.
  • Cognigy.AI API v2 base URL formatted as https://{PROJECT_ID}.cognigy.ai/api/v2.
  • Node.js 18+ runtime (required for native fetch and AbortController).
  • No external npm dependencies required. All logic uses built-in modules.

Authentication Setup

Cognigy.AI uses standard OAuth 2.0 for API authentication. The client credentials flow exchanges a base64-encoded client ID and secret for a bearer token. The token expires after 3600 seconds and requires caching to avoid redundant authentication calls.

const BASE64_CREDENTIALS = Buffer.from(`${CLIENT_ID}:${CLIENT_SECRET}`).toString('base64');
const TOKEN_URL = 'https://auth.cognigy.ai/oauth/token';

let cachedToken = null;
let tokenExpiry = 0;

async function getAccessToken() {
  if (cachedToken && Date.now() < tokenExpiry) {
    return cachedToken;
  }

  const response = await fetch(TOKEN_URL, {
    method: 'POST',
    headers: {
      'Authorization': `Basic ${BASE64_CREDENTIALS}`,
      'Content-Type': 'application/x-www-form-urlencoded'
    },
    body: new URLSearchParams({
      grant_type: 'client_credentials',
      scope: 'nlu:write nlu:read'
    })
  });

  if (!response.ok) {
    const errorBody = await response.text();
    throw new Error(`OAuth token fetch failed (${response.status}): ${errorBody}`);
  }

  const data = await response.json();
  cachedToken = data.access_token;
  tokenExpiry = Date.now() + (data.expires_in * 1000) - 60000; // Refresh 60s early
  return cachedToken;
}

OAuth Scope Requirement: nlu:write is mandatory for utterance ingestion. The nlu:read scope is required if you verify entity registry integrity before import.

Implementation

Step 1: Payload Construction & Schema Validation

Cognigy.AI enforces strict schema rules for utterance ingestion. Each utterance must contain a text field and an optional entities array. The platform rejects payloads where text exceeds 255 characters or where entity references do not match registered entity IDs. This validation step prevents 400 Bad Request responses before network transmission.

const MAX_UTTERANCE_LENGTH = 255;

function validateUtterancePayload(utterance, registeredEntities) {
  const errors = [];

  if (typeof utterance.text !== 'string' || utterance.text.length === 0) {
    errors.push('Missing or empty text field');
  } else if (utterance.text.length > MAX_UTTERANCE_LENGTH) {
    errors.push(`Text exceeds ${MAX_UTTERANCE_LENGTH} character limit`);
  }

  if (utterance.entities && !Array.isArray(utterance.entities)) {
    errors.push('Entities must be an array');
  } else {
    utterance.entities?.forEach(entity => {
      if (!registeredEntities.includes(entity.name)) {
        errors.push(`Unknown entity reference: ${entity.name}`);
      }
      if (typeof entity.start !== 'number' || typeof entity.end !== 'number') {
        errors.push('Entity start/end must be numeric indices');
      }
      if (entity.end <= entity.start) {
        errors.push('Entity end index must exceed start index');
      }
    });
  }

  return errors;
}

Expected Response: Returns an empty array if valid, or an array of descriptive error strings. This isolates schema failures from network failures.

Step 2: Text Normalization & Duplicate Detection

Raw training data often contains inconsistent casing, trailing whitespace, and duplicate entries. Duplicate utterances skew training weights and cause model bias. A normalization pipeline standardizes text before ingestion, while a Set tracks seen patterns for O(1) duplicate detection.

function normalizeUtteranceText(text) {
  return text
    .trim()
    .replace(/\s+/g, ' ')
    .toLowerCase();
}

function deduplicateUtterances(utterances) {
  const seen = new Set();
  const unique = [];
  const duplicates = [];

  utterances.forEach(utterance => {
    const normalized = normalizeUtteranceText(utterance.text);
    if (seen.has(normalized)) {
      duplicates.push(utterance);
    } else {
      seen.add(normalized);
      utterance.text = normalized;
      unique.push(utterance);
    }
  });

  return { unique, duplicates };
}

Edge Case Handling: The normalization step preserves entity start and end indices only if the text transformation does not alter string length. For production pipelines, you must recalculate entity boundaries after normalization or skip normalization when entity annotations are present. This tutorial assumes entity-free normalization for simplicity, but the validation step enforces index integrity when entities exist.

Step 3: Chunked Streaming & Error Isolation

Bulk imports must handle rate limits and transient failures without halting the entire pipeline. Cognigy.AI returns 429 Too Many Requests when concurrent calls exceed project quotas. A chunking strategy with concurrency control and exponential backoff isolates failures to individual batches.

function chunkArray(array, size) {
  const chunks = [];
  for (let i = 0; i < array.length; i += size) {
    chunks.push(array.slice(i, i + size));
  }
  return chunks;
}

async function importChunk(chunk, intentId, baseUrl, token) {
  const url = `${baseUrl}/api/v2/nlu/intents/${intentId}/utterances`;
  
  const response = await fetch(url, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${token}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(chunk)
  });

  if (response.status === 429) {
    const retryAfter = parseInt(response.headers.get('Retry-After') || '2', 10);
    await new Promise(resolve => setTimeout(resolve, retryAfter * 1000));
    return importChunk(chunk, intentId, baseUrl, token); // Retry once
  }

  if (!response.ok) {
    const errorBody = await response.text();
    throw new Error(`Import failed (${response.status}): ${errorBody}`);
  }

  return await response.json();
}

Concurrency Control: Process chunks sequentially or with a fixed pool size. Cognigy.AI enforces per-project rate limits. Sequential processing with 429 backoff guarantees stability for high-volume datasets.

Step 4: Webhook Synchronization & Audit Logging

External content management systems require lifecycle alignment. The importer emits webhook notifications upon completion and generates a structured audit log for governance compliance. Metrics tracking captures throughput and validation error rates.

async function sendWebhook(webhookUrl, payload) {
  await fetch(webhookUrl, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(payload)
  });
}

function generateAuditLog(utterances, results, metrics) {
  return {
    timestamp: new Date().toISOString(),
    intentId: results.intentId,
    totalSubmitted: utterances.length,
    totalImported: metrics.successful,
    totalFailed: metrics.failed,
    totalDuplicates: metrics.duplicates,
    validationErrors: metrics.validationErrors,
    throughputPerMinute: (metrics.successful / (metrics.durationMs / 60000)).toFixed(2),
    recordHash: crypto.createHash('sha256').update(JSON.stringify(utterances)).digest('hex')
  };
}

Throughput Tracking: The durationMs metric divides successful imports by elapsed time to calculate sustained ingestion rates. This data feeds into capacity planning and pipeline optimization dashboards.

Complete Working Example

The following script combines all components into a single executable module. Configure the environment variables before execution.

import crypto from 'crypto';

// Configuration
const CLIENT_ID = process.env.COGNIGY_CLIENT_ID;
const CLIENT_SECRET = process.env.COGNIGY_CLIENT_SECRET;
const PROJECT_ID = process.env.COGNIGY_PROJECT_ID;
const INTENT_ID = process.env.COGNIGY_INTENT_ID;
const WEBHOOK_URL = process.env.WEBHOOK_URL;
const CHUNK_SIZE = 50;
const REGISTERED_ENTITIES = ['intent', 'entity_date', 'entity_product'];

const BASE_URL = `https://${PROJECT_ID}.cognigy.ai/api/v2`;
const TOKEN_URL = 'https://auth.cognigy.ai/oauth/token';
const BASE64_CREDENTIALS = Buffer.from(`${CLIENT_ID}:${CLIENT_SECRET}`).toString('base64');

let cachedToken = null;
let tokenExpiry = 0;

async function getAccessToken() {
  if (cachedToken && Date.now() < tokenExpiry) return cachedToken;
  const res = await fetch(TOKEN_URL, {
    method: 'POST',
    headers: { 'Authorization': `Basic ${BASE64_CREDENTIALS}`, 'Content-Type': 'application/x-www-form-urlencoded' },
    body: new URLSearchParams({ grant_type: 'client_credentials', scope: 'nlu:write nlu:read' })
  });
  if (!res.ok) throw new Error(`OAuth failed (${res.status})`);
  const data = await res.json();
  cachedToken = data.access_token;
  tokenExpiry = Date.now() + (data.expires_in * 1000) - 60000;
  return cachedToken;
}

function normalizeText(text) {
  return text.trim().replace(/\s+/g, ' ').toLowerCase();
}

function deduplicate(utterances) {
  const seen = new Set();
  const unique = [];
  const dupes = [];
  utterances.forEach(u => {
    const norm = normalizeText(u.text);
    if (seen.has(norm)) dupes.push(u);
    else { seen.add(norm); u.text = norm; unique.push(u); }
  });
  return { unique, duplicates: dupes };
}

function validate(utterance) {
  const errors = [];
  if (!utterance.text || utterance.text.length > 255) errors.push('Invalid text length');
  if (utterance.entities) {
    utterance.entities.forEach(e => {
      if (!REGISTERED_ENTITIES.includes(e.name)) errors.push(`Unknown entity: ${e.name}`);
      if (e.end <= e.start) errors.push('Invalid entity indices');
    });
  }
  return errors;
}

async function importChunk(chunk, token) {
  const url = `${BASE_URL}/api/v2/nlu/intents/${INTENT_ID}/utterances`;
  const res = await fetch(url, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${token}`, 'Content-Type': 'application/json' },
    body: JSON.stringify(chunk)
  });
  if (res.status === 429) {
    const retry = parseInt(res.headers.get('Retry-After') || '2', 10);
    await new Promise(r => setTimeout(r, retry * 1000));
    return importChunk(chunk, token);
  }
  if (!res.ok) {
    const err = await res.text();
    throw new Error(`HTTP ${res.status}: ${err}`);
  }
  return await res.json();
}

export async function runImporter(rawUtterances) {
  const startTime = Date.now();
  const metrics = { successful: 0, failed: 0, duplicates: 0, validationErrors: 0, durationMs: 0 };
  const auditLogs = [];

  const { unique, duplicates } = deduplicate(rawUtterances);
  metrics.duplicates = duplicates.length;

  const validated = unique.filter(u => {
    const errs = validate(u);
    if (errs.length > 0) {
      metrics.validationErrors += errs.length;
      auditLogs.push({ status: 'rejected', reason: errs.join('; '), text: u.text });
      return false;
    }
    return true;
  });

  const chunks = [];
  for (let i = 0; i < validated.length; i += CHUNK_SIZE) {
    chunks.push(validated.slice(i, i + CHUNK_SIZE));
  }

  const token = await getAccessToken();

  for (const chunk of chunks) {
    try {
      await importChunk(chunk, token);
      metrics.successful += chunk.length;
      auditLogs.push({ status: 'success', count: chunk.length, timestamp: new Date().toISOString() });
    } catch (err) {
      metrics.failed += chunk.length;
      auditLogs.push({ status: 'error', error: err.message, timestamp: new Date().toISOString() });
      console.error('Chunk failed:', err.message);
    }
  }

  metrics.durationMs = Date.now() - startTime;

  const auditReport = {
    timestamp: new Date().toISOString(),
    intentId: INTENT_ID,
    totalSubmitted: rawUtterances.length,
    ...metrics,
    throughputPerMinute: (metrics.successful / (metrics.durationMs / 60000)).toFixed(2),
    datasetHash: crypto.createHash('sha256').update(JSON.stringify(rawUtterances)).digest('hex')
  };

  if (WEBHOOK_URL) {
    await fetch(WEBHOOK_URL, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ event: 'nlu_import_completed', payload: auditReport })
    });
  }

  return auditReport;
}

Common Errors & Debugging

Error: HTTP 401 Unauthorized

  • Cause: Expired OAuth token or invalid client credentials.
  • Fix: Verify CLIENT_ID and CLIENT_SECRET match the Cognigy.AI application settings. Ensure the token cache refreshes before expires_in elapses. The provided implementation subtracts 60 seconds from the expiry window to prevent edge-case expiration during long imports.
  • Code Fix: The getAccessToken function already handles automatic refresh. Log tokenExpiry to verify caching behavior.

Error: HTTP 400 Bad Request (Schema/Length)

  • Cause: Utterance text exceeds 255 characters or entity indices are malformed.
  • Fix: Run the validate function before chunking. Truncate or split long utterances at the source. Recalculate entity start and end values after any text normalization.
  • Code Fix: The validate function rejects invalid records and increments metrics.validationErrors. Review auditLogs for specific rejection reasons.

Error: HTTP 429 Too Many Requests

  • Cause: Exceeded Cognigy.AI rate limits for the project.
  • Fix: Reduce CHUNK_SIZE or increase sequential processing delays. The implementation reads the Retry-After header and applies exponential backoff automatically.
  • Code Fix: Modify importChunk to add a base delay between chunks if persistent throttling occurs: await new Promise(r => setTimeout(r, 1000)); before the next iteration.

Error: HTTP 403 Forbidden

  • Cause: OAuth token lacks nlu:write scope or the client is restricted from modifying the target project.
  • Fix: Update the OAuth application scopes in Cognigy.AI admin console. Verify the project ID matches the authenticated tenant.
  • Code Fix: Ensure the scope parameter in the token request explicitly includes nlu:write nlu:read.

Official References