Implementing Retry Logic for NICE Cognigy.AI Webhook Failures with TypeScript

Implementing Retry Logic for NICE Cognigy.AI Webhook Failures with TypeScript

What You Will Build

This tutorial builds a production-grade Express middleware that intercepts incoming Cognigy.AI webhook requests, catches transient downstream failures, queues the original payloads with exponential backoff, reconstructs dialog context from session snapshots, updates Cognigy runtime session variables to track retry state, and exposes a paginated dashboard endpoint for monitoring success rates. The implementation uses the NICE Cognigy.AI Runtime REST API and standard TypeScript tooling. The code is written in TypeScript and runs on Node.js.

Prerequisites

  • NICE Cognigy.AI OAuth2 client credentials with scopes runtime:session:read and runtime:session:write
  • Cognigy.AI API version v1 (Runtime endpoints)
  • Node.js 18+ with TypeScript 5+
  • Dependencies: express, axios, bullmq, redis, zod, dotenv, uuid
npm install express axios bullmq redis zod uuid dotenv
npm install -D typescript @types/express @types/node @types/uuid

Authentication Setup

Cognigy.AI uses standard OAuth2 client credentials flow for programmatic access. The middleware must acquire a bearer token before calling the Runtime API. Token caching prevents unnecessary authentication requests, and automatic refresh handles expiration.

// auth.ts
import axios, { AxiosInstance, AxiosError } from "axios";
import dotenv from "dotenv";

dotenv.config();

const COGNIGY_BASE_URL = process.env.COGNIGY_INSTANCE_URL || "https://your-instance.cognigy.ai";
const COGNIGY_CLIENT_ID = process.env.COGNIGY_CLIENT_ID!;
const COGNIGY_CLIENT_SECRET = process.env.COGNIGY_CLIENT_SECRET!;

let cachedToken: string | null = null;
let tokenExpiry: number = 0;
let axiosClient: AxiosInstance;

async function getToken(): Promise<string> {
  const now = Date.now();
  if (cachedToken && now < tokenExpiry - 60000) {
    return cachedToken;
  }

  const tokenUrl = `${COGNIGY_BASE_URL}/api/v1/oauth/token`;
  
  // HTTP Request Cycle
  // POST /api/v1/oauth/token
  // Headers: Content-Type: application/x-www-form-urlencoded, Accept: application/json
  // Body: grant_type=client_credentials&scope=runtime:session:read runtime:session:write
  
  try {
    const response = await axios.post(tokenUrl, new URLSearchParams({
      grant_type: "client_credentials",
      scope: "runtime:session:read runtime:session:write"
    }), {
      auth: { username: COGNIGY_CLIENT_ID, password: COGNIGY_CLIENT_SECRET },
      headers: { "Content-Type": "application/x-www-form-urlencoded" }
    });

    cachedToken = response.data.access_token;
    tokenExpiry = now + (response.data.expires_in * 1000);
    return cachedToken;
  } catch (error) {
    if (axios.isAxiosError(error) && error.response?.status === 401) {
      throw new Error("OAuth authentication failed. Verify client credentials.");
    }
    throw error;
  }
}

function getCognigyApiClient(): AxiosInstance {
  if (!axiosClient) {
    axiosClient = axios.create({
      baseURL: `${COGNIGY_BASE_URL}/api/v1/runtime`,
      headers: { "Content-Type": "application/json" }
    });

    axiosClient.interceptors.request.use(async (config) => {
      const token = await getToken();
      config.headers.Authorization = `Bearer ${token}`;
      return config;
    });

    // 429 Retry Logic with Exponential Backoff
    axiosClient.interceptors.response.use(
      (response) => response,
      async (error: AxiosError) => {
        const originalRequest = error.config as any;
        if (!originalRequest) return Promise.reject(error);
        
        if (error.response?.status === 429 && !originalRequest._retryCount) {
          originalRequest._retryCount = originalRequest._retryCount || 0;
          originalRequest._retryCount++;
          
          const retryAfter = error.response?.headers["retry-after"] 
            ? parseInt(error.response.headers["retry-after"], 10) * 1000 
            : Math.min(1000 * Math.pow(2, originalRequest._retryCount), 10000);
            
          await new Promise((resolve) => setTimeout(resolve, retryAfter));
          return axiosClient(originalRequest);
        }
        return Promise.reject(error);
      }
    );
  }
  return axiosClient;
}

export { getCognigyApiClient };

Implementation

Step 1: Intercept Webhook Requests and Detect Transient Errors

The Express middleware receives webhook payloads from Cognigy.AI. It attempts to process the request against your downstream service. If the downstream service returns a 5xx status or throws a network error, the middleware classifies it as transient, stores the payload, and returns a graceful fallback to Cognigy.AI to prevent dialog timeout.

// middleware.ts
import { Request, Response, NextFunction } from "express";
import { v4 as uuidv4 } from "uuid";
import { Queue } from "bullmq";
import { z } from "zod";

// Cognigy.AI webhook payload schema
const WebhookPayloadSchema = z.object({
  sessionId: z.string(),
  context: z.record(z.any()),
  input: z.object({ text: z.string().optional() }),
  session: z.object({
    variables: z.record(z.any()).optional(),
    snapshot: z.any().optional()
  })
});

type WebhookPayload = z.infer<typeof WebhookPayloadSchema>;

// Durable queue connection (Redis)
const connection = { host: process.env.REDIS_HOST || "localhost", port: 6379 };
const retryQueue = new Queue("cognigy-webhook-retries", { connection });

export async function cognigyWebhookInterceptor(
  req: Request,
  res: Response,
  next: NextFunction
) {
  const parsed = WebhookPayloadSchema.safeParse(req.body);
  if (!parsed.success) {
    return res.status(400).json({ error: "Invalid Cognigy.AI webhook payload" });
  }

  const payload: WebhookPayload = parsed.data;

  try {
    // Simulate downstream processing
    await processDownstreamService(payload);
    res.status(200).json({ status: "processed", sessionId: payload.sessionId });
  } catch (error) {
    const isTransient = axios.isAxiosError(error) && error.response?.status && error.response.status >= 500;
    const isNetworkFailure = axios.isAxiosError(error) && !error.response;

    if (isTransient || isNetworkFailure) {
      const retryId = uuidv4();
      await retryQueue.add("retry-webhook", {
        id: retryId,
        payload,
        attempt: 1,
        maxAttempts: 5,
        timestamp: Date.now()
      }, {
        jobId: retryId,
        removeOnComplete: true,
        attempts: 5,
        backoff: { type: "exponential", delay: 2000 }
      });

      // Return fallback to Cognigy.AI to keep dialog alive
      return res.status(200).json({
        status: "deferred",
        retryId,
        message: "Processing queued for retry"
      });
    }

    next(error);
  }
}

async function processDownstreamService(payload: WebhookPayload): Promise<void> {
  // Replace with actual downstream API call
  const axios = require("axios");
  await axios.post("https://your-downstream-api.com/process", payload, { timeout: 4000 });
}

Step 2: Durable Queue with Exponential Backoff Scheduling

BullMQ handles persistence, job locking, and exponential backoff automatically. The worker processes queued jobs, reconstructs the original request, and attempts the downstream call again. If it succeeds, the worker updates Cognigy session variables. If it fails again, BullMQ schedules the next attempt according to the backoff policy.

// worker.ts
import { Worker, Job } from "bullmq";
import { getCognigyApiClient } from "./auth";
import { z } from "zod";

const WebhookPayloadSchema = z.object({
  sessionId: z.string(),
  context: z.record(z.any()),
  input: z.object({ text: z.string().optional() }),
  session: z.object({
    variables: z.record(z.any()).optional(),
    snapshot: z.any().optional()
  })
});

type WebhookPayload = z.infer<typeof WebhookPayloadSchema>;

const connection = { host: process.env.REDIS_HOST || "localhost", port: 6379 };

const worker = new Worker("cognigy-webhook-retries", async (job: Job) => {
  const { payload, attempt } = job.data;
  const parsed = WebhookPayloadSchema.safeParse(payload);
  if (!parsed.success) throw new Error("Corrupted payload in queue");

  const webhookData: WebhookPayload = parsed.data;

  // Reconstruct dialog context from session snapshot
  const reconstructedContext = {
    ...webhookData.context,
    ...webhookData.session.snapshot,
    _retryAttempt: attempt,
    _lastRetryTimestamp: new Date().toISOString()
  };

  try {
    await processDownstreamService(webhookData, reconstructedContext);
    
    // Update Cognigy session variables to indicate success
    await updateCognigySessionVariables(webhookData.sessionId, {
      __retryAttempt: attempt,
      __retryStatus: "success",
      __lastRetryTimestamp: new Date().toISOString()
    });

    return { success: true, attempt };
  } catch (error) {
    // Update Cognigy session variables to indicate failure
    await updateCognigySessionVariables(webhookData.sessionId, {
      __retryAttempt: attempt,
      __retryStatus: "failed",
      __lastError: String(error)
    });

    throw error; // BullMQ will schedule backoff automatically
  }
}, { connection, concurrency: 3 });

async function processDownstreamService(payload: WebhookPayload, context: any): Promise<void> {
  const axios = require("axios");
  await axios.post("https://your-downstream-api.com/process", { ...payload, context }, { timeout: 4000 });
}

export { worker };

Step 3: Reconstruct Dialog Context and Update Session Variables

Cognigy.AI stores session state in its runtime. The middleware updates session variables via the Runtime API to maintain visibility into retry state. This allows bot designers to read __retryStatus and __retryAttempt directly in the flow.

// cognigy-session.ts
import { getCognigyApiClient } from "./auth";
import { AxiosError } from "axios";

export async function updateCognigySessionVariables(
  sessionId: string,
  variables: Record<string, any>
): Promise<void> {
  const client = getCognigyApiClient();
  
  // HTTP Request Cycle
  // PUT /api/v1/runtime/sessions/{sessionId}/variables
  // Headers: Authorization: Bearer <token>, Content-Type: application/json
  // Body: { "variables": { "__retryAttempt": 2, "__retryStatus": "success" } }
  // Response: { "status": "updated", "sessionId": "abc-123" }

  try {
    await client.put(`/sessions/${sessionId}/variables`, { variables });
  } catch (error) {
    if (axios.isAxiosError(error)) {
      if (error.response?.status === 404) {
        console.warn(`Session ${sessionId} not found. Skipping variable update.`);
        return;
      }
      if (error.response?.status === 403) {
        throw new Error("Missing runtime:session:write scope. Check OAuth configuration.");
      }
      if (error.response?.status === 429) {
        // Handled by interceptor, but log for observability
        console.warn(`Rate limited updating session ${sessionId}. Retry handled by client.`);
        return;
      }
    }
    throw error;
  }
}

Step 4: Dashboard Endpoint for Monitoring Retry Success Rates

The dashboard endpoint aggregates queue metrics and recent job outcomes. It supports pagination to handle high-volume environments.

// dashboard.ts
import { Router, Request, Response } from "express";
import { Queue } from "bullmq";

const router = Router();
const connection = { host: process.env.REDIS_HOST || "localhost", port: 6379 };
const retryQueue = new Queue("cognigy-webhook-retries", { connection });

router.get("/retry/dashboard", async (req: Request, res: Response) => {
  const page = parseInt(req.query.page as string) || 1;
  const limit = parseInt(req.query.limit as string) || 20;
  const offset = (page - 1) * limit;

  try {
    const waiting = await retryQueue.getWaiting();
    const active = await retryQueue.getActive();
    const failed = await retryQueue.getFailed();
    const completed = await retryQueue.getCompleted();

    // Paginated recent completed jobs
    const recentJobs = await retryQueue.getJobs(["completed", "failed"], offset, offset + limit, true);
    
    const successCount = recentJobs.filter(j => j.state === "completed").length;
    const failureCount = recentJobs.length - successCount;
    const successRate = recentJobs.length > 0 ? (successCount / recentJobs.length) * 100 : 0;

    const dashboardData = {
      queueDepth: { waiting: waiting.length, active: active.length, failed: failed.length },
      metrics: {
        successRate: parseFloat(successRate.toFixed(2)),
        totalRecentJobs: recentJobs.length,
        successes: successCount,
        failures: failureCount
      },
      recentJobs: recentJobs.map(j => ({
        id: j.id,
        state: j.state,
        attempt: j.attemptsMade,
        timestamp: j.finishedOn
      })),
      pagination: { page, limit, total: completed.length + failed.length }
    };

    res.status(200).json(dashboardData);
  } catch (error) {
    console.error("Dashboard fetch failed:", error);
    res.status(500).json({ error: "Failed to retrieve retry dashboard metrics" });
  }
});

export { router as dashboardRouter };

Complete Working Example

The following module combines authentication, middleware, worker, and dashboard routing into a single runnable Express application. Replace environment variables with your Cognigy.AI instance credentials and Redis connection details.

// index.ts
import express, { Request, Response } from "express";
import dotenv from "dotenv";
import { cognigyWebhookInterceptor } from "./middleware";
import { worker } from "./worker";
import { dashboardRouter } from "./dashboard";

dotenv.config();

const app = express();
app.use(express.json());

// Webhook interception endpoint
app.post("/webhook/cognigy", cognigyWebhookInterceptor);

// Monitoring dashboard
app.use("/api", dashboardRouter);

// Health check
app.get("/health", (_req: Request, res: Response) => {
  res.status(200).json({ status: "operational" });
});

// Global error handler
app.use((err: Error, _req: Request, res: Response, _next: express.NextFunction) => {
  console.error("Unhandled error:", err);
  res.status(500).json({ error: "Internal server error", message: err.message });
});

const PORT = process.env.PORT || 3000;

app.listen(PORT, () => {
  console.log(`Cognigy.AI Retry Middleware running on port ${PORT}`);
  console.log("Worker initialized. Processing queued retries with exponential backoff.");
});

export default app;

Common Errors & Debugging

Error: 401 Unauthorized

  • Cause: OAuth token expired, client credentials incorrect, or missing Authorization header.
  • Fix: Verify COGNIGY_CLIENT_ID and COGNIGY_CLIENT_SECRET match your Cognigy.AI integration settings. Ensure the token refresh logic runs before each Runtime API call.
  • Code Fix: The interceptor in auth.ts automatically refreshes tokens when now >= tokenExpiry - 60000.

Error: 403 Forbidden

  • Cause: Missing required OAuth scopes. Cognigy.AI requires runtime:session:write to update session variables.
  • Fix: Regenerate the OAuth token with the correct scope string. Update the scope parameter in the token request.
  • Code Fix: The token request explicitly requests runtime:session:read runtime:session:write.

Error: 429 Too Many Requests

  • Cause: Cognigy.AI Runtime API rate limits triggered by high retry volume.
  • Fix: The axiosClient interceptor implements exponential backoff with retry-after header parsing. Ensure your downstream service does not generate retry storms.
  • Code Fix: The response interceptor checks error.response?.status === 429 and delays retries up to 10 seconds.

Error: 5xx Downstream Failure

  • Cause: Target service unavailable, timeout, or malformed request.
  • Fix: The middleware catches 5xx and network failures, queues the payload, and returns a 200 fallback to Cognigy.AI. BullMQ handles up to 5 attempts with exponential delay.
  • Code Fix: cognigyWebhookInterceptor classifies error.response?.status >= 500 as transient and adds the job to retryQueue.

Official References