Implementing Retry Logic for NICE Cognigy.AI Webhook Failures with TypeScript
What You Will Build
This tutorial builds a production-grade Express middleware that intercepts incoming Cognigy.AI webhook requests, catches transient downstream failures, queues the original payloads with exponential backoff, reconstructs dialog context from session snapshots, updates Cognigy runtime session variables to track retry state, and exposes a paginated dashboard endpoint for monitoring success rates. The implementation uses the NICE Cognigy.AI Runtime REST API and standard TypeScript tooling. The code is written in TypeScript and runs on Node.js.
Prerequisites
- NICE Cognigy.AI OAuth2 client credentials with scopes
runtime:session:readandruntime:session:write - Cognigy.AI API version
v1(Runtime endpoints) - Node.js 18+ with TypeScript 5+
- Dependencies:
express,axios,bullmq,redis,zod,dotenv,uuid
npm install express axios bullmq redis zod uuid dotenv
npm install -D typescript @types/express @types/node @types/uuid
Authentication Setup
Cognigy.AI uses standard OAuth2 client credentials flow for programmatic access. The middleware must acquire a bearer token before calling the Runtime API. Token caching prevents unnecessary authentication requests, and automatic refresh handles expiration.
// auth.ts
import axios, { AxiosInstance, AxiosError } from "axios";
import dotenv from "dotenv";
dotenv.config();
const COGNIGY_BASE_URL = process.env.COGNIGY_INSTANCE_URL || "https://your-instance.cognigy.ai";
const COGNIGY_CLIENT_ID = process.env.COGNIGY_CLIENT_ID!;
const COGNIGY_CLIENT_SECRET = process.env.COGNIGY_CLIENT_SECRET!;
let cachedToken: string | null = null;
let tokenExpiry: number = 0;
let axiosClient: AxiosInstance;
async function getToken(): Promise<string> {
const now = Date.now();
if (cachedToken && now < tokenExpiry - 60000) {
return cachedToken;
}
const tokenUrl = `${COGNIGY_BASE_URL}/api/v1/oauth/token`;
// HTTP Request Cycle
// POST /api/v1/oauth/token
// Headers: Content-Type: application/x-www-form-urlencoded, Accept: application/json
// Body: grant_type=client_credentials&scope=runtime:session:read runtime:session:write
try {
const response = await axios.post(tokenUrl, new URLSearchParams({
grant_type: "client_credentials",
scope: "runtime:session:read runtime:session:write"
}), {
auth: { username: COGNIGY_CLIENT_ID, password: COGNIGY_CLIENT_SECRET },
headers: { "Content-Type": "application/x-www-form-urlencoded" }
});
cachedToken = response.data.access_token;
tokenExpiry = now + (response.data.expires_in * 1000);
return cachedToken;
} catch (error) {
if (axios.isAxiosError(error) && error.response?.status === 401) {
throw new Error("OAuth authentication failed. Verify client credentials.");
}
throw error;
}
}
function getCognigyApiClient(): AxiosInstance {
if (!axiosClient) {
axiosClient = axios.create({
baseURL: `${COGNIGY_BASE_URL}/api/v1/runtime`,
headers: { "Content-Type": "application/json" }
});
axiosClient.interceptors.request.use(async (config) => {
const token = await getToken();
config.headers.Authorization = `Bearer ${token}`;
return config;
});
// 429 Retry Logic with Exponential Backoff
axiosClient.interceptors.response.use(
(response) => response,
async (error: AxiosError) => {
const originalRequest = error.config as any;
if (!originalRequest) return Promise.reject(error);
if (error.response?.status === 429 && !originalRequest._retryCount) {
originalRequest._retryCount = originalRequest._retryCount || 0;
originalRequest._retryCount++;
const retryAfter = error.response?.headers["retry-after"]
? parseInt(error.response.headers["retry-after"], 10) * 1000
: Math.min(1000 * Math.pow(2, originalRequest._retryCount), 10000);
await new Promise((resolve) => setTimeout(resolve, retryAfter));
return axiosClient(originalRequest);
}
return Promise.reject(error);
}
);
}
return axiosClient;
}
export { getCognigyApiClient };
Implementation
Step 1: Intercept Webhook Requests and Detect Transient Errors
The Express middleware receives webhook payloads from Cognigy.AI. It attempts to process the request against your downstream service. If the downstream service returns a 5xx status or throws a network error, the middleware classifies it as transient, stores the payload, and returns a graceful fallback to Cognigy.AI to prevent dialog timeout.
// middleware.ts
import { Request, Response, NextFunction } from "express";
import { v4 as uuidv4 } from "uuid";
import { Queue } from "bullmq";
import { z } from "zod";
// Cognigy.AI webhook payload schema
const WebhookPayloadSchema = z.object({
sessionId: z.string(),
context: z.record(z.any()),
input: z.object({ text: z.string().optional() }),
session: z.object({
variables: z.record(z.any()).optional(),
snapshot: z.any().optional()
})
});
type WebhookPayload = z.infer<typeof WebhookPayloadSchema>;
// Durable queue connection (Redis)
const connection = { host: process.env.REDIS_HOST || "localhost", port: 6379 };
const retryQueue = new Queue("cognigy-webhook-retries", { connection });
export async function cognigyWebhookInterceptor(
req: Request,
res: Response,
next: NextFunction
) {
const parsed = WebhookPayloadSchema.safeParse(req.body);
if (!parsed.success) {
return res.status(400).json({ error: "Invalid Cognigy.AI webhook payload" });
}
const payload: WebhookPayload = parsed.data;
try {
// Simulate downstream processing
await processDownstreamService(payload);
res.status(200).json({ status: "processed", sessionId: payload.sessionId });
} catch (error) {
const isTransient = axios.isAxiosError(error) && error.response?.status && error.response.status >= 500;
const isNetworkFailure = axios.isAxiosError(error) && !error.response;
if (isTransient || isNetworkFailure) {
const retryId = uuidv4();
await retryQueue.add("retry-webhook", {
id: retryId,
payload,
attempt: 1,
maxAttempts: 5,
timestamp: Date.now()
}, {
jobId: retryId,
removeOnComplete: true,
attempts: 5,
backoff: { type: "exponential", delay: 2000 }
});
// Return fallback to Cognigy.AI to keep dialog alive
return res.status(200).json({
status: "deferred",
retryId,
message: "Processing queued for retry"
});
}
next(error);
}
}
async function processDownstreamService(payload: WebhookPayload): Promise<void> {
// Replace with actual downstream API call
const axios = require("axios");
await axios.post("https://your-downstream-api.com/process", payload, { timeout: 4000 });
}
Step 2: Durable Queue with Exponential Backoff Scheduling
BullMQ handles persistence, job locking, and exponential backoff automatically. The worker processes queued jobs, reconstructs the original request, and attempts the downstream call again. If it succeeds, the worker updates Cognigy session variables. If it fails again, BullMQ schedules the next attempt according to the backoff policy.
// worker.ts
import { Worker, Job } from "bullmq";
import { getCognigyApiClient } from "./auth";
import { z } from "zod";
const WebhookPayloadSchema = z.object({
sessionId: z.string(),
context: z.record(z.any()),
input: z.object({ text: z.string().optional() }),
session: z.object({
variables: z.record(z.any()).optional(),
snapshot: z.any().optional()
})
});
type WebhookPayload = z.infer<typeof WebhookPayloadSchema>;
const connection = { host: process.env.REDIS_HOST || "localhost", port: 6379 };
const worker = new Worker("cognigy-webhook-retries", async (job: Job) => {
const { payload, attempt } = job.data;
const parsed = WebhookPayloadSchema.safeParse(payload);
if (!parsed.success) throw new Error("Corrupted payload in queue");
const webhookData: WebhookPayload = parsed.data;
// Reconstruct dialog context from session snapshot
const reconstructedContext = {
...webhookData.context,
...webhookData.session.snapshot,
_retryAttempt: attempt,
_lastRetryTimestamp: new Date().toISOString()
};
try {
await processDownstreamService(webhookData, reconstructedContext);
// Update Cognigy session variables to indicate success
await updateCognigySessionVariables(webhookData.sessionId, {
__retryAttempt: attempt,
__retryStatus: "success",
__lastRetryTimestamp: new Date().toISOString()
});
return { success: true, attempt };
} catch (error) {
// Update Cognigy session variables to indicate failure
await updateCognigySessionVariables(webhookData.sessionId, {
__retryAttempt: attempt,
__retryStatus: "failed",
__lastError: String(error)
});
throw error; // BullMQ will schedule backoff automatically
}
}, { connection, concurrency: 3 });
async function processDownstreamService(payload: WebhookPayload, context: any): Promise<void> {
const axios = require("axios");
await axios.post("https://your-downstream-api.com/process", { ...payload, context }, { timeout: 4000 });
}
export { worker };
Step 3: Reconstruct Dialog Context and Update Session Variables
Cognigy.AI stores session state in its runtime. The middleware updates session variables via the Runtime API to maintain visibility into retry state. This allows bot designers to read __retryStatus and __retryAttempt directly in the flow.
// cognigy-session.ts
import { getCognigyApiClient } from "./auth";
import { AxiosError } from "axios";
export async function updateCognigySessionVariables(
sessionId: string,
variables: Record<string, any>
): Promise<void> {
const client = getCognigyApiClient();
// HTTP Request Cycle
// PUT /api/v1/runtime/sessions/{sessionId}/variables
// Headers: Authorization: Bearer <token>, Content-Type: application/json
// Body: { "variables": { "__retryAttempt": 2, "__retryStatus": "success" } }
// Response: { "status": "updated", "sessionId": "abc-123" }
try {
await client.put(`/sessions/${sessionId}/variables`, { variables });
} catch (error) {
if (axios.isAxiosError(error)) {
if (error.response?.status === 404) {
console.warn(`Session ${sessionId} not found. Skipping variable update.`);
return;
}
if (error.response?.status === 403) {
throw new Error("Missing runtime:session:write scope. Check OAuth configuration.");
}
if (error.response?.status === 429) {
// Handled by interceptor, but log for observability
console.warn(`Rate limited updating session ${sessionId}. Retry handled by client.`);
return;
}
}
throw error;
}
}
Step 4: Dashboard Endpoint for Monitoring Retry Success Rates
The dashboard endpoint aggregates queue metrics and recent job outcomes. It supports pagination to handle high-volume environments.
// dashboard.ts
import { Router, Request, Response } from "express";
import { Queue } from "bullmq";
const router = Router();
const connection = { host: process.env.REDIS_HOST || "localhost", port: 6379 };
const retryQueue = new Queue("cognigy-webhook-retries", { connection });
router.get("/retry/dashboard", async (req: Request, res: Response) => {
const page = parseInt(req.query.page as string) || 1;
const limit = parseInt(req.query.limit as string) || 20;
const offset = (page - 1) * limit;
try {
const waiting = await retryQueue.getWaiting();
const active = await retryQueue.getActive();
const failed = await retryQueue.getFailed();
const completed = await retryQueue.getCompleted();
// Paginated recent completed jobs
const recentJobs = await retryQueue.getJobs(["completed", "failed"], offset, offset + limit, true);
const successCount = recentJobs.filter(j => j.state === "completed").length;
const failureCount = recentJobs.length - successCount;
const successRate = recentJobs.length > 0 ? (successCount / recentJobs.length) * 100 : 0;
const dashboardData = {
queueDepth: { waiting: waiting.length, active: active.length, failed: failed.length },
metrics: {
successRate: parseFloat(successRate.toFixed(2)),
totalRecentJobs: recentJobs.length,
successes: successCount,
failures: failureCount
},
recentJobs: recentJobs.map(j => ({
id: j.id,
state: j.state,
attempt: j.attemptsMade,
timestamp: j.finishedOn
})),
pagination: { page, limit, total: completed.length + failed.length }
};
res.status(200).json(dashboardData);
} catch (error) {
console.error("Dashboard fetch failed:", error);
res.status(500).json({ error: "Failed to retrieve retry dashboard metrics" });
}
});
export { router as dashboardRouter };
Complete Working Example
The following module combines authentication, middleware, worker, and dashboard routing into a single runnable Express application. Replace environment variables with your Cognigy.AI instance credentials and Redis connection details.
// index.ts
import express, { Request, Response } from "express";
import dotenv from "dotenv";
import { cognigyWebhookInterceptor } from "./middleware";
import { worker } from "./worker";
import { dashboardRouter } from "./dashboard";
dotenv.config();
const app = express();
app.use(express.json());
// Webhook interception endpoint
app.post("/webhook/cognigy", cognigyWebhookInterceptor);
// Monitoring dashboard
app.use("/api", dashboardRouter);
// Health check
app.get("/health", (_req: Request, res: Response) => {
res.status(200).json({ status: "operational" });
});
// Global error handler
app.use((err: Error, _req: Request, res: Response, _next: express.NextFunction) => {
console.error("Unhandled error:", err);
res.status(500).json({ error: "Internal server error", message: err.message });
});
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`Cognigy.AI Retry Middleware running on port ${PORT}`);
console.log("Worker initialized. Processing queued retries with exponential backoff.");
});
export default app;
Common Errors & Debugging
Error: 401 Unauthorized
- Cause: OAuth token expired, client credentials incorrect, or missing
Authorizationheader. - Fix: Verify
COGNIGY_CLIENT_IDandCOGNIGY_CLIENT_SECRETmatch your Cognigy.AI integration settings. Ensure the token refresh logic runs before each Runtime API call. - Code Fix: The interceptor in
auth.tsautomatically refreshes tokens whennow >= tokenExpiry - 60000.
Error: 403 Forbidden
- Cause: Missing required OAuth scopes. Cognigy.AI requires
runtime:session:writeto update session variables. - Fix: Regenerate the OAuth token with the correct scope string. Update the
scopeparameter in the token request. - Code Fix: The token request explicitly requests
runtime:session:read runtime:session:write.
Error: 429 Too Many Requests
- Cause: Cognigy.AI Runtime API rate limits triggered by high retry volume.
- Fix: The
axiosClientinterceptor implements exponential backoff withretry-afterheader parsing. Ensure your downstream service does not generate retry storms. - Code Fix: The response interceptor checks
error.response?.status === 429and delays retries up to 10 seconds.
Error: 5xx Downstream Failure
- Cause: Target service unavailable, timeout, or malformed request.
- Fix: The middleware catches 5xx and network failures, queues the payload, and returns a 200 fallback to Cognigy.AI. BullMQ handles up to 5 attempts with exponential delay.
- Code Fix:
cognigyWebhookInterceptorclassifieserror.response?.status >= 500as transient and adds the job toretryQueue.