Generating Custom Vector Embeddings for Cognigy.AI Knowledge Bases with TypeScript
What You Will Build
- A TypeScript utility that extracts raw text from PDF and DOCX documents, splits content into overlapping semantic chunks, and generates vector embeddings via a model API.
- The script indexes vectors in a Pinecone collection with structured metadata, retrieves top-k matches for natural language queries, and injects the results into an active Cognigy.AI conversation via the Session API.
- All logic is implemented in modern TypeScript with explicit type definitions, production-ready error handling, and explicit retry logic for rate limits.
Prerequisites
- Cognigy.AI API token configured with
sessions:readandsessions:writescopes - Pinecone API key and a pre-created index with dimensions matching your embedding model (1536 for
text-embedding-3-small) - OpenAI API key (or any endpoint supporting the OpenAI embedding schema)
- Node.js 18 or higher
- Dependencies:
@pinecone-database/pinecone,openai,pdf-parse,mammoth,axios,dotenv - TypeScript 5.0+ with
esModuleInteropandstrictenabled intsconfig.json
Authentication Setup
Cognigy.AI uses bearer token authentication. The token must be generated in the Cognigy platform under Settings > API Tokens and assigned the required scopes. Pinecone and OpenAI use standard API key authentication. The following initialization block loads credentials from environment variables and configures the HTTP clients with explicit timeout and header defaults.
import * as dotenv from 'dotenv';
dotenv.config();
import axios from 'axios';
import OpenAI from 'openai';
import { Pinecone } from '@pinecone-database/pinecone';
// Validate required environment variables
const requiredVars = ['COGNIFY_API_TOKEN', 'PINECONE_API_KEY', 'PINECONE_INDEX_NAME', 'OPENAI_API_KEY'];
for (const v of requiredVars) {
if (!process.env[v]) throw new Error(`Missing required environment variable: ${v}`);
}
// Cognigy Session API client
export const cognigyClient = axios.create({
baseURL: 'https://api.cognigy.ai/api/v1',
headers: {
'Authorization': `Bearer ${process.env.COIGNY_API_TOKEN}`,
'Content-Type': 'application/json'
},
timeout: 10000
});
// OpenAI client for embeddings
export const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
timeout: 15000
});
// Pinecone client and index reference
export const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! });
export const index = pc.Index(process.env.PINECONE_INDEX_NAME!);
Implementation
Step 1: Text Extraction from PDF and DOCX
Document parsing requires binary buffer handling. The pdf-parse library returns a parsed object with a text property, while mammoth extracts raw text from DOCX files. Both libraries operate asynchronously. The function reads the file into memory, determines the extension, and routes to the appropriate parser.
import * as fs from 'fs';
import * as pdf from 'pdf-parse';
import * as mammoth from 'mammoth';
export async function extractText(filePath: string): Promise<string> {
if (!fs.existsSync(filePath)) {
throw new Error(`File not found: ${filePath}`);
}
const buffer = fs.readFileSync(filePath);
const ext = filePath.split('.').pop()?.toLowerCase();
try {
if (ext === 'pdf') {
const data = await pdf(buffer);
return data.text.replace(/\r\n/g, '\n').trim();
}
if (ext === 'docx') {
const result = await mammoth.extractRawText({ buffer });
return result.value.replace(/\r\n/g, '\n').trim();
}
throw new Error('Unsupported file format. Only PDF and DOCX are allowed.');
} catch (error) {
throw new Error(`Failed to parse ${ext} file: ${error instanceof Error ? error.message : 'Unknown error'}`);
}
}
Expected Response: A clean string containing the document text with normalized line endings.
Error Handling: The function throws explicit errors for missing files, unsupported formats, and parser failures. Production systems should wrap this call in a try-catch block and log the original stack trace.
Step 2: Semantic Chunking with Overlap
Vector retrieval performance degrades when chunks exceed the embedding model context window or when semantic boundaries are ignored. This function implements a sliding window approach with character limits and configurable overlap. It aligns split points to whitespace to prevent breaking words mid-token.
export function chunkText(text: string, chunkSize: number = 500, overlap: number = 50): string[] {
if (text.length === 0) return [];
const chunks: string[] = [];
let start = 0;
while (start < text.length) {
let end = start + chunkSize;
if (end >= text.length) {
const finalChunk = text.slice(start).trim();
if (finalChunk.length > 0) chunks.push(finalChunk);
break;
}
// Align to nearest whitespace to preserve word integrity
const nextSpace = text.lastIndexOf(' ', end);
if (nextSpace > start) {
end = nextSpace;
}
const chunk = text.slice(start, end).trim();
if (chunk.length > 0) chunks.push(chunk);
// Advance start position, accounting for overlap
start = end - overlap;
}
return chunks;
}
Expected Response: An array of strings, each containing a segment of the original document with preserved context at the boundaries.
Error Handling: The function guards against empty inputs and ensures the loop terminates correctly when the remaining text is smaller than the overlap threshold.
Step 3: Embedding Generation and Pinecone Uplink
Embedding APIs enforce request limits. OpenAI allows up to 1024 inputs per request. This step batches chunks, generates vectors, and upserts them into Pinecone with structured metadata. A retry mechanism handles transient 429 rate limit responses.
import { PineconeRecord } from '@pinecone-database/pinecone';
async function retryWithBackoff<T>(fn: () => Promise<T>, maxRetries: number = 3, baseDelay: number = 1000): Promise<T> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error: any) {
const isRateLimit = error?.status === 429 || error?.code === 'rate_limit_exceeded';
if (isRateLimit && attempt < maxRetries) {
const delay = baseDelay * Math.pow(2, attempt - 1);
console.log(`Rate limit hit. Retrying in ${delay}ms (attempt ${attempt}/${maxRetries})`);
await new Promise(resolve => setTimeout(resolve, delay));
} else {
throw error;
}
}
}
throw new Error('Max retries exceeded');
}
export async function generateAndIndexEmbeddings(chunks: string[], sourceId: string) {
const batchSize = 100;
const namespace = 'cognigy-kb';
for (let i = 0; i < chunks.length; i += batchSize) {
const batch = chunks.slice(i, i + batchSize);
const embeddingResponse = await retryWithBackoff(() =>
openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch,
dimensions: 1536
})
);
const records: PineconeRecord[] = embeddingResponse.data.map((embedding, idx) => ({
id: `${sourceId}-chunk-${i + idx}`,
values: embedding.embedding,
metadata: {
source: sourceId,
chunkIndex: i + idx,
content: batch[idx],
timestamp: new Date().toISOString(),
chunkSize: batch[idx].length
}
}));
await index.namespace(namespace).upsert(records);
console.log(`Indexed batch ${Math.floor(i / batchSize) + 1}: ${records.length} vectors`);
}
}
Expected Response: Pinecone returns an object containing upsertedCount. The console logs confirm successful batching.
Error Handling: The retryWithBackoff wrapper catches 429 status codes and implements exponential backoff. Dimension mismatches or invalid metadata types will throw immediately, requiring index recreation.
Step 4: Retrieval and Cognigy Session API Update
Retrieval requires embedding the user query, querying Pinecone for cosine similarity matches, and pushing the top-k results into the active Cognigy session. The Cognigy Session API accepts a PUT request to update context variables. The following function demonstrates the full request cycle.
export async function retrieveAndPushToCognigy(query: string, sessionId: string, topK: number = 3) {
// 1. Embed the query
const queryResponse = await retryWithBackoff(() =>
openai.embeddings.create({
model: 'text-embedding-3-small',
input: query,
dimensions: 1536
})
);
const queryVector = queryResponse.data[0].embedding;
// 2. Query Pinecone
const pineconeResults = await index.namespace('cognigy-kb').query({
vector: queryVector,
topK,
includeMetadata: true
});
// 3. Format snippets for Cognigy
const snippets = pineconeResults.matches.map(match => ({
text: match.metadata?.content as string,
relevanceScore: match.score,
sourceId: match.metadata?.source as string,
chunkIndex: match.metadata?.chunkIndex as number
}));
// 4. Update Cognigy Session Context
const cognigyPayload = {
context: {
vectorRetrieval: {
query: query,
snippets: snippets,
retrievalTimestamp: new Date().toISOString()
}
}
};
// HTTP Request Cycle Documentation:
// Method: PUT
// Path: /api/v1/sessions/{sessionId}
// Headers: Authorization: Bearer <token>, Content-Type: application/json
// Body: { "context": { "vectorRetrieval": { ... } } }
const response = await cognigyClient.put(`/sessions/${sessionId}`, cognigyPayload);
// Expected Response Body:
// {
// "sessionId": "abc123",
// "context": {
// "vectorRetrieval": {
// "query": "How do I reset my password?",
// "snippets": [ ... ],
// "retrievalTimestamp": "2024-05-20T10:30:00.000Z"
// }
// },
// "updated": true
// }
if (response.status !== 200) {
throw new Error(`Cognigy API returned status ${response.status}`);
}
return snippets;
}
Expected Response: The Cognigy API returns the updated session object with the injected context variables. The function returns the matched snippets for local logging or further processing.
Error Handling: The axios client throws on non-2xx responses. The function explicitly checks status codes and propagates failures with descriptive messages. Session expiration or invalid IDs return 404, which the calling layer should catch.
Complete Working Example
The following module combines all components into a single executable script. Replace placeholder values with your actual file paths and credentials.
import * as dotenv from 'dotenv';
dotenv.config();
import { extractText } from './extractors';
import { chunkText } from './chunking';
import { generateAndIndexEmbeddings } from './indexing';
import { retrieveAndPushToCognigy } from './retrieval';
async function main() {
try {
const filePath = process.argv[2] || './docs/faq.pdf';
const sessionId = process.argv[3] || 'your-active-session-id';
const userQuery = process.argv[4] || 'What is the refund policy?';
console.log('Extracting text from document...');
const rawText = await extractText(filePath);
console.log('Chunking content with 500 char limit and 50 char overlap...');
const chunks = chunkText(rawText, 500, 50);
console.log(`Generated ${chunks.length} chunks`);
console.log('Generating embeddings and upserting to Pinecone...');
await generateAndIndexEmbeddings(chunks, 'faq-doc-001');
console.log(`Retrieving top matches for: "${userQuery}"`);
const results = await retrieveAndPushToCognigy(userQuery, sessionId, 3);
console.log('Successfully pushed snippets to Cognigy session.');
console.log('Retrieved snippets:', JSON.stringify(results, null, 2));
} catch (error) {
console.error('Pipeline failed:', error instanceof Error ? error.message : error);
process.exit(1);
}
}
main();
Common Errors & Debugging
Error: 400 Bad Request - Dimension Mismatch
- What causes it: Pinecone requires the index dimensions to exactly match the embedding model output. Creating an index with 1536 dimensions but querying with a 768-dimensional model triggers this error.
- How to fix it: Delete the existing Pinecone index and recreate it with the correct dimension parameter. Verify the
dimensionsfield in youropenai.embeddings.createcall matches the index configuration. - Code showing the fix:
// Ensure dimensions match exactly const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: batch, dimensions: 1536 // Must match Pinecone index creation params });
Error: 401 Unauthorized - Invalid Cognigy Token
- What causes it: The API token is expired, revoked, or lacks the
sessions:writescope. - How to fix it: Regenerate the token in the Cognigy platform. Verify scope assignments in the API token configuration panel. Ensure the
Authorizationheader uses theBearerprefix. - Code showing the fix:
// Validate token format before request if (!process.env.COIGNY_API_TOKEN || !process.env.COIGNY_API_TOKEN.startsWith('sk_')) { throw new Error('Invalid Cognigy API token format'); }
Error: 404 Not Found - Session ID Expired
- What causes it: Cognigy sessions have a TTL (time-to-live). Querying a session after expiration returns 404.
- How to fix it: Capture the session ID during the initial webhook or trigger event. Pass the ID immediately to the retrieval function. Implement session validation before upserting context.
- Code showing the fix:
try { await cognigyClient.get(`/sessions/${sessionId}`); // Verify session exists } catch (err: any) { if (err.response?.status === 404) { throw new Error(`Session ${sessionId} has expired or does not exist`); } throw err; }
Error: 429 Too Many Requests - Rate Limit Cascade
- What causes it: Rapid batch uploads or concurrent retrieval requests exceed OpenAI or Pinecone throughput limits.
- How to fix it: Implement exponential backoff with jitter. Reduce batch sizes. Add a delay between sequential upsert calls.
- Code showing the fix:
// Jittered exponential backoff const delay = (baseDelay * Math.pow(2, attempt - 1)) + (Math.random() * 500); await new Promise(resolve => setTimeout(resolve, delay));