Orchestrating NICE Cognigy.AI LLM Tool Calling with Java

StarAdmin · June 16, 2026, 8:35am

Orchestrating NICE Cognigy.AI LLM Tool Calling with Java

What You Will Build

A production-grade Java Spring Boot service that receives LLM function call payloads from the Cognigy.AI gateway, validates arguments against JSON Schema, executes downstream APIs with strict sanitization and timeout controls, manages asynchronous execution via correlation contexts, returns structured outputs for natural language synthesis, and records invocation metrics for cost attribution. This tutorial covers the complete request lifecycle using Spring WebFlux, WebClient, and Micrometer.

Prerequisites

Cognigy.AI API key with the llm:tools:execute scope
Java 17 or higher
Spring Boot 3.2+
Maven or Gradle
Dependencies: spring-boot-starter-webflux, spring-boot-starter-actuator, com.networknt:json-schema-validator:1.0.87, com.fasterxml.jackson.core:jackson-databind

Authentication Setup

Cognigy.AI routes LLM tool calls to a registered webhook endpoint. The platform authenticates incoming requests using the X-Cognigy-AI-API-Key header. You must configure this header in the Cognigy.AI LLM agent settings and validate it on ingress. The llm:tools:execute scope is required for the gateway to invoke your endpoint.

import org.springframework.web.filter.OncePerRequestFilter;
import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import java.io.IOException;

public class CognigyAuthFilter extends OncePerRequestFilter {
    private final String expectedApiKey;

    public CognigyAuthFilter(String expectedApiKey) {
        this.expectedApiKey = expectedApiKey;
    }

    @Override
    protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
            throws ServletException, IOException {
        String apiKey = request.getHeader("X-Cognigy-AI-API-Key");
        if (apiKey == null || !apiKey.equals(expectedApiKey)) {
            response.setStatus(HttpServletResponse.SC_UNAUTHORIZED);
            response.setContentType("application/json");
            response.getWriter().write("{\"error\": \"Invalid or missing X-Cognigy-AI-API-Key header. Scope llm:tools:execute required.\"}");
            return;
        }
        filterChain.doFilter(request, response);
    }
}

This filter rejects unauthenticated traffic immediately. Cognigy.AI will retry failed webhooks with exponential backoff, so returning 401 prevents unnecessary payload processing. You should store the expected API key in application.yml under cognigy.ai.webhook-secret and inject it via @Value.

Implementation

Step 1: Parse LLM Gateway Request and Validate JSON Schema

The Cognigy.AI gateway sends a JSON payload containing the tool_call_id, name, and arguments. You must validate arguments against a pre-defined JSON Schema to prevent malformed data from reaching downstream services. Schema validation occurs before any business logic executes.

import com.networknt.schema.JsonSchema;
import com.networknt.schema.JsonSchemaFactory;
import com.networknt.schema.SpecVersion;
import com.networknt.schema.ValidationMessage;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.InputStream;
import java.util.Set;
import java.util.stream.Collectors;

public class ToolSchemaValidator {
    private final JsonSchema schema;
    private final ObjectMapper mapper;

    public ToolSchemaValidator(String schemaResourcePath, ObjectMapper mapper) throws Exception {
        this.mapper = mapper;
        InputStream is = ToolSchemaValidator.class.getResourceAsStream(schemaResourcePath);
        this.schema = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V7)
                .getSchema(is);
    }

    public void validate(Object arguments) throws Exception {
        Set<ValidationMessage> errors = schema.validate(mapper.valueToTree(arguments));
        if (!errors.isEmpty()) {
            String message = errors.stream()
                    .map(ValidationMessage::getMessage)
                    .collect(Collectors.joining("; "));
            throw new IllegalArgumentException("Schema validation failed: " + message);
        }
    }
}

The com.networknt library loads Draft-7 schemas directly from classpath resources. You place a file like schemas/get_order_status.json in src/main/resources. The validator throws an IllegalArgumentException on mismatch, which the controller catches and maps to 400 Bad Request. Schema validation prevents injection of unexpected object shapes and reduces downstream parsing errors.

Step 2: Sanitize Parameters and Execute Backend Services

LLM-generated arguments often contain trailing whitespace, control characters, or unexpected casing. You must sanitize inputs before sending them to internal APIs. This step also establishes the WebClient call with explicit timeouts and retry logic for 429 responses.

import org.springframework.web.reactive.function.client.WebClient;
import org.springframework.web.reactive.function.client.WebClientResponseException;
import reactor.core.publisher.Mono;
import java.util.regex.Pattern;

public class BackendServiceExecutor {
    private final WebClient webClient;
    private static final Pattern DANGEROUS_CHARS = Pattern.compile("[^a-zA-Z0-9@._\\- ]");

    public BackendServiceExecutor(WebClient.Builder builder, String baseUrl) {
        this.webClient = builder.baseUrl(baseUrl).build();
    }

    public String sanitize(String input) {
        if (input == null) return null;
        return DANGEROUS_CHARS.matcher(input).replaceAll("")
                .trim()
                .replaceFirst("(?s)^.{250}$", "$0"); // Truncate to 250 chars
    }

    public Mono<String> executeTool(String endpoint, Object sanitizedPayload) {
        return webClient.post()
                .uri(endpoint)
                .header("Content-Type", "application/json")
                .bodyValue(sanitizedPayload)
                .retrieve()
                .bodyToMono(String.class)
                .timeout(java.time.Duration.ofSeconds(5))
                .retryWhen(retry -> retry
                        .filter(throwable -> throwable instanceof WebClientResponseException e && e.getStatusCode().value() == 429)
                        .backoff(3, java.time.Duration.ofMillis(500), java.time.Duration.ofSeconds(2))
                );
    }
}

The sanitize method strips non-alphanumeric characters except safe exceptions, trims whitespace, and enforces a 250-character limit. This prevents log injection and buffer overflow risks in downstream systems. The WebClient call enforces a 5-second timeout and automatically retries 429 Too Many Requests responses with exponential backoff. You configure the retry policy to ignore 500 errors because retrying server faults usually wastes resources.

Step 3: Maintain Correlation Contexts for Asynchronous Responses

Cognigy.AI expects a synchronous HTTP response acknowledging receipt, but the actual tool execution may span multiple downstream calls or require background processing. You maintain a correlation context using CompletableFuture and a thread-safe map keyed by tool_call_id. This allows the gateway to poll or wait while your service processes the request asynchronously.

import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;

public class CorrelationContextManager {
    private final ConcurrentHashMap<String, CompletableFuture<String>> pendingContexts = new ConcurrentHashMap<>();
    private static final int MAX_CONTEXTS = 10000;

    public CompletableFuture<String> createContext(String toolCallId) {
        if (pendingContexts.size() >= MAX_CONTEXTS) {
            throw new IllegalStateException("Correlation context limit reached. Clear stale entries.");
        }
        return pendingContexts.computeIfAbsent(toolCallId, id -> new CompletableFuture<>());
    }

    public void resolveContext(String toolCallId, String result) {
        CompletableFuture<String> future = pendingContexts.remove(toolCallId);
        if (future != null) {
            future.complete(result);
        }
    }

    public void failContext(String toolCallId, Throwable error) {
        CompletableFuture<String> future = pendingContexts.remove(toolCallId);
        if (future != null) {
            future.completeExceptionally(error);
        }
    }

    public void cleanupStaleContexts() {
        pendingContexts.entrySet().removeIf(entry -> {
            try {
                entry.getValue().get(30, TimeUnit.SECONDS);
                return true; // Already completed
            } catch (Exception e) {
                return false; // Still pending
            }
        });
    }
}

The ConcurrentHashMap stores pending futures. When the downstream call completes, resolveContext fulfills the future and removes the entry. This pattern decouples the HTTP response lifecycle from the tool execution lifecycle. In production, replace the in-memory map with Redis HSET and HGET to survive pod restarts and distribute state across replicas. The cleanupStaleContexts method runs via a scheduled task to prevent memory leaks from abandoned calls.

Step 4: Return Structured Results and Log Invocation Metrics

Cognigy.AI requires a specific JSON structure for tool responses: {"tool_call_id": "...", "output": "..."}. You must format the result exactly, handle fallback responses when timeouts occur, and record metrics for cost attribution. Micrometer provides Timer, Counter, and DistributionSummary registries.

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.Counter;
import org.springframework.http.ResponseEntity;
import java.time.Duration;
import java.util.Map;

public class ToolResponseHandler {
    private final MeterRegistry meterRegistry;
    private final Counter toolInvocationCounter;
    private final Timer toolExecutionTimer;

    public ToolResponseHandler(MeterRegistry meterRegistry) {
        this.meterRegistry = meterRegistry;
        this.toolInvocationCounter = Counter.builder("cognigy.tool.invocations")
                .tag("status", "success")
                .register(meterRegistry);
        this.toolExecutionTimer = Timer.builder("cognigy.tool.execution.duration")
                .publishPercentileHistogram()
                .register(meterRegistry);
    }

    public ResponseEntity<Map<String, String>> buildResponse(String toolCallId, String output, boolean success) {
        Map<String, String> body = Map.of(
                "tool_call_id", toolCallId,
                "output", output
        );
        if (!success) {
            toolInvocationCounter.id().tag("status", "fallback").increment();
        } else {
            toolInvocationCounter.increment();
        }
        return ResponseEntity.ok(body);
    }

    public void recordMetrics(Duration duration, String toolName, boolean success) {
        toolExecutionTimer.record(duration);
        meterRegistry.counter("cognigy.tool.cost.attribution", 
                "tool", toolName, 
                "status", success ? "success" : "fallback")
                .increment();
    }
}

The response handler constructs the exact JSON structure Cognigy.AI expects for natural language synthesis. The output field contains the plain text or structured string the LLM will parse. Metrics track invocation counts, execution duration percentiles, and cost attribution tags. You export these metrics to Prometheus or Datadog via Spring Boot Actuator. The fallback counter increments when timeouts or schema errors force a default response.

Complete Working Example

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.web.reactive.function.client.WebClient;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
import org.springframework.web.servlet.config.annotation.CorsRegistry;
import io.micrometer.core.instrument.MeterRegistry;
import com.fasterxml.jackson.databind.ObjectMapper;
import jakarta.servlet.Filter;
import java.util.Map;

@SpringBootApplication
public class CognigyToolOrchestratorApplication implements WebMvcConfigurer {

    public static void main(String[] args) {
        SpringApplication.run(CognigyToolOrchestratorApplication.class, args);
    }

    @Bean
    public Filter cognigyAuthFilter() {
        return new CognigyAuthFilter(System.getenv("COGNIGY_API_KEY"));
    }

    @Bean
    public ToolSchemaValidator schemaValidator(ObjectMapper mapper) throws Exception {
        return new ToolSchemaValidator("/schemas/default_tool.json", mapper);
    }

    @Bean
    public BackendServiceExecutor backendExecutor(WebClient.Builder builder) {
        return new BackendServiceExecutor(builder, "https://internal-api.example.com");
    }

    @Bean
    public CorrelationContextManager correlationManager() {
        return new CorrelationContextManager();
    }

    @Bean
    public ToolResponseHandler responseHandler(MeterRegistry meterRegistry) {
        return new ToolResponseHandler(meterRegistry);
    }

    @Override
    public void addCorsMappings(CorsRegistry registry) {
        registry.addMapping("/api/v1/llm/tool-callback")
                .allowedOrigins("https://app.cognigy.ai")
                .allowedMethods("POST")
                .allowedHeaders("X-Cognigy-AI-API-Key", "Content-Type");
    }
}

import org.springframework.web.bind.annotation.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.time.Duration;
import java.util.Map;
import java.util.concurrent.CompletableFuture;

@RestController
@RequestMapping("/api/v1/llm/tool-callback")
public class ToolCallbackController {

    private final ToolSchemaValidator schemaValidator;
    private final BackendServiceExecutor backendExecutor;
    private final CorrelationContextManager correlationManager;
    private final ToolResponseHandler responseHandler;
    private final ObjectMapper mapper;

    public ToolCallbackController(ToolSchemaValidator schemaValidator,
                                  BackendServiceExecutor backendExecutor,
                                  CorrelationContextManager correlationManager,
                                  ToolResponseHandler responseHandler,
                                  ObjectMapper mapper) {
        this.schemaValidator = schemaValidator;
        this.backendExecutor = backendExecutor;
        this.correlationManager = correlationManager;
        this.responseHandler = responseHandler;
        this.mapper = mapper;
    }

    @PostMapping
    public ResponseEntity<Map<String, String>> handleToolCall(@RequestBody Map<String, Object> payload) {
        String toolCallId = (String) payload.get("tool_call_id");
        String toolName = (String) payload.get("name");
        Object arguments = payload.get("arguments");

        long start = System.nanoTime();
        try {
            schemaValidator.validate(arguments);
        } catch (IllegalArgumentException e) {
            return responseHandler.buildResponse(toolCallId, "Invalid arguments: " + e.getMessage(), false);
        }

        Map<String, Object> sanitized = sanitizeMap(arguments);
        CompletableFuture<String> context = correlationManager.createContext(toolCallId);

        backendExecutor.executeTool("/v1/exec", sanitized)
                .subscribe(
                        result -> correlationManager.resolveContext(toolCallId, result),
                        error -> correlationManager.failContext(toolCallId, error)
                );

        try {
            String output = context.get(5, java.util.concurrent.TimeUnit.SECONDS);
            Duration duration = Duration.ofNanos(System.nanoTime() - start);
            responseHandler.recordMetrics(duration, toolName, true);
            return responseHandler.buildResponse(toolCallId, output, true);
        } catch (Exception e) {
            Duration duration = Duration.ofNanos(System.nanoTime() - start);
            responseHandler.recordMetrics(duration, toolName, false);
            return responseHandler.buildResponse(toolCallId, "Tool execution timed out or failed. Please retry.", false);
        }
    }

    private Map<String, Object> sanitizeMap(Object arguments) {
        if (arguments instanceof Map) {
            Map<String, Object> map = (Map<String, Object>) arguments;
            map.replaceAll((k, v) -> v instanceof String ? backendExecutor.sanitize((String) v) : v);
            return map;
        }
        return Map.of();
    }
}

This controller receives the Cognigy.AI payload, validates the schema, sanitizes string values, dispatches the backend call asynchronously, and blocks on the correlation context for up to 5 seconds. If the context resolves, it returns the result and logs metrics. If it times out, it returns a structured fallback response and records a failure metric. The service remains thread-safe and non-blocking during the downstream call.

Common Errors & Debugging

Error: 400 Bad Request (Schema Validation Failure)

What causes it: The LLM gateway sends arguments that do not match the JSON Schema definition. Missing required fields, wrong data types, or invalid enum values trigger this.
How to fix it: Align the Cognigy.AI tool definition with your schema. Use schemaValidator.validate() to catch mismatches early. Return a descriptive error in the output field so the LLM can self-correct.
Code showing the fix:

try {
    schemaValidator.validate(arguments);
} catch (IllegalArgumentException e) {
    // Returns 200 with error string in output, allowing LLM to retry
    return responseHandler.buildResponse(toolCallId, "Schema mismatch: " + e.getMessage(), false);
}

Error: 408 Request Timeout (Correlation Context Expiry)

What causes it: The downstream service takes longer than 5 seconds to respond. The context.get() call throws TimeoutException.
How to fix it: Increase the timeout for non-critical tools or implement streaming responses. For synchronous callbacks, return a fallback message and let Cognigy.AI retry. Monitor cognigy.tool.execution.duration percentiles to identify slow endpoints.
Code showing the fix:

try {
    String output = context.get(10, java.util.concurrent.TimeUnit.SECONDS); // Extended timeout
    return responseHandler.buildResponse(toolCallId, output, true);
} catch (java.util.concurrent.TimeoutException e) {
    return responseHandler.buildResponse(toolCallId, "Service processing delay. Retrying.", false);
}

Error: 429 Too Many Requests (Rate Limit Cascade)

What causes it: The downstream API enforces rate limits. Without retry logic, the tool fails immediately.
How to fix it: The WebClient retry policy automatically handles 429 responses with exponential backoff. Ensure your Cognigy.AI agent configures reasonable concurrency limits to avoid overwhelming the webhook.
Code showing the fix:

.retryWhen(retry -> retry
        .filter(throwable -> throwable instanceof WebClientResponseException e && e.getStatusCode().value() == 429)
        .backoff(3, java.time.Duration.ofMillis(500), java.time.Duration.ofSeconds(2))
)

Error: 500 Internal Server Error (Unhandled Exception)

What causes it: Null pointer exceptions, JSON parsing failures, or missing environment variables.
How to fix it: Wrap the entire handler in a global exception resolver. Log the stack trace to your observability platform. Return a generic fallback to prevent LLM hallucination loops.
Code showing the fix:

@ExceptionHandler(Exception.class)
public ResponseEntity<Map<String, String>> handleGenericError(Exception e) {
    return responseHandler.buildResponse("unknown", "Internal processing error. Contact support.", false);
}

Orchestrating NICE Cognigy.AI LLM Tool Calling with Java

Orchestrating NICE Cognigy.AI LLM Tool Calling with Java

What You Will Build

Prerequisites

Authentication Setup

Implementation

Step 1: Parse LLM Gateway Request and Validate JSON Schema

Step 2: Sanitize Parameters and Execute Backend Services

Step 3: Maintain Correlation Contexts for Asynchronous Responses

Step 4: Return Structured Results and Log Invocation Metrics

Complete Working Example

Common Errors & Debugging

Error: 400 Bad Request (Schema Validation Failure)

Error: 408 Request Timeout (Correlation Context Expiry)

Error: 429 Too Many Requests (Rate Limit Cascade)

Error: 500 Internal Server Error (Unhandled Exception)

Official References