Orchestrating NICE Cognigy.AI LLM Tool Calling with Java
What You Will Build
A production-grade Java Spring Boot service that receives LLM function call payloads from the Cognigy.AI gateway, validates arguments against JSON Schema, executes downstream APIs with strict sanitization and timeout controls, manages asynchronous execution via correlation contexts, returns structured outputs for natural language synthesis, and records invocation metrics for cost attribution. This tutorial covers the complete request lifecycle using Spring WebFlux, WebClient, and Micrometer.
Prerequisites
- Cognigy.AI API key with the
llm:tools:executescope - Java 17 or higher
- Spring Boot 3.2+
- Maven or Gradle
- Dependencies:
spring-boot-starter-webflux,spring-boot-starter-actuator,com.networknt:json-schema-validator:1.0.87,com.fasterxml.jackson.core:jackson-databind
Authentication Setup
Cognigy.AI routes LLM tool calls to a registered webhook endpoint. The platform authenticates incoming requests using the X-Cognigy-AI-API-Key header. You must configure this header in the Cognigy.AI LLM agent settings and validate it on ingress. The llm:tools:execute scope is required for the gateway to invoke your endpoint.
import org.springframework.web.filter.OncePerRequestFilter;
import jakarta.servlet.FilterChain;
import jakarta.servlet.ServletException;
import jakarta.servlet.http.HttpServletRequest;
import jakarta.servlet.http.HttpServletResponse;
import java.io.IOException;
public class CognigyAuthFilter extends OncePerRequestFilter {
private final String expectedApiKey;
public CognigyAuthFilter(String expectedApiKey) {
this.expectedApiKey = expectedApiKey;
}
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
throws ServletException, IOException {
String apiKey = request.getHeader("X-Cognigy-AI-API-Key");
if (apiKey == null || !apiKey.equals(expectedApiKey)) {
response.setStatus(HttpServletResponse.SC_UNAUTHORIZED);
response.setContentType("application/json");
response.getWriter().write("{\"error\": \"Invalid or missing X-Cognigy-AI-API-Key header. Scope llm:tools:execute required.\"}");
return;
}
filterChain.doFilter(request, response);
}
}
This filter rejects unauthenticated traffic immediately. Cognigy.AI will retry failed webhooks with exponential backoff, so returning 401 prevents unnecessary payload processing. You should store the expected API key in application.yml under cognigy.ai.webhook-secret and inject it via @Value.
Implementation
Step 1: Parse LLM Gateway Request and Validate JSON Schema
The Cognigy.AI gateway sends a JSON payload containing the tool_call_id, name, and arguments. You must validate arguments against a pre-defined JSON Schema to prevent malformed data from reaching downstream services. Schema validation occurs before any business logic executes.
import com.networknt.schema.JsonSchema;
import com.networknt.schema.JsonSchemaFactory;
import com.networknt.schema.SpecVersion;
import com.networknt.schema.ValidationMessage;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.io.InputStream;
import java.util.Set;
import java.util.stream.Collectors;
public class ToolSchemaValidator {
private final JsonSchema schema;
private final ObjectMapper mapper;
public ToolSchemaValidator(String schemaResourcePath, ObjectMapper mapper) throws Exception {
this.mapper = mapper;
InputStream is = ToolSchemaValidator.class.getResourceAsStream(schemaResourcePath);
this.schema = JsonSchemaFactory.getInstance(SpecVersion.VersionFlag.V7)
.getSchema(is);
}
public void validate(Object arguments) throws Exception {
Set<ValidationMessage> errors = schema.validate(mapper.valueToTree(arguments));
if (!errors.isEmpty()) {
String message = errors.stream()
.map(ValidationMessage::getMessage)
.collect(Collectors.joining("; "));
throw new IllegalArgumentException("Schema validation failed: " + message);
}
}
}
The com.networknt library loads Draft-7 schemas directly from classpath resources. You place a file like schemas/get_order_status.json in src/main/resources. The validator throws an IllegalArgumentException on mismatch, which the controller catches and maps to 400 Bad Request. Schema validation prevents injection of unexpected object shapes and reduces downstream parsing errors.
Step 2: Sanitize Parameters and Execute Backend Services
LLM-generated arguments often contain trailing whitespace, control characters, or unexpected casing. You must sanitize inputs before sending them to internal APIs. This step also establishes the WebClient call with explicit timeouts and retry logic for 429 responses.
import org.springframework.web.reactive.function.client.WebClient;
import org.springframework.web.reactive.function.client.WebClientResponseException;
import reactor.core.publisher.Mono;
import java.util.regex.Pattern;
public class BackendServiceExecutor {
private final WebClient webClient;
private static final Pattern DANGEROUS_CHARS = Pattern.compile("[^a-zA-Z0-9@._\\- ]");
public BackendServiceExecutor(WebClient.Builder builder, String baseUrl) {
this.webClient = builder.baseUrl(baseUrl).build();
}
public String sanitize(String input) {
if (input == null) return null;
return DANGEROUS_CHARS.matcher(input).replaceAll("")
.trim()
.replaceFirst("(?s)^.{250}$", "$0"); // Truncate to 250 chars
}
public Mono<String> executeTool(String endpoint, Object sanitizedPayload) {
return webClient.post()
.uri(endpoint)
.header("Content-Type", "application/json")
.bodyValue(sanitizedPayload)
.retrieve()
.bodyToMono(String.class)
.timeout(java.time.Duration.ofSeconds(5))
.retryWhen(retry -> retry
.filter(throwable -> throwable instanceof WebClientResponseException e && e.getStatusCode().value() == 429)
.backoff(3, java.time.Duration.ofMillis(500), java.time.Duration.ofSeconds(2))
);
}
}
The sanitize method strips non-alphanumeric characters except safe exceptions, trims whitespace, and enforces a 250-character limit. This prevents log injection and buffer overflow risks in downstream systems. The WebClient call enforces a 5-second timeout and automatically retries 429 Too Many Requests responses with exponential backoff. You configure the retry policy to ignore 500 errors because retrying server faults usually wastes resources.
Step 3: Maintain Correlation Contexts for Asynchronous Responses
Cognigy.AI expects a synchronous HTTP response acknowledging receipt, but the actual tool execution may span multiple downstream calls or require background processing. You maintain a correlation context using CompletableFuture and a thread-safe map keyed by tool_call_id. This allows the gateway to poll or wait while your service processes the request asynchronously.
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.TimeUnit;
public class CorrelationContextManager {
private final ConcurrentHashMap<String, CompletableFuture<String>> pendingContexts = new ConcurrentHashMap<>();
private static final int MAX_CONTEXTS = 10000;
public CompletableFuture<String> createContext(String toolCallId) {
if (pendingContexts.size() >= MAX_CONTEXTS) {
throw new IllegalStateException("Correlation context limit reached. Clear stale entries.");
}
return pendingContexts.computeIfAbsent(toolCallId, id -> new CompletableFuture<>());
}
public void resolveContext(String toolCallId, String result) {
CompletableFuture<String> future = pendingContexts.remove(toolCallId);
if (future != null) {
future.complete(result);
}
}
public void failContext(String toolCallId, Throwable error) {
CompletableFuture<String> future = pendingContexts.remove(toolCallId);
if (future != null) {
future.completeExceptionally(error);
}
}
public void cleanupStaleContexts() {
pendingContexts.entrySet().removeIf(entry -> {
try {
entry.getValue().get(30, TimeUnit.SECONDS);
return true; // Already completed
} catch (Exception e) {
return false; // Still pending
}
});
}
}
The ConcurrentHashMap stores pending futures. When the downstream call completes, resolveContext fulfills the future and removes the entry. This pattern decouples the HTTP response lifecycle from the tool execution lifecycle. In production, replace the in-memory map with Redis HSET and HGET to survive pod restarts and distribute state across replicas. The cleanupStaleContexts method runs via a scheduled task to prevent memory leaks from abandoned calls.
Step 4: Return Structured Results and Log Invocation Metrics
Cognigy.AI requires a specific JSON structure for tool responses: {"tool_call_id": "...", "output": "..."}. You must format the result exactly, handle fallback responses when timeouts occur, and record metrics for cost attribution. Micrometer provides Timer, Counter, and DistributionSummary registries.
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Timer;
import io.micrometer.core.instrument.Counter;
import org.springframework.http.ResponseEntity;
import java.time.Duration;
import java.util.Map;
public class ToolResponseHandler {
private final MeterRegistry meterRegistry;
private final Counter toolInvocationCounter;
private final Timer toolExecutionTimer;
public ToolResponseHandler(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.toolInvocationCounter = Counter.builder("cognigy.tool.invocations")
.tag("status", "success")
.register(meterRegistry);
this.toolExecutionTimer = Timer.builder("cognigy.tool.execution.duration")
.publishPercentileHistogram()
.register(meterRegistry);
}
public ResponseEntity<Map<String, String>> buildResponse(String toolCallId, String output, boolean success) {
Map<String, String> body = Map.of(
"tool_call_id", toolCallId,
"output", output
);
if (!success) {
toolInvocationCounter.id().tag("status", "fallback").increment();
} else {
toolInvocationCounter.increment();
}
return ResponseEntity.ok(body);
}
public void recordMetrics(Duration duration, String toolName, boolean success) {
toolExecutionTimer.record(duration);
meterRegistry.counter("cognigy.tool.cost.attribution",
"tool", toolName,
"status", success ? "success" : "fallback")
.increment();
}
}
The response handler constructs the exact JSON structure Cognigy.AI expects for natural language synthesis. The output field contains the plain text or structured string the LLM will parse. Metrics track invocation counts, execution duration percentiles, and cost attribution tags. You export these metrics to Prometheus or Datadog via Spring Boot Actuator. The fallback counter increments when timeouts or schema errors force a default response.
Complete Working Example
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.web.reactive.function.client.WebClient;
import org.springframework.web.servlet.config.annotation.WebMvcConfigurer;
import org.springframework.web.servlet.config.annotation.CorsRegistry;
import io.micrometer.core.instrument.MeterRegistry;
import com.fasterxml.jackson.databind.ObjectMapper;
import jakarta.servlet.Filter;
import java.util.Map;
@SpringBootApplication
public class CognigyToolOrchestratorApplication implements WebMvcConfigurer {
public static void main(String[] args) {
SpringApplication.run(CognigyToolOrchestratorApplication.class, args);
}
@Bean
public Filter cognigyAuthFilter() {
return new CognigyAuthFilter(System.getenv("COGNIGY_API_KEY"));
}
@Bean
public ToolSchemaValidator schemaValidator(ObjectMapper mapper) throws Exception {
return new ToolSchemaValidator("/schemas/default_tool.json", mapper);
}
@Bean
public BackendServiceExecutor backendExecutor(WebClient.Builder builder) {
return new BackendServiceExecutor(builder, "https://internal-api.example.com");
}
@Bean
public CorrelationContextManager correlationManager() {
return new CorrelationContextManager();
}
@Bean
public ToolResponseHandler responseHandler(MeterRegistry meterRegistry) {
return new ToolResponseHandler(meterRegistry);
}
@Override
public void addCorsMappings(CorsRegistry registry) {
registry.addMapping("/api/v1/llm/tool-callback")
.allowedOrigins("https://app.cognigy.ai")
.allowedMethods("POST")
.allowedHeaders("X-Cognigy-AI-API-Key", "Content-Type");
}
}
import org.springframework.web.bind.annotation.*;
import com.fasterxml.jackson.databind.ObjectMapper;
import java.time.Duration;
import java.util.Map;
import java.util.concurrent.CompletableFuture;
@RestController
@RequestMapping("/api/v1/llm/tool-callback")
public class ToolCallbackController {
private final ToolSchemaValidator schemaValidator;
private final BackendServiceExecutor backendExecutor;
private final CorrelationContextManager correlationManager;
private final ToolResponseHandler responseHandler;
private final ObjectMapper mapper;
public ToolCallbackController(ToolSchemaValidator schemaValidator,
BackendServiceExecutor backendExecutor,
CorrelationContextManager correlationManager,
ToolResponseHandler responseHandler,
ObjectMapper mapper) {
this.schemaValidator = schemaValidator;
this.backendExecutor = backendExecutor;
this.correlationManager = correlationManager;
this.responseHandler = responseHandler;
this.mapper = mapper;
}
@PostMapping
public ResponseEntity<Map<String, String>> handleToolCall(@RequestBody Map<String, Object> payload) {
String toolCallId = (String) payload.get("tool_call_id");
String toolName = (String) payload.get("name");
Object arguments = payload.get("arguments");
long start = System.nanoTime();
try {
schemaValidator.validate(arguments);
} catch (IllegalArgumentException e) {
return responseHandler.buildResponse(toolCallId, "Invalid arguments: " + e.getMessage(), false);
}
Map<String, Object> sanitized = sanitizeMap(arguments);
CompletableFuture<String> context = correlationManager.createContext(toolCallId);
backendExecutor.executeTool("/v1/exec", sanitized)
.subscribe(
result -> correlationManager.resolveContext(toolCallId, result),
error -> correlationManager.failContext(toolCallId, error)
);
try {
String output = context.get(5, java.util.concurrent.TimeUnit.SECONDS);
Duration duration = Duration.ofNanos(System.nanoTime() - start);
responseHandler.recordMetrics(duration, toolName, true);
return responseHandler.buildResponse(toolCallId, output, true);
} catch (Exception e) {
Duration duration = Duration.ofNanos(System.nanoTime() - start);
responseHandler.recordMetrics(duration, toolName, false);
return responseHandler.buildResponse(toolCallId, "Tool execution timed out or failed. Please retry.", false);
}
}
private Map<String, Object> sanitizeMap(Object arguments) {
if (arguments instanceof Map) {
Map<String, Object> map = (Map<String, Object>) arguments;
map.replaceAll((k, v) -> v instanceof String ? backendExecutor.sanitize((String) v) : v);
return map;
}
return Map.of();
}
}
This controller receives the Cognigy.AI payload, validates the schema, sanitizes string values, dispatches the backend call asynchronously, and blocks on the correlation context for up to 5 seconds. If the context resolves, it returns the result and logs metrics. If it times out, it returns a structured fallback response and records a failure metric. The service remains thread-safe and non-blocking during the downstream call.
Common Errors & Debugging
Error: 400 Bad Request (Schema Validation Failure)
What causes it: The LLM gateway sends arguments that do not match the JSON Schema definition. Missing required fields, wrong data types, or invalid enum values trigger this.
How to fix it: Align the Cognigy.AI tool definition with your schema. Use schemaValidator.validate() to catch mismatches early. Return a descriptive error in the output field so the LLM can self-correct.
Code showing the fix:
try {
schemaValidator.validate(arguments);
} catch (IllegalArgumentException e) {
// Returns 200 with error string in output, allowing LLM to retry
return responseHandler.buildResponse(toolCallId, "Schema mismatch: " + e.getMessage(), false);
}
Error: 408 Request Timeout (Correlation Context Expiry)
What causes it: The downstream service takes longer than 5 seconds to respond. The context.get() call throws TimeoutException.
How to fix it: Increase the timeout for non-critical tools or implement streaming responses. For synchronous callbacks, return a fallback message and let Cognigy.AI retry. Monitor cognigy.tool.execution.duration percentiles to identify slow endpoints.
Code showing the fix:
try {
String output = context.get(10, java.util.concurrent.TimeUnit.SECONDS); // Extended timeout
return responseHandler.buildResponse(toolCallId, output, true);
} catch (java.util.concurrent.TimeoutException e) {
return responseHandler.buildResponse(toolCallId, "Service processing delay. Retrying.", false);
}
Error: 429 Too Many Requests (Rate Limit Cascade)
What causes it: The downstream API enforces rate limits. Without retry logic, the tool fails immediately.
How to fix it: The WebClient retry policy automatically handles 429 responses with exponential backoff. Ensure your Cognigy.AI agent configures reasonable concurrency limits to avoid overwhelming the webhook.
Code showing the fix:
.retryWhen(retry -> retry
.filter(throwable -> throwable instanceof WebClientResponseException e && e.getStatusCode().value() == 429)
.backoff(3, java.time.Duration.ofMillis(500), java.time.Duration.ofSeconds(2))
)
Error: 500 Internal Server Error (Unhandled Exception)
What causes it: Null pointer exceptions, JSON parsing failures, or missing environment variables.
How to fix it: Wrap the entire handler in a global exception resolver. Log the stack trace to your observability platform. Return a generic fallback to prevent LLM hallucination loops.
Code showing the fix:
@ExceptionHandler(Exception.class)
public ResponseEntity<Map<String, String>> handleGenericError(Exception e) {
return responseHandler.buildResponse("unknown", "Internal processing error. Contact support.", false);
}