NLU Confidence Drops During JMeter Concurrent Load Test

Stuck on a strange degradation in bot accuracy when moving from single-threaded testing to high-concurrency load testing.

The environment is Genesys Cloud (US1), using the standard Conversations API endpoints for simulation. The bot is built in Architect with a simple intent structure.

When I run a JMeter script with 10 concurrent users, the NLU returns high confidence scores (0.9+) for all intents. The latency is acceptable, around 200ms. However, when I increase the thread count to 100 concurrent users, sending 50 requests per second, the NLU behavior changes significantly.

Many requests return a low confidence score (0.4) or fail to match any intent, even though the input text is identical and correct. The HTTP 200 response is still received, but the nluScore in the payload is unexpectedly low. There are no 429 rate limit errors returned by the API, which is confusing.

I have checked the Architect trace logs. The flow logic is not the issue; the problem happens before the routing decision. It seems like the NLU engine itself is struggling under the load, or there is some caching mechanism that is being invalidated too quickly.

I am using the standard REST API for simulation, not the WebSocket interface. The request body includes the conversationId and the message text. No custom entities are involved, only system intents and basic keywords.

Is there a known throughput limit for the NLU inference engine that causes accuracy drops? Or is this a configuration issue in the JMeter script? I have tried adding a small delay between requests, but the issue persists as long as the concurrency is above 50.

How can I ensure consistent NLU confidence scores during high-concurrency load tests without hitting hidden throttling limits?