open source · MIT

Your agent said
it sent the email.
It didn't.

Catch hallucinated tool-call arguments before they hit production. Auto-repair in a single round-trip, with under a millisecond of overhead.

pip install cruxialcopied! · Python 3.10+ · MIT

cruxial · intercepting live
any LLM
OpenAI Anthropic Azure OpenAI LiteLLM
Native adapters for the first three. LiteLLM unlocks 100+ more providers (Gemini, Mistral, Cohere, Bedrock, Groq, Together, …).
any MCP
github slack notion salesforce airtable supabase kubernetes playwright
Plug in any MCP-compliant server — public, private, or one you wrote this morning.
0
silent passes across 342 live LLM tool calls
Across 2 runs · Wilson 95% CI: 0–1.1%
5.85%
schema-violation rate caught on Azure gpt-4o
Pooled across 342 calls · 95% CI: 3.8–8.9%
877
real production schemas validated synthetically
52 public MCP servers · 26 domains
<1ms
p99 cruxial overhead per call
243 tests · MIT licensed · fail-open
numbers, not vibes

Real runs. Public servers.
Clone it and get your own number.

Real benchmarks on 51 public MCP servers. Every number below is reproducible from the repo.

Benchmark Model Intercept rate One-shot repair Sample
Real public MCP servers, pooled across 2 runs
github · kubernetes · notion · salesforce · airtable · slack · ms-teams · atlassian · playwright · firecrawl · hubspot · zendesk · supabase · +38 more
Azure gpt-4o 5.85%95% CI 3.8 – 8.9% 90.0%18 / 20 intercepts 342 calls
51 servers
603 tools
352 prompts
Constraint-heavy production schemas
enums · formats · regex · nested objects · datetime ranges
Azure gpt-4o 17.1% 66.7% 15 tools
70 prompts
Same schemas, smaller model
what changing model tier alone does on the same prompts
Azure gpt-5-mini-2 1.4%92% fewer than gpt-4o 100% 74 calls
same 15 tools
Simple-schema control group
filesystem, memory, time, fetch (the easy half of MCP)
Azure gpt-4o 0.0% 7 servers
25 prompts
Robustness audit (no LLM)
synthetic violation classification across the full real-world schema corpus
classifier only 100%rejection · 0 silent passes 877 real schemas
1,947 violations
"
Zero silent passes across 342 live LLM tool calls. Twice. Same corpus, same result on the one metric that matters for a validation layer.
pooled across 2 independent runs · gpt-4o · 51 real public MCP servers · Wilson 95% upper bound: 1.1%
reproduce yourself pip install cruxial && python examples/azure_mcp_suite.pycopied!
the problem

You've shipped the integration.
It works in staging. Then production.

Three weeks later a customer tells you the email was never sent. Your logs show HTTP 200.

01 · wrong arguments
hallucination

The model invents its own arguments

It passes "sample_id" instead of the real ID. Or an integer where an email is required. The tool fails silently. The model writes "done." Your user never gets what they asked for.

02 · regressions
the fix loop

Every fix breaks something else

You patch the retry logic. The integration works. Two days later a different tool fails. You spend another afternoon on a problem that should be solved once — at the layer level, not per-feature.

03 · silent errors
no signal

Failures your logs will never show

The agent wrote "done" without calling the tool. No error. No log entry. HTTP 200. You find out three days later from a customer — after the damage is already done.

integration

Add it in 30 seconds.
Remove it just as fast.

One import. Every tool call validated from that moment on. Nothing to configure.

before — failing silently
# your existing code               
response = client.chat(
  model="gpt-4o",
  tools=my_tools,
  messages=messages
)

# wrong args pass through.
# tool never called.
# user finds out last.
after — caught before it fires
# one import, nothing else changes
from cruxial import guard

response = client.chat(
  model="gpt-4o",
  tools=guard(my_tools),
  messages=messages
)

# validated before execution.
# bad args corrected + retried.
# every failure logged.
mechanics

Five things happen
on every tool call.

Valid calls pass through in under 50ms. Invalid calls get caught, corrected, and logged before the tool ever fires.

01
Intercept
always · zero overhead
02
Validate
before execution
03
Correct
on failure
04
Retry
on failure
05
Log
always
01 / 05
Intercept
Wraps your tool list in one function call. Sits between the model response and your execution layer. Zero changes to your tool definitions, prompts, or app logic.
guard(my_tools) → validation layer → execution
interception log · 14ms ago
{
  "tool": "send_email",
  "status": "intercepted",
  "original_args": { "recipient": 12345, "subject": "Follow up" },
  "failure": "type_error: recipient must be email string, got int",
  "retry_args": { "recipient": "[email protected]", "subject": "Follow up" },
  "retry_outcome": "success",
  "latency_added_ms": 340
}
what the community is reporting

It's not just us saying this.

Three numbers from independent research. One line from every developer who's shipped an agent.

~0%
GPT-4's success rate on real-world multi-step agent tasks. Most failures trace back to tool calls, not reasoning.
τ-bench · Anthropic + Princeton · 2024 · arxiv 2406.12045
0–19%
Accuracy drop for top frontier models when the user simply rephrases the same request.
Berkeley Function Calling Leaderboard · gorilla.cs.berkeley.edu
0%
Schema reliability OpenAI added in Structured Outputs. They shipped it because the unenforced baseline wasn't good enough.
openai.com · Structured Outputs launch · Aug 2024
"
The hardest part of running an LLM agent in production isn't getting it to call the right tool. It's everything after: the call succeeds, returns 200, and silently does the wrong thing.
recurring sentiment across developer discussions · Hacker News · 2024–2025
deployment models

Open Core for dev.
Cloud for scale.

The SDK is free and open-source. The hosted dashboard is what we manage.

cruxial core
Free
MIT licensed · Self-Hosted
  • JSON Schema validation on every tool call
  • Auto-repair via one corrective round-trip to the model
  • Fail-open — never crashes your app
  • Local SQLite telemetry + cruxial stats CLI
View GitHub Repository →
cruxial cloud
Waitlist
Fully Managed Middleware Service
  • Everything in Cruxial Core
  • Hosted dashboard for cross-app interception analytics
  • Schema drift detection across your tool registry
  • Slack, webhook, and PagerDuty alerts

We'll only email you about Cruxial Cloud. No newsletters.

Know your interception rate
in 5 minutes.

Free forever. No credit card. Works on your existing code without changes.

pip install cruxial → copied!