Black-Box LLM Forensics

Detect Reasoning Compromise Before It Becomes a Liability

The only system that audits LLM output stability from text alone. No logits. No embeddings. No weights. No runtime access.

Current safety tools filter inputs or require white-box access. We detect when reasoning itself has been destabilized—mapping the full adversarial kill chain from RECON to SUSTAIN, on any model.

252

Reasoning Collapses Detected

31%

Policy Adherence Observed

64-bit

Semantic Precision

Model Access Required

Explore Platform → For Developers

The Blind Spot

Your Safety Stack Has a Blind Spot the Size of the Reasoning Layer

When an LLM produces bad output, you can't tell if it was a single bad token, gradual drift, or sudden collapse. You're debugging blind — and the attack may already be in SUSTAIN phase.

Input filters stop known signatures — not engineered drift

Input guardrails stop known attack signatures at the gate. They cannot detect reasoning drift that emerges mid-chain — after the guard has waved the request through. The model is already compromised.

LLM-as-judge introduces a second attack surface

Asking a language model to audit a language model means the evaluator can be socially engineered by the output it's evaluating. You've added complexity, not safety.

White-box interpretability doesn't work on the models you actually run

TransformerLens requires weights you don't have. The models your organisation deploys — GPT, Claude, Gemini — are black boxes by design. Interpretability tools cannot touch them.

Market Gap

What Exists vs. What's Missing

Five categories of AI safety tools exist. None answer the critical question: was the model's reasoning destabilized?

Current Solutions

What the market offers

Input Filters Misses engineered attacks

Output Filters Content only, not reasoning

Interpretability Requires white-box access

LLM-as-Judge No stability metrics

Hallucination Detection Facts only, not reasoning

→

NCF Audit Runtime

The missing layer

Semantic Likelihood Token-level fit scoring

Stability Index Coherence-velocity ratio

Alignment Gradient Reasoning chain pressure

Black-Box Compatible Any model, any vendor

Post-Hoc Forensics Audit historical logs

Demonstrated Evidence

Multi-Agent LLM Audit — Reasoning Collapse Under Adversarial Load

A commercial LLM's output across 3 medium-complexity prompts was processed by NCF Audit Runtime v5. The audit detected sustained reasoning collapse invisible to standard safety tooling.

Target LLM (Multi-Agent, 3 Prompts) COMPROMISED

252

Reasoning Collapses

108

High-Variance Events

311

Instability Events

-0.276

Mean Stability

NCF Baseline (Stable Model) STABLE

Reasoning Collapses

High-Variance Events

Instability Events

-0.076

Mean Stability

For Development Teams

Observability for LLM Reasoning — Including Deliberation Depth

Distributed tracing gave microservices observability. NCF Audit gives LLM pipelines the same visibility — including deliberation fingerprinting, backtrack scoring, and MCTS pattern detection.

🔍

Reasoning Chain Debugging

Token-level visibility into WHERE reasoning collapsed, not just THAT it did. Stability Basin per token: STABLE / TRANSITIONAL / CHAOTIC.

📊

Version Comparison

Quantifiable stability metrics across fine-tuning iterations. Did v2 improve or degrade? Measured, not guessed.

🧪

Deliberation Fingerprinting

Detect when a model is performing speculative search versus confident generation. Identify MCTS signatures in output geometry — no sampler access required.

🔗

Agent Handoff Integrity

Track semantic coherence across every agent boundary. Turbulence events at handoffs are measured, not inferred.

⚡

Cascade Failure Detection

Identify WHERE the chain broke when one agent's instability propagates downstream. Kill chain phase: RECON → PROBE → EXPLOIT → SUSTAIN.

🛡️

Adversarial Propagation Tracing

Trace prompt injection through your entire pipeline. Adversarial Risk Index and Gradient Force Signal isolate the injection point to a specific token.

✗ Without NCF Audit

Output is wrong
Check each agent's logs manually
Re-run with print statements
Guess which agent broke
Trial and error until fixed

✓ With NCF Audit

Output is wrong
Open stability heatmap
See: "Agent 3 collapsed at token 847"
Drill into Agent 3's reasoning trace
Fix the specific failure point

Applications

Who Uses NCF Audit

From regulatory compliance to incident response, NCF Audit serves teams who need proof their AI behaved correctly.

📋

Compliance Teams

Cryptographically-sealed audit trail per token — SHA256 state integrity hash for EU AI Act Article 9, NIST AI RMF Govern 1.1, and ISO 42001 clause 9.1.

🔒

Security Operations

Full kill-chain reconstruction — RECON → PROBE → EXPLOIT → SUSTAIN — from output text alone. Detect successful jailbreaks without model access.

💼

Insurance Underwriters

Quantifiable risk scores for AI deployments. Stability Basin Classification, Adversarial Risk Index, and composite attack scores — all deterministic and repeatable.

🚨

Incident Response

Token-level forensic autopsy of historical LLM output. Pinpoint the exact token where reasoning collapsed and reconstruct the attack vector post-hoc.

Ready to see inside your LLM's reasoning?

Request a demonstration audit on your production outputs. We'll show you what your current tools are missing.

Request Audit →