Black-Box LLM Forensics

Detect Reasoning Compromise Before It Becomes a Liability

The only system that audits LLM output stability from text alone. No logits. No embeddings. No weights. No runtime access.

Current safety tools filter inputs or require white-box access. We detect when reasoning itself has been destabilized—mapping the full adversarial kill chain from RECON to SUSTAIN, on any model.

252
Reasoning Collapses Detected
31%
Policy Adherence Observed
64-bit
Semantic Precision
0
Model Access Required

Your Safety Stack Has a Blind Spot the Size of the Reasoning Layer

When an LLM produces bad output, you can't tell if it was a single bad token, gradual drift, or sudden collapse. You're debugging blind — and the attack may already be in SUSTAIN phase.

Input filters stop known signatures — not engineered drift

Input guardrails stop known attack signatures at the gate. They cannot detect reasoning drift that emerges mid-chain — after the guard has waved the request through. The model is already compromised.

LLM-as-judge introduces a second attack surface

Asking a language model to audit a language model means the evaluator can be socially engineered by the output it's evaluating. You've added complexity, not safety.

White-box interpretability doesn't work on the models you actually run

TransformerLens requires weights you don't have. The models your organisation deploys — GPT, Claude, Gemini — are black boxes by design. Interpretability tools cannot touch them.

What Exists vs. What's Missing

Five categories of AI safety tools exist. None answer the critical question: was the model's reasoning destabilized?

Current Solutions

What the market offers
Input Filters Misses engineered attacks
Output Filters Content only, not reasoning
Interpretability Requires white-box access
LLM-as-Judge No stability metrics
Hallucination Detection Facts only, not reasoning

NCF Audit Runtime

The missing layer
Semantic Likelihood Token-level fit scoring
Stability Index Coherence-velocity ratio
Alignment Gradient Reasoning chain pressure
Black-Box Compatible Any model, any vendor
Post-Hoc Forensics Audit historical logs

Multi-Agent LLM Audit — Reasoning Collapse Under Adversarial Load

A commercial LLM's output across 3 medium-complexity prompts was processed by NCF Audit Runtime v5. The audit detected sustained reasoning collapse invisible to standard safety tooling.

Target LLM (Multi-Agent, 3 Prompts) COMPROMISED
252
Reasoning Collapses
108
High-Variance Events
311
Instability Events
-0.276
Mean Stability
NCF Baseline (Stable Model) STABLE
0
Reasoning Collapses
1
High-Variance Events
0
Instability Events
-0.076
Mean Stability

Observability for LLM Reasoning — Including Deliberation Depth

Distributed tracing gave microservices observability. NCF Audit gives LLM pipelines the same visibility — including deliberation fingerprinting, backtrack scoring, and MCTS pattern detection.

🔍

Reasoning Chain Debugging

Token-level visibility into WHERE reasoning collapsed, not just THAT it did. Stability Basin per token: STABLE / TRANSITIONAL / CHAOTIC.

📊

Version Comparison

Quantifiable stability metrics across fine-tuning iterations. Did v2 improve or degrade? Measured, not guessed.

🧪

Deliberation Fingerprinting

Detect when a model is performing speculative search versus confident generation. Identify MCTS signatures in output geometry — no sampler access required.

🔗

Agent Handoff Integrity

Track semantic coherence across every agent boundary. Turbulence events at handoffs are measured, not inferred.

Cascade Failure Detection

Identify WHERE the chain broke when one agent's instability propagates downstream. Kill chain phase: RECON → PROBE → EXPLOIT → SUSTAIN.

🛡️

Adversarial Propagation Tracing

Trace prompt injection through your entire pipeline. Adversarial Risk Index and Gradient Force Signal isolate the injection point to a specific token.

✗ Without NCF Audit

  • Output is wrong
  • Check each agent's logs manually
  • Re-run with print statements
  • Guess which agent broke
  • Trial and error until fixed

✓ With NCF Audit

  • Output is wrong
  • Open stability heatmap
  • See: "Agent 3 collapsed at token 847"
  • Drill into Agent 3's reasoning trace
  • Fix the specific failure point

Who Uses NCF Audit

From regulatory compliance to incident response, NCF Audit serves teams who need proof their AI behaved correctly.

📋

Compliance Teams

Cryptographically-sealed audit trail per token — SHA256 state integrity hash for EU AI Act Article 9, NIST AI RMF Govern 1.1, and ISO 42001 clause 9.1.

🔒

Security Operations

Full kill-chain reconstruction — RECON → PROBE → EXPLOIT → SUSTAIN — from output text alone. Detect successful jailbreaks without model access.

💼

Insurance Underwriters

Quantifiable risk scores for AI deployments. Stability Basin Classification, Adversarial Risk Index, and composite attack scores — all deterministic and repeatable.

🚨

Incident Response

Token-level forensic autopsy of historical LLM output. Pinpoint the exact token where reasoning collapsed and reconstruct the attack vector post-hoc.

Ready to see inside your LLM's reasoning?

Request a demonstration audit on your production outputs. We'll show you what your current tools are missing.

Request Audit →