Skip to content
Help
HelpTrust & provenance

The Assurance Lab

An adversarial wind tunnel that mutates approved claims into labeled traps and runs them through the real Gate versus a single number-blind judge, reporting decomposed reliability. · 5 min read

The Assurance Lab is a wind tunnel for the checker. It generates adversarial traps from the approved claims, runs them through the real Gate and a single number-blind judge, and reports not one pass-rate but decomposed reliability (pipeline/assurance/harness.py). See it live on the Assurance dashboard.

Four mutation families

Each verified claim is mutated into labeled traps, plus a clean control for measuring false rejects (pipeline/assurance/traps.py):

  • number_drift — change the asserted number (a material trap that looks almost identical lexically, so a similarity judge passes it).
  • unsupported_superlative — inject a #1 / best-in-class claim that no source supports.
  • false_equivalence — append an unsubstantiated competitor comparison.
  • true_but_unsayable — a true statistic wrapped in a guarantee, so it is entailed but impermissible.

Decomposed reliability

Rather than a single score, the harness reports catch-rate at a fixed false-reject rate, broken down by mutation family and by severity (material vs puffery), and it fits an isotonic calibrator on the labeled mix to report ECE and a reliability diagram. The same harness is sliced per channel (email and website) — one lab, not two.

Why the ensemble beats one judge

The baseline is a single number-blind similarity judge — the same coverage lens the Gate uses, but on its own. It is blind to number-drift: a claim that quietly changes a number still looks topically similar, so the baseline passes it. The Gate's numeric and rule lenses catch it. That is the honest, principled reason the diverse ensemble beats a single judge — not a tuned number.

This is headline property P5, proven in tests/test_assurance.py: at a fixed false-reject ceiling the Gate's catch-rate exceeds the single-judge baseline by a wide margin, and the Gate catches all number-drift traps while the baseline catches almost none.