HelpStart here
Glossary
Precise, code-grounded definitions of the key terms used throughout Provenance. · 6 min read
These are the load-bearing terms in Provenance, defined the way the system actually uses them. Skim it once and the rest of the app reads more clearly.
Core concepts
| claim | An atomic, single-fact assertion extracted from an approved source — the smallest unit the Gate verifies and the unit a verdict attaches to. |
| evidence | The verbatim source span (character offsets into a source document) that a claim is bound to — what the Gate retrieves and checks the claim against. |
| the ledger | The Gate's output: the per-claim record of each claim's verdict, confidence, source, and reasons — idempotent and cached per (claim_id, source_version, rules_version). |
| verdict | The Gate's ruling on a claim: green = entailed by source and permissible (cited), amber = uncertain / needs a disclaimer (repaired), red = unsupported or impermissible (blocked, never a sendable variant). |
| provenance class | Where a fact came from (declared, behavioral, modeled, broker, OAuth, etc.) — the source dimension that, together with the surface policy, governs how a fact may be used. |
Surface policy
| say / allude / hold | The surface policy on a fact: say = may appear verbatim in copy (e.g. a name they typed), allude = may shape the message but not be recited (e.g. de-anonymized firmographics), hold = may steer selection but can never appear in copy (e.g. modeled income). |
| the truth boundary | The set of moves the system is allowed to make — a variant is admitted only if it has no red claim and respects surface policy; enforced at pool construction, not by a downstream filter. |
The Gate
| the Gate | The verification module: decompose → retrieve → NLI → calibrated ensemble → compliance rules → ledger; idempotent and claim-level cached, and a compliance rule can only make a verdict more restrictive. |
| NLI ensemble | A diverse set of entailment judges ("is this claim entailed by its evidence?") whose agreement is a better-calibrated signal than any single judge, because different lenses fail differently — the LLM judge is one ensemble member, not the verifier. |
The Optimizer
| the Optimizer | A contextual bandit campaign that drives verified variants over recipients per segment, learning from a simulated CTA oracle — it can only ever pull arms in its pool, and the blocked-lie arm is never in it. |
| bandit | The Thompson-sampling policy: it samples each active arm's Beta posterior and pulls the best, updating from observed reward. |
| arm | One verified variant the bandit can choose for a segment; an ungated or red-claim variant is structurally excluded from the arm pool, so it can be selected 0×. |
| regret | The cumulative shortfall versus always playing the best honest arm — it trends toward 0 as the constrained bandit converges, while the unconstrained twin chases the planted lie. |
Drift & Assurance
| drift | A change to a source or a legal-hold flip that invalidates affected claims — Drift re-verifies exactly those claims (no under/over-invalidation) and pauses the dependent variants. |
| trap | An adversarially mutated claim (number drift, unsupported superlative, false equivalence, or a true-but-unsayable guarantee) used to test whether the Gate catches what a shallow judge would miss. |
| catch-rate | The fraction of bad (trap) claims the Gate correctly catches — the headline number the Assurance Lab reports against a single-judge baseline. |
| false-reject | The fraction of clean, approved claims that get wrongly blocked — measured alongside catch-rate so a high catch-rate at zero false-reject is a fair comparison. |
| ECE / calibration | Expected Calibration Error — how far the Gate's stated confidence is from observed accuracy across bins; calibration (an isotonic PAV calibrator) turns a raw entailment score into a probability you can trust, so "0.9" means roughly 90%. |