What Provenance is
The problem, the thesis, the five modules, and the two channels — what Provenance actually is. · 5 min read
Welcome to Provenance — a system of record for claims, built around one idea: outreach that can't say what it can't prove. This article is the front door. It explains the problem we're solving, the thesis behind the design, and the pieces you'll meet as you explore.
The problem
Ultra-personalization breaks human review. You can't legally read 100,000 unique messages, so in claims-heavy domains (health-tech, finance, anything regulated) you're forced to choose: generic-but-reviewed, or personal-but-unverified. The real bottleneck isn't writing — it's verification at scale. Once every message is unique, no human can stand behind what each one says.
The thesis
Provenance flips the trade-off: instead of reviewing output after the fact, it makes the system unable to assert anything it can't ground in an approved source. Personalization is then free to optimize — but only inside the truth boundary. An AI move can't win by saying something it can't prove, because an unprovable variant is never admitted to the pool in the first place.
The five modules
Provenance is built from five modules. Each one has its own help article — follow the link to go deeper.
| Claims Library | Approved source documents become atomic claims, each bound to a verbatim source span — a versioned claim–evidence graph. Read more → |
| The Gate | Decompose → retrieve → NLI → calibrated ensemble → compliance rules → a green/amber/red ledger. A blocked claim never becomes a sendable variant. Read more → |
| The Optimizer | A contextual bandit that learns the best verified variant per segment — the ungated arm is out of the pool by construction. Read more → |
| Drift Monitor | On a source change or legal-hold flip, it surgically re-verifies only the affected claims and pauses the dependent variants. Read more → |
| Assurance Lab | An adversarial harness that proves the Gate works — it runs trap claims through the real Gate and beats a single-judge baseline. Read more → |
The two channels
The same Gate governs outreach across two channels: the email campaign (the bandit optimizes verified variants per segment) and the website (a personalized page that renders only Gate-passed claims). The same claim_id gets the same verdict on both channels — the truth boundary doesn't change when the surface does.
What kind of demo this is
Everything here is a deterministic, offline-by-default, 100%-synthetic demo. "Live" means a seed-locked replay — given the global seed, every run is byte-identical. It runs with no API key and sends no real email. There is no real PII: the recipients, profiles, and facts are synthetic.
- Deterministic — seed-locked; re-running the pipeline produces identical artifacts.
- Offline by default — no network and no API key required (an optional "rich" profile can use real models, and an optional live news connector exists, both off by default).
- Synthetic — no real PII; the demo tenant and its data are fabricated for the showcase.