Say / allude / hold — the surface policy
Every fact carries a provenance class and a surface policy; a HOLD fact may steer selection but may never be recited in copy — the privacy core of the system. · 4 min read
Knowing a fact and being allowed to say it are two different things. Every fact in the system carries both a provenance class (where it came from) and a surface policy (how it may be used). The surface policy is the privacy core: it decides whether a fact may be spoken, merely hinted at, or held silent. It is defined as SurfacePolicy in pipeline/customer/schemas.py and reused across the personalization signal catalog (pipeline/personalization/signals.py).
The three policies
| say | Reference it directly. May be inlined in mass copy, with its receipt. |
| allude | Never state it; raise an adjacent topic. May steer which claim/class is featured, but is never recited literally. |
| hold | Never surface and never allude. May not even steer selection. |
These aren't loose conventions — they're derived properties on the fact, so the 1:1 view and the mass-copy view read from one source of truth. A CustomerFact is mass_inlineable only when its policy is say ("only facts the customer stated to us directly may be put in mass copy verbatim"), and selection_ok for any policy except hold ("allude/say may silently steer which claim we feature; hold may not").
A HOLD fact may steer, never recite
This is the whole point. The signal catalog assigns the responsible default per provenance class: things the visitor declared (their name, email, stated goal) are say; passively observed or inferred signals (neighborhood-level location, device economics, battery level) are typically hold; a few mid-risk signals (approximate city, languages-as-second-language) sit at allude. A held fact can shift which already-verified claim leads — it can never appear as a sentence in the copy.
Provenance classes — where a fact comes from
- observed — collected by us passively at page load (HTTP request + a little JavaScript)
- first_party — our own logs / CRM; behavior we recorded, data we hold
- declared — the visitor typed it / handed it to us (the cleanest data there is)
- oauth — granted via "Sign in with Google" scopes
- broker — purchased from a third-party data broker / append
- identity_graph — cross-device + offline match bought from an identity-resolution vendor
basis per source (for example public_record for news, legitimate_interest_b2b for the email domain, contract_vendor_dpa for simulated vendors); on the customer side, each CustomerFact carries its basis alongside its surface policy. A fact you can't justify isn't surfaced.