Provenance · Enrichment catalog

What we can enrich a lead with — and the receipt each fact must carry

Between the form and a personalized email or return visit, we can learn a lot about a lead from outside sources. In Provenance, every enriched fact is gated like a claim: it must come from a human-allow-listed source, carry a lawful basis, be fresh, and pass the Enrichment Gate before it can appear in a message. This page catalogs the sources honestly — what they give, what they cost, and the basis question. In the demo, sources are used (simulated) unless marked live; nothing here calls a paid API.

Free / low-cost — between form → email

SourceGives usCostLawful basis / cautionFreshnessIn demo
Email domain parse company domain, B2B vs personal $0 user-provided (low risk) instant used live
DNS / MX / WHOIS mail provider, domain age, registrar $0 public record days used live
Company website scrape products, locations, tech hints $0 ToS varies; respect robots.txt weeks cataloged
News / RSS / Google News recent events, funding, initiatives $0 public; attribute the source hours used live
SEC EDGAR revenue, risk factors (public cos) $0 public record quarterly cataloged
GitHub / job boards tech stack, hiring signals $0 public; ToS days cataloged
LinkedIn public profile title, seniority $0 ToS-restricted — careful weeks cataloged
BuiltWith (free tier) website tech stack free tier vendor ToS weeks cataloged

Paid — between form → email, and return-visit refresh

SourceGives usCostLawful basis / cautionFreshnessIn demo
Clearbit / HubSpot Enrichment firmographics, role, seniority per-enrichment / seat vendor AUP + your basis weeks used
ZoomInfo contact + company, org chart seat + credits ($$$) AUP is strict; record basis weeks cataloged
Apollo.io contact + intent seat + credits AUP weeks cataloged
People Data Labs person/company graph per-record API basis required weeks cataloged
Cognism / Lusha EU-compliant B2B contacts seat + credits GDPR-positioned weeks cataloged
6sense / Demandbase account intent, buying stage platform ($$$$) account-level, lower PII risk days used
Bombora topic surge / intent subscription account-level weekly cataloged

Engagement — email sent → click → website

SourceGives usCostLawful basis / cautionFreshnessIn demo
Email open pixel opened? when? client? $0 disclose tracking; some clients block realtime used
Click tracking (wrapped links) which CTA, when $0 first-party, low risk realtime used
First-party site analytics pages, dwell, return $0 first-party cookie + notice realtime used
Reverse-IP (Clearbit Reveal / KickFire) company of an anonymous visit per-lookup account-level, no PII realtime cataloged

Where the data lives

StorePathHolds
provenance.sqlite/data/provenance.sqliterecipients · form_events · cta_events · verdict_cache · llm_cache
profiles.sqlite/data/profiles.sqlitesynthesized profiles + every fact receipt (source · basis · verdict)
claims/library.json/app/data/demo/claims/library.jsonversioned claim-evidence graph
observe/*.jsonl/app/data/demo/observeappend-only observability event ledger (one per lane)
helix_tenant.yaml/app/rules/helix_tenant.yamlhuman-owned claim policy + enrichment-source policy

All local, synthetic, and gitignored — the demo's $0 / offline / no-PII guarantee holds.

Live example — the profile synthesized for Northwind Health (synthetic mode)

VerdictFactValueSourceBasis
usable company_domainnorthwindhealth.orgemail_domainlegitimate_interest_b2b
usable recent_newsNorthwind Health reported a push to cut administrative costcompany_news_rsspublic_record
usable num_facilities19firmographic_simcontract_vendor_dpa
usable ehr_vendorAllscripts/Veradigmfirmographic_simcontract_vendor_dpa
usable size_band9+ hospital IDNfirmographic_simcontract_vendor_dpa
disclaimer intent_topicreducing length of stayintent_simcontract_vendor_dpa
disclaimer in_markettrueintent_simcontract_vendor_dpa

7 usable · 0 blocked · fact-audit caught 100.0% of un-shippable traps at 0.0% false-block. These facts are readable on the Observatory and via /api/observe/profiles.