Provenance
100% synthetic persona — no real PII

How personalized can a website get — and where does each fact come from?

One visitor, Maya Chen. Climb the tiers to see how much a site can know — from an anonymous landing page, to a Google sign-in, to a purchased data append. Every fact is tagged with its provenance: collected by us, declared, granted via Google, or bought. Then flip tasteful → creepy to see the difference a surface policy makes.

0 · none Anonymous landing 1 · cookie Returning visitor 2 · email Identified — gave email 3 · google Signed in with Google 4 · google + Purchased append 5 · google + Existing customer 6 · google Full identity graph

We've seen this browser before (a cookie, or a device fingerprint that survives clearing it).

Tasteful Creepy 👁 surface policy switched OFF — every available fact, printed, tagged with where it came from
18of 52 facts knowable at this tier
18printed on the page
18that a policy would have hidden
2bought from third parties

We've been expecting you, Maya. 👁

▼ Anonymous landing
You're in Austin, Texas. Collected ourselves (passive)
MaxMind GeoIP2 / IP2Location · $0 (free GeoIP DB)
…within about two miles of the 78722 ZIP — the Mueller neighborhood. Collected ourselves (passive)
IP geolocation (ZIP-level) · $0
On a residential Spectrum line — so you're at home, not the office. Collected ourselves (passive)
IP → ISP / ASN lookup · $0
If this were an office IP we'd already know your employer — yours is residential, so: working from home. Collected ourselves (passive)
Clearbit Reveal / KickFire · ~$0.01 / lookup
On an iPhone 13 running iOS 17.4, in Safari. Collected ourselves (passive)
User-Agent + Client Hints · $0
A two-year-old, non-Pro phone — we can guess your budget from your hardware. Collected ourselves (passive)
Model → release-date + price tier · $0
Dark mode, battery-saver on, one tab open. Collected ourselves (passive)
JS: screen, prefers-color-scheme · $0
Your battery's at 18% and falling — better make this quick. Collected ourselves (passive)
Battery Status API · $0
It's 11:47 PM on a Tuesday where you are — a late-night browse. Collected ourselves (passive)
JS Date + IANA timezone · $0
Your browser lists Chinese as a second language — a hint about who you are. Collected ourselves (passive)
Accept-Language header · $0
You came from our Instagram ad — the 'from barista to AI engineer' creative. Collected ourselves (passive)
Referrer + UTM parameters · $0
Even with cookies off, this exact device hashes to fp_9f3a — we'll recognize you on your next 'anonymous' visit. Collected ourselves (passive)
FingerprintJS · ~$0.005 / match (pro tier)
▼ Returning visitor
This is your 4th visit in 6 days. Our own database
First-party cookie / fingerprint · $0
You keep returning to the night cohort and the pricing page. Our own database
Site analytics · $0
You read 90% of the financing FAQ and lingered two minutes. Our own database
Scroll-depth + dwell tracking · $0
You started the application Sunday and didn't finish it. Our own database
Form analytics · $0
You're in our Meta retargeting pool — that's why we've chased you across Instagram seven times. Purchased (data broker append)
Meta Pixel / Google Ads tag · ad spend
We can see you comparison-shopped two competitors this week. Purchased (data broker append)
DMP / cookie-sync (Lotame, Oracle BlueKai) · $$ subscription
😬 Every line above is real data with a real way to obtain it. The only thing separating this from the tasteful page is the surface policy — none of this is *new* collection.

Where the data comes from

Six provenance classes. The count is how many facts each one yields at this tier. See the full paid/free source list on the enrichment catalog.

Collected ourselves (passive) 12
The HTTP request + a few lines of JavaScript, the instant the page loads. No login, no form, often no cookie. You give it to every site you visit.
💵 $0 (optionally a few ¢ for a reverse-IP or fingerprint lookup)
Our own database 4
Behaviour we logged on previous visits (cookies / device match) and records in our CRM. We already own it — no third party involved.
💵 $0 (we collected it)
Declared by the visitor 0
A field they filled in — a newsletter box, a lead form, an account signup. The cleanest data there is: they chose to tell us.
💵 $0
Granted via Google sign-in 0
"Sign in with Google" hands us OAuth scopes. Profile is one click; but the consent screen can also grant calendar, Gmail metadata, YouTube, contacts and location history — an enormous jump for one tap.
💵 $0 (the price is the permission)
Purchased (data broker append) 2
Match a name+email+address against a data broker (Acxiom, Experian, Epsilon, Oracle Data Cloud) and append what they've compiled: income, life events, demographics, propensities. Billions of attributes, sold per record.
💵 ~$0.05–0.25 per record matched
Identity graph (cross-device + offline) 0
An identity-resolution vendor (LiveRamp, Tapad) ties this browser to your other devices, your household, and offline data sold by retailers — loyalty cards, credit-card panels, smart-TV viewing. The complete picture.
💵 $$ platform subscription

Unlocked by reaching “Returning visitor”

The provenance ledger — every fact, where it's from, what we did with it

said stated literally · steer shapes selection, not stated · withheld known, held back · shown printed (creepy) · locked needs a higher tier

Data pointValue (synthetic)Where fromCost CreepyPolicyHere
Approximate location
Resolve the visitor's IP address to a city.
Austin, Texas (metro) Collected ourselves (passive)
MaxMind GeoIP2 / IP2Location
$0 (free GeoIP DB) ●● allude shown
Neighborhood
Same IP lookup, at neighborhood resolution.
~2 mi of ZIP 78722 (Mueller) Collected ourselves (passive)
IP geolocation (ZIP-level)
$0 ●●● hold shown
Connection
Map the IP to its owning network and connection type.
Spectrum cable · residential Collected ourselves (passive)
IP → ISP / ASN lookup
$0 ●● allude shown
Company (reverse-IP)
Match the IP to a company's network to de-anonymize B2B visits.
(residential — no company match) Collected ourselves (passive)
Clearbit Reveal / KickFire
~$0.01 / lookup ●● allude shown
Device & OS
The browser announces device, OS and version on every request.
Apple iPhone 13 · iOS 17.4 · Safari Collected ourselves (passive)
User-Agent + Client Hints
$0 allude shown
Device economics
Infer age and price bracket from the device model.
≈2-yr-old, non-Pro model Collected ourselves (passive)
Model → release-date + price tier
$0 ●●● hold shown
Screen & theme
A few JS properties read on load.
390×844 · dark mode · battery-saver on Collected ourselves (passive)
JS: screen, prefers-color-scheme
$0 allude shown
Battery level
JS reads the device battery level and charging state.
18% and dropping (not charging) Collected ourselves (passive)
Battery Status API
$0 ●●●● hold shown
Local time
The browser's clock and timezone, read in JS.
11:47 PM, Tuesday Collected ourselves (passive)
JS Date + IANA timezone
$0 ●● allude shown
Languages
The ranked language list your browser sends.
en-US, then zh-CN Collected ourselves (passive)
Accept-Language header
$0 ●● hold shown
Where you came from
The link that sent you carries the campaign + creative.
Instagram ad · campaign 'career_switch_q3' Collected ourselves (passive)
Referrer + UTM parameters
$0 ●● allude shown
Device fingerprint
Hash your canvas, fonts and GPU into an ID that survives clearing cookies.
fp_9f3a… (canvas+fonts+GPU hash) Collected ourselves (passive)
FingerprintJS
~$0.005 / match (pro tier) ●●●●● hold shown
Visit history
Tie this session to prior ones we logged.
4th visit in 6 days Our own database
First-party cookie / fingerprint
$0 ●● allude shown
What you looked at
Every page, in order, time-stamped.
'Night cohort' ×3, pricing ×2 Our own database
Site analytics
$0 ●●● allude shown
How you read
JS records scroll position and time on each block.
Read 90% of the financing FAQ, 2m11s Our own database
Scroll-depth + dwell tracking
$0 ●●● allude shown
Unfinished actions
Partial form state captured field-by-field, before submit.
Started the application, didn't submit Our own database
Form analytics
$0 ●●● allude shown
Ad retargeting pool
A 3rd-party pixel adds you to ad audiences across the web.
Meta audience 'career-switch-warm' · shown 7 ads Purchased (data broker append)
Meta Pixel / Google Ads tag
ad spend ●●●● hold shown
Comparison shopping
Cross-site browsing bought from a data-management platform.
Visited 2 competitor bootcamps this week Purchased (data broker append)
DMP / cookie-sync (Lotame, Oracle BlueKai)
$$ subscription ●●●●● hold shown
Name
They typed it into a field.
Maya Declared by the visitor
Newsletter / lead form
$0 say locked
Email
They typed it.
maya.chen@gmail.com Declared by the visitor
Form field
$0 say locked
Stated goal
They wrote it in their own words.
"switch into AI without going broke" Declared by the visitor
Free-text form field
$0 say locked
Linked photo & accounts
Hash the email and look it up across services.
Gravatar photo + 9 sites tied to this email Collected ourselves (passive)
Gravatar / hash lookup
$0 ●●●● hold locked
Breach exposure
Check the email against breach corpora.
Appears in 3 known breaches Collected ourselves (passive)
Have I Been Pwned
$0 ●●●● hold locked
Verified name & photo
One click grants name, photo, locale.
Maya Chen + verified profile photo Granted via Google sign-in
Google OAuth · profile scope
$0 (consented) say locked
Verified email + recovery
Verified address, and that a recovery phone exists.
maya.chen@gmail.com (verified) · recovery phone on file Granted via Google sign-in
Google OAuth · email scope
$0 (consented) ●● allude locked
Account maturity
Account age and activity hints.
Google account since 2009 · 'power user' Granted via Google sign-in
Profile metadata
$0 ●● allude locked
Your calendar
The consent screen can include calendar read — most people don't notice.
'OB checkup Thu 2pm', 'daycare tour Sat' Granted via Google sign-in
Google OAuth · calendar.readonly
$0 (one extra checkbox) ●●●●● hold locked
Inbox metadata
Senders + subjects reveal purchases without reading bodies.
Receipts from Pampers, BuyBuyBaby, a fertility clinic Granted via Google sign-in
Google OAuth · gmail.metadata
$0 ●●●●● hold locked
Watch history
Watch + search history as interest signals.
Recently: 'newborn sleep', 'career switch at 34' Granted via Google sign-in
Google OAuth · YouTube Data API
$0 ●●●● hold locked
Location history
Months of timestamped places.
Home: Mueller · work: downtown · 2 hospital visits this month Granted via Google sign-in
Google OAuth · Maps Timeline
$0 ●●●●● hold locked
Contacts graph
Your whole address book, with labels.
1,840 contacts · partner 'Jordan', an OB-GYN, your mom Granted via Google sign-in
Google OAuth · People API
$0 ●●●● hold locked
Household income
Append modeled income to a name+address match.
Modeled $115–135K band Purchased (data broker append)
Experian / Acxiom income model
~$0.05–0.25 / record ●●●● hold locked
Home & residence
Public deeds + broker compilation.
Homeowner · ~$540K home · 4 yrs in residence Purchased (data broker append)
Property records / Acxiom
bundled in append ●●● allude locked
Net worth band
Modeled from assets, home, credit.
$250–500K Purchased (data broker append)
Experian wealth model
bundled ●●●● hold locked
Life event: separation
Brokers sell change-of-status triggers as they happen.
Trigger: 'recently separated' Purchased (data broker append)
Epsilon / Experian life-event triggers
premium trigger ●●●●● hold locked
Life event: new baby
New-parent is one of the most-traded triggers.
Trigger: 'new parent', infant 0–6mo Purchased (data broker append)
Epsilon life-event triggers
premium trigger ●●●●● hold locked
Vehicle
Registration + service data compiled and sold.
Drives a 2021 Subaru Outback Purchased (data broker append)
Oracle Data Cloud / Polk auto
bundled ●●● allude locked
Education & occupation
Compiled demographics.
BS · occupation: software/IT Purchased (data broker append)
Acxiom InfoBase
bundled ●● allude locked
Political profile
Voter rolls + modeling, sold for targeting.
Leans Democrat · high turnout · past donor Purchased (data broker append)
L2 / Aristotle voter file
voter-file license ●●●●● hold locked
Health ad audiences
Inferred condition audiences sold for ad targeting.
'Expectant/new parent', 'seasonal allergy sufferer' Purchased (data broker append)
Health-adjacent ad audiences
audience license ●●●●● hold locked
Ethnic affinity
Name + geography modeled into an 'affinity'.
Modeled: Chinese Purchased (data broker append)
Acxiom ethnic-affinity model
bundled ●●●●● hold locked
In-market signals
Predicted near-term purchases.
Minivan, baby gear, term life insurance Purchased (data broker append)
Bombora / Oracle propensity
subscription ●●● hold locked
Customer history
We already have an account record.
Took 'Intro to Python' with us, 2023 Our own database
Our CRM
$0 (we own it) say locked
Spend & support
Lifetime value and support load.
LTV $349 · 1 purchase · 3 support tickets Our own database
CRM / billing
$0 ●● allude locked
Risk segment
A score we computed on our own data.
Churn-risk 0.31 · 'price-sensitive' Our own database
Internal model
$0 ●●● hold locked
Payment on file
Stored from the last purchase.
Visa •••• 4242, exp 11/26 Our own database
Billing system
$0 ●●● hold locked
Advocacy
Your own feedback to us.
NPS 9 · left a public 5★ review Our own database
Survey + reviews
$0 say locked
Cross-device
Deterministic + probabilistic linking of all your devices.
This iPhone + a work MacBook + a home iPad = one you Identity graph (cross-device + offline)
LiveRamp / Tapad
$$ subscription ●●●●● hold locked
Household
Devices + addresses clustered into a household.
2 adults (Maya, Jordan) + 1 infant Identity graph (cross-device + offline)
LiveRamp household graph
$$ ●●●●● hold locked
Offline purchases
Loyalty-card baskets sold and matched to your identity.
Target loyalty: diapers + formula weekly; Whole Foods 3×/wk Identity graph (cross-device + offline)
Retail loyalty data resold to brokers
$$ ●●●●● hold locked
Card-spend panel
Anonymized-then-rematched card data sold to marketers.
Baby-gear spike; $0 restaurants since April Identity graph (cross-device + offline)
Credit-card transaction panels
$$ ●●●●● hold locked
Smart-TV viewing
Your TV reports what's on screen, frame-matched.
Heavy HGTV + late-night cartoons Identity graph (cross-device + offline)
Smart-TV ACR (Samba, Vizio Inscape)
$$ ●●●●● hold locked

Text only, for now — the same signals would drive the layout next: reorder sections, swap hero imagery, change the offer. The provenance ledger is the point — nothing reaches the page without a receipt for where it came from.