HelpMethodology

Does personalization actually move the metrics?

The agent-swarm research on personalization → reply and booking, graded by an adversarial fact-check. These are external benchmarks (sourced, by confidence) — not Provenance's own measured numbers. · 5 min read

We ran the question through a 15-agent research swarm — does personalizing outreach actually move the numbers, and by how much? Seven researchers gathered the evidence; an adversarial verifier then graded every figure and threw out the inflated ones. What follows is what survived, by confidence. These are external research benchmarks (sourced) — not Provenance's own metrics; the product enforces the principles via the Gate, it doesn't claim the numbers as its own.

What held up — verified across sources

Personalization → the metric (graded)

Lever	Effect on the metric	Confidence	Source
Signal-based opener vs a generic blast	≈ 2–2.5× reply rate	cross-source	multiple datasets
Soft interest CTA vs a hard meeting ask	≈ 3× reply (4.2% vs 1.4%); 15% cold-stage booking	robust	Gong · Puzzle Inbox
One contact per account vs 10+	7.8% vs 3.8% reply	supported	Belkins
Personal / career value vs company ROI	≈ 2× the impact; ≈ 50% more likely to buy	supported	Google / CEB (B2B)
Not pitching in the first email	up to −57% reply when you do	supported	Gong · 28M emails
Following up vs a single touch	42–55% of replies come from follow-ups; +50–66% from the first	cross-source	multiple
Loss framing vs upside-only	a loss is felt ≈ 2× an equal gain	supported	prospect theory
Voice: rapport + a stated reason	“did I catch you at a bad time?” → 0.9% success	supported	Gong · 90,380 calls

Calibration — what “good” looks like

Cold-outbound reply benchmarks

Benchmark	Value
Average cold reply rate	≈ 3.4%
Good	5–10%
Elite	10.7%+
Optimize on	reply / positive-reply — not opens (Apple MPP inflates opens)
Deliverability ceilings	spam complaints < 0.3% · bounce < 2%

The takeaway isn't a single magic number — it's the direction and the order of operations: relevance first (it gates everything), then a soft ask, match the peer, frame the loss, don't pitch first, and follow up. Those are exactly the rules the Gate now enforces.

What we threw out — inflated or untraceable

Dropped by the fact-check

Claim we found	Verdict	Why
“5.2× from signal personalization”	dropped	a cross-stitch of two unrelated studies; the real lift is ≈ 2–2.5×
“287% from multichannel”	dropped	misattributed — untraceable to a primary source
“25–40% reply” (vendor tiers)	dropped	single-vendor marketing, well above the ≈ 7.8% real-world ceiling
“44% time-ask penalty”	dropped	untraceable

The full swarm playbook (all 7 dimensions, every source and verdict) is persisted in the repo at docs/research/copy-personalization-research.md. For how these findings became enforced code, see Copy that drives action and the engineering writeup.

Copy that drives action — and the Gate that enforces itWhat a research swarm taught us about outreach that converts, and how every lesson became a deterministic Gate check — because relevance you can't verify reads as spam. The GateHow one claim is verified — decompose, retrieve, NLI, a calibrated diverse ensemble, compliance rules — into a green/amber/red claim ledger. Agent graph & decision treesHow a visitor is routed — classify the network, score confidence, pick a personalization tier, then the agent graph. The router is deterministic; the LLM only drafts; the Gate disposes.

← Previous

Copy that drives action — and the Gate that enforces it