HelpMethodology
Does personalization actually move the metrics?
The agent-swarm research on personalization → reply and booking, graded by an adversarial fact-check. These are external benchmarks (sourced, by confidence) — not Provenance's own measured numbers. · 5 min read
We ran the question through a 15-agent research swarm — does personalizing outreach actually move the numbers, and by how much? Seven researchers gathered the evidence; an adversarial verifier then graded every figure and threw out the inflated ones. What follows is what survived, by confidence. These are external research benchmarks (sourced) — not Provenance's own metrics; the product enforces the principles via the Gate, it doesn't claim the numbers as its own.
What held up — verified across sources
Personalization → the metric (graded)
| Lever | Effect on the metric | Confidence | Source |
|---|---|---|---|
| Signal-based opener vs a generic blast | ≈ 2–2.5× reply rate | cross-source | multiple datasets |
| Soft interest CTA vs a hard meeting ask | ≈ 3× reply (4.2% vs 1.4%); 15% cold-stage booking | robust | Gong · Puzzle Inbox |
| One contact per account vs 10+ | 7.8% vs 3.8% reply | supported | Belkins |
| Personal / career value vs company ROI | ≈ 2× the impact; ≈ 50% more likely to buy | supported | Google / CEB (B2B) |
| Not pitching in the first email | up to −57% reply when you do | supported | Gong · 28M emails |
| Following up vs a single touch | 42–55% of replies come from follow-ups; +50–66% from the first | cross-source | multiple |
| Loss framing vs upside-only | a loss is felt ≈ 2× an equal gain | supported | prospect theory |
| Voice: rapport + a stated reason | “did I catch you at a bad time?” → 0.9% success | supported | Gong · 90,380 calls |
Calibration — what “good” looks like
Cold-outbound reply benchmarks
| Benchmark | Value |
|---|---|
| Average cold reply rate | ≈ 3.4% |
| Good | 5–10% |
| Elite | 10.7%+ |
| Optimize on | reply / positive-reply — not opens (Apple MPP inflates opens) |
| Deliverability ceilings | spam complaints < 0.3% · bounce < 2% |
The takeaway isn't a single magic number — it's the direction and the order of operations: relevance first (it gates everything), then a soft ask, match the peer, frame the loss, don't pitch first, and follow up. Those are exactly the rules the Gate now enforces.
What we threw out — inflated or untraceable
Dropped by the fact-check
| Claim we found | Verdict | Why |
|---|---|---|
| “5.2× from signal personalization” | dropped | a cross-stitch of two unrelated studies; the real lift is ≈ 2–2.5× |
| “287% from multichannel” | dropped | misattributed — untraceable to a primary source |
| “25–40% reply” (vendor tiers) | dropped | single-vendor marketing, well above the ≈ 7.8% real-world ceiling |
| “44% time-ask penalty” | dropped | untraceable |
The full swarm playbook (all 7 dimensions, every source and verdict) is persisted in the repo at
docs/research/copy-personalization-research.md. For how these findings became enforced code, see Copy that drives action and the engineering writeup.Related
Copy that drives action — and the Gate that enforces itWhat a research swarm taught us about outreach that converts, and how every lesson became a deterministic Gate check — because relevance you can't verify reads as spam.
The GateHow one claim is verified — decompose, retrieve, NLI, a calibrated diverse ensemble, compliance rules — into a green/amber/red claim ledger.
Agent graph & decision treesHow a visitor is routed — classify the network, score confidence, pick a personalization tier, then the agent graph. The router is deterministic; the LLM only drafts; the Gate disposes.