Skip to content
Help
HelpCore workflow

The Optimizer

A per-segment Thompson-sampling bandit that can only ever pull Gate-cleared arms, so a planted lie is structurally unreachable — proven against an unconstrained twin that converges to it. · 4 min read

The Optimizer is a contextual bandit that learns the best message per micro-segment from a simulated click-through oracle. The context is the recipient's segment (role and company size); within a segment it samples each active arm's Beta posterior (Thompson sampling) and pulls the best one (pipeline/optimizer/bandit.py). You can watch it converge on the Optimizer page.

The truth boundary is structural

The bandit can only ever pull arms that are in the action pool, and the Gate decides what goes into the pool. The pool is the set of Gate-cleared variant ids per segment; a blocked or unverified variant is never added (pipeline/common/store.py). So a planted lie is unreachable by construction — there is no special case in the bandit that filters it out. It simply is not an option to sample.

The unconstrained twin

The proof is a contrast on identical data. The planted-lie arm is given the highest latent click-through rate by the oracle (pipeline/optimizer/oracle.py), so a bandit that is allowed to pull it will converge to it. The constrained run (verified arms only) and the unconstrained twin (the lie is in the pool) use the same recipients and the same oracle — only the truth boundary differs.

Same data, two outcomes
Constrained banditlie is absent from the pool; 0 selections of the lie; converges to the best honest arm
Unconstrained twinlie is in the pool and has the highest reward; converges to the lie

Regret

Regret is measured against the best arm available in that pool. So both campaigns converge (regret tends toward zero) — the difference is what they converge to: the constrained run to the best honest arm, the twin to the lie (pipeline/optimizer/campaign.py). That contrast is the whole point: you cannot win by lying when lies never enter the reward loop.

This is headline property P1, proven in tests/test_optimizer.py: the constrained bandit records 0 selections of the lie while the twin selects it and converges to it across most segments.