Insights

May 26, 2026

Geometric illustration of purple octopus tentacles rising from the bottom of the frame, each gripping a different email marketing object: an envelope, a bar chart, a coin, a subject line bar, a slot machine lever, a phone, and a clock. Deep navy background.

May 26, 2026

Meet Multi-Armed Bandits: The Smartest Way to Test Email

Your ESP has been hiding a smarter way to test email and it's already running under the hood. Meet the algorithm that earns revenue while it learns, and how it compares to good ol' A/B testing.

image of Bridget

Bridget Johnston

Marketing

image of Bridget

Bridget Johnston

In 1919, English statistician Ronald Fisher got into an argument about tea. 

His colleague claimed she could taste whether the milk went into a teacup before or after the tea itself. Fisher built an eight-cup, fully randomized, controlled experiment to test her claims. Eight cups. Four with milk first. Four with tea first. Randomized and blinded, she aced his test, guessing correctly all eight times.

From that argument, the modern A/B test was born.

Geometric illustration of purple octopus tentacles rising from the bottom of the frame, each holding a different tea-making object: a kettle, a teacup on a saucer, a milk jug, a sugar cube, a teabag, and a spoon. Deep navy background.

It's a great story, but the problem is that it was designed to test one variable, one time, on a sample of eight cups of tea. Now, it’s a century later, we're using the same basic logic to decide which subject line to send to 400,000 people on Cyber Monday.

This isn’t to say that A/B tests are useless. Instead, we’re saying that testing technology has advanced. Brands today are using tests to make decisions that move faster and hit harder than anything Ronald Fisher ever imagined. These new testing methodologies make it harder to stomach that every time you run a classic A/B test on an email, you knowingly send its worse version to thousands of paying customers on purpose.

There's a better way and it's already running inside Klaviyo and other major ESPs, whether you know it or not. It's called a multi-armed bandit (MAB). Put simply, a multi-armed bandit is a decision-making problem where you choose among several options with unknown payoffs, trying to maximize your total reward over time. 

And put simply again, if your ecommerce email program isn't using MABs, you're leaving real money in a losing arm.

What Is a Multi-Armed Bandit? 

Picture a row of slot machines. You don't know which one pays out the best. 

Classic A/B testing would tell you to do something like: "Put $100 in every machine, count what comes out, then play only the best one going forward." That methodology is sensible, albeit slow and expensive.

A multi-armed bandit would instead tell you: "Drop a quarter in each, watch closely and start feeding more quarters into the one that's clearly hot. Keep dropping the occasional quarter in the others, in case things change."

Now swap "slot machine" for "subject line." Or "hero image." Or "send time." Or "20% off vs. free shipping." That's MAB in email marketing. It allows you to run proactive testing and optimization on your email marketing creative.

Multi-Armed Bandits vs. A/B Tests: How Do They Differ?

Here’s how a traditional Klaviyo or Mailchimp A/B test works. An email marketer will:

  • Send each variant to a fixed slice (such as 20% / 20%) of your list.

  • Then wait for statistical significance before trusting the result. The exact sample size depends on your traffic, effect size and confidence level.

  • Declare a winner after the test completes, shipping its winning creative to the remaining 60% of their list.

Split illustration comparing A/B testing and multi-armed bandit testing. Left side shows two sparse purple tentacles each holding a single object — an envelope and a toggle switch. Right side shows six active tentacles reaching in every direction, each gripping a different object including a bar chart, clock, phone, coin, envelope, and lever.

A multi-armed bandit works differently. When deploying this testing method, an email marketer will:

  • Begin the test with their creative splits, but every few minutes, ask: "Which variant is winning right now?"

  • Quietly shift more sends toward the winningest creative in that moment, while still serving the underdogs enough to detect a performance comeback.

  • Declares no fixed "end," but rather keeps optimizing based on performance.

The difference between these two approaches is the difference between exploring-then-exploiting (A/B) and exploring-while-exploiting (MAB). For email campaigns, you can’t add recipients after the send, so if your list is too small to reach statistical significance, that constraint is a whole issue. But don’t worry; there’s a solution.

The Bayesian Engine Under the Hood

When you hear "MAB," you'll often hear "Bayesian optimization" in the same breath. Here’s a simple explanation.

Bayesian methods don’t stop their processes to ask: “Is variant B statistically significant?” Rather, they ask a more human question: “How likely is variant B to be the better choice?” 

A lot of multi-armed bandit systems keep updating their guess for each email version as new results come in, then send the next message based on what looks strongest at that moment. The more data they collect, the sharper those guesses become. Ultimately, this is just a smarter way to learn as you go, instead of waiting around for a final verdict.

Can I Actually Use MAB Testing Today?

Yes! Several ESPs support it. In fact, Klaviyo supports multi-armed bandit-style testing through its “distribute automatically” option in A/B tests. In practice, that means Klaviyo can keep shifting more sends toward the variant performing best instead of splitting traffic evenly the whole time. This is especially useful when you want the campaign to optimize as results come in.

One important caveat is that this feature is not available to everyone. Klaviyo’s documentation notes that it’s gated to accounts with more than 400,000 profiles, so for some users it’s not an option.

When Isn’t MAB the Right Answer?

We're not here to gaslight you. MABs have real limits and a good old-fashioned A/B test may still suit some specific campaigns! Their shortfalls?

  • MABs assume the world doesn't change. Seasonality, Black Friday, a viral TikTok trend. These can all suddenly shift user behavior mid-test, confusing a bandit. For example, it'll happily exploit a "winner" from Tuesday that's already obsolete by Friday.

  • You don’t get a clean final proof. If your CFO asks: “Are we sure the new flow won the test?” a bandit is better at improving performance in real time than giving the kind of fixed statistical result a standard A/B test can provide.

  • Tiny lists struggle. If you only have a few hundred recipients per send, bandits will starve, losing variants before you really know they're losers.

Our guidance is to: 

  • Use A/B testing when the stakes are huge and you need a defensible answer. Consider them when sending materials on a full rebrand, pricing change or brand-voice pivot. 

  • Use MAB for everything else, like subject lines, hero creative, send time, offer depth or testing CTA copy. Consider it for basically the 90% of email decisions you make every week.

A critical takeaway is that a multi-armed bandit doesn't replace creativity or taste, but amplifies the best parts of it for your brand. An AI agent picks the winner. The marketer picks what's worth winning with, long before testing begins.

That's why MAB pairs so well with generative-AI email tools that produce dozens of on-brand variants in seconds. You can't bandit-test what you don't have. Twelve hero images is a better experiment than two. And that combination of fast generation + adaptive testing is the ultimate unlock for ecommerce brands today.

The Backstroke Take

So much of today’s "AI email testing" in the wild is just A/B testing with a chatbot taped to it. Real AI email marketing means agentic decisioning. Brands need a system that generates the variants, allocates the sends, learns from every interaction and keeps learning even after the test "ends." It earns while it learns. It treats every campaign like a live optimization, not a static announcement.

Backstroke is built around multi-armed bandits and our system is informed by data from 20,000+ ecommerce brands.

If your current setup says: "let's wait two weeks and pick a winner," that’s not testing. At this point, it’s stalling.

See an AI-native, bandit-powered email program in motion.

Book a Backstroke demo →