Skip to main content

A/B testing & experiments

Why this matters for your business

This page covers site-wide A/B testing — the surface that manages experiments across all platform features (campaigns, journeys, content, storefront, ad creatives) in one coordinated place. Specific surfaces have their own A/B implementations (see storefront A/B block, sales-engine experiments), but this is where they're orchestrated and reported on.

The strategic value is coordination. Without a single testing surface, teams run conflicting experiments (a subject- line test + a journey-version test on the same audience makes both meaningless). They also run nothing because "it's hard to set up." This page surface is the experiment-program backbone.

What this typically unlocks

OutcomeResult
Tests/quarter (vs. ad-hoc)+4× typical
Validated insights/year30-40 with discipline
Conflicting experiments0 with coordination
Time setup → live test5 minutes

What you actually get

CapabilityDescription
Test catalogEvery active + completed test in one view
Conflict detectionWon't let you test contradicting variables on overlapping audiences
Sample-size calculator"How many recipients to detect 5% lift?"
Live significancep-value + confidence interval as data accumulates
Auto-stoppingOptional: stop test when significance reached
Hypothesis logPre-registered hypotheses; the "expected" before "actual"
Cross-surface testsSame test running on email + WA + storefront

Real merchant scenarios

Scenario A — Brand commits to weekly test

Setup. $5M brand. Goal: 1 test per week × 52 weeks.

Year-1 result: 47 tests run. 31 had clear winners. 8 overlapped on audience and were re-run separately. 8 inconclusive. +18% compounded conversion lift from the ones that won.

Scenario B — Avoiding conflict

Setup. Marketing manager scheduled a subject-line test on welcome email. Same week, a designer scheduled a content test on welcome-email step 2.

Conflict alert: "Both tests target the same audience on the same journey. Suggested: stagger the tests, or coordinate under one multivariate test."

Action: Combined into one multivariate test. Both factors measured cleanly.

Scenario C — Stopping a clear winner early

Setup. A/B test on campaign subject. After 12,000 sends, variant B was up 38% with p < 0.001.

Auto-stop fired: Test concluded; remaining sends went to the winner. Captured an extra ~$8K in conversion vs. running test to full duration.

Best practices

Pre-register hypotheses. Writing "I think X will win because Y" before testing makes the result mean something.

Use the conflict detection. Saves you from reading inconclusive tests later.

Run holdout-based tests for journeys (sales-engine experiments); they measure causal lift, not just correlation.

Don't run too many concurrent tests without coordination.

Don't act on tests that haven't reached power.

Plan tiers

CapabilityFreeStarterProAgencyEnterprise
Test catalog + management
Conflict detection
Sample-size calculator
Live significance
Auto-stop on significance
Multivariate tests
Hypothesis log

See also