A/B testing & experiments
Why this matters for your business
This page covers site-wide A/B testing — the surface that manages experiments across all platform features (campaigns, journeys, content, storefront, ad creatives) in one coordinated place. Specific surfaces have their own A/B implementations (see storefront A/B block, sales-engine experiments), but this is where they're orchestrated and reported on.
The strategic value is coordination. Without a single testing surface, teams run conflicting experiments (a subject- line test + a journey-version test on the same audience makes both meaningless). They also run nothing because "it's hard to set up." This page surface is the experiment-program backbone.
What this typically unlocks
| Outcome | Result |
|---|---|
| Tests/quarter (vs. ad-hoc) | +4× typical |
| Validated insights/year | 30-40 with discipline |
| Conflicting experiments | 0 with coordination |
| Time setup → live test | 5 minutes |
What you actually get
| Capability | Description |
|---|---|
| Test catalog | Every active + completed test in one view |
| Conflict detection | Won't let you test contradicting variables on overlapping audiences |
| Sample-size calculator | "How many recipients to detect 5% lift?" |
| Live significance | p-value + confidence interval as data accumulates |
| Auto-stopping | Optional: stop test when significance reached |
| Hypothesis log | Pre-registered hypotheses; the "expected" before "actual" |
| Cross-surface tests | Same test running on email + WA + storefront |
Real merchant scenarios
Scenario A — Brand commits to weekly test
Setup. $5M brand. Goal: 1 test per week × 52 weeks.
Year-1 result: 47 tests run. 31 had clear winners. 8 overlapped on audience and were re-run separately. 8 inconclusive. +18% compounded conversion lift from the ones that won.
Scenario B — Avoiding conflict
Setup. Marketing manager scheduled a subject-line test on welcome email. Same week, a designer scheduled a content test on welcome-email step 2.
Conflict alert: "Both tests target the same audience on the same journey. Suggested: stagger the tests, or coordinate under one multivariate test."
Action: Combined into one multivariate test. Both factors measured cleanly.
Scenario C — Stopping a clear winner early
Setup. A/B test on campaign subject. After 12,000 sends, variant B was up 38% with p < 0.001.
Auto-stop fired: Test concluded; remaining sends went to the winner. Captured an extra ~$8K in conversion vs. running test to full duration.
Best practices
✅ Pre-register hypotheses. Writing "I think X will win because Y" before testing makes the result mean something.
✅ Use the conflict detection. Saves you from reading inconclusive tests later.
✅ Run holdout-based tests for journeys (sales-engine experiments); they measure causal lift, not just correlation.
❌ Don't run too many concurrent tests without coordination.
❌ Don't act on tests that haven't reached power.
Plan tiers
| Capability | Free | Starter | Pro | Agency | Enterprise |
|---|---|---|---|---|---|
| Test catalog + management | — | ✓ | ✓ | ✓ | ✓ |
| Conflict detection | — | ✓ | ✓ | ✓ | ✓ |
| Sample-size calculator | — | ✓ | ✓ | ✓ | ✓ |
| Live significance | — | ✓ | ✓ | ✓ | ✓ |
| Auto-stop on significance | — | — | ✓ | ✓ | ✓ |
| Multivariate tests | — | — | ✓ | ✓ | ✓ |
| Hypothesis log | — | — | ✓ | ✓ | ✓ |
See also
- Sales engine experiments + holdouts — formal statistical testing
- Storefront A/B block — drop-in storefront A/B
- Decision Intelligence — suggests what to test next