Convert your Z-Score to a confidence level
What is a z-score?
A z-score is a standardized score that describes how many standard deviations an element is from the mean. In A/B Testing terms, all of your visitors are observations, and the Control experience makes up a bell curve. The Variant Recipe and all of the visitors in it make up a second bell curve. We use the Z-score calculator to test how far the center of the Variant bell curve is from the center of the Control bell curve.
Is my test one-sided or two-sided?
We typically recommend two-sided tests. If you conduct a two-sided hypothesis test, you can be mathematically confident about whether or not your Variant Recipe is greater than or less than your Control Recipe. With a one-sided test, you are only mathematically confident about one or the other, but never both. We believe it’s just as important to know if your test is statistically underperforming as it is to know if it’s performing better than Control.
What does my confidence level mean to me in a business sense?
Z-scores are equated to confidence levels. If your two-sided test has a z-score of 1.96, you are 95% confident that that Variant Recipe is different than the Control Recipe. If you roll out this Variant Recipe, there is only a one in 20 chance that you will not see a lift.
What are common confidence levels?
The most commonly used confidence level is 95%. This is the standard confidence level in the scientific community, essentially stating that there is a one in twenty chance of an alpha error, or the chance that the observations in the experiment look different, but are not.
Common Confidence Levels and their Z-Score Equivalents
- Two-Sided Z-Score: 1.96
- One-Sided Z-Score: 1.65
- Two-Sided Z-Score: 2.58
- One-Sided Z-Score: 2.33
- Two-Sided Z-Score: 1.64
- One-Sided Z-Score: 1.28
In the digital community, it’s not uncommon to see A/B testing tools make calls at only 80% or 85% confidence. While there are a limited set of situations when this is OK, it is never ideal. Making decisions too early is one of the most common mistakes we see in A/B Testing. If you make ROI projections based on 80% confidence and roll out that experience, you have a one in five chance of missing them completely. If you do one test a month, at least two likely had erroneous results.
Of course, we don’t recommend waiting for 99% confidence either. We recommend setting standards based on available traffic levels, risk appetite, and the willingness to back test.