Evolytics Blog

Risk Tolerance & Error in AB Testing

Minimizing Type I and Type II Error

Understanding risk in AB TestingThere’s much more subjectivity in statistics than we give it credit for. All of the mathematical variables put in place are based upon assumptions. One of the most important assumptions made during the AB Testing process is the risk tolerance. 0% risk is an impossibility in the world of statistics, and, generally speaking, less risk equates to more time and money. It’s a careful balancing act between minimizing risk and maximizing resources.

If you’re not familiar with statistics, the error terms being thrown around can be confusing.

  • Alpha Error = Type I Error | Most commonly referred to as a false positive
  • Beta Error = Type II Error | Most commonly referred to as a false negative

Let’s detour for a moment to discuss how a hypothesis is designed.

If you’re launching an AB Test and want to see Version B convert better than Version A, your null hypothesis would be:

  • A = B, or A ≥ B.

The reason? In statistical terms, it is a stronger showing of evidence to reject than to accept because acceptance is merely a failure to find sufficient evidence to say otherwise. You can reject, or you can fail to find sufficient evidence to prove otherwise.

If you were to reject the hypothesis above, you are forced to believe its opposite, or the Alternative Hypothesis, is true:

  • A < B

Understanding Error, utilizing the example above

Type I Error, or rejecting when you should not have, occurs when, by chance, your evidence shows you that A < B. You would spend resources to roll out the new version of the website, version B, assuming it converts better than your current version, Version A. Unfortunately, if you made a Type I Error, you would be incorrect, and conversions may either remain static, or, in a worst case scenario, conversions would decrease.

Type II Error, or failing to reject when you should have, occurs when, by chance, your evidence shows you that A ≥ B. You would not spend the resources to roll out the new, better performing version of your website, when, in fact, it would improve your conversions.

It isn’t difficult to quickly discern the negative business effect of making one of these errors.

Minimizing Type I Error Risk

The term confidence level is associated with Type I Error. Being 95% confident means that you are allowing a 5% chance for a false positive, or one in 20. If the risk associated with a false positive is very higher, you may increase your confidence level to 99%.  This must be balanced with business resources, as a higher level of confidence means increasing the sample size or increasing the level of difference necessary, which puts you at risk for failing to find small gains.

It is not uncommon for some online testing agencies to lower the confidence level to something like 85% or 90%. After all, those are still good odds, right? Perhaps, but by dropping your confidence level from 95% to 90%, your chance of error doubles from one in 20 to one in 10. The likelihood of making the wrong call should not be taken lightly here.

Minimizing Type II Error Risk

A lesser known term associated with hypothesis testing is power. In fact, if you’re using an easy online calculator and not a statistics tool such as R, you likely don’t even need to enter the power assumptions. Like confidence interval, power denotes the level of risk you are willing to accept, but it is associated with Type II Error.

Type II Error risk can go one of two ways. The best method to control for Type II Error is by planning ahead and basing your sample size on the level of risk you’re comfortable with. There are only two ways to control for Type II Error:

  1. Increase risk for Type I Error
  2. Increase sample size

Understanding the potential costs of making a mistake here will be valuable. For instance, if your monthly online revenue is $100,000, and you optimize the experience by 2%, but fail to recognize that due to Type II Error and do not roll out the experience, you are leaving $24,000 on the table over the course of a year.

A skilled agency such as Evolytics can help you evaluate the costs and risks associated with your AB Testing needs.

Krissy Tripp

Krissy Tripp is a member of the Marketing Analytics team. Curious and creative, she enjoys applying statistics to discover actionable insights that increase client success and profitability. She has supported analytics initiatives for brands such as Sephora, Intuit Markets, and TurboTax.

Krissy TrippRisk Tolerance & Error in AB Testing

Related Posts