How to Determine Your A/B Testing Sample Size & Time Frame

I remember running my first A/B test after college. It wasn’t till then that I understood the basics of getting a big enough A/B test sample size or running the test long enough to get statistically significant results.

But figuring out what “big enough” and “long enough” were was not easy.

Googling for answers didn’t help me, as I got information that only applied to the ideal, theoretical, and non-marketing world.

Turns out I wasn’t alone, because asking how to determine A/B testing sample size and time frame is a common question from our customers.

So, I figured I’d do the research to help answer this question for all of us. In this post, I’ll share what I’ve learned to help you confidently determine the right sample size and time frame for your next A/B test.

Table of Contents

A/B Test Sample Size Formula

When I first saw the A/B test sample size formula, I was like, woah!!!!

Here’s how it looks:

Image Source

n is the sample size
𝑝1 is the Baseline Conversion Rate
𝑝2 is the conversion rate lifted by Absolute “Minimum Detectable Effect”, which means 𝑝1+Absolute Minimum Detectable Effect
𝑍𝛼/2 means Z Score from the z table that corresponds to 𝛼/2 (e.g., 1.96 for a 95{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} confidence interval).
𝑍𝛽 means Z Score from the z table that corresponds to 𝛽 (e.g., 0.84 for 80{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} power).

Pretty complicated formula, right?

Luckily, there are tools that let us plug in as little as three numbers to get our results, and I will cover them in this guide.

Need to review A/B testing key principles first? This video helps.

A/B Testing Sample Size & Time Frame

In theory, to conduct a perfect A/B test and determine a winner between Variation A and Variation B, you need to wait until you have enough results to see if there is a statistically significant difference between the two.

Many A/B test experiments prove this is true.

Depending on your company, sample size, and how you execute the A/B test, getting statistically significant results could happen in hours or days or weeks — and you have to stick it out until you get those results.

For many A/B tests, waiting is no problem. Testing headline copy on a landing page? It‘s cool to wait a month for results. Same goes with blog CTA creative — you’d be going for the long-term lead generation play, anyway.

But certain aspects of marketing demand shorter timelines with A/B testing. Take email as an example. With email, waiting for an A/B test to conclude can be a problem for several practical reasons I’ve identified below.

1. Each email send has a finite audience.

Unlike a landing page (where you can continue to gather new audience members over time), once you run an email A/B test, that‘s it — you can’t “add” more people to that A/B test.

So you’ve got to figure out how to squeeze the most juice out of your emails.

This will usually require you to send an A/B test to the smallest portion of your list needed to get statistically significant results, pick a winner, and send the winning variation to the rest of the list.

2. Running an email marketing program means you’re juggling at least a few email sends per week. (In reality, probably way more than that.)

If you spend too much time collecting results, you could miss out on sending your next email — which could have worse effects than if you sent a non-statistically significant winner email on to one segment of your database.

3. Email sends need to be timely.

Your marketing emails are optimized to deliver at a certain time of day. They might be supporting the timing of a new campaign launch and/or landing in your recipient‘s inboxes at a time they’d love to receive it.

So if you wait for your email to be fully statistically significant, you might miss out on being timely and relevant — which could defeat the purpose of sending the emails in the first place.

That’s why email A/B testing programs have a “timing” setting built in: At the end of that time frame, if neither result is statistically significant, one variation (which you choose ahead of time) will be sent to the rest of your list.

That way, you can still run A/B tests in email, but you can also work around your email marketing scheduling demands and ensure people are always getting timely content.

So, to run email A/B tests while optimizing your sends for the best results, consider both your A/B test sample size and timing.

Next up — how to figure out your sample size and timing using data.

How to Determine Sample Size for an A/B Test

For this guide, I’m going to use email to show how you’ll determine sample size and timing for an A/B test. However, note that you can apply the steps in this list for any A/B test, not just email.

As I mentioned above, you can only send an A/B test to a finite audience — so you need to figure out how to maximize the results from that A/B test.

To do that, you must know the smallest portion of your total list needed to get statistically significant results.

Let me show you how you calculate it.

1. Check if your contact list is large enough to conduct an A/B test.

To A/B test a sample of your list, you need a list size of at least 1,000 contacts.

From my experience, if you have fewer than 1,000 contacts, the proportion of your list that you need to A/B test to get statistically significant results gets larger and larger.

For example, if I have a small list of 500 subscribers, I might have to test 85{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} or 95{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} of them to get statistically significant results.

Once I’m done, the remaining number of subscribers who I didn’t test will be so small that I might as well send half of my list one email version, and the other half another, and then measure the difference.

For you, your results might not be statistically significant at the end of it all, but at least you’re gathering learnings while you grow your email list.

Pro tip: If you use HubSpot, you’ll find that 1,000 contacts is your benchmark for running A/B tests on samples of email sends. If you have fewer than 1,000 contacts in your selected list, Version A of your test will automatically go to half of your list and Version B goes to the other half.

2. Use a sample size calculator.

HubSpot’s A/B Testing Kit has a fantastic and free A/B testing sample size calculator.

During my research, I also found two web-based A/B testing calculators that work well. The first is Optimizely’s A/B test sample size calculator. The second is that of Evan Miller.

For our illustration, though, I’ll use the HubSpot calculator. Here’s how it looks like when I download it:

3. Input your baseline conversion rate, minimum detectable effect, and statistical significance into the calculator.

This is a lot of statistical jargon, but don’t worry, I’ll explain them in layman’s terms.

Statistical significance: This tells you how sure you can be that your sample results lie within your set confidence interval. The lower the percentage, the less sure you can be about the results. The higher the percentage, the more people you’ll need in your sample, too.

Baseline conversion rate (BCR): BCR is the conversion rate of the control version. For example, if I email 10,000 contacts and 6,000 opened the email, the conversion rate (BCR) of the email opens is 60{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}.

Minimum detectable effect (MDE): MDE is the minimum relative change in conversion rate that I want the experiment to detect between version A (original or control sample) and version B (new variant).

For example, if my BCR is 60{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}, I could set my MDE at 5{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}. This means I want the experiment to check whether the conversion rate of my new variant differs significantly from the control by at least 5{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}.

If the conversion rate of my new variant is, for example, 65{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} or higher, or 55{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} or lower, I can be confident that this new variant has a real impact.

But if the difference is smaller than 5{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} (for example, 58{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} or 62{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}), then the test might not be statistically significant as the change could be because of random chance rather than the variant itself.

MDE has real implications on your sample size in terms of time required for your test and traffic. Think of MDE as water in a cup. As the size of the water increases, you need less time and effort (traffic) to get the result you want.

The translation: a higher MDE provides more certainty that my sample’s true actions have been accounted for in the interval. The downside to higher MDEs is the less definitive results they provide.

It‘s a trade-off you’ll have to make. For our purposes, it’s not worth getting too caught up in MDE. When you‘re just getting started with A/B tests, I’d recommend choosing a smaller interval (e.g., around 5{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}).

Note for HubSpot customers: The HubSpot Email A/B tool automatically uses the 85{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} confidence level to determine a winner..

Email A/B Test Example

Let’s say I want to run an email A/B test. First, I need to determine the size of each sample of the test.

Here‘s what I’d put in the Optimizely A/B testing sample size calculator:

Ta-da! The calculator has shown me my sample.

In this example, it is 2,700 contacts per variation.

This is the size that one of my variations needs to be. So for my email send, if I have one control and one variation, I‘ll need to double this number. If I had a control and two variations, I’d triple it.

Here’s how this looks in the HubSpot A/B testing kit.

4. Depending on your email program, you may need to calculate the sample size’s percentage of the whole email.

HubSpot customers, I‘m looking at you for this section. When you’re running an email A/B test, you’ll need to select the percentage of contacts to send the list to — not just the raw sample size.

To do that, you need to divide the number in your sample by the total number of contacts in your list. Here’s what that math looks like, using the example numbers above:

2700 / 10,000 = 27{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}

This means that each sample (both my control AND variation) needs to be sent to 27-28{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} of my audience — roughly ‌55{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} of my list size. And once a winner is determined, the winning version goes to the rest of my list.

a/b testing size results from hubspot calculator

And that’s it! Now you are ready to select your sending time.

How to Choose the Right Timeframe for Your A/B Test for a Landing Page

If I want to test a landing page, the timeframe I’ll choose will vary depending on my business’ goals.

So let’s say I‘d like to design a new landing page by Q1 2025 and it’s Q4 2024. To have the best version ready, I need to have finished my A/B test by December so I can use the results to build the winning page.

Calculating the time I need is easy. Here’s an example:

Landing page traffic: 7,000 per week
BCR: 10{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}
MDE: 5{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}
Statistical significance: 80{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740}

When I plug the BCR, MDE, and statistical significance into the Optimizely A/B test Sample Size Calculator, I got 53,000 as the result.

This means 53,000 people need to visit each version of my landing page if I am experimenting with two versions.

So the time frame for the test will be:

53,000*2/7,000 = 15.14 weeks

This implies I should start running this test within the first two weeks of September.

Choosing the Right Timeframe for Your A/B Test for Email

For emails, you have to figure out how long to run your email A/B test before sending a (winning) version on to the rest of your list.

Knowing the timing aspect is a little less statistically driven, but you should definitely use past data to make better decisions. Here’s how you can do that.

If you don’t have timing restrictions on when to send the winning email to the rest of the list, head to your analytics.

Figure out when your email opens/clicks (or whatever your success metrics are) starts dropping. Look at your past email sends to figure this out.

For example, what percentage of total clicks did you get on your first day?

If you found you got 70{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} of your clicks in the first 24 hours, and then 5{326198e71d5f7582f354f6f6ee7a089ba2f57e2fa39b69186a2948f78a2d5740} each day after that, it‘d make sense to cap your email A/B testing timing window to 24 hours because it wouldn’t be worth delaying your results just to gather a little extra data.

After 24 hours, your email marketing tool should let you know if they can determine a statistically significant winner. Then, it’s up to you what to do next.

If you have a large sample size and found a statistically significant winner at the end of the testing time frame, many email marketing tools will automatically and immediately send the winning variation.

If you have a large enough sample size and there’s no statistically significant winner at the end of the testing time frame, email marketing tools might also allow you to send a variation of your choice automatically.

If you have a smaller sample size or are running a 50/50 A/B test, when to send the next email based on the initial email’s results is entirely up to you.

If you have time restrictions on when to send the winning email to the rest of the list, figure out how late you can send the winner without it being untimely or affecting other email sends.

For example, if you‘ve sent emails out at 3 PM EST for a flash sale that ends at midnight EST, you wouldn’t want to determine an A/B test winner at 11 PM Instead, you‘d want to email closer to 6 or 7 PM — that’ll give the people not involved in the A/B test enough time to act on your email.

Pumped to run A/B tests?

What I have shared here is pretty much everything you need to know about your A/B test sample size and timeframe.

After doing these calculations and examining your data, I’m positive you’ll be in a much better state to conduct successful A/B tests — ones that are statistically valid and help you move the needle on your goals.

Editor’s note: This post was originally published in December 2014 and has been updated for comprehensiveness.

Source link