Binomial distribution and normal approximation part II

5 min readJun 26, 2021

1. Introduction
2. Normal approximation
3. Continuity correction
4. Example

Introduction

Previously, we’ve learned what binomial distribution is and how we can use it to solve coin toss problems, if you need a refresher on what binomial distribution is it is recommended that you review previous blog.

Part II will be focused on mitigating limitations of binomial distributions we’ve introduced previously by using normal approximation and continuity correction.

Normal approximation

Goal of normal approximation is if binomial distribution satisfy certain conditions treat them like normal distribution therefore we could apply tricks that are applicable to normal distributions.

We know that normal distribution is symmetric distribution therefore for binomial distribution must be symmetric in order for normal approximation to be reasonable.

Also for binomial distribution, mean = npand standard deviation = np(1-p) .

Closer probability of success (p) is to 0.5 more symmetric binomial distribution will be otherwise it will be skewed right(p<0.5) and left(p>0.5).

There is a rough guideline on when normal approximation is reasonable, that is when both np ≥ 10 and nq ≥ 10. It is just a rough guideline, just remember that as np,nq become larger binomial distribution looks more like normal distribution.

You should play around with different n, p to see how binomial distribution changes. I’ve plotted some of binomial distribution with different n and q , You could play around with the hyperparameters using code → here.

Using the code I’ve plotted binomial distribution with p= 0.2, 0.4, 0.5, and 0.9 different n for each p.

When p = 0.2, you can see right skewed distribution for all n and larger the n, becomes to look more like normal distribution so it seems to follow guide line we’ve stated above.

When p=0.4 it starts to look really like normal distribution however as n decreases we can see it is still right skewed.

When p=0.5, even with very small n it seems to look like normal distribution.

Finally with p = 0.9, we can see distribution being left skewed.

Continuity correction

when we satisfy the guideline we can standardize it (calculate z) z = (x-mean)/variance then since Z ~N(0,1) (Z is approximately normal distribution) we can use it to solve binomial distribution problems just like normal distribution.

However note that normal distribution is continuous whereas binomial distribution is discrete (1,2,3,4…etc). So there is one last thing we must consider before using normal approximation, this is called Continuity correction.

If you want to find probability of getting 2 heads we need to consider values between 1.5~2.5 to fully consider getting 2 heads because unlike discrete problems 2 in continuous means 2.000000… which is different from 2.00000001…

And if you want to get probability of getting greater than 2 head in 1000 coin toss we would consider values ≥ 1.5

Example

Lastly remember in previous blog we’ve said that it was cumbersome to calculate probability of getting greater than or equal to 50 heads in 100 coin toss using binomial distribution because our calculation would be like this:

it will require 50 calculations, one for each random variable.

Even though it could easily be done using code below it creates unnecessary bottleneck in computation.

traditional way of calculating is (using formula from previous blog post):

Now using normal approximation, we just need to do one calculation:

Looking at z-table we know that there is 1–0.46017 = 53.983% chance of getting 50 heads in 100 coin toss.

We can see that normal approximation for binomial distribution indeed does a very good job of estimating cumulative probability. Notice how simpler it became such that we can even do it by hand!

This is approximately 50x faster than calculation using binomial distribution function, and if we are dealing with more coin tosses say 10000 coin toss then normal approximation will reduce computation by roughly 10000x.

Conclusion

Even though normal approximation may not be needed when you are doing data analysis when you deploy statistical model in production it will definitely help computation speed. Also note how easy calculation becomes compared to using binomial distribution function for cumulative random variables. Even though you do not use it in practice I was blown away about its beauty, isn’t it beautiful to see mathematics in action? anyways thanks for reading and please comment if there is any misinformation :)

References: