Everyone knows and loves the normal distribution. It is used in a huge variety of applications such as investment modeling, A/B testing, and manufacturing process improvement (six sigma). But people are less familiar with the binomial distribution. That’s a shame because the binomial distribution is
really useful. Have you ever been asked something like: “Given 10 flips of a fair coin, what is the probability of getting 6 heads?” Probability (especially back of the napkin probability calculations) is not my favorite thing in the world. So when I first learned about the binomial distribution, I thought, “Yes, I never have to worry about coin flip probability questions again!” That’s because the results of coin flips
follow the binomial distribution. I should emphasize that the law of large numbers applies here. To be technically correct, I should say that if were to repeatedly perform the same set of experiments (flipping the coin 10 times) over and over, the number of heads that we observe across all those sets would follow the binomial distribution. Don’t worry, I
will illustrate this in detail shortly. What is the Binomial DistributionFirst let’s start with the slightly more technical definition — the binomial distribution is the probability distribution of a sequence of experiments where each experiment produces a binary outcome and where each of the outcomes is independent of all the others. A single coin flip is an example of an experiment with a binary outcome. Coin flips meet the other binomial distribution requirement as well — the outcome of each individual coin flip is independent of all the others. Just to be clear, the outcomes of the experiment don’t need to be equally likely as they are with flips of a fair coin — the following things also meet the prerequisites of the binomial distribution:
One thing that may trouble newcomers to probability and statistics is the idea of a probability distribution. We tend to think deterministically such as “I flipped a coin 10 times and produced 6 heads”. So the outcome is 6 — where is the distribution then? The probability distribution derives from variance. If both you and I flipped 10 coins, it’s pretty likely that we would get different results (you might get 5 heads and I get 7). This variance, a.k.a. uncertainty around the outcome, produces a probability distribution, which basically tells us what outcomes are relatively more likely (such as 5 heads) and which outcomes are relatively less likely (such as 10 heads). We can produce such a probability distribution through simulation, such as in the image below: Illustration of a Sequence of Trials that Would Produce a Binomial DistributionBefore we go into some Python code that would run this simulation and produce a binomial distribution, let’s first get some definitions out of the way. When you see binomial distributions and the experiments that underlie them described in textbooks, the descriptions always include the following key parameters:
Binomial Distributions with PythonLet’s go through some python code that runs the simulation we described above. The code below (also available on my Github here) does the following:
# Import libraries So that’s the code. So what happens when we repeat our 10 coin toss trial 1,000 times? We get the histogram plotted below: Let’s modify the plotting section of our previous code so that our plot also shows the actual binomial distribution (using the stats.binom function from the scipy library): # Plot the actual binomial distribution as a sanity check The following plot shows our original simulated distribution in blue and the actual binomial distribution in red. The takeaway is that the binomial distribution is a pretty good approximation of what we would have observed if we had actually repeated our 10 coin tosses 1,000 times — so instead of wasting tons of time tossing coins and recording the results, we can just use the binomial distribution! Actual Binomial Distribution (Red) vs. Our Simulation Results (Blue)And if we wanted to simulate the result of a single sequence of n experiments (recall that n=10 for our example but really it could be any positive integer), we could generate that using a binomially distributed random variable like so: np.random.binomial(n, p) Finally, let’s answer our original question (probability of getting 6 heads with 10 coin flips) by running 10,000 simulations of our 10 coin flips: # Probability of getting 6 headsruns = 10000 We find the probability to be around 20% (we can also see that in our earlier histogram via the height of the red vertical line above 6 on the x axis). Real World Applications of the Binomial DistributionCool, but what if we want to analyze things beyond coin flips? Let’s run through a stylized real world use case for the binomial distribution. Imagine that we are data scientists tasked with improving the ROI (Return on Investment) of our company’s call center, where employees attempt to cold call potential customers and get them to purchase our product. You look at some historical data and find the following:
The following code simulates our call center: # Call Center Simulation# Number of employees to simulate If you run the code, you get outputs like the following (it changes from run to run because conversions is an array of binomially distributed random variables) key metrics for your call center:
Profits are pretty slim compared to expenses. But these are results for just one randomly generated day. Let’s look at the profit of our call center over 1,000 simulations and see how the daily profit varies: Profit Distribution of Our Call CenterWow, there is a very high chance of loss given the current operating metrics of our call center (nearly half the simulated profits are negative). What should we do then? Recalling that each employee’s results follows a binomial distribution, we realize that we can do one or more of the following to improve things:
Eventually, we develop a lead generation tool that allows our call center employees to identify people that are more likely to purchase our product. This results in shorter phone calls (less sweet talking required) and an uptick in our conversion probability, ultimately producing these revisions to our parameters:
Let’s run the following lines of code that simulates 1,000 potential days of our new and improved call center to see how these changes impact the statistical distribution of our future daily profit. # Call Center Simulation (Higher Conversion Rate)# Number of employees to simulate The above code also plots the distribution of our improved results (red) against our old ones (blue). We don’t need to run an A/B test (though we really should) to see that our lead generation tool has significantly improved the operation and profitability of our call center. Improved Call Center Simulation ConclusionI should end by making the following points:
Cheers! Which NumPy method is used to simulate a binomial distribution?binomial. Draw samples from a binomial distribution.
How do you do binomial distribution in Python?Binomial test in Python (Example). Import the function. from scipy. stats import binomtest. Python.. Define the number of successes (k), define the number of trials (n), and define the expected probability success (p). k=5 n=12 p=0.17. Python.. Perform the binomial test in Python. res = binomtest(k, n, p) print(res. pvalue). What is binom in Python?pmf function is a part of Python's SciPy library and is used to model probabilistic experiments with the help of binomial distribution.
What is binary distribution in Python?The binomial distribution model deals with finding the probability of success of an event which has only two possible outcomes in a series of experiments. For example, tossing of a coin always gives a head or a tail.
|