Which of the following numpy method is used to simulate a binomial distribution in python?

Understanding the Lesser Known Cousin of the Normal Distribution and How to Apply It

Everyone knows and loves the normal distribution. It is used in a huge variety of applications such as investment modeling, A/B testing, and manufacturing process improvement (six sigma).

But people are less familiar with the binomial distribution. That’s a shame because the binomial distribution is really useful. Have you ever been asked something like:

“Given 10 flips of a fair coin, what is the probability of getting 6 heads?”

Probability (especially back of the napkin probability calculations) is not my favorite thing in the world. So when I first learned about the binomial distribution, I thought, “Yes, I never have to worry about coin flip probability questions again!”

That’s because the results of coin flips follow the binomial distribution. I should emphasize that the law of large numbers applies here. To be technically correct, I should say that if were to repeatedly perform the same set of experiments (flipping the coin 10 times) over and over, the number of heads that we observe across all those sets would follow the binomial distribution.

Don’t worry, I will illustrate this in detail shortly.

What is the Binomial Distribution

First let’s start with the slightly more technical definition — the binomial distribution is the probability distribution of a sequence of experiments where each experiment produces a binary outcome and where each of the outcomes is independent of all the others.

A single coin flip is an example of an experiment with a binary outcome. Coin flips meet the other binomial distribution requirement as well — the outcome of each individual coin flip is independent of all the others. Just to be clear, the outcomes of the experiment don’t need to be equally likely as they are with flips of a fair coin — the following things also meet the prerequisites of the binomial distribution:

  • An unfair coin (e.g. one with an 80% probability of coming up heads).
  • Randomly picking people on the street to answer a yes or no question.
  • Attempting to convince visitors of a website to buy a product — the yes or no outcome is whether they purchased or not.

One thing that may trouble newcomers to probability and statistics is the idea of a probability distribution. We tend to think deterministically such as “I flipped a coin 10 times and produced 6 heads”. So the outcome is 6 — where is the distribution then?

The probability distribution derives from variance. If both you and I flipped 10 coins, it’s pretty likely that we would get different results (you might get 5 heads and I get 7). This variance, a.k.a. uncertainty around the outcome, produces a probability distribution, which basically tells us what outcomes are relatively more likely (such as 5 heads) and which outcomes are relatively less likely (such as 10 heads).

We can produce such a probability distribution through simulation, such as in the image below:

Illustration of a Sequence of Trials that Would Produce a Binomial Distribution

Before we go into some Python code that would run this simulation and produce a binomial distribution, let’s first get some definitions out of the way. When you see binomial distributions and the experiments that underlie them described in textbooks, the descriptions always include the following key parameters:

  1. n: the number of times we perform our experiment. In our coin example, n is equal to 10 (each experiment is 1 flip of the coin).
  2. p: the probability of success. For a fair coin, it would be 50%.
  3. k: the target number of successes. Earlier we mentioned that we were looking for 6 successes.

Binomial Distributions with Python

Let’s go through some python code that runs the simulation we described above. The code below (also available on my Github here) does the following:

  1. Generate a random number between 0 and 1. If that number is 0.5 or more, then count it as heads, otherwise tails. Do this n times using a Python list comprehension. This happens within the function run_binom via the variables tosses.
  2. Repeat this a specified number of times (the amount of trials is specified by the input variable trials). We will perform 1,000 trials.
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Input variables# Number of trials
trials = 1000
# Number of independent experiments in each trial
n = 10
# Probability of success for each experiment
p = 0.5
# Function that runs our coin toss trials
# heads is a list of the number of successes from each trial of n experiments
def run_binom(trials, n, p):
heads = []
for i in range(trials):
tosses = [np.random.random() for i in range(n)]
heads.append(len([i for i in tosses if i>=0.50]))
return heads
# Run the function
heads = run_binom(trials, n, p)
# Plot the results as a histogram
fig, ax = plt.subplots(figsize=(14,7))
ax = sns.distplot(heads, bins=11, label='simulation results')
ax.set_xlabel("Number of Heads",fontsize=16)
ax.set_ylabel("Frequency",fontsize=16)

So that’s the code. So what happens when we repeat our 10 coin toss trial 1,000 times? We get the histogram plotted below:

Our Simulation Results

Let’s modify the plotting section of our previous code so that our plot also shows the actual binomial distribution (using the stats.binom function from the scipy library):

# Plot the actual binomial distribution as a sanity check
from scipy.stats import binom
x = range(0,11)
ax.plot(x, binom.pmf(x, n, p), 'ro', label='actual binomial distribution')
ax.vlines(x, 0, binom.pmf(x, n, p), colors='r', lw=5, alpha=0.5)
plt.legend()
plt.show()

The following plot shows our original simulated distribution in blue and the actual binomial distribution in red. The takeaway is that the binomial distribution is a pretty good approximation of what we would have observed if we had actually repeated our 10 coin tosses 1,000 times — so instead of wasting tons of time tossing coins and recording the results, we can just use the binomial distribution!

Actual Binomial Distribution (Red) vs. Our Simulation Results (Blue)

And if we wanted to simulate the result of a single sequence of n experiments (recall that n=10 for our example but really it could be any positive integer), we could generate that using a binomially distributed random variable like so:

np.random.binomial(n, p)

Finally, let’s answer our original question (probability of getting 6 heads with 10 coin flips) by running 10,000 simulations of our 10 coin flips:

# Probability of getting 6 headsruns = 10000
prob_6 = sum([1 for i in np.random.binomial(n, p, size=runs) if i==6])/runs
print('The probability of 6 heads is: ' + str(prob_6))

We find the probability to be around 20% (we can also see that in our earlier histogram via the height of the red vertical line above 6 on the x axis).

Real World Applications of the Binomial Distribution

Cool, but what if we want to analyze things beyond coin flips? Let’s run through a stylized real world use case for the binomial distribution. Imagine that we are data scientists tasked with improving the ROI (Return on Investment) of our company’s call center, where employees attempt to cold call potential customers and get them to purchase our product.

You look at some historical data and find the following:

  • The typical call center employee completes on average 50 calls per day.
  • The probability of a conversion (purchase) for each call is 4%.
  • The average revenue to your company for each conversion is $20.
  • The call center you are analyzing has 100 employees.
  • Each employee is paid $200 per day of work.

We can think of each employee as a binomially distributed random variable with the following parameters:

n = 50

p = 4%

The following code simulates our call center:

# Call Center Simulation# Number of employees to simulate
employees = 100
# Cost per employee
wage = 200
# Number of independent calls per employee
n = 50
# Probability of success for each call
p = 0.04
# Revenue per call
revenue = 100
# Binomial random variables of call center employees
conversions = np.random.binomial(n, p, size=employees)
# Print some key metrics of our call center
print('Average Conversions per Employee: ' + str(round(np.mean(conversions), 2)))
print('Standard Deviation of Conversions per Employee: ' + str(round(np.std(conversions), 2)))
print('Total Conversions: ' + str(np.sum(conversions)))
print('Total Revenues: ' + str(np.sum(conversions)*revenue))
print('Total Expense: ' + str(employees*wage))
print('Total Profits: ' + str(np.sum(conversions)*revenue - employees*wage))

If you run the code, you get outputs like the following (it changes from run to run because conversions is an array of binomially distributed random variables) key metrics for your call center:

  • Average Conversions per Employee: 2.13
  • Standard Deviation of Conversions per Employee: 1.48
  • Total Conversions: 213
  • Total Revenues: $21,300
  • Total Expenses: $20,000
  • Total Profits: $1,300

Profits are pretty slim compared to expenses. But these are results for just one randomly generated day. Let’s look at the profit of our call center over 1,000 simulations and see how the daily profit varies:

Profit Distribution of Our Call Center

Wow, there is a very high chance of loss given the current operating metrics of our call center (nearly half the simulated profits are negative). What should we do then?

Recalling that each employee’s results follows a binomial distribution, we realize that we can do one or more of the following to improve things:

  • Make more cold calls (increase n).
  • Convert at a higher percentage (increase p).
  • Pay our employees less (we won’t do this because we are nice).

Eventually, we develop a lead generation tool that allows our call center employees to identify people that are more likely to purchase our product. This results in shorter phone calls (less sweet talking required) and an uptick in our conversion probability, ultimately producing these revisions to our parameters:

  • n = 55
  • p = 5%

Let’s run the following lines of code that simulates 1,000 potential days of our new and improved call center to see how these changes impact the statistical distribution of our future daily profit.

# Call Center Simulation (Higher Conversion Rate)# Number of employees to simulate
employees = 100
# Cost per employee
wage = 200
# Number of independent calls per employee
n = 55
# Probability of success for each call
p = 0.05
# Revenue per call
revenue = 100
# Binomial random variables of call center employees
conversions_up = np.random.binomial(n, p, size=employees)
# Simulate 1,000 days for our call center# Number of days to simulate
sims = 1000
sim_conversions_up = [np.sum(np.random.binomial(n, p, size=employees)) for i in range(sims)]
sim_profits_up = np.array(sim_conversions_up)*revenue - employees*wage
# Plot and save the results as a histogram
fig, ax = plt.subplots(figsize=(14,7))
ax = sns.distplot(sim_profits, bins=20, label='original call center simulation results')
ax = sns.distplot(sim_profits_up, bins=20, label='improved call center simulation results', color='red')
ax.set_xlabel("Profits",fontsize=16)
ax.set_ylabel("Frequency",fontsize=16)
plt.legend()

The above code also plots the distribution of our improved results (red) against our old ones (blue). We don’t need to run an A/B test (though we really should) to see that our lead generation tool has significantly improved the operation and profitability of our call center.

We successfully recognized that the profit produced by each employee follows a binomial distribution — so if we could increase both the n (number of cold calls made per day) and p (probability of conversion for each call) parameters, we could generate higher profits.

Improved Call Center Simulation

Conclusion

I should end by making the following points:

  • Probability distributions are models that approximate reality. If we can identify a distribution that closely models the the outcome we care about then that’s really powerful — like we saw above, with just a few parameters (n and p), we could model the profits produced by hundreds of people.
  • However, it is also important to be cognizant of where and how the model fails to match the reality of our situation — this way we know in what situations our model is likely to underperform.

Cheers!

Which NumPy method is used to simulate a binomial distribution?

binomial. Draw samples from a binomial distribution.

How do you do binomial distribution in Python?

Binomial test in Python (Example).
Import the function. from scipy. stats import binomtest. Python..
Define the number of successes (k), define the number of trials (n), and define the expected probability success (p). k=5 n=12 p=0.17. Python..
Perform the binomial test in Python. res = binomtest(k, n, p) print(res. pvalue).

What is binom in Python?

pmf function is a part of Python's SciPy library and is used to model probabilistic experiments with the help of binomial distribution.

What is binary distribution in Python?

The binomial distribution model deals with finding the probability of success of an event which has only two possible outcomes in a series of experiments. For example, tossing of a coin always gives a head or a tail.