Cara menggunakan calculate probability in python

Why loops are integral to building thorough statistical models

Cara menggunakan calculate probability in python

Source: Photo by stevepb from Pixabay

Loops are quite an important part of learning how to code in Python, and this is particularly true when it comes to implementing calculations across a large array of numbers.

All too often, the temptation for statisticians and data scientists is to skip over the more mundane aspects of coding such as this — we assume that software engineers can simply reformat the code in the proper way.

However, there are many situations where the person writing the code needs to understand both the statistics underlying the model as well as how to iterate the model output through loops — these two processes simply cannot be developed independently.

Here is one example of how the use of for loops in Python can greatly enhance statistical analysis.

Background to Cumulative Binomial Probabilities

In conducting probability analysis, the two variables that take account of the chance of an event happening are N (number of observations) and λ (lambda — our hit rate/chance of occurrence in a single interval). When we talk about a cumulative binomial probability distribution, we mean to say that the greater the number of trials, the higher the overall probability of an event occurring.

probability = 1 — ((1 — λ)^N)

For instance, the odds of rolling a number 6 on a fair die is 1/6. However, suppose that same die is rolled 10 times:

1 — ((1–0.1667)^10) = 0.8385

We see that the probability of rolling a number 6 now increases to 83.85%.

Based on the law of large numbers, the larger the number of trials; the larger the probability of an event happening even if the probability within a single trial is very low. So, let us generate a cumulative binomial probability to demonstrate how probability increases given an increase in the number of trials.

Model without Loops

Here is a script that calculates the cumulative binomial probabilities without the use of loops.

import numpy as np
import pandas as pd
l = 0.02
m = 0.04
n = 0.06
p=np.arange(0, 100, 1)h = 1 - l
j = 1 - m
k = 1 - n
q = 1-(h**p)
r = 1-(j**p)
s = 1-(k**p)
  • l, m, and n represent three individual probabilities.
  • p represents the number of trials (up to 100)
  • q, r, and s represent the cumulative binomial probabilities, i.e. the increase in probability for every unit increase in the number of trials

Here is a sample of the generated output:

>>> qarray([0., 0.02, 0.0396, 0.058808, 0.07763184, 0.0960792, 0.11415762, 0.13187447, 0.14923698, 0.16625224, ..., 0.8532841, 0.85621842, 0.85909405, 0.86191217, 0.86467392])>>> rarray([0., 0.04, 0.0784, 0.115264, 0.15065344, 0.1846273, 0.21724221, 0.24855252, 0.27861042, 0.307466, ..., 0.97930968, 0.9801373, 0.9809318, 0.98169453, 0.98242675])>>> sarray([0., 0.06, 0.1164, 0.169416, 0.21925104, 0.26609598, 0.31013022, 0.35152241, 0.39043106, 0.4270052, 0.46138489, 0.49370179, 0.52407969, 0.5526349, 0.57947681, ..., 0.99720008, 0.99736807, 0.99752599, 0.99767443, 0.99781396])

We see that for the probabilities q, r, and s - the cumulative probabilities increase at different rates for a given number of trials.

That said, developing this model without using loops has a key disadvantage — namely that the individual probabilities can only take on the values as specified by the end user. What if we wish to iterate from 0.01 to 0.99 in succession?

Model with Loops: List Comprehensions and 2D Arrays

This time, the model will be built by using one individual probability variable that iterates through values 0.01 to 0.99, and the cumulative binomial probability will be calculated using 100 trials.

import numpy as np
import pandas as pd
# List comprehension
probability=[x*0.01 for x in range(1,100)]
probability=np.array(probability)
probability
h = 1 - probability
h
# Construct 2D array
result = 1-h[:, np.newaxis] ** np.arange(1,100)
result

The output of the generated probability variable is as follows:

>>> probabilityarray([0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, ... 0.96, 0.97, 0.98, 0.99])

Note that for the probability variable, it is necessary to use list comprehensions. This is because Python’s range() function can only work with integers, not float values. More information is provided at the following Stack Overflow guide.

You will note when looking at the last two lines in the code that a 2D array is constructed to calculate the cumulative binomial probabilities. When originally attempting to calculate these in lieu of using a 2D array, the arrays were calculated — but the values were not in the desired order.

>>> for i in range(1,100,1):
>>> print(1-(h**i))
[0.01 0.02]
[0.0199 0.0396]
...
[0.62653572 0.86191217]
[0.63027036 0.86467392]

Instead, we wish to have the arrays in the order [0.01, 0.0199, …, 0.62653572, 0.63027036] and [0.02, 0.0396, …, 0.86191217, 0.86467392].

As explained in the following Reddit thread, transposing the above will not be of any use since h is a one-dimensional array.

An alternative is to calculate a 2D array and then print it directly:

>>> result = 1-h[:, np.newaxis] ** np.arange(1,100)
>>> result
array([[0.01, 0.0199, 0.029701, ..., 0.62653572, 0.63027036], [0.02, 0.0396, 0.058808, ..., 0.86191217, 0.86467392], [0.03, 0.0591, 0.087327, ..., 0.94946061, 0.9509768], ..., [0.97, 0.9991, 0.999973, ..., 1., 1., 1.], [0.98, 0.9996, 0.999992, ..., 1., 1., 1.], [0.99, 0.9999, 0.999999, ..., 1., 1.,1.]])

As can be seen from the above, the cumulative binomial probabilities from 0.01 right up to 0.99 is calculated.

Using for loops in this manner has allowed us to iterate from 0.01 to 0.99 automatically — attempting to do this manually would have been far too cumbersome and error-prone.

Conclusion

In this example, you have seen how to:

  • Calculative cumulative binomial probabilities in Python
  • Use for loops to iterate across a large range of values
  • Employ list comprehensions to work with a range of float values
  • Devise 2D arrays when unable to transpose values contained in a 1D array

Many thanks for your time, and any questions or feedback are greatly appreciated.

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice. The findings and interpretations in this article are those of the author and are not endorsed by or affiliated with any third-party mentioned in this article.