Monte carlo simulation correlated variables python

Question

import typing

import numpy as np
import scipy.stats


def run_gaussian_copula_simulation_and_get_samples(
        ppfs: typing.List[typing.Callable[[np.ndarray], np.ndarray]],  # List of $num_dims percentile point functions
        cov_matrix: np.ndarray,  # covariance matrix, shape($num_dims, $num_dims)
        num_samples: int,  # number of random samples to draw
) -> np.ndarray:
    num_dims = len(ppfs)

    # Draw random samples from multidimensional normal distribution -> shape($num_samples, $num_dims)
    ran = np.random.multivariate_normal(np.zeros(num_dims), cov_matrix, (num_samples,), check_valid="raise")

    # Transform back into a uniform distribution, i.e. the space [0,1]^$num_dims
    U = scipy.stats.norm.cdf(ran)

    # Apply ppf to transform samples into the desired distribution
    # Each row of the returned array will represent one random sample -> access with a[i]
    return np.array([ppfs[i](U[:, i]) for i in range(num_dims)]).T  # shape($num_samples, $num_dims)

# Example 1. Uncorrelated data, i.e. both distributions are independent
f1 = run_gaussian_copula_simulation_and_get_samples(
    [lambda x: scipy.stats.norm.ppf(x, loc=100, scale=15), scipy.stats.norm.ppf],
    [[1, 0], [0, 1]],
    6
)
# Example 2. Completely correlated data, i.e. both percentiles match
f2 = run_gaussian_copula_simulation_and_get_samples(
    [lambda x: scipy.stats.norm.ppf(x, loc=100, scale=15), scipy.stats.norm.ppf],
    [[1, 1], [1, 1]],
    6
)
np.set_printoptions(suppress=True)  # suppress scientific notation
print(f1)
print(f2)

A few note on this function. np.random.multivariate_normal does a lot of the heavy lifting for us, note that in particular we do not need to decompose the correlation matrix. ppfs is passed as a list of functions which each have one input and one return value.

Table of Contents Show

Revenue Model
Generic Model 1
Generic Model 2
Final Thoughts on Input Variable Correlation
Which variables can you simulate with Monte Carlo simulation?
How do you construct a correlated random variable?
What data do you need for a Monte Carlo simulation?
What is Monte Carlo method in Python?

In my particular use case I needed to generate multivariate-t-distributed random variables (in addition to normal-distributed ones), consult this answer on how to do that: https://stackoverflow.com/a/41967819/2111778. Additionally, I used scipy.stats.t.cdf for the back-transform part.

In my particular use case the desired distributions were empirical distributions representing expected financial loss. The final data points then had to be added together to get a total financial loss across all of the individual-but-correlated financial events. Thus, np.array(...).T is actually replaced by sum(...) in my code base.

When conducting a Monte Carlo simulation, correlation among input variables is an important factor to consider. If input random variables are treated as independent, when they are actually correlated, risk can be under or over estimated.

Let's think about how this occurs, when two input variables have positive correlation, the value for each should be relatively high in a given simulation iteration and both relatively low in another iteration. For negatively correlated inputs, one should be at the high end of possible values while the other should be at the low end for a given iteration.

We will consider three simple examples to illustrate how input variable correlation affects simulation output.

Revenue Model

Consider a very simple model of revenue that has demand and price as inputs. Demand and price are negatively correlated. When price increases, demand decreases and vice versa. The model is below with formulas shown. Of course, knowing if correlation is present may be a difficult question to answer, but for our example we will assume we know the correlation.

We will run two simulations to compare independent inputs versus correlated inputs. For demand, we will assume a triangular distribution with 10 as worst case, 20 as most likely, and 35 as best case. For price we will assume a triangular distribution with 125 as worst case, 150 as most likely, and 190 as best case.

In the first simulation we will assume that demand and price are independent.

Revenue Model Without Correlation

Revenue Results Without Correlation

In the second simulation we will assume a Spearman rank correlation coefficient of -.5 between demand and price. We will use rank order correlation to simulate the input variable correlation.

Revenue Model With Correlation Matrix

Revenue Results With Correlation

After running the simulations we see that independent inputs resulted in a wider spread of outcomes.

Independent input simulation revenue range: 1375 to 6443

Correlated input simulation revenue range: 1588 to 5750

As expected, the revenue variance using independent inputs is greater as well. In this example, if we assume independent inputs we would be over estimating risk.

Generic Model 1

In the second example, we have a generic model with two random variables. The output is the product of the two random variables. One random variable follows the logistic distribution and the second is normally distributed.

Generic Model 1

First, we simulate the model with the random variables being independent.

Generic 1 Results Without Correlation

Next, we'll simulate assuming the random variables are correlated with a 0.5 correlation coefficient.

Generic 1 Results With Correlation

After running the simulations we have an interesting difference.

Output range with independent inputs: -11.35 to 105.48

Output range with correlated inputs: -4.37 to 107.96

Note that the mean of each simulation is nearly identical. In this example we have a risk shift to the right. Without correlation, the range of outcomes is still larger.

Generic Model 2

In the third example, we have a generic model with two random variables. The output is the product of the two random variables. Both random variables are normally distributed.

Generic Model 2

First, we simulate the model with the random variables being independent.

Generic 2 Results Without Correlation

Next, we'll simulate assuming the random variables are correlated with a 0.5 correlation coefficient.

Generic 2 results with correlation

After running the simulations we see that there is more risk in the correlated simulation.

Output range with independent inputs: 29.65 to 73.32

Output range with correlated inputs: 26.87 to 78.96

Note that the mean of each simulation is nearly identical. In this example if we assumed independent inputs, we would have underestimated risk.

Final Thoughts on Input Variable Correlation

We've looked at three examples where input variable correlation affects the outcome differently. By assuming independence of inputs, we could be under or over estimating risk. Also, there could also be a shift in the outcomes to the left or right.

Are you looking for a Monte Carlo simulation add-in packed with features at a reasonable price? If so, visit the Simulation Master product page.

To learn about input variable correlation methods in Simulation Master, check out this article.

Excel is a registered trademark of Microsoft Corporation. Used with permission from Microsoft.

Which variables can you simulate with Monte Carlo simulation?

A Monte Carlo simulation takes the variable that has uncertainty and assigns it a random value.

How do you construct a correlated random variable?

To generate correlated normally distributed random samples, one can first generate uncorrelated samples, and then multiply them by a matrix C such that CCT=R, where R is the desired covariance matrix. C can be created, for example, by using the Cholesky decomposition of R, or from the eigenvalues and eigenvectors of R.

What data do you need for a Monte Carlo simulation?

The Monte Carlo simulation is a mathematical numerical method that uses random draws to perform calculations and complex problems..

Step 1: Dice Rolling Events. ... .

Step 2: Range of Outcomes. ... .

Step 3: Conclusions. ... .

Step 4: Number of Dice Rolls. ... .

Step 5: Simulation..

What is Monte Carlo method in Python?

A Monte Carlo simulation is a type of computational algorithm that estimates the probability of occurrence of an undeterminable event due to the involvement of random variables. The algorithm relies on repeated random sampling in an attempt to determine the probability.

kode python