import typing import numpy as np import scipy.stats def run_gaussian_copula_simulation_and_get_samples( ppfs: typing.List[typing.Callable[[np.ndarray], np.ndarray]], # List of $num_dims percentile point functions cov_matrix: np.ndarray, # covariance matrix, shape($num_dims, $num_dims) num_samples: int, # number of random samples to draw ) -> np.ndarray: num_dims = len(ppfs) # Draw random samples from multidimensional normal distribution -> shape($num_samples, $num_dims) ran = np.random.multivariate_normal(np.zeros(num_dims), cov_matrix, (num_samples,), check_valid="raise") # Transform back into a uniform distribution, i.e. the space [0,1]^$num_dims U = scipy.stats.norm.cdf(ran) # Apply ppf to transform samples into the desired distribution # Each row of the returned array will represent one random sample -> access with a[i] return np.array([ppfs[i](U[:, i]) for i in range(num_dims)]).T # shape($num_samples, $num_dims) # Example 1. Uncorrelated data, i.e. both distributions are independent f1 = run_gaussian_copula_simulation_and_get_samples( [lambda x: scipy.stats.norm.ppf(x, loc=100, scale=15), scipy.stats.norm.ppf], [[1, 0], [0, 1]], 6 ) # Example 2. Completely correlated data, i.e. both percentiles match f2 = run_gaussian_copula_simulation_and_get_samples( [lambda x: scipy.stats.norm.ppf(x, loc=100, scale=15), scipy.stats.norm.ppf], [[1, 1], [1, 1]], 6 ) np.set_printoptions(suppress=True) # suppress scientific notation print(f1) print(f2)
A few note on this function. np.random.multivariate_normal does a lot of the heavy lifting for us, note that in particular we do not need to decompose the correlation matrix. ppfs is passed as a list of functions which each have one input and one return value.
In my particular use case I needed to generate multivariate-t-distributed random variables (in addition to normal-distributed ones), consult this answer on how to do that: //stackoverflow.com/a/41967819/2111778. Additionally, I used scipy.stats.t.cdf for the back-transform part.
In my particular use case the desired distributions were empirical distributions representing expected financial loss. The final data points then had to be added together to get a total financial loss across all of the individual-but-correlated financial events. Thus, np.array(...).T is actually replaced by sum(...) in my code base.
When conducting a Monte Carlo simulation, correlation among input variables is an important factor to consider. If input random variables are treated as independent, when they are actually correlated, risk can be under or over estimated. Let's think about
how this occurs, when two input variables have positive correlation, the value for each should be relatively high in a given simulation iteration and both relatively low in another iteration. For negatively correlated inputs, one should be at the high end of possible values while the other should be at the low end for a given iteration. We will consider three simple examples to illustrate how input variable correlation affects simulation output. Consider a very simple model of revenue that has demand and price as inputs. Demand and price are negatively correlated. When price increases, demand decreases and vice versa. The model is below with formulas shown. Of course, knowing if correlation is present may be a difficult question to answer, but for our example we will assume we know the correlation. We will run two simulations to compare independent inputs
versus correlated inputs. For demand, we will assume a triangular distribution with 10 as worst case, 20 as most likely, and 35 as best case. For price we will assume a triangular distribution with 125 as worst case, 150 as most likely, and 190 as best case. In the first simulation we will assume that demand and price are independent.
Revenue Model
Revenue Model Without Correlation
Revenue Results Without Correlation
In the second simulation we will assume a Spearman rank correlation coefficient of -.5 between demand and price. We will use rank order correlation to simulate the input variable correlation.
Revenue Model With Correlation Matrix
Revenue Results With Correlation
After running the simulations we see that independent inputs resulted in a wider spread of outcomes.
Independent input simulation revenue range: 1375 to 6443
Correlated input simulation revenue range: 1588 to 5750
As expected, the revenue variance using independent inputs is greater as well. In this example, if we assume independent inputs we would be over estimating risk.
Generic Model 1
In the second example, we have a generic model with two random variables. The output is the product of the two random variables. One random variable follows the logistic distribution and the second is normally distributed.
Generic Model 1
First, we simulate the model with the random variables being independent.
Generic 1 Results Without Correlation
Next, we'll simulate assuming the random variables are correlated with a 0.5 correlation coefficient.
Generic 1 Results With Correlation
After running the simulations we have an interesting difference.
Output range with independent inputs: -11.35 to 105.48
Output range with correlated inputs: -4.37 to 107.96
Note that the mean of each simulation is nearly identical. In this example we have a risk shift to the right. Without correlation, the range of outcomes is still larger.
Generic Model 2
In the third example, we have a generic model with two random variables. The output is the product of the two random variables. Both random variables are normally distributed.
Generic Model 2
First, we simulate the model with the random variables being independent.
Generic 2 Results Without Correlation
Next, we'll simulate assuming the random variables are correlated with a 0.5 correlation coefficient.
Generic 2 results with correlation
After running the simulations we see that there is more risk in the correlated simulation.
Output range with independent inputs: 29.65 to 73.32
Output range with correlated inputs: 26.87 to 78.96
Note that the mean of each simulation is nearly identical. In this example if we assumed independent inputs, we would have underestimated risk.
Final Thoughts on Input Variable Correlation
We've looked at three examples where input variable correlation affects the outcome differently. By assuming independence of inputs, we could be under or over estimating risk. Also, there could also be a shift in the outcomes to the left or right.
Are you looking for a Monte Carlo simulation add-in packed with features at a reasonable price? If so, visit the Simulation Master product page.
To learn about input variable correlation methods in Simulation Master, check out this article.
Excel is a registered trademark of Microsoft Corporation. Used with permission from Microsoft.