2 Roll the Dice: Monte Carlo Simulation

“Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” — John von Neumann

Here is the core insight behind Monte Carlo simulation: instead of guessing a single answer, roll the dice thousands of times and let the distribution speak for itself. The method was named after the famous casino in Monaco, not because it involves gambling, but because it involves random sampling. If you can describe your uncertainty with a probability distribution, Monte Carlo can turn that uncertainty into a full picture of possible project outcomes.

Monte Carlo simulation turns that randomness into insight.

Learning Objectives

By the end of this chapter, you will be able to:

Explain the five steps of a Monte Carlo simulation
Run mcs() with multiple distributions and a correlation matrix
Interpret P50, P80, and P95 percentiles in project schedule context
Compute a schedule contingency reserve from simulation output
Identify which tasks drive the most variance using a tornado chart

2.1 How Monte Carlo Simulation Works

Monte Carlo (MC) simulation is a quantitative risk analysis technique that models uncertainty by running thousands of simulated project outcomes. Instead of using single-point estimates for task durations or costs, each task is described by a probability distribution. The simulation draws random samples from these distributions, sums them to get a total outcome, and repeats this thousands of times to build a full picture of possible project results (Vose 2008).

The Five Steps of Monte Carlo Simulation

Model Definition: Define project tasks and the variables that drive uncertainty (durations, costs).
Assign Distributions: Choose a probability distribution for each uncertain variable (e.g., triangular for tasks with optimistic/likely/pessimistic estimates).
Specify Correlations: If tasks are related (e.g., both affected by a shared risk), set a correlation coefficient between them.
Run Simulation: Draw random samples and compute the total outcome for each iteration (typically 10,000+).
Analyze Results: Summarize the distribution of totals using percentiles, mean, and variance.

2.2 Example

library(PRA)
library(ggplot2)
set.seed(42)

We model a 3-task project (in weeks). Task A follows a normal distribution, Task B has a triangular distribution (optimistic/most-likely/pessimistic), and Task C is uniformly distributed.

num_simulations <- 10000
task_distributions <- list(
  list(type = "normal", mean = 10, sd = 2),        # Task A
  list(type = "triangular", a = 5, b = 10, c = 15), # Task B
  list(type = "uniform", min = 8, max = 12)          # Task C
)

2.2.1 Correlation Matrix

Tasks often move together due to shared resources or external risks. The correlation matrix captures this. Values range from −1 (perfectly opposed) to +1 (perfectly aligned); 0 means independent. Here Tasks A and B have moderate positive correlation (0.5), meaning delays in one tend to coincide with delays in the other.

correlation_matrix <- matrix(c(
  1.0, 0.5, 0.3,
  0.5, 1.0, 0.4,
  0.3, 0.4, 1.0
), nrow = 3, byrow = TRUE)

2.2.2 Run the Simulation

results <- mcs(num_simulations, task_distributions, correlation_matrix)

cat("Mean Total Duration:  ", round(results$total_mean, 2), "weeks\n")

Mean Total Duration:   38.6 weeks

cat("Variance of Duration: ", round(results$total_variance, 2), "\n")

Variance of Duration:  20.01

cat("Std Dev of Duration:  ", round(results$total_sd, 2), "weeks\n")

Std Dev of Duration:   4.47 weeks

2.2.3 Distribution of Outcomes

The histogram below shows all 10,000 simulated total durations. The overlaid density curve reveals the shape of the distribution, notice the slight right skew from the triangular task.

hist_data <- results$total_distribution

hist(hist_data,
  breaks = 50, freq = FALSE,
  main = "Monte Carlo Simulation: Total Project Duration",
  xlab = "Total Duration (weeks)", col = "steelblue", border = "white"
)
lines(density(hist_data), col = "tomato", lwd = 2)
abline(v = results$total_mean, col = "black", lty = 2, lwd = 1.5)
legend("topright",
  legend = c("Density", paste0("Mean = ", round(results$total_mean, 1), " wks")),
  col = c("tomato", "black"), lty = c(1, 2), lwd = 2, bty = "n"
)

Monte Carlo simulation results showing the full distribution of possible total project durations.

2.3 Interpreting Percentiles

The mcs() function returns key percentiles of the total distribution. These answer the question: “What duration has X% probability of not being exceeded?”

knitr::kable(
  data.frame(
    Percentile = c("P5", "P50 (Median)", "P95"),
    Duration   = round(results$percentiles, 1),
    Meaning    = c(
      "5% chance of finishing this fast or faster",
      "Equal chance of finishing above or below this",
      "95% chance of finishing by this date"
    )
  ),
  caption = "Simulation Percentiles"
)

Simulation Percentiles
	Percentile	Duration	Meaning
5%	P5	31.2	5% chance of finishing this fast or faster
50%	P50 (Median)	38.6	Equal chance of finishing above or below this
95%	P95	46.0	95% chance of finishing by this date

The P50 is your base estimate, the median, not the mean (though for symmetric distributions they’re close). The P95 is the date you can be 95% confident in. The gap between P50 and P95 is your schedule risk.

2.4 Contingency Analysis

Contingency is the buffer added above the base estimate to cover uncertainty. A common approach is to use the difference between the P95 (or chosen confidence level) outcome and the P50 (base estimate) (Project Management Institute 2021).

contingency_val <- contingency(results, phigh = 0.95, pbase = 0.50)
cat("Schedule contingency (P95 − P50):", round(contingency_val, 2), "weeks\n")

Schedule contingency (P95 − P50): 7.4 weeks

cat(
  "There is a 95% chance the project will finish within",
  round(results$percentiles["95%"], 1), "weeks.\n"
)

There is a 95% chance the project will finish within 46 weeks.

Adding this contingency to the P50 estimate gives a 95% confidence of on-time delivery. Teams with low risk tolerance should use P95; those with higher tolerance might use P80.

2.5 Sensitivity Analysis

Sensitivity analysis identifies which tasks drive the most variability in the total outcome, the tasks that deserve the most management attention. The result is a tornado chart: the wider the bar, the bigger the impact on total schedule risk.

sensitivity_results <- sensitivity(task_distributions, correlation_matrix)

sens_data <- data.frame(
  Task        = c("Task A (Normal)", "Task B (Triangular)", "Task C (Uniform)"),
  Sensitivity = sensitivity_results
)

p <- ggplot2::ggplot(
  sens_data,
  ggplot2::aes(x = Sensitivity, y = reorder(Task, Sensitivity))
) +
  ggplot2::geom_col(fill = "steelblue") +
  ggplot2::geom_text(ggplot2::aes(label = round(Sensitivity, 3)),
    hjust = -0.1, size = 3.5
  ) +
  ggplot2::labs(
    title = "Tornado Chart: Task Sensitivity",
    x     = "Sensitivity Coefficient",
    y     = NULL
  ) +
  ggplot2::xlim(0, max(sensitivity_results) * 1.2) +
  ggplot2::theme_minimal()

print(p)

Tornado chart showing each task’s contribution to total schedule variance. Focus risk mitigation on the tallest bars.

Tasks with larger bars contribute more variance to the total. Prioritize risk mitigation efforts on the highest-sensitivity task. Even a small reduction in its uncertainty can meaningfully reduce overall project risk.

2.6 Summary

Key Takeaways

Monte Carlo simulation runs thousands of iterations to produce a full distribution of project outcomes, not just a single number.
Percentiles (P50, P80, P95) translate simulation output into concrete schedule or cost commitments at a stated confidence level.
Contingency = P_high − P50: the buffer needed to achieve a given confidence level above the base estimate.
Sensitivity analysis (tornado chart) reveals which tasks drive the most variance, so focus mitigation there first.
Positive correlation between tasks always increases total variance; ignoring it leads to underestimating risk.

For early-stage estimates when full distributions are unavailable, the Second Moment Method (Chapter 3) provides a fast analytical alternative using only means and variances.

2.7 Exercises

Conceptual check. What does increasing the number of simulations from 10,000 to 100,000 do to the result? Does it change the mean? The P95? The smoothness of the histogram? Try it.
New task. Add a 4th task with a Beta distribution (hint: look at ?mcs to see if Beta is supported, or use a normal approximation). Run mcs() with your updated task list and compare the P80 to the 3-task version. Did the project get riskier or safer?
Correlation experiment. Re-run the simulation with all off-diagonal correlation values set to 0 (fully independent tasks). Then re-run with all off-diagonal values set to 0.9 (strongly correlated). How does total variance change in each case? Explain intuitively why strong positive correlation increases portfolio risk. ★
Choosing a confidence level. Your client asks for a “90% confident” delivery date. What percentile do you use, and what contingency does that imply? Recalculate using contingency(results, phigh = 0.90, pbase = 0.50).
Real-world application. ★ Think of a project you know. Identify three tasks with significant uncertainty. Estimate (or guess) a distribution type and rough parameters for each. Run mcs() and report the P50 and P80 schedule. What’s the contingency at P80?

# Roll the Dice: Monte Carlo Simulation {#sec-mcs} > *"Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin."* > — John von Neumann Here is the core insight behind Monte Carlo simulation: instead of guessing a single answer, roll the dice thousands of times and let the distribution speak for itself. The method was named after the famous casino in Monaco, not because it involves gambling, but because it involves random sampling. If you can describe your uncertainty with a probability distribution, Monte Carlo can turn that uncertainty into a full picture of possible project outcomes. Monte Carlo simulation turns that randomness into insight. ::: {.callout-note icon=false} ## Learning Objectives By the end of this chapter, you will be able to: 1. Explain the five steps of a Monte Carlo simulation 2. Run `mcs()` with multiple distributions and a correlation matrix 3. Interpret P50, P80, and P95 percentiles in project schedule context 4. Compute a schedule contingency reserve from simulation output 5. Identify which tasks drive the most variance using a tornado chart ::: ## How Monte Carlo Simulation Works Monte Carlo (MC) simulation is a quantitative risk analysis technique that models uncertainty by running thousands of simulated project outcomes. Instead of using single-point estimates for task durations or costs, each task is described by a probability distribution. The simulation draws random samples from these distributions, sums them to get a total outcome, and repeats this thousands of times to build a full picture of possible project results [@vose2008]. ::: {.callout-note} ## The Five Steps of Monte Carlo Simulation 1. **Model Definition**: Define project tasks and the variables that drive uncertainty (durations, costs). 2. **Assign Distributions**: Choose a probability distribution for each uncertain variable (e.g., triangular for tasks with optimistic/likely/pessimistic estimates). 3. **Specify Correlations**: If tasks are related (e.g., both affected by a shared risk), set a correlation coefficient between them. 4. **Run Simulation**: Draw random samples and compute the total outcome for each iteration (typically 10,000+). 5. **Analyze Results**: Summarize the distribution of totals using percentiles, mean, and variance. ::: ## Example ```{r setup} #| code-fold: false library(PRA) library(ggplot2) set.seed(42) ``` We model a 3-task project (in weeks). Task A follows a normal distribution, Task B has a triangular distribution (optimistic/most-likely/pessimistic), and Task C is uniformly distributed. ```{r} num_simulations <- 10000 task_distributions <- list( list(type = "normal", mean = 10, sd = 2), # Task A list(type = "triangular", a = 5, b = 10, c = 15), # Task B list(type = "uniform", min = 8, max = 12) # Task C ) ``` ### Correlation Matrix Tasks often move together due to shared resources or external risks. The correlation matrix captures this. Values range from −1 (perfectly opposed) to +1 (perfectly aligned); 0 means independent. Here Tasks A and B have moderate positive correlation (0.5), meaning delays in one tend to coincide with delays in the other. ```{r} correlation_matrix <- matrix(c( 1.0, 0.5, 0.3, 0.5, 1.0, 0.4, 0.3, 0.4, 1.0 ), nrow = 3, byrow = TRUE) ``` ### Run the Simulation ```{r} results <- mcs(num_simulations, task_distributions, correlation_matrix) ``` ```{r} cat("Mean Total Duration: ", round(results$total_mean, 2), "weeks\n") cat("Variance of Duration: ", round(results$total_variance, 2), "\n") cat("Std Dev of Duration: ", round(results$total_sd, 2), "weeks\n") ``` ### Distribution of Outcomes The histogram below shows all 10,000 simulated total durations. The overlaid density curve reveals the shape of the distribution, notice the slight right skew from the triangular task. ```{r} #| fig-cap: "Monte Carlo simulation results showing the full distribution of possible total project durations." hist_data <- results$total_distribution hist(hist_data, breaks = 50, freq = FALSE, main = "Monte Carlo Simulation: Total Project Duration", xlab = "Total Duration (weeks)", col = "steelblue", border = "white" ) lines(density(hist_data), col = "tomato", lwd = 2) abline(v = results$total_mean, col = "black", lty = 2, lwd = 1.5) legend("topright", legend = c("Density", paste0("Mean = ", round(results$total_mean, 1), " wks")), col = c("tomato", "black"), lty = c(1, 2), lwd = 2, bty = "n" ) ``` ## Interpreting Percentiles The `mcs()` function returns key percentiles of the total distribution. These answer the question: *"What duration has X% probability of not being exceeded?"* ```{r} knitr::kable( data.frame( Percentile = c("P5", "P50 (Median)", "P95"), Duration = round(results$percentiles, 1), Meaning = c( "5% chance of finishing this fast or faster", "Equal chance of finishing above or below this", "95% chance of finishing by this date" ) ), caption = "Simulation Percentiles" ) ``` The P50 is your base estimate, the median, not the mean (though for symmetric distributions they're close). The P95 is the date you can be 95% confident in. The gap between P50 and P95 is your schedule risk. ## Contingency Analysis Contingency is the buffer added above the base estimate to cover uncertainty. A common approach is to use the difference between the P95 (or chosen confidence level) outcome and the P50 (base estimate) [@pmbok2021]. ```{r} contingency_val <- contingency(results, phigh = 0.95, pbase = 0.50) cat("Schedule contingency (P95 − P50):", round(contingency_val, 2), "weeks\n") cat( "There is a 95% chance the project will finish within", round(results$percentiles["95%"], 1), "weeks.\n" ) ``` Adding this contingency to the P50 estimate gives a 95% confidence of on-time delivery. Teams with low risk tolerance should use P95; those with higher tolerance might use P80. ## Sensitivity Analysis Sensitivity analysis identifies which tasks drive the most variability in the total outcome, the tasks that deserve the most management attention. The result is a **tornado chart**: the wider the bar, the bigger the impact on total schedule risk. ```{r} #| fig-cap: "Tornado chart showing each task's contribution to total schedule variance. Focus risk mitigation on the tallest bars." sensitivity_results <- sensitivity(task_distributions, correlation_matrix) sens_data <- data.frame( Task = c("Task A (Normal)", "Task B (Triangular)", "Task C (Uniform)"), Sensitivity = sensitivity_results ) p <- ggplot2::ggplot( sens_data, ggplot2::aes(x = Sensitivity, y = reorder(Task, Sensitivity)) ) + ggplot2::geom_col(fill = "steelblue") + ggplot2::geom_text(ggplot2::aes(label = round(Sensitivity, 3)), hjust = -0.1, size = 3.5 ) + ggplot2::labs( title = "Tornado Chart: Task Sensitivity", x = "Sensitivity Coefficient", y = NULL ) + ggplot2::xlim(0, max(sensitivity_results) * 1.2) + ggplot2::theme_minimal() print(p) ``` Tasks with larger bars contribute more variance to the total. Prioritize risk mitigation efforts on the highest-sensitivity task. Even a small reduction in its uncertainty can meaningfully reduce overall project risk. ## Summary ::: {.callout-tip icon=false} ## Key Takeaways - Monte Carlo simulation runs thousands of iterations to produce a full distribution of project outcomes, not just a single number. - **Percentiles** (P50, P80, P95) translate simulation output into concrete schedule or cost commitments at a stated confidence level. - **Contingency** = P_high − P50: the buffer needed to achieve a given confidence level above the base estimate. - **Sensitivity analysis** (tornado chart) reveals which tasks drive the most variance, so focus mitigation there first. - Positive correlation between tasks *always* increases total variance; ignoring it leads to underestimating risk. ::: For early-stage estimates when full distributions are unavailable, the Second Moment Method (@sec-smm) provides a fast analytical alternative using only means and variances. ## Exercises 1. **Conceptual check.** What does increasing the number of simulations from 10,000 to 100,000 do to the result? Does it change the mean? The P95? The smoothness of the histogram? Try it. 2. **New task.** Add a 4th task with a Beta distribution (hint: look at `?mcs` to see if Beta is supported, or use a normal approximation). Run `mcs()` with your updated task list and compare the P80 to the 3-task version. Did the project get riskier or safer? 3. **Correlation experiment.** Re-run the simulation with all off-diagonal correlation values set to 0 (fully independent tasks). Then re-run with all off-diagonal values set to 0.9 (strongly correlated). How does total variance change in each case? Explain intuitively why strong positive correlation increases portfolio risk. ★ 4. **Choosing a confidence level.** Your client asks for a "90% confident" delivery date. What percentile do you use, and what contingency does that imply? Recalculate using `contingency(results, phigh = 0.90, pbase = 0.50)`. 5. **Real-world application.** ★ Think of a project you know. Identify three tasks with significant uncertainty. Estimate (or guess) a distribution type and rough parameters for each. Run `mcs()` and report the P50 and P80 schedule. What's the contingency at P80?