3 May I Have a (Second) Moment? The Second Moment Method

“All models are wrong, but some are useful.” — George E. P. Box

The SMM is a deliberately simplified model. It assumes the total project duration is normally distributed. It ignores the shape of individual task distributions. And it is, in Box’s sense, useful, fast enough to run before a meeting ends, and honest enough to use in a risk report.

Sometimes you don’t need a full Monte Carlo simulation. Sometimes you need an answer in thirty seconds, not thirty minutes. You have means and variances, a correlation matrix, and a deadline. Enter the Second Moment Method, the analyst’s equivalent of a napkin calculation that is actually defensible.

The name comes from statistics: the “first moment” of a distribution is its mean; the “second moment” (more precisely, the central second moment) is its variance, and the Second Moment Method is named for exactly that. It’s the fastest way to get from “I have means and variances” to “I have a project risk estimate.”

Learning Objectives

By the end of this chapter, you will be able to:

Explain when the Second Moment Method is preferable to Monte Carlo simulation
Apply the SMM formulas for total mean and total variance by hand
Run smm() and interpret the output
Construct a 95% confidence interval for total project duration
Compare SMM and MCS results and understand where they diverge

3.1 When to Use SMM

SMM vs. Monte Carlo: The Decision Rule

Use SMM when:

You need results in seconds, not minutes
You only have mean and variance estimates (no full distribution)
Tasks are approximately normal and correlations are well-characterized
You want a quick sanity check before investing in a full MCS

Use Monte Carlo (Chapter 2) when:

Tasks have non-normal distributions (triangular, uniform, lognormal)
You need accurate tail behavior (P90, P95, P99)
Skewness matters, as asymmetric distributions require simulation to capture correctly

Think of SMM as a reconnaissance tool. It tells you the territory before you commit to the full expedition.

3.2 How It Works

For a project with $n$ tasks, SMM computes:

Total mean: Sum of individual task means: \[E[X] = \sum_{i=1}^{n} E[X_i]\]

Total variance: Sum of variances plus twice the sum of all pairwise covariances: \[\text{Var}(X) = \sum_{i=1}^{n} \text{Var}(X_i) + 2 \sum_{i<j} \text{Cov}(X_i, X_j)\]

Covariance: Derived from the correlation matrix: \[\text{Cov}(X_i, X_j) = \rho_{ij} \cdot \sigma_i \cdot \sigma_j\]

By the Central Limit Theorem, the total is approximately normally distributed when there are many tasks, so we get confidence intervals for free (Benjamin and Cornell 2000).

3.3 Example

library(PRA)
set.seed(42)

We analyze a 3-task project with task durations in weeks. Each task has a known mean and variance, and correlations between tasks are provided.

task_means <- c(10, 15, 20)  # Expected duration for each task (weeks)
task_vars  <- c(4, 9, 16)    # Variance of each task duration
cor_mat <- matrix(c(
  1.0, 0.5, 0.3,
  0.5, 1.0, 0.4,
  0.3, 0.4, 1.0
), nrow = 3, byrow = TRUE)

result <- smm(task_means, task_vars, cor_mat)
cat("Total Mean Duration:  ", round(result$total_mean, 2), "weeks\n")

Total Mean Duration:   45 weeks

cat("Total Variance:       ", round(result$total_var, 2), "\n")

Total Variance:        49.4

cat("Total Std Deviation:  ", round(result$total_std, 2), "weeks\n")

Total Std Deviation:   7.03 weeks

3.4 Implied Distribution and Confidence Interval

SMM assumes the total project duration is approximately normally distributed. This allows us to construct a confidence interval directly from the mean and standard deviation.

A 95% confidence interval for total project duration is approximately:

\[\bar{X} \pm 1.96 \cdot \sigma\]

total_mean <- result$total_mean
total_sd   <- result$total_std
ci_lower   <- total_mean - 1.96 * total_sd
ci_upper   <- total_mean + 1.96 * total_sd
cat("95% CI: [", round(ci_lower, 1), ",", round(ci_upper, 1), "] weeks\n")

95% CI: [ 31.2 , 58.8 ] weeks

The plot below shows the implied normal distribution of total project duration, with the 95% confidence interval shaded:

x_range <- seq(total_mean - 4 * total_sd, total_mean + 4 * total_sd, length.out = 300)
y_range <- dnorm(x_range, mean = total_mean, sd = total_sd)

plot(x_range, y_range,
  type = "l", lwd = 2, col = "steelblue",
  main = "SMM: Implied Project Duration Distribution",
  xlab = "Total Duration (weeks)", ylab = "Density"
)

x_ci <- x_range[x_range >= ci_lower & x_range <= ci_upper]
y_ci <- dnorm(x_ci, mean = total_mean, sd = total_sd)
polygon(c(ci_lower, x_ci, ci_upper), c(0, y_ci, 0),
  col = "lightblue", border = NA
)

abline(v = total_mean, col = "black", lty = 2, lwd = 1.5)
legend("topright",
  legend = c("Normal density", "95% CI", "Mean"),
  col    = c("steelblue", "lightblue", "black"),
  lty    = c(1, NA, 2), lwd = c(2, NA, 1.5),
  pch    = c(NA, 15, NA), pt.cex = 1.5,
  bty    = "n"
)

SMM implied normal distribution for total project duration. The shaded region is the 95% confidence interval.

3.5 Comparison with Monte Carlo Simulation

What This Comparison Is Testing

To isolate the effect of distributional assumptions (normal vs. any shape), the MCS below uses the same task parameters as the SMM but with no correlation matrix (i.e., tasks are treated as independent). This allows a clean apples-to-apples test: both methods sum three independent normal tasks, so any remaining difference comes from simulation variance alone.

The earlier SMM result (total mean = 45 weeks, total SD including correlations) is not the right benchmark here, as that figure includes covariance from the correlation matrix, which the MCS below omits. To compare correlated results, you would pass the same cor_mat to mcs().

Running Monte Carlo simulation with the same task distributions validates the SMM. The two methods should yield very similar total means; differences in variance arise from how each handles correlated sampling.

task_dists_for_mcs <- list(
  list(type = "normal", mean = task_means[1], sd = sqrt(task_vars[1])),
  list(type = "normal", mean = task_means[2], sd = sqrt(task_vars[2])),
  list(type = "normal", mean = task_means[3], sd = sqrt(task_vars[3]))
)

mcs_result <- mcs(10000, task_dists_for_mcs)

smm_var_nocor <- sum(task_vars)

comparison <- data.frame(
  Method         = c("SMM (independent)", "Monte Carlo (10,000 runs)"),
  Total_Mean     = round(c(result$total_mean, mcs_result$total_mean), 2),
  Total_Variance = round(c(smm_var_nocor, mcs_result$total_variance), 2),
  Total_StdDev   = round(c(sqrt(smm_var_nocor), mcs_result$total_sd), 2)
)
knitr::kable(comparison, caption = "SMM vs. Monte Carlo Comparison (independent tasks)")

SMM vs. Monte Carlo Comparison (independent tasks)
Method	Total_Mean	Total_Variance	Total_StdDev
SMM (independent)	45.00	29.00	5.39
Monte Carlo (10,000 runs)	45.01	29.83	5.46

The two methods agree closely on the mean and variance. SMM is faster but assumes normality; Monte Carlo is more flexible and can use any distribution type.

3.6 Benefits and Limitations

	SMM	Monte Carlo
Speed	Instant (analytical)	Slow (thousands of iterations)
Inputs needed	Mean + variance per task	Full distribution per task
Distribution assumption	Normal (by CLT)	Any distribution
Correlation handling	Explicit covariance formula	Cholesky decomposition
Skewness / tails	Ignored	Captured accurately
Best for	Early estimates, quick checks	Detailed risk analysis, non-normal tasks

3.7 Summary

Key Takeaways

The Second Moment Method propagates uncertainty analytically using only means, variances, and correlations, no simulation required.
The covariance formula $\text{Cov}(X_i, X_j) = \rho_{ij} \cdot \sigma_i \cdot \sigma_j$ turns a correlation matrix into a contribution to total variance.
By the Central Limit Theorem, the total is approximately normal, enabling confidence intervals via $\bar{X} \pm z \cdot \sigma$.
SMM and Monte Carlo agree closely on the mean; differences in variance emerge from distributional assumptions and correlation handling.
SMM is ideal for rapid early-stage estimates; use Monte Carlo (Chapter 2) when distribution shape and tail accuracy matter.

For projects where risks are interconnected through shared root causes, the Bayesian approach in Chapter 6 provides a richer updating framework beyond means and variances alone.

3.8 Exercises

By hand. Compute the project mean and total variance by hand for two tasks with means 5 and 10 weeks, variances 1 and 4, and a correlation of 0.3. Then verify your answer using smm().
Effect of correlation. Run smm() for the 3-task example above with three different correlation matrices: (a) identity matrix (all tasks independent), (b) the original matrix (moderate correlation), and (c) a matrix where all off-diagonal entries are 0.9. Plot the three implied normal distributions on the same graph. What does correlation do to the spread?
Normality check. ★ The SMM assumes normality via the Central Limit Theorem. This works best when there are many tasks. Run mcs() for the same 3-task project, then overlay the SMM normal distribution on the MCS histogram. How well does normality hold? What if two of the tasks followed exponential distributions instead of normal?
SMM for costs. Your project has four cost items with means $50K, $80K, $30K, and $60K and standard deviations $10K, $15K, $5K, and $12K. Assume moderate positive correlation (0.3) between all pairs. Use smm() to compute the P90 cost estimate (mean + 1.28 × SD).
When to stop. ★ Under what conditions would you trust the SMM result over Monte Carlo? Under what conditions would you distrust it? Write a one-paragraph decision rule for choosing between the two methods.

# May I Have a (Second) Moment? The Second Moment Method {#sec-smm} > *"All models are wrong, but some are useful."* > — George E. P. Box The SMM is a deliberately simplified model. It assumes the total project duration is normally distributed. It ignores the shape of individual task distributions. And it is, in Box's sense, *useful*, fast enough to run before a meeting ends, and honest enough to use in a risk report. Sometimes you don't need a full Monte Carlo simulation. Sometimes you need an answer in thirty seconds, not thirty minutes. You have means and variances, a correlation matrix, and a deadline. Enter the Second Moment Method, the analyst's equivalent of a napkin calculation that is actually defensible. The name comes from statistics: the "first moment" of a distribution is its mean; the "second moment" (more precisely, the central second moment) is its variance, and the Second Moment Method is named for exactly that. It's the fastest way to get from "I have means and variances" to "I have a project risk estimate." ::: {.callout-note icon=false} ## Learning Objectives By the end of this chapter, you will be able to: 1. Explain when the Second Moment Method is preferable to Monte Carlo simulation 2. Apply the SMM formulas for total mean and total variance by hand 3. Run `smm()` and interpret the output 4. Construct a 95% confidence interval for total project duration 5. Compare SMM and MCS results and understand where they diverge ::: ## When to Use SMM ::: {.callout-tip} ## SMM vs. Monte Carlo: The Decision Rule Use **SMM** when: - You need results in seconds, not minutes - You only have mean and variance estimates (no full distribution) - Tasks are approximately normal and correlations are well-characterized - You want a quick sanity check before investing in a full MCS Use **Monte Carlo** (@sec-mcs) when: - Tasks have non-normal distributions (triangular, uniform, lognormal) - You need accurate tail behavior (P90, P95, P99) - Skewness matters, as asymmetric distributions require simulation to capture correctly ::: Think of SMM as a reconnaissance tool. It tells you the territory before you commit to the full expedition. ## How It Works For a project with $n$ tasks, SMM computes: **Total mean:** Sum of individual task means: $$E[X] = \sum_{i=1}^{n} E[X_i]$$ **Total variance:** Sum of variances plus twice the sum of all pairwise covariances: $$\text{Var}(X) = \sum_{i=1}^{n} \text{Var}(X_i) + 2 \sum_{i<j} \text{Cov}(X_i, X_j)$$ **Covariance:** Derived from the correlation matrix: $$\text{Cov}(X_i, X_j) = \rho_{ij} \cdot \sigma_i \cdot \sigma_j$$ By the Central Limit Theorem, the total is approximately normally distributed when there are many tasks, so we get confidence intervals for free [@benjamin2000]. ## Example ```{r setup} #| code-fold: false library(PRA) set.seed(42) ``` We analyze a 3-task project with task durations in weeks. Each task has a known mean and variance, and correlations between tasks are provided. ```{r} task_means <- c(10, 15, 20) # Expected duration for each task (weeks) task_vars <- c(4, 9, 16) # Variance of each task duration cor_mat <- matrix(c( 1.0, 0.5, 0.3, 0.5, 1.0, 0.4, 0.3, 0.4, 1.0 ), nrow = 3, byrow = TRUE) ``` ```{r} result <- smm(task_means, task_vars, cor_mat) cat("Total Mean Duration: ", round(result$total_mean, 2), "weeks\n") cat("Total Variance: ", round(result$total_var, 2), "\n") cat("Total Std Deviation: ", round(result$total_std, 2), "weeks\n") ``` ## Implied Distribution and Confidence Interval SMM assumes the total project duration is approximately normally distributed. This allows us to construct a confidence interval directly from the mean and standard deviation. A 95% confidence interval for total project duration is approximately: $$\bar{X} \pm 1.96 \cdot \sigma$$ ```{r} total_mean <- result$total_mean total_sd <- result$total_std ci_lower <- total_mean - 1.96 * total_sd ci_upper <- total_mean + 1.96 * total_sd cat("95% CI: [", round(ci_lower, 1), ",", round(ci_upper, 1), "] weeks\n") ``` The plot below shows the implied normal distribution of total project duration, with the 95% confidence interval shaded: ```{r} #| fig-cap: "SMM implied normal distribution for total project duration. The shaded region is the 95% confidence interval." x_range <- seq(total_mean - 4 * total_sd, total_mean + 4 * total_sd, length.out = 300) y_range <- dnorm(x_range, mean = total_mean, sd = total_sd) plot(x_range, y_range, type = "l", lwd = 2, col = "steelblue", main = "SMM: Implied Project Duration Distribution", xlab = "Total Duration (weeks)", ylab = "Density" ) x_ci <- x_range[x_range >= ci_lower & x_range <= ci_upper] y_ci <- dnorm(x_ci, mean = total_mean, sd = total_sd) polygon(c(ci_lower, x_ci, ci_upper), c(0, y_ci, 0), col = "lightblue", border = NA ) abline(v = total_mean, col = "black", lty = 2, lwd = 1.5) legend("topright", legend = c("Normal density", "95% CI", "Mean"), col = c("steelblue", "lightblue", "black"), lty = c(1, NA, 2), lwd = c(2, NA, 1.5), pch = c(NA, 15, NA), pt.cex = 1.5, bty = "n" ) ``` ## Comparison with Monte Carlo Simulation ::: {.callout-note} ## What This Comparison Is Testing To isolate the effect of **distributional assumptions** (normal vs. any shape), the MCS below uses the same task parameters as the SMM but with **no correlation matrix** (i.e., tasks are treated as independent). This allows a clean apples-to-apples test: both methods sum three independent normal tasks, so any remaining difference comes from simulation variance alone. The earlier SMM result (total mean = 45 weeks, total SD including correlations) is *not* the right benchmark here, as that figure includes covariance from the correlation matrix, which the MCS below omits. To compare correlated results, you would pass the same `cor_mat` to `mcs()`. ::: Running Monte Carlo simulation with the same task distributions validates the SMM. The two methods should yield very similar total means; differences in variance arise from how each handles correlated sampling. ```{r} task_dists_for_mcs <- list( list(type = "normal", mean = task_means[1], sd = sqrt(task_vars[1])), list(type = "normal", mean = task_means[2], sd = sqrt(task_vars[2])), list(type = "normal", mean = task_means[3], sd = sqrt(task_vars[3])) ) mcs_result <- mcs(10000, task_dists_for_mcs) ``` ```{r} smm_var_nocor <- sum(task_vars) comparison <- data.frame( Method = c("SMM (independent)", "Monte Carlo (10,000 runs)"), Total_Mean = round(c(result$total_mean, mcs_result$total_mean), 2), Total_Variance = round(c(smm_var_nocor, mcs_result$total_variance), 2), Total_StdDev = round(c(sqrt(smm_var_nocor), mcs_result$total_sd), 2) ) knitr::kable(comparison, caption = "SMM vs. Monte Carlo Comparison (independent tasks)") ``` The two methods agree closely on the mean and variance. SMM is faster but assumes normality; Monte Carlo is more flexible and can use any distribution type. ## Benefits and Limitations | | SMM | Monte Carlo | |--|-----|-------------| | **Speed** | Instant (analytical) | Slow (thousands of iterations) | | **Inputs needed** | Mean + variance per task | Full distribution per task | | **Distribution assumption** | Normal (by CLT) | Any distribution | | **Correlation handling** | Explicit covariance formula | Cholesky decomposition | | **Skewness / tails** | Ignored | Captured accurately | | **Best for** | Early estimates, quick checks | Detailed risk analysis, non-normal tasks | ## Summary ::: {.callout-tip icon=false} ## Key Takeaways - The Second Moment Method propagates uncertainty analytically using only means, variances, and correlations, no simulation required. - The **covariance formula** $\text{Cov}(X_i, X_j) = \rho_{ij} \cdot \sigma_i \cdot \sigma_j$ turns a correlation matrix into a contribution to total variance. - By the Central Limit Theorem, the total is approximately normal, enabling confidence intervals via $\bar{X} \pm z \cdot \sigma$. - SMM and Monte Carlo agree closely on the mean; differences in variance emerge from distributional assumptions and correlation handling. - SMM is ideal for rapid early-stage estimates; use Monte Carlo (@sec-mcs) when distribution shape and tail accuracy matter. ::: For projects where risks are interconnected through shared root causes, the Bayesian approach in @sec-bayes provides a richer updating framework beyond means and variances alone. ## Exercises 1. **By hand.** Compute the project mean and total variance by hand for two tasks with means 5 and 10 weeks, variances 1 and 4, and a correlation of 0.3. Then verify your answer using `smm()`. 2. **Effect of correlation.** Run `smm()` for the 3-task example above with three different correlation matrices: (a) identity matrix (all tasks independent), (b) the original matrix (moderate correlation), and (c) a matrix where all off-diagonal entries are 0.9. Plot the three implied normal distributions on the same graph. What does correlation do to the spread? 3. **Normality check.** ★ The SMM assumes normality via the Central Limit Theorem. This works best when there are many tasks. Run `mcs()` for the same 3-task project, then overlay the SMM normal distribution on the MCS histogram. How well does normality hold? What if two of the tasks followed exponential distributions instead of normal? 4. **SMM for costs.** Your project has four cost items with means \$50K, \$80K, \$30K, and \$60K and standard deviations \$10K, \$15K, \$5K, and \$12K. Assume moderate positive correlation (0.3) between all pairs. Use `smm()` to compute the P90 cost estimate (mean + 1.28 × SD). 5. **When to stop.** ★ Under what conditions would you trust the SMM result over Monte Carlo? Under what conditions would you distrust it? Write a one-paragraph decision rule for choosing between the two methods.