6 I Had a Feeling: Bayesian Risk Inference

“When the facts change, I change my mind. What do you do, sir?” — Attributed to John Maynard Keynes

Here is the situation: you have a risk assessment based on historical data and expert judgment. Then something happens on site: a root cause you were watching turns out to be present, or a supplier sends a warning signal. Your prior assessment is now out of date. Do you stick with the original numbers, or update them?

Bayesian inference gives you the formal machinery to update. Not because it feels right, but because it is the mathematically correct way to incorporate new evidence.

Learning Objectives

By the end of this chapter, you will be able to:

Explain prior and posterior probability in plain English
Use risk_prob() to compute a prior risk probability from root causes
Use risk_post_prob() to update that probability given new observations
Sample prior and posterior cost distributions with cost_pdf() and cost_post_pdf()
Interpret the shift between prior and posterior distributions

6.1 The Core Idea

Bayes’ Theorem in Plain English

\[P(H \mid E) = \frac{P(E \mid H) \cdot P(H)}{P(E)}\]

In plain English: Your updated belief (posterior) equals your initial belief (prior) multiplied by how well the evidence fits that belief, normalized to sum to 1.

In project risk terms:

Prior $P(H)$: Your initial estimate of risk probability, before any field observations
Likelihood $P(E \mid H)$: How likely is the observed evidence if the risk/cause is present?
Posterior $P(H \mid E)$: Your updated estimate after incorporating the evidence

Observing a root cause always moves the posterior. The direction and magnitude depend on how strongly that cause is linked to the risk event.

Bayes’ theorem states:

\[P(H \mid E) = \frac{P(E \mid H) \cdot P(H)}{P(E)}\]

In English: the probability of your hypothesis ($H$) given new evidence ($E$) equals the likelihood of seeing that evidence if the hypothesis were true, times your prior probability, divided by the total probability of the evidence. The result is the posterior probability, your updated belief.

In project risk terms:

Prior: Your initial estimate of risk probability before any field observations
Evidence: Observations about root causes (is the labor market tight? is weather bad?)
Posterior: Your updated estimate after incorporating those observations

The PRA package provides four Bayesian functions organized into two stages:

Stage	Function	Purpose
Prior	`risk_prob()`	Compute risk probability from root causes (no observations yet)
Prior	`cost_pdf()`	Sample prior cost distribution based on risk probabilities
Posterior	`risk_post_prob()`	Update risk probability after observing cause status
Posterior	`cost_post_pdf()`	Sample posterior cost distribution based on observed risks

library(PRA)
set.seed(42)

6.2 Causal Structure

Before computing probabilities, it helps to see the causal graph. Two independent root causes each point to the risk event, and observing either one updates the posterior risk probability.

library(igraph)
g <- graph_from_data_frame(
  data.frame(from = c("Cause-1", "Cause-2"), to = c("Risk", "Risk")),
  directed = TRUE
)
V(g)$color <- c("steelblue", "steelblue", "tomato")
plot(g, vertex.size = 40, vertex.label.cex = 0.9,
     edge.arrow.size = 0.6, layout = layout_with_sugiyama(g)$layout)

DAG: Cause-1 and Cause-2 independently drive the risk event R.

6.3 Step 1: Prior Risk Probability

Before any observations are made, risk_prob() computes the probability of a risk event R occurring given two potential root causes. For each cause, we supply:

cause_probs: prior probability that the cause is present
risks_given_causes: P(R | cause present)
risks_given_not_causes: P(R | cause absent)

cause_probs            <- c(0.3, 0.2)
risks_given_causes     <- c(0.8, 0.6)
risks_given_not_causes <- c(0.2, 0.4)

prior_risk <- risk_prob(cause_probs, risks_given_causes, risks_given_not_causes)
cat("Prior probability of risk event R:", round(prior_risk, 3), "\n")

Prior probability of risk event R: 0.653

This is your starting point, an estimate of how likely the risk event is before you’ve looked at anything on site.

6.4 Step 2: Prior Cost Distribution

Given the prior risk probabilities, cost_pdf() samples the cost distribution before any field observations. Three independent risk events can each contribute cost if they occur.

risk_probs        <- c(0.3, 0.5, 0.2)
means_given_risks <- c(10000, 15000, 5000)
sds_given_risks   <- c(2000, 1000, 1000)
base_cost         <- 2000

prior_samples <- cost_pdf(
  num_sims          = 5000,
  risk_probs        = risk_probs,
  means_given_risks = means_given_risks,
  sds_given_risks   = sds_given_risks,
  base_cost         = base_cost
)

We’ll compare this prior distribution to the posterior in Step 4.

6.5 Step 3: Posterior Risk Probability (Bayesian Update)

After inspecting the project site, we observe that Cause 1 is present (= 1). Cause 2 has not yet been assessed (= NA). risk_post_prob() updates the risk probability using only the available evidence; NA causes are treated as unobserved and do not contribute to the update.

observed_causes <- c(1, NA)  # C1 observed as present; C2 not yet assessed

posterior_risk <- risk_post_prob(
  cause_probs, risks_given_causes,
  risks_given_not_causes, observed_causes
)
cat("Posterior probability of risk event R:", round(posterior_risk, 3), "\n")

Posterior probability of risk event R: 0.888

Observing Cause 1 (which has a strong link to R) raises the risk probability substantially. The NA for Cause 2 is simply ignored; only confirmed observations drive the update.

6.5.1 Prior vs. Posterior Probability

prob_data <- data.frame(
  Stage       = c("Prior", "Posterior"),
  Probability = c(prior_risk, posterior_risk)
)

p <- ggplot2::ggplot(prob_data, ggplot2::aes(x = Stage, y = Probability, fill = Stage)) +
  ggplot2::geom_col(width = 0.5, show.legend = FALSE) +
  ggplot2::geom_text(ggplot2::aes(label = round(Probability, 3)),
    vjust = -0.4, size = 4.5
  ) +
  ggplot2::scale_fill_manual(values = c("Prior" = "steelblue", "Posterior" = "tomato")) +
  ggplot2::scale_y_continuous(limits = c(0, 1), labels = scales::percent) +
  ggplot2::labs(
    title = "Bayesian Update: Risk Probability",
    x     = NULL, y = "P(Risk Event R)"
  ) +
  ggplot2::theme_minimal(base_size = 13)

print(p)

Bayesian update: observing Cause 1 nearly doubles the estimated probability of the risk event.

6.6 Step 4: Posterior Cost Distribution

Now that we know Cause 1 is present (Risk 1 occurs), and one risk remains unobserved (Risk 2 = NA), cost_post_pdf() samples the posterior cost distribution. Observed risks that occurred (= 1) add their cost; unobserved risks (= NA) are excluded from the simulation.

observed_risks <- c(1, NA, 1)  # Risk 1 and 3 confirmed; Risk 2 not yet assessed

posterior_samples <- cost_post_pdf(
  num_sims          = 5000,
  observed_risks    = observed_risks,
  means_given_risks = means_given_risks,
  sds_given_risks   = sds_given_risks,
  base_cost         = base_cost
)

6.6.1 Prior vs. Posterior Cost Distribution

Plotting both distributions on the same axes shows how the evidence shifts the cost estimate:

xlim_range <- range(c(prior_samples, posterior_samples))

hist(prior_samples,
  breaks = 40, freq = FALSE,
  col    = rgb(0.27, 0.51, 0.71, 0.5),
  border = "white",
  xlim   = xlim_range,
  main   = "Prior vs. Posterior Cost Distribution",
  xlab   = "Total Cost ($)", ylab = "Density"
)

hist(posterior_samples,
  breaks = 40, freq = FALSE,
  col    = rgb(0.84, 0.24, 0.31, 0.5),
  border = "white",
  add    = TRUE
)

abline(v = mean(prior_samples),     col = "steelblue", lty = 2, lwd = 2)
abline(v = mean(posterior_samples), col = "tomato",    lty = 2, lwd = 2)

legend("topright",
  legend = c(
    paste0("Prior  (mean = $", format(round(mean(prior_samples)),     big.mark = ",")),
    paste0("Posterior (mean = $", format(round(mean(posterior_samples)), big.mark = ","))
  ),
  fill = c(rgb(0.27, 0.51, 0.71, 0.5), rgb(0.84, 0.24, 0.31, 0.5)),
  bty  = "n"
)

Prior vs. posterior cost distribution. Confirming which risks materialized narrows and shifts the distribution.

Interpretation: The posterior distribution is narrower and shifted upward, as observing which risks materialized eliminates uncertainty about some cost components while raising the estimated exposure from confirmed events. The remaining spread reflects uncertainty from the unobserved risk and cost variability in the confirmed ones.

6.7 Summary

The Bayesian workflow in PRA follows a natural before-and-after structure:

Before observations: use risk_prob() and cost_pdf() to characterize the risk landscape.
As evidence arrives: use risk_post_prob() and cost_post_pdf() to update estimates.
NA values represent causes/risks not yet assessed; they are correctly excluded from the update.

This approach is particularly powerful in phased projects where information about risk drivers becomes available progressively, allowing cost forecasts to be refined at each stage.

6.8 Summary

Key Takeaways

Bayesian inference is the principled way to update risk estimates as new evidence arrives; it is not a heuristic, it is the mathematically correct rule.
Prior → Posterior: the posterior always moves in the direction of the evidence; the size of the move depends on how strongly the cause links to the risk.
NA values for unobserved causes or risks are correctly handled; they do not contribute to the update, reflecting genuine ignorance.
The posterior cost distribution is typically narrower than the prior, because confirmed risk states eliminate uncertainty about some cost components.
This workflow is particularly powerful in phased projects: update at each phase gate as more information becomes available.

For risks that propagate through a network of shared resources and tasks, Chapter 8 extends these ideas into a full probabilistic graph where Bayesian conditioning and structural dependencies interact.

6.9 Exercises

Prior vs. posterior. In plain English, what is the difference between a prior probability and a posterior probability? Give a project example where each would be used.
Change the observation. Re-run Step 3 with Cause 2 observed as present (= 1) instead of Cause 1. How does the posterior change compared to when Cause 1 was observed? Why?
Both causes observed. What happens if you observe both causes as present (= 1, = 1)? Compute the posterior. What is the maximum possible posterior probability given the model parameters?
Intuition check. ★ Explain in plain English why observing one root cause changes your belief about the risk event, even though we didn’t observe the risk event itself. What is the role of the conditional probabilities risks_given_causes in this reasoning?
Sequential updating. ★ Start with the prior. Observe Cause 1. Update. Then observe Cause 2. Update again. Does it matter what order you observe the causes? Should it? (Hint: think about whether the causes are assumed independent in this model.)

# I Had a Feeling: Bayesian Risk Inference {#sec-bayes} > *"When the facts change, I change my mind. What do you do, sir?"* > — Attributed to John Maynard Keynes Here is the situation: you have a risk assessment based on historical data and expert judgment. Then something happens on site: a root cause you were watching turns out to be present, or a supplier sends a warning signal. Your prior assessment is now out of date. Do you stick with the original numbers, or update them? Bayesian inference gives you the formal machinery to update. Not because it feels right, but because it is the mathematically correct way to incorporate new evidence. ::: {.callout-note icon=false} ## Learning Objectives By the end of this chapter, you will be able to: 1. Explain prior and posterior probability in plain English 2. Use `risk_prob()` to compute a prior risk probability from root causes 3. Use `risk_post_prob()` to update that probability given new observations 4. Sample prior and posterior cost distributions with `cost_pdf()` and `cost_post_pdf()` 5. Interpret the shift between prior and posterior distributions ::: ## The Core Idea ::: {.callout-note} ## Bayes' Theorem in Plain English $$P(H \mid E) = \frac{P(E \mid H) \cdot P(H)}{P(E)}$$ **In plain English:** Your updated belief (posterior) equals your initial belief (prior) multiplied by how well the evidence fits that belief, normalized to sum to 1. In project risk terms: - **Prior** $P(H)$: Your initial estimate of risk probability, before any field observations - **Likelihood** $P(E \mid H)$: How likely is the observed evidence if the risk/cause is present? - **Posterior** $P(H \mid E)$: Your updated estimate after incorporating the evidence Observing a root cause *always* moves the posterior. The direction and magnitude depend on how strongly that cause is linked to the risk event. ::: Bayes' theorem states: $$P(H \mid E) = \frac{P(E \mid H) \cdot P(H)}{P(E)}$$ In English: the probability of your hypothesis ($H$) given new evidence ($E$) equals the likelihood of seeing that evidence if the hypothesis were true, times your prior probability, divided by the total probability of the evidence. The result is the **posterior** probability, your updated belief. In project risk terms: - **Prior**: Your initial estimate of risk probability before any field observations - **Evidence**: Observations about root causes (is the labor market tight? is weather bad?) - **Posterior**: Your updated estimate after incorporating those observations The PRA package provides four Bayesian functions organized into two stages: | Stage | Function | Purpose | |-------|---------|---------| | **Prior** | `risk_prob()` | Compute risk probability from root causes (no observations yet) | | **Prior** | `cost_pdf()` | Sample prior cost distribution based on risk probabilities | | **Posterior** | `risk_post_prob()` | Update risk probability after observing cause status | | **Posterior** | `cost_post_pdf()` | Sample posterior cost distribution based on observed risks | ```{r setup} #| code-fold: false library(PRA) set.seed(42) ``` ## Causal Structure Before computing probabilities, it helps to see the causal graph. Two independent root causes each point to the risk event, and observing either one updates the posterior risk probability. ```{r} #| fig-cap: "DAG: Cause-1 and Cause-2 independently drive the risk event R." library(igraph) g <- graph_from_data_frame( data.frame(from = c("Cause-1", "Cause-2"), to = c("Risk", "Risk")), directed = TRUE ) V(g)$color <- c("steelblue", "steelblue", "tomato") plot(g, vertex.size = 40, vertex.label.cex = 0.9, edge.arrow.size = 0.6, layout = layout_with_sugiyama(g)$layout) ``` ## Step 1: Prior Risk Probability Before any observations are made, `risk_prob()` computes the probability of a risk event R occurring given two potential root causes. For each cause, we supply: - `cause_probs`: prior probability that the cause is present - `risks_given_causes`: P(R | cause present) - `risks_given_not_causes`: P(R | cause absent) ```{r} cause_probs <- c(0.3, 0.2) risks_given_causes <- c(0.8, 0.6) risks_given_not_causes <- c(0.2, 0.4) ``` ```{r} prior_risk <- risk_prob(cause_probs, risks_given_causes, risks_given_not_causes) cat("Prior probability of risk event R:", round(prior_risk, 3), "\n") ``` This is your starting point, an estimate of how likely the risk event is before you've looked at anything on site. ## Step 2: Prior Cost Distribution Given the prior risk probabilities, `cost_pdf()` samples the cost distribution before any field observations. Three independent risk events can each contribute cost if they occur. ```{r} risk_probs <- c(0.3, 0.5, 0.2) means_given_risks <- c(10000, 15000, 5000) sds_given_risks <- c(2000, 1000, 1000) base_cost <- 2000 ``` ```{r} prior_samples <- cost_pdf( num_sims = 5000, risk_probs = risk_probs, means_given_risks = means_given_risks, sds_given_risks = sds_given_risks, base_cost = base_cost ) ``` We'll compare this prior distribution to the posterior in Step 4. ## Step 3: Posterior Risk Probability (Bayesian Update) After inspecting the project site, we observe that Cause 1 is present (= 1). Cause 2 has not yet been assessed (= NA). `risk_post_prob()` updates the risk probability using only the available evidence; NA causes are treated as unobserved and do not contribute to the update. ```{r} observed_causes <- c(1, NA) # C1 observed as present; C2 not yet assessed ``` ```{r} posterior_risk <- risk_post_prob( cause_probs, risks_given_causes, risks_given_not_causes, observed_causes ) cat("Posterior probability of risk event R:", round(posterior_risk, 3), "\n") ``` Observing Cause 1 (which has a strong link to R) raises the risk probability substantially. The NA for Cause 2 is simply ignored; only confirmed observations drive the update. ### Prior vs. Posterior Probability ```{r} #| fig-cap: "Bayesian update: observing Cause 1 nearly doubles the estimated probability of the risk event." prob_data <- data.frame( Stage = c("Prior", "Posterior"), Probability = c(prior_risk, posterior_risk) ) p <- ggplot2::ggplot(prob_data, ggplot2::aes(x = Stage, y = Probability, fill = Stage)) + ggplot2::geom_col(width = 0.5, show.legend = FALSE) + ggplot2::geom_text(ggplot2::aes(label = round(Probability, 3)), vjust = -0.4, size = 4.5 ) + ggplot2::scale_fill_manual(values = c("Prior" = "steelblue", "Posterior" = "tomato")) + ggplot2::scale_y_continuous(limits = c(0, 1), labels = scales::percent) + ggplot2::labs( title = "Bayesian Update: Risk Probability", x = NULL, y = "P(Risk Event R)" ) + ggplot2::theme_minimal(base_size = 13) print(p) ``` ## Step 4: Posterior Cost Distribution Now that we know Cause 1 is present (Risk 1 occurs), and one risk remains unobserved (Risk 2 = NA), `cost_post_pdf()` samples the posterior cost distribution. Observed risks that occurred (= 1) add their cost; unobserved risks (= NA) are excluded from the simulation. ```{r} observed_risks <- c(1, NA, 1) # Risk 1 and 3 confirmed; Risk 2 not yet assessed ``` ```{r} posterior_samples <- cost_post_pdf( num_sims = 5000, observed_risks = observed_risks, means_given_risks = means_given_risks, sds_given_risks = sds_given_risks, base_cost = base_cost ) ``` ### Prior vs. Posterior Cost Distribution Plotting both distributions on the same axes shows how the evidence shifts the cost estimate: ```{r} #| fig-cap: "Prior vs. posterior cost distribution. Confirming which risks materialized narrows and shifts the distribution." xlim_range <- range(c(prior_samples, posterior_samples)) hist(prior_samples, breaks = 40, freq = FALSE, col = rgb(0.27, 0.51, 0.71, 0.5), border = "white", xlim = xlim_range, main = "Prior vs. Posterior Cost Distribution", xlab = "Total Cost ($)", ylab = "Density" ) hist(posterior_samples, breaks = 40, freq = FALSE, col = rgb(0.84, 0.24, 0.31, 0.5), border = "white", add = TRUE ) abline(v = mean(prior_samples), col = "steelblue", lty = 2, lwd = 2) abline(v = mean(posterior_samples), col = "tomato", lty = 2, lwd = 2) legend("topright", legend = c( paste0("Prior (mean = $", format(round(mean(prior_samples)), big.mark = ",")), paste0("Posterior (mean = $", format(round(mean(posterior_samples)), big.mark = ",")) ), fill = c(rgb(0.27, 0.51, 0.71, 0.5), rgb(0.84, 0.24, 0.31, 0.5)), bty = "n" ) ``` **Interpretation:** The posterior distribution is narrower and shifted upward, as observing which risks materialized eliminates uncertainty about some cost components while raising the estimated exposure from confirmed events. The remaining spread reflects uncertainty from the unobserved risk and cost variability in the confirmed ones. ## Summary The Bayesian workflow in PRA follows a natural before-and-after structure: 1. **Before observations**: use `risk_prob()` and `cost_pdf()` to characterize the risk landscape. 2. **As evidence arrives**: use `risk_post_prob()` and `cost_post_pdf()` to update estimates. 3. **NA values** represent causes/risks not yet assessed; they are correctly excluded from the update. This approach is particularly powerful in phased projects where information about risk drivers becomes available progressively, allowing cost forecasts to be refined at each stage. ## Summary ::: {.callout-tip icon=false} ## Key Takeaways - Bayesian inference is the principled way to update risk estimates as new evidence arrives; it is not a heuristic, it is the mathematically correct rule. - **Prior → Posterior**: the posterior always moves in the direction of the evidence; the size of the move depends on how strongly the cause links to the risk. - `NA` values for unobserved causes or risks are correctly handled; they do not contribute to the update, reflecting genuine ignorance. - The posterior cost distribution is typically narrower than the prior, because confirmed risk states eliminate uncertainty about some cost components. - This workflow is particularly powerful in **phased projects**: update at each phase gate as more information becomes available. ::: For risks that propagate through a network of shared resources and tasks, @sec-network extends these ideas into a full probabilistic graph where Bayesian conditioning and structural dependencies interact. ## Exercises 1. **Prior vs. posterior.** In plain English, what is the difference between a prior probability and a posterior probability? Give a project example where each would be used. 2. **Change the observation.** Re-run Step 3 with Cause 2 observed as present (= 1) instead of Cause 1. How does the posterior change compared to when Cause 1 was observed? Why? 3. **Both causes observed.** What happens if you observe both causes as present (= 1, = 1)? Compute the posterior. What is the maximum possible posterior probability given the model parameters? 4. **Intuition check.** ★ Explain in plain English why observing one root cause changes your belief about the risk event, even though we didn't observe the risk event itself. What is the role of the conditional probabilities `risks_given_causes` in this reasoning? 5. **Sequential updating.** ★ Start with the prior. Observe Cause 1. Update. Then observe Cause 2. Update again. Does it matter what order you observe the causes? Should it? (Hint: think about whether the causes are assumed independent in this model.)