Skip to contents

Introduction

Bayesian inference is a statistical approach based on Bayes’ theorem, which describes how to update beliefs based on new evidence. This approach provides a framework for reasoning about probabilities in the presence of uncertainty.

Bayes’ theorem states that:

P(H|E)=P(E|H)P(H)P(E) P(H | E) = \frac{P(E | H) P(H)}{P(E)}

where:

  • P(H|E)P(H | E) is the posterior probability of hypothesis HH given evidence EE.
  • P(E|H)P(E | H) is the likelihood of observing evidence EE given that hypothesis HH is true.
  • P(H)P(H) is the prior probability of hypothesis HH before observing evidence EE.
  • P(E)P(E) is the probability of evidence EE occurring.

This document explores Bayesian methods for risk probability and cost probability density estimation.

Inference for Risk Probability

Consider a risk event RR that may be caused by multiple root causes C1,C2,,CnC_1, C_2, \dots, C_n. The probability of RR occurring can be computed as:

P(R)=i=1nP(R|Ci)P(Ci)+P(R|¬Ci)P(¬Ci) P(R) = \sum_{i=1}^{n} P(R | C_i) P(C_i) + P(R | \neg C_i) P(\neg C_i)

where:

  • P(R|Ci)P(R | C_i) represents the probability of RR occurring given that CiC_i is present.
  • P(Ci)P(C_i) is the prior probability of the root cause CiC_i.
  • P(R|¬Ci)P(R | \neg C_i) is the probability of RR occurring when CiC_i is absent.
  • P(¬Ci)=1P(Ci)P(\neg C_i) = 1 - P(C_i) is the probability that CiC_i does not occur.

The function risk_prob calculates the probability of the risk event given the root causes and their conditional probabilities.

Example

First, load the package:

Suppose there are two root causes with probabilities P(C1)=0.3P(C_1) = 0.3 and P(C2)=0.2P(C_2) = 0.2.

cause_probs <- c(0.3, 0.2)

The conditional probabilities of the risk event given each cause are P(R|C1)=0.8P(R | C_1) = 0.8 and P(R|C2)=0.6P(R | C_2) = 0.6, respectively. The conditional probabilities of the risk event given not each cause are P(R|¬C1)=0.2P(R | \neg C_1) = 0.2 and P(R|¬C2)=0.4P(R | \neg C_2) = 0.4.

risks_given_causes <- c(0.8, 0.6)
risks_given_not_causes <- c(0.2, 0.4)

To calculate the probability of the risk event, use the risk_prob function:

risk_prob_value <- risk_prob(cause_probs, risks_given_causes, risks_given_not_causes)
cat(risk_prob_value)

0.82

Inference for Cost Probability Density

The cost_pdf function uses Bayesian inference to model the probability distribution of cost outcomes based on the occurrence of risk events. It assumes that each risk event contributes to the total cost according to a normal distribution, leading to a mixture model representation:

P(A)=i=1nP(Ri)N(A|μi,σi)+P(¬Ri)N(A|base_cost,0) P(A) = \sum_{i=1}^{n} P(R_i) \cdot N(A | \mu_i, \sigma_i) + P(\neg R_i) \cdot N(A | \text{base_cost}, 0)

where:

  • P(Ri)P(R_i) is the probability of risk event RiR_i.
  • N(A|μi,σi)N(A | \mu_i, \sigma_i) is the normal distribution with mean μi\mu_i and standard deviation σi\sigma_i.
  • P(¬Ri)=1P(Ri)P(\neg R_i) = 1 - P(R_i) is the probability that risk event RiR_i does not occur.
  • N(A|base_cost,0)N(A | \text{base_cost}, 0) is a point mass at the baseline cost base_cost\text{base_cost}.

The function cost_pdf generates random samples from the mixture model to estimate the cost distribution.

Example

Suppose there are three risk events with probabilities P(R1)=0.3P(R_1) = 0.3, P(R2)=0.5P(R_2) = 0.5, and P(R3)=0.2P(R_3) = 0.2.

risk_probs <- c(0.3, 0.5, 0.2)

The means and standard deviations of the normal distributions for cost given each risk event are:

means_given_risks <- c(10000, 15000, 5000)
sds_given_risks <- c(2000, 1000, 1000)

The baseline cost is base_cost=2000\text{base_cost} = 2000.

base_cost <- 2000

To generate random samples from the cost distribution, use the cost_pdf function:

num_sims <- 1000
samples <- cost_pdf(num_sims, risk_probs, means_given_risks, sds_given_risks, base_cost)
hist(samples, breaks = 30, col = "skyblue", main = "Histogram of Cost", xlab = "Cost")

The histogram above shows the distribution of cost outcomes based on the risk events and their associated costs.

Posterior Risk Probability

Bayesian updating is the process of updating prior beliefs given new evidence. The risk_post_prob function calculates the posterior probability of a risk event given observations of its root causes. This is achieved by applying Bayes’ theorem to update the prior probabilities of root causes based on the observed data.

Example

Suppose there are two root causes with prior probabilities P(C1)=0.3P(C_1) = 0.3 and P(C2)=0.2P(C_2) = 0.2.

cause_probs <- c(0.3, 0.2)

The conditional probabilities of the risk event given each cause are P(R|C1)=0.8P(R | C_1) = 0.8 and P(R|C2)=0.6P(R | C_2) = 0.6, respectively. The conditional probabilities of the risk event given not each cause are P(R|¬C1)=0.2P(R | \neg C_1) = 0.2 and P(R|¬C2)=0.4P(R | \neg C_2) = 0.4.

risks_given_causes <- c(0.8, 0.6)
risks_given_not_causes <- c(0.2, 0.4)

Suppose the observed root causes are C1=1C_1 = 1 and C2=NAC_2 = \text{NA}.

observed_causes <- c(1, NA)

To calculate the posterior probability of the risk event, use the risk_post_prob function:

risk_post_prob <- risk_post_prob(cause_probs, risks_given_causes,
  risks_given_not_causes, observed_causes)
cat(risk_post_prob)

0.6315789

The posterior probability of the risk event is updated based on the observed root causes.

Posterior Cost Probability Density

The cost_post_pdf function generates a posterior probability density function (PDF) for costs, given observed risk events. This function simulates random samples from a mixture model based on Bayesian updating principles.

Example

Suppose there are three risk events with observed values R1=1R_1 = 1, R2=NAR_2 = \text{NA}, and R3=1R_3 = 1.

observed_risks <- c(1, NA, 1)

The means and standard deviations of the normal distributions for cost given each risk event are:

means_given_risks <- c(10000, 15000, 5000)
sds_given_risks <- c(2000, 1000, 1000)

The baseline cost is base_cost=2000\text{base_cost} = 2000.

base_cost <- 2000

To generate random samples from the posterior cost distribution, use the cost_post_pdf function:

num_sims <- 1000
posterior_samples <- cost_post_pdf(
  num_sims = num_sims,
  observed_risks = observed_risks,
  means_given_risks = means_given_risks,
  sds_given_risks = sds_given_risks,
  base_cost = base_cost
)

hist(posterior_samples, breaks = 30, col = "skyblue", main = "Posterior Cost PDF", xlab = "Cost")

The histogram above shows the posterior probability density function of costs based on the observed risk events.

Conclusion

Bayesian methods provide a powerful framework for updating beliefs and making inferences based on observed data. By incorporating prior knowledge and new evidence, these methods can help quantify uncertainty and make informed decisions in a wide range of applications.