Bayesian Networks • PRA

Introduction

Bayesian networks are a type of mathematical model that represent dependencies and uncertainties using probability theory and graph structures. A Bayesian network is a directed acyclic graph (DAG) where nodes represent random variables and edges represent dependencies between the variables.

This document explores Bayesian networks for project risk analysis and decision making.

Project

Tasks

Suppose there is a simple roadway project. The project consists of 8 tasks, each with a specific duration and cost. The tasks are as follows:

roadway_tasks <- data.frame(
  ID = c("L", "M", "N", "O", "P", "Q", "R", "S"),
  Label = c(
    "Task-1",
    "Task-2",
    "Task-3",
    "Task-4",
    "Task-5", 
    "Task-6",
    "Task-7",
    "Task-8"
  ),
  Task = c(
    "Survey and Site Assessment",
    "Design and Planning",
    "Permitting and Approvals",
    "Excavation and Grading",
    "Pavement Installation",
    "Drainage and Utilities Installation",
    "Signage and Markings",
    "Final Inspection and Handover"
  ), 
  Project_ID = rep("P", 8)
)

knitr::kable(roadway_tasks, caption = "Roadway Tasks")

Roadway Tasks
ID	Label	Task	Project_ID
L	Task-1	Survey and Site Assessment	P
M	Task-2	Design and Planning	P
N	Task-3	Permitting and Approvals	P
O	Task-4	Excavation and Grading	P
P	Task-5	Pavement Installation	P
Q	Task-6	Drainage and Utilities Installation	P
R	Task-7	Signage and Markings	P
S	Task-8	Final Inspection and Handover	P

Resources

The project requires various resources to complete the tasks. The resources include surveyors, engineers, regulatory support, heavy machinery, pavement and related machinery, drainage material and equipment, painters, traffic signs, road markers, inspectors, and quality control support. The resources are allocated to specific tasks based on their need and availability.

roadway_resources <- data.frame(
  ID = c("D", "E", "F", "G", "H", "I", "J", "K"),
  Label = c(
    "Resource-1",
    "Resource-2",
    "Resource-3",
    "Resource-4",
    "Resource-5",
    "Resource-6",
    "Resource-7",
    "Resource-8"
  ),
  Resource = c(
    "Surveyer",
    "Engineer",
    "Regulatory Support",
    "Heavy Machinery",
    "Pavement and Related Machinery",
    "Drainage Material and Equipment",
    "Painters, Traffic Signs, Road Markers",
    "Inspectors and Quality Control Support"
  ),
  Task_ID = c("L", "M", "N", "O", "P", "Q", "R", "S"),
  Task = c(
    "Survey and Site Assessment",
    "Design and Planning",
    "Permitting and Approvals",
    "Excavation and Grading",
    "Pavement Installation",
    "Drainage and Utilities Installation",
    "Signage and Markings",
    "Final Inspection and Handover"
  ),
  Mean = c(
    10000,
    20000,
    3500,
    35000,
    100000,
    25000,
    6500,
    2000
  ), 
  SD = c(
    2000,
    5000,
    1000,
    10000,
    20000,
    5000,
    1500,
    500
  )
)

knitr::kable(roadway_resources, caption = "Roadway Resources")

Roadway Resources
ID	Label	Resource	Task_ID	Task	Mean	SD
D	Resource-1	Surveyer	L	Survey and Site Assessment	10000	2000
E	Resource-2	Engineer	M	Design and Planning	20000	5000
F	Resource-3	Regulatory Support	N	Permitting and Approvals	3500	1000
G	Resource-4	Heavy Machinery	O	Excavation and Grading	35000	10000
H	Resource-5	Pavement and Related Machinery	P	Pavement Installation	100000	20000
I	Resource-6	Drainage Material and Equipment	Q	Drainage and Utilities Installation	25000	5000
J	Resource-7	Painters, Traffic Signs, Road Markers	R	Signage and Markings	6500	1500
K	Resource-8	Inspectors and Quality Control Support	S	Final Inspection and Handover	2000	500

Risks

The project is subject to various risks that may impact the cost, duration, and quality of the project. The risks include delays in permitting and approvals, unforeseen site conditions, material price fluctuations, labor shortages, weather disruptions, equipment breakdowns, design changes, and regulatory changes. Each risk event has a probability of occurrence and an impact on the project.

roadway_risks <- data.frame(
  Risk_ID = c("A", "B", "C"),
  Name = c(
    "Risk-1",
    "Risk-2",
    "Risk-3"
  ),
  Risk = c(
    "Delays in Permitting and Approvals",
    "Unforeseen Site Conditions",
    "Material Price Fluctuations"
  ),
  Probability = c(
    0.9,
    0.95,
    0.8
  ),
  Resource_ID = c("F", "G", "H"),
  Resource_Impacted = c(
    "Regulatory Support",
    "Heavy Machinery",
    "Pavement and Related Machinery"
  ),
  Mean = c(
    7000,
    70000,
    200000
  ),
  SD = c(
    2000,
    20000,
    40000
  )
)

knitr::kable(roadway_risks, caption = "Roadway Risks")

Roadway Risks
Risk_ID	Name	Risk	Probability	Resource_ID	Resource_Impacted	Mean	SD
A	Risk-1	Delays in Permitting and Approvals	0.90	F	Regulatory Support	7e+03	2000
B	Risk-2	Unforeseen Site Conditions	0.95	G	Heavy Machinery	7e+04	20000
C	Risk-3	Material Price Fluctuations	0.80	H	Pavement and Related Machinery	2e+05	40000

Bayesian Network

A Bayesian network can be used to model the relationships between tasks, resources, and risks in the project. The network can help in analyzing the impact of risks on the project outcomes and in making informed decisions.

Nodes

First, define the nodes and edges of the Bayesian network. The nodes represent the tasks, resources, and risks in the project.

nodes <- data.frame(
 id = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T"),
  label = c(
    "Risk-1",
    "Risk-2",
    "Risk-3",
    "Resource-1",
    "Resource-2",
    "Resource-3",
    "Resource-4",
    "Resource-5",
    "Resource-6",
    "Resource-7",
    "Resource-8",
    "Task-1",
    "Task-2",
    "Task-3",
    "Task-4",
    "Task-5", 
    "Task-6",
    "Task-7",
    "Task-8",
    "Project"
  ),
 group = c(
    "Risk",
    "Risk",
    "Risk",
    "Resource",
    "Resource",
    "Resource",
    "Resource",
    "Resource",
    "Resource",
    "Resource",
    "Resource",
    "Task",
    "Task",
    "Task",
    "Task",
    "Task",
    "Task",
    "Task",
    "Task",
    "Project"
    ),
  stringsAsFactors = FALSE
 )

Edges

Next, define the edges between the nodes in the Bayesian network. The edges represent the dependencies between the nodes.

links <- data.frame(
  source = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S"
  ),
  target = c("F", "G", "H", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "T", "T", "T", "T", "T", "T", "T"
  ),
  value = rep(1, 19),
  stringsAsFactors = FALSE
)

Then, define the distributions for the nodes in the Bayesian network. The distributions represent the probabilities of the outcomes for each node.

distributions <- list(
  A = list(
    type = "discrete",
    values = c(1, 0),
    probs = c(0.9, 0.1)
  ),
  B = list(
    type = "discrete",
    values = c(1, 0),
    probs = c(0.95, 0.05)
  ),
  C = list(
    type = "discrete",
    values = c(1, 0),
    probs = c(0.8, 0.2)
  ),
  D = list(
    type = "normal",
    mean = 10000,
    sd = 2000
  ),
  E = list(
    type = "normal",
    mean = 20000,
    sd = 5000
  ),
  F = list(
    type = "conditional", condition = "A",
    true_dist = list(
      type = "normal",
      mean = 7000,
      sd = 2000
    ),
    false_dist = list(
      type = "normal",
      mean = 3500,
      sd = 1000
    )
  ),
  G = list(
    type = "conditional", condition = "B",
    true_dist = list(
      type = "normal",
      mean = 70000,
      sd = 20000
    ),
    false_dist = list(
      type = "normal",
      mean = 35000,
      sd = 10000
    )
  ),
  H = list(
    type = "conditional", condition = "C",
    true_dist = list(
      type = "normal",
      mean = 200000,
      sd = 40000
    ),
    false_dist = list(
      type = "normal",
      mean = 100000,
      sd = 20000
    )
  ),
  I = list(
    type = "normal",
    mean = 100000,
    sd = 20000
  ),
  J = list(
    type = "normal",
    mean = 25000,
    sd = 5000
  ),
  K = list(
    type = "normal",
    mean = 6500,
    sd = 1500
  ),
  L = list(
    type = "aggregate",
    nodes = c("D")
  ),
  M = list(
    type = "aggregate",
    nodes = c("E")
  ),
  N = list(
    type = "aggregate",
    nodes = c("F")
  ),
  O = list(
    type = "aggregate",
    nodes = c("G")
  ),
  P = list(
    type = "aggregate",
    nodes = c("H")
  ),
  Q = list(
    type = "aggregate",
    nodes = c("I")
  ),
  R = list(
    type = "aggregate",
    nodes = c("J")
  ),
  S = list(
    type = "aggregate",
    nodes = c("K")
  ),
  T = list(
    type = "aggregate",
    nodes = c("L", "M", "N", "O", "P", "Q", "R", "S")
  )
)

Finally, create the Bayesian network using the nodes, edges, and distributions defined above.

library(PRA)
graph <- prob_net(nodes, links, distributions = distributions)

Graph

The Bayesian network can be visualized using the igraph and networkD3 packages. The igraph package provides functions for creating and analyzing graph structures, and the networkD3 package provides functions for creating interactive network visualizations.

library(igraph)
library(networkD3)
g <- graph_from_data_frame(graph$links, vertices = graph$nodes, directed = TRUE)
d3g <- igraph_to_networkD3(g, group = graph$nodes$group)
forceNetwork(Links = d3g$links, Nodes = d3g$nodes, NodeID = "name", Group = "group", Value = "value",
             zoom = TRUE, legend = TRUE, arrows = TRUE, opacity = 0.8, fontSize = 14)

Inference

To analyze the Bayesian network, use probabilistic inference to calculate the probabilities of different outcomes. The probabilities can help in assessing the impact of risks on the project outcomes and in making informed decisions.

simulation_results <- prob_net_sim(graph, num_samples = 1000)

The simulation results can provide estimates of the total project cost, duration, and other outcomes based on the probabilities of the risk events.

hist <- hist(simulation_results$T, breaks = 50, plot = FALSE)
plot(hist, main = "Total Project Cost", xlab = "Project Cost", col = "skyblue", border = "white")

Learning

The prob_net_learn() function can be used to update the probabilities of the risk events based on new information or expert judgment. The updated probabilities can help in refining the project risk analysis and in making better decisions.

For example, if Risk 3 (material price fluctuations) did not occur, the Bayesian network can be updated with the new probability.

updated_results <- prob_net_learn(graph, observations = list(C = "No"),
                                  num_samples = 1000)

The updated results can be compared with the original results to see how the changes in the risk probabilities affect the project outcomes.

hist <- hist(simulation_results$H, breaks = 50, plot = FALSE)
hist2 <- hist(updated_results$H, breaks = 50, plot = FALSE)
plot(hist, main = "Pavement Cost", xlab = "Resource Cost", col = "skyblue", 
     border = "white", ylim = c(0, max(hist$counts, hist2$counts)))
plot(hist2, col = "blue", border = "white", add = TRUE)
legend("topright", legend = c("Original", "Updated"), fill = c("skyblue", "blue"))

Updating

Similarly, the prob_net_update() function can be used to update the structure of the Bayesian network by adding or removing arcs between nodes. This can help in refining the project risk analysis and in making better decisions.

For example, if Risk 1 (delays in permitting and approvals) is no longer a concern, the arc between Risk 1 and Resource 3 (Regulatory Support) can be removed.

remove_links <- data.frame(
  source = c("A"),
  target = c("F"),
  stringsAsFactors = FALSE
)
update_distributions <- list(
  F = list(
    type = "normal",
    mean = 3500, 
    sd = 1000
    )
)
updated_graph <- prob_net_update(graph, remove_links = remove_links,
                                   update_distributions = update_distributions)
updated_results <- prob_net_sim(updated_graph, num_samples = 1000)

Just as before the updated results can be compared with the original results to see how the changes in the network structure affect the project outcomes.

hist <- hist(simulation_results$F, breaks = 50, plot = FALSE)
hist2 <- hist(updated_results$F, breaks = 50, plot = FALSE)
plot(hist, main = "Regulatory Support Cost", xlab = "Resource Cost", 
     col = "skyblue", border = "white", ylim = c(0, max(hist$counts, hist2$counts)))
plot(hist2, col = "blue", border = "white", add = TRUE)
legend("topright", legend = c("Original", "Updated"), fill = c("skyblue", "blue"))

Conclusion

Bayesian networks are powerful tools for project risk analysis and decision making. By modeling the dependencies and uncertainties in a project, Bayesian networks can help project managers assess the impact of risks on project outcomes and make informed decisions. The Bayesian network created in this document represents the relationships between tasks, resources, and risks in a roadway project. The network can be used to analyze the impact of risks on the project outcomes and refine the risk analysis based on new information or expert judgment.