Skip to contents

Introduction

This vignette demonstrates various data management techniques commonly used in reliability analysis. It covers importing data from different sources, arranging and cleaning the data, handling missing values, and exporting the cleaned data for further analysis.

Importing Data

Most examples in this package use data that is included with the package. However, in practice, you will often need to import data from external sources. Here are some common methods for importing data into R:

Importing from CSV Files

You can use the read.csv() function to import data from CSV files. For example:

data <- read.csv("path/to/your/data.csv")
head(data)

Importing from Excel Files

You can use the readxl package to import data from Excel files. For example:

library(readxl)
data <- read_excel("path/to/your/data.xlsx")
head(data)

There are many other data import methods available in R, including importing from databases, JSON files, and more. The choice of method will depend on the format of your data and your specific needs. To learn more about data import in R, you can refer to the R Data Import/Export documentation.

Arranging Data

Once you have imported your data, you may need to arrange it in a format suitable for reliability analysis. ReliaGrowR functions typically expect data in a “long” format, where each row represents a single observation or event. Specifically, for reliability data, you will need data for:

  • Times: The time points at which failures were observed.
  • Failures: The number of failures observed at each time point.

This data can be passed to functions as separate vectors or as a data frame with appropriate column names. For example, data vectors can be created as follows:

times <- c(100, 200, 300, 400, 500)
failures <- c(2, 3, 5, 7, 11)

Alternatively, you can create a data frame:

data <- data.frame(col1 = times, col2 = failures)
head(data)
#>   col1 col2
#> 1  100    2
#> 2  200    3
#> 3  300    5
#> 4  400    7
#> 5  500   11

If importing data using the read.csv() or read_excel() functions, you may need to rename columns to match the expected format. You can use the colnames() function to rename columns:

colnames(data) <- c("times", "failures")
head(data)
#>   times failures
#> 1   100        2
#> 2   200        3
#> 3   300        5
#> 4   400        7
#> 5   500       11

Censored Data

In reliability analysis, censored data refers to observations where the exact failure time is not known. This can occur in various scenarios, such as when a test is terminated before all units have failed or when some units are still functioning at the end of the observation period. Censored data can be classified into different types:

  • Right Censoring: This occurs when the failure time is known to be greater than a certain value. For example, if a unit is still functioning at the end of the test period, its failure time is right-censored.
  • Left Censoring: This occurs when the failure time is known to be less than a certain value. For example, if a unit fails before the start of the observation period, its failure time is left-censored.
  • Interval Censoring: This occurs when the failure time is known to lie within a certain interval. For example, if a unit is inspected at regular intervals, and it is known to have failed between two inspection times, its failure time is interval censored.

The weibull_to_rga utility function can be used to convert censored data into a format suitable for reliability growth analysis. This function converts data in the following ways:

  • Failure times: Failure times are converted to cumulative times and failure counts.
  • Right-censored times: Right-censored times are included in the cumulative time calculations but do not contribute to failure counts.
  • Interval-censored times: Interval-censored times are handled by taking the midpoint of the interval for the failure time.

For example, consider the following data that includes failure times, right-censored times, and interval-censored times:

failures <- c(100, 200, 200, 400)
right_censored <- c(250, 350, 450)
interval_starts <- c(150, 300)
interval_ends <- c(180, 320)

To convert this data for reliability growth analysis, you can use the weibull_to_rga function as follows:

library(ReliaGrowR)
result <- weibull_to_rga(failures, right_censored, interval_starts, interval_ends)
head(result)
#>   CumulativeTime Failures
#> 1            100        1
#> 2            265        1
#> 3            465        2
#> 6           1225        1
#> 8           1975        1

The resulting data frame contains cumulative times and failure counts, which can then be passed directly to the rga function for reliability growth analysis.

Missing, NAs, NANs and Infs

When working with real-world data, you may encounter missing values, represented as NA, NaN, or Inf in R. It is important to handle these values appropriately before performing any analysis. Here are some common techniques for dealing with missing values:

Removing Missing Values

You can use the na.omit() function to remove rows with missing values from your data frame. For example:

data <- data.frame(
  times = c(100, 200, NA, 400, 500),
  failures = c(2, NA, 5, 7, 11)
)
cleaned_data <- na.omit(data)
head(cleaned_data)
#>   times failures
#> 1   100        2
#> 4   400        7
#> 5   500       11

Replacing Missing Values

You can replace missing values with a specific value, such as the mean or median failure time. For example, to replace missing values in the times and failures columns with the mean values, you can use the following:

data <- data.frame(
  times = c(100, 200, NA, 400, 500),
  failures = c(2, NA, 5, 7, 11)
)
data$times[is.na(data$times)] <-
  mean(
    data$times,
    na.rm = TRUE
  )
data$failures[is.na(data$failures)] <-
  ceiling(
    mean(
      data$failures,
      na.rm = TRUE
    )
  )
head(data)
#>   times failures
#> 1   100        2
#> 2   200        7
#> 3   300        5
#> 4   400        7
#> 5   500       11

Exporting Data

Once you have cleaned and arranged your data, you may want to export it for further analysis or sharing. Here are some common methods for exporting data from R:

Exporting to CSV Files

You can use the write.csv() function to export data to CSV files. For example:

write.csv(data, "path/to/your/cleaned_data.csv", row.names = FALSE)

Exporting to Excel Files

You can use the writexl package to export data to Excel files. For example:

library(writexl)
write_xlsx(data, "path/to/your/cleaned_data.xlsx")

To learn more about data export in R, you can refer to the R Data Import/Export documentation.