1  Introduction

1.1 R Packages

The following is a list of packages that are used throughout this book that need to be loaded before any analysis. A complete list of all the packages used in the book, as well as code to download these packages, can be found in Chapter 5.

Code
# General Packages:
library(EnvirOmix)
library(tidyverse)
library(table1)
# Load plotting packages:
library(ggplot2)
library(cowplot)
library(ComplexHeatmap)
library(ggh4x)
# Packages for Quasi-mediation:
#LUCIDus Version 3.0.1 is required for this analysis:
library(LUCIDus)
library(networkD3)
library(plotly)
library(htmlwidgets)
library(jsonlite)

In order to replicate the style of the figures in this book, you will also have to set the ggplot theme:

Code
ggplot2::theme_set(cowplot::theme_cowplot())

1.2 The Data

The data used in this project is based off of simulated data from the Human Early Life Exposome (HELIX) cohort (Vrijheid et al. 2014). The data was simulated for one exposure, five omics layers, and one continuous outcome (after publication, this data will be available on github). The format of this data is a named list with 6 elements. It includes separate numeric matrices for each of the 5 omics layers, as well as the exposure and phenotype data. In all datasets in the list, the rows represent individuals and the columns represent omics features. In this analysis, the exposure and outcome are:

  • Exposure: hs_hg_m_resid, representing maternal mercury levels
  • Outcome: ck18_scaled, representing child liver enzyme levels, a major risk factor for non-alcoholic fatty liver disease (NAFLD).

This data is provided in the EnvirOmics package, which you can access in R using the following code:

Code
# Load R package
library(EnvirOmix)

# Load simulated data
data(simulated_data)

# Define exposure and outcome name
exposure <- simulated_data[["phenotype"]]$hs_hg_m_scaled
outcome  <- simulated_data[["phenotype"]]$ck18_scaled

# Get numeric matrix of covariates 
covs <- simulated_data[["phenotype"]][c("e3_sex_None", "hs_child_age_yrs_None")] 
covs$e3_sex_None <- ifelse(covs$e3_sex_None == "male", 1, 0)

# create list of omics data 
omics_lst <- simulated_data[-which(names(simulated_data) == "phenotype")]

# Create data frame of omics data
omics_df <- omics_lst |> 
  purrr::map(~as_tibble(.x, rownames = "name")) |>
  purrr::reduce(left_join, by = "name") |>
  column_to_rownames("name")

1.2.1 Descriptive Statistics

Table 1.1 shows the summary statistics for the exposure and phenotype data in this analysis.

Code
table1::table1(~., data = simulated_data[["phenotype"]][,-1])
Table 1.1: Descriptive Statistics for the Simulated Variables
Overall
(N=420)
hs_child_age_yrs_None
Mean (SD) 7.22 (1.04)
Median [Min, Max] 7.18 [3.93, 10.9]
hs_hg_m_scaled
Mean (SD) -0.0148 (1.02)
Median [Min, Max] -0.0433 [-2.73, 3.34]
ck18_scaled
Mean (SD) -0.0511 (1.03)
Median [Min, Max] 0.00883 [-3.56, 3.09]
e3_sex_None
female 192 (45.7%)
male 228 (54.3%)
h_fish_preg_Ter
1 233 (55.5%)
2 88 (21.0%)
3 99 (23.6%)

1.2.2 Mercury exposure and childhood MAFLD risk

Code
lm_res <- lm(ck18_scaled ~ hs_hg_m_scaled + 
     e3_sex_None +
     hs_child_age_yrs_None,
   data = simulated_data[["phenotype"]])

summary(lm_res)

In the simulated data, each 1 standard deviation increase in maternal mercury was associated with a 0.11 standard deviation increase in CK18 enzymes (Figure 1.1; p=0.02), after adjusting for child age and child sex.

Code
ggplot(data = simulated_data[["phenotype"]], 
       aes(x = hs_hg_m_scaled, y = ck18_scaled)) + 
  geom_point() +
  stat_smooth(method = "lm",
              formula = y ~ x ,
              geom = "smooth") +
  xlab("Maternal Mercury Exposure (Scaled)") +
  ylab("CK-18 Levels (Scaled)")

Figure 1.1: Association between maternal mercury and CK18 in the Simulated Data

1.2.3 Correlation of omics features

Figure 1.2 shows the correlation within and between the omics layers in the simulated data.

Code
# Change omics list elements to dataframes
omics_df <- purrr::map(omics_lst, ~as_tibble(.x, rownames = "name")) %>%
  purrr::reduce(left_join, by = "name") %>%
  column_to_rownames("name")

meta_df <- imap_dfr(purrr::map(omics_lst, ~as_tibble(.x)),
                    ~tibble(omic_layer = .y, ftr_name = names(.x)))

# Correlation Matrix
cormat <- cor(omics_df, method = "pearson")

# Annotations
annotation <- data.frame(
  ftr_name = colnames(cormat),
  index = 1:ncol(cormat)) %>%
  left_join(meta_df, by = "ftr_name") %>%
  mutate(omic_layer = str_to_title(omic_layer))

# Make Plot
Heatmap(cormat, 
        row_split = annotation$omic_layer,
        column_split = annotation$omic_layer,
        show_row_names = FALSE,
        show_column_names = FALSE, 
        column_title_gp = gpar(fontsize = 12),
        row_title_gp = gpar(fontsize = 12),
        heatmap_legend_param = list(title = "Correlation"))

Figure 1.2: Heatmap illustrating the correlation of molecular features within and between different omics layers.