2  High Dimensional Multiomic Mediation

2.1 Load data and packages

Before starting, load the data and packages using the following code.

Code
# Load R package
library(EnvirOmix)

# Load simulated data
data(simulated_data)

# Define exposure and outcome name
exposure <- simulated_data[["phenotype"]]$hs_hg_m_scaled
outcome  <- simulated_data[["phenotype"]]$ck18_scaled

# Get numeric matrix of covariates 
covs <- simulated_data[["phenotype"]][c("e3_sex_None", "hs_child_age_yrs_None")] 
covs$e3_sex_None <- ifelse(covs$e3_sex_None == "male", 1, 0)

# create list of omics data 
omics_lst <- simulated_data[-which(names(simulated_data) == "phenotype")]

# Create data frame of omics data
omics_df <- omics_lst |> 
  purrr::map(~as_tibble(.x, rownames = "name")) |>
  purrr::reduce(left_join, by = "name") |>
  column_to_rownames("name")

2.2 Early integration

High dimensional multiomic mediation with early multiomic integration (Figure 1, panel a) identified differentially methylated CpG sites, gene transcript clusters, metabolites, and proteins which mediated associations of prenatal mercury with MAFLD risk in adolescents (Figure 2.1). Combining all omics layers before analysis identifies the strongest mediating feature across all omics layers without accounting for the differences in underlying correlation structure. For this analysis, we used High Dimensional Mediation Analysis (HIMA), a penalization-based mediation method implemented in the R package HIMA (Zhang et al. 2016).

Code
# Run Analysis
result_hima_early <- hidimum(exposure = exposure,
                             outcome = outcome, 
                             omics_lst = omics_lst, 
                             covs = covs,
                             Y.family = "gaussian", 
                             M.family = "gaussian", 
                             integration = "early")
# Plot Result
plot_hidimum(result_hima_early)

Figure 2.1: High dimensional mediation analysis with early integration and multiple omic layers identifies individual molecular features linking maternal mercury with childhood liver injury. Alpha represents the coefficient estimates of the exposure to the mediator, Beta indicates the coefficient estimates of the mediators to the outcome, and TME (%) represents the percent total effect mediated calculated as alpha*beta/gamma.

2.3 Intermediate Integration

High dimensional mediation with intermediate multiomic integration (Figure 1, panel b) identified differentially methylated CpG sites, gene transcript clusters, and miRNA which mediated associations of prenatal mercury with MAFLD risk in adolescents (Figure 2.2). Combining all omics layers before analysis identifies the strongest mediating feature across all omics layers without accounting for the differences in underlying correlation structure.

For this analysis, we use a novel two-step approach that incorporates feature level metadata to inform on feature selection using xtune (Zeng, Thomas, and Lewinger 2021), an approach that allows for feature-specific penalty parameters and, in this example, performs a group-lasso-type shrinkage within each omic dataset. This analysis is similar in theory to HIMA, with the difference being that for this method the penalization can vary across each of the omic layers. This analysis was based on the product of coefficients method for mediation and was performed in three steps:

  1. Regression 1 (linear regression):

First, we performed independent linear regression models for all exposure-mediator associations to get the exposure mediator coefficient:

\[\begin{equation} m_i = a_0 + a_1 \times x \label{eq:reg1} \end{equation}\]

   Where \(x\) is the exposure and \(m_i\) is each mediator.

  1. Regression 2 (group lasso):

Second, we performed a single group lasso regression for the mediator outcome associations, adjusting for the exposure, using the R package xtune (He and Zeng 2023). This step provided coefficients for each of the mediator outcome associations. We used bootstrapping to obtain the standard error of the coefficients from the group lasso regression for each of the mediator coefficients.

\[\begin{equation} y = b_0 + b_1 \times x + b_{2_i} \times M \label{eq:reg2} \end{equation}\]

   Where \(y\) is the outcome, \(x\) is the exposure, \(M\) is the mediator matrix with corresponding estimate \(b_{2_i}\). The bootstrapped standard error (se) of \(b_{2_i}\) is used for calculating mediation confidence intervals.

  1. Calculate mediation confidence interval for mediator ( i ):

Finally, for each omic feature, we calculated the mediation effect and 95% confidence intervals using the R package RMediation, which is based on the distribution-of-the-product method (Tofighi and MacKinnon 2011).

\[\begin{align} \alpha &= a_1 \label{eq:alpha}\\ \text{se of } \alpha &= \text{se of } a_1 \label{eq:se_alpha}\\ \beta &= b_{2_i} \label{eq:beta}\\ \text{se of } \beta &= \text{se of } b_{2_i} \label{eq:se_beta} \end{align}\]

Note: For the actual analysis, we would normally set n_boot to a higher value (1000 or more). In the example code, it is set to 12 to improve the speed of the function.

Code
# Run Analysis
result_hima_intermediate <- hidimum(omics_lst = omics_lst,
                                    covs = covs,
                                    outcome = outcome,
                                    exposure = exposure,
                                    n_boot = 12,
                                    Y.family = "gaussian", 
                                    integration = "intermediate")
Code
# plot
plot_hidimum(result_hima_intermediate)

Figure 2.2: High dimensional mediation analysis with intermediate integration and multiple omic layers identifies individual molecular features linking maternal mercury with childhood liver injury. Alpha represents the coefficient estimates of the exposure to the mediator, Beta indicates the coefficient estimates of the mediators to the outcome, and TME (%) represents the percent total effect mediated calculated as alpha*beta/gamma.

2.4 Late integration

High dimensional mediation with late multiomic integration (Figure 1, panel c) identified differentially methylated CpG sites, gene transcript clusters, and miRNA which mediated associations of prenatal mercury with MAFLD risk in adolescents (Figure 2.3). High dimensional mediation with late multiomic integration differs from the early and intermediate integration in that each omics layer is analyzed individually. Thus, this approach does not condition on features within the other omics layers in the analysis.

Combining all omics layers before analysis identifies the strongest mediating feature across all omics layers without accounting for the differences in underlying correlation structure. For this analysis, we used High Dimensional Mediation Analysis (HIMA), a penalization-based mediation method implemented in the R package HIMA (Zhang et al. 2016).

Code
# Run Analysis
result_hima_late <- hidimum(exposure = exposure,
                            outcome = outcome, 
                            omics_lst = omics_lst, 
                            covs = covs, 
                            Y.family = "gaussian", 
                            M.family = "gaussian", 
                            integration = "late")

# plot
plot_hidimum(result_hima_late)

Figure 2.3: High dimensional mediation analysis with late integration and multiple omic layers identifies individual molecular features linking maternal mercury with childhood liver injury. Alpha represents the coefficient estimates of the exposure to the mediator, Beta indicates the coefficient estimates of the mediators to the outcome, and TME (%) represents the percent total effect mediated calculated as alpha*beta/gamma.