MASH v FLASH GTEx analysis: nonnegative loadings

Last updated: 2018-08-16

workflowr checks: (Click a bullet for more information)

✔ R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
✔ Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
✔ Seed: set.seed(20180609)

The command set.seed(20180609) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
✔ Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
✔ Repository version: be0d4d7
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
```
Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    data/
    Ignored:    docs/.DS_Store
    Ignored:    docs/images/.DS_Store
    Ignored:    docs/images/.Rapp.history
    Ignored:    output/.DS_Store
    Ignored:    output/.Rapp.history
    Ignored:    output/MASHvFLASHgtex/.DS_Store
    Ignored:    output/MASHvFLASHsims/.DS_Store
    Ignored:    output/MASHvFLASHsims/backfit/.DS_Store
    Ignored:    output/MASHvFLASHsims/backfit/.Rapp.history
```
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

Expand here to see past versions:

File	Version	Author	Date	Message
Rmd	be0d4d7	Jason Willwerscheid	2018-08-16	wflow_publish(“analysis/MASHvFLASHgtex3.Rmd”)
html	48615e4	Jason Willwerscheid	2018-08-15	Build site.
Rmd	79641fe	Jason Willwerscheid	2018-08-15	wflow_publish(“analysis/MASHvFLASHgtex3.Rmd”)

Introduction

Here I fit MASH and FLASH objects to the “strong” GTEx dataset using the nonnegative loadings obtained in the previous analysis. See below for code and here for an introduction to the plots below.

Fitting methods

The workflows are similar to those described here.

The workflow for FLASH is identical, except that I use the nonnegative loadings from the previous analysis as my “data-driven” loadings. (To save time, I only use the loadings corresponding to “multi-tissue effects,” since the other loadings are well represented by “canonical” loadings. Further, to aid interpretation, I normalize the loadings so that each has \(\ell_\infty\)-norm equal to 1.)

The workflow for MASH is also identical, except that I use rank-one matrices derived from the nonnegative loadings rather than rank-one matrices obtained using Extreme Deconvolution. More precisely, for each multi-tissue effect \(\ell_i\), I add the matrix \(\ell_i \ell_i^T\) to the list of matrices \(U_i\) in the MASH mixture model.

Results

I pre-run the code and load the fits from file.

library(mashr)

Loading required package: ashr

devtools::load_all("/Users/willwerscheid/GitHub/flashr/")

Loading flashr

source("./code/gtexanalysis.R")

gtex <- readRDS(gzcon(url("https://github.com/stephenslab/gtexresults/blob/master/data/MatrixEQTLSumStats.Portable.Z.rds?raw=TRUE")))
strong <- t(gtex$strong.z)

fpath <- "./output/MASHvFLASHgtex3/"
m_final <- readRDS(paste0(fpath, "m.rds"))
fl_final <- readRDS(paste0(fpath, "fl.rds"))

m_lfsr <- t(get_lfsr(m_final))
m_pm <- t(get_pm(m_final))

fl_lfsr <- readRDS(paste0(fpath, "fllfsr.rds"))
fl_pm <- flash_get_fitted_values(fl_final)

There is significant disagreement between the two fits:

signif_by_mash = as.vector(m_lfsr < 0.05)
signif_by_flash = as.vector(fl_lfsr < 0.05)

round(table(signif_by_mash, signif_by_flash) / length(signif_by_mash),
      digits = 2)

              signif_by_flash
signif_by_mash FALSE TRUE
         FALSE  0.31 0.17
         TRUE   0.08 0.44

Significant for FLASH, not MASH

As discussed in previous analyses (see, for example, here), FLASH often finds a combination of a large unique effect and a small equal effect where MASH only finds the unique effect to be significant (or, more rarely, no effects).

# interesting.tests <- compare_methods(fl_lfsr, m_lfsr, fl_pm, m_pm)

par(mfrow=c(1, 2))

identical.plus.unique <- c(336, 7526, 2696, 611)
for (n in identical.plus.unique) {
  plot_test(n, fl_lfsr, fl_pm, "FLASH")
  plot_test(n, m_lfsr, m_pm, "MASH")
}

Expand here to see past versions of flash_not_mash-1.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of flash_not_mash-2.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of flash_not_mash-3.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of flash_not_mash-4.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Significant for MASH, not FLASH

In many cases, MASH finds all effects to be significant where FLASH only finds a single unique effect. What happens in these cases, I think, is that MASH puts nearly all posterior weight on a single data-driven covariance matrix. Since these covariance matrices are each rank-one with no entries that are exactly equal to zero, even tiny effects will appear significant (since they are sampled away from zero nearly all of the time).

In contrast, since the FLASH model allows for covariance structures that are linear combinations of these rank-one matrices, the FLASH posteriors are far less likely to follow these patterns of perfect correlation.

par(mfrow=c(1, 2))

mash.not.flash <- c(228, 65, 2903, 1460)
for (n in mash.not.flash) {
  plot_test(n, fl_lfsr, fl_pm, "FLASH")
  plot_test(n, m_lfsr, m_pm, "MASH")
}

Expand here to see past versions of mash_not_flash-1.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of mash_not_flash-2.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of mash_not_flash-3.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of mash_not_flash-4.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Different posterior means

As discussed here, MASH often does poorly when effect sizes are uniformly large.

par(mfrow=c(1, 2))

diff_pms <- c(9956, 5293, 13168, 3680)
for (n in diff_pms) {
  plot_test(n, fl_lfsr, fl_pm, "FLASH")
  plot_test(n, m_lfsr, m_pm, "MASH")
}

Expand here to see past versions of diff_pms-1.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of diff_pms-2.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of diff_pms-3.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Expand here to see past versions of diff_pms-4.png:

Version	Author	Date
48615e4	Jason Willwerscheid	2018-08-15

Code

Click “Code” to view the code used to obtain the fits used in this analysis.

# devtools::install_github("stephenslab/flashr")
devtools::load_all("/Users/willwerscheid/GitHub/flashr/")
# devtools::install_github("stephenslab/ebnm")
devtools::load_all("/Users/willwerscheid/GitHub/ebnm/")

library(mashr)
source("./code/utils.R")

gtex <- readRDS(gzcon(url("https://github.com/stephenslab/gtexresults/blob/master/data/MatrixEQTLSumStats.Portable.Z.rds?raw=TRUE")))

strong <- t(gtex$strong.z)
random <- t(gtex$random.z)

strong_data <- flash_set_data(strong, S = 1)
random_data <- flash_set_data(random, S = 1)

# The multi-tissue effects from the previous analysis will serve as
#   the data-driven loadings:
nn <- readRDS("./output/MASHvFLASHnn/fl.rds")
multi <- c(2, 5, 6, 8, 11:13, 17, 22:25, 31)
n <- nrow(strong)
dd <- nn$EL[, multi]
dd <- dd / rep(apply(dd, 2, max), each=n) # normalize


## FLASH fit ------------------------------------------------------------

canonical <- cbind(rep(1, n), diag(rep(1, n)))
LL <- cbind(canonical, dd)

fl_random <- flash_add_fixed_loadings(random_data, LL)
system.time(
  fl_random <- flash_backfit(random_data,
                             fl_random,
                             var_type = "zero",
                             ebnm_fn = "ebnm_pn",
                             nullcheck = FALSE)
)
# 26 minutes (62 iterations)
saveRDS(fl_random$gf, "./output/MASHvFLASHgtex3/flgf.rds")

gf <- readRDS("./output/MASHvFLASHgtex3/flgf.rds")
fl_final <- flash_add_fixed_loadings(strong_data, LL)
ebnm_param_f = lapply(gf, function(g) {list(g=g, fixg=TRUE)})
system.time(
  fl_final <- flash_backfit(strong_data,
                            fl_final,
                            var_type = "zero",
                            ebnm_fn = "ebnm_pn",
                            ebnm_param = list(f = ebnm_param_f, l = list()),
                            nullcheck = FALSE)
)
# 21 minutes (75 iterations)
saveRDS(fl_final, "./output/MASHvFLASHgtex3/fl.rds")

nsamp <- 200
system.time({
  sampler <- flash_sampler(strong_data, fl_final, fixed="loadings")
  samp <- sampler(200)
  fl_lfsr <- flash_lfsr(samp)
})
# 2 minutes
saveRDS(fl_lfsr, "./output/MASHvFLASHgtex3/fllfsr.rds")


# MASH fit --------------------------------------------------------------

strong_data <- mash_set_data(t(strong), Shat = 1)
random_data <- mash_set_data(t(random), Shat = 1)

U.flash <- list()
for (i in 1:ncol(dd)) {
  U.flash[[i]] <- outer(dd[, i], dd[, i])
}
U.c <- cov_canonical(random_data)
system.time(m_random <- mash(random_data, Ulist = c(U.c, U.flash)))
# 4 minutes

system.time(
  m_final <- mash(strong_data, g=get_fitted_g(m_random), fixg=TRUE)
)
saveRDS(m_final, "./output/MASHvFLASHgtex3/m.rds")
# 19 seconds

Session information

sessionInfo()

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] flashr_0.5-14 mashr_0.2-7   ashr_2.2-10  

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.17        pillar_1.2.1        compiler_3.4.3     
 [4] git2r_0.21.0        plyr_1.8.4          workflowr_1.0.1    
 [7] R.methodsS3_1.7.1   R.utils_2.6.0       iterators_1.0.9    
[10] tools_3.4.3         testthat_2.0.0      digest_0.6.15      
[13] tibble_1.4.2        gtable_0.2.0        evaluate_0.10.1    
[16] memoise_1.1.0       lattice_0.20-35     rlang_0.2.0        
[19] Matrix_1.2-12       foreach_1.4.4       commonmark_1.4     
[22] yaml_2.1.17         parallel_3.4.3      ebnm_0.1-12        
[25] mvtnorm_1.0-7       xml2_1.2.0          withr_2.1.1.9000   
[28] stringr_1.3.0       knitr_1.20          roxygen2_6.0.1.9000
[31] devtools_1.13.4     rprojroot_1.3-2     grid_3.4.3         
[34] R6_2.2.2            rmarkdown_1.8       rmeta_3.0          
[37] ggplot2_2.2.1       magrittr_1.5        whisker_0.3-2      
[40] scales_0.5.0        backports_1.1.2     codetools_0.2-15   
[43] htmltools_0.3.6     MASS_7.3-48         assertthat_0.2.0   
[46] softImpute_1.4      colorspace_1.3-2    stringi_1.1.6      
[49] lazyeval_0.2.1      munsell_0.4.3       doParallel_1.0.11  
[52] pscl_1.5.2          truncnorm_1.0-8     SQUAREM_2017.10-1  
[55] R.oo_1.21.0

This reproducible R Markdown analysis was created with workflowr 1.0.1