Last updated: 2018-09-05

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Environment: empty

    Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

  • Seed: set.seed(20180609)

    The command set.seed(20180609) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

  • Session information: recorded

    Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

  • Repository version: d17ac04

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .DS_Store
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    data/
        Ignored:    docs/.DS_Store
        Ignored:    docs/images/.DS_Store
        Ignored:    docs/images/.Rapp.history
        Ignored:    output/.DS_Store
        Ignored:    output/.Rapp.history
        Ignored:    output/MASHvFLASHgtex/.DS_Store
        Ignored:    output/MASHvFLASHsims/.DS_Store
        Ignored:    output/MASHvFLASHsims/backfit/.DS_Store
        Ignored:    output/MASHvFLASHsims/backfit/.Rapp.history
    
    Untracked files:
        Untracked:  code/MASHvFLASHnn2.R
        Untracked:  code/MASHvFLASHnnrefine.R
        Untracked:  output/MASHvFLASHnn2/
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    Rmd d17ac04 Jason Willwerscheid 2018-09-05 wflow_publish(“analysis/MASHvFLASHnondiagonalV.Rmd”)
    html 603daf1 Jason Willwerscheid 2018-08-30 Build site.
    Rmd 4ca600b Jason Willwerscheid 2018-08-30 wflow_publish(c(“analysis/MASHvFLASHnondiagonalV.Rmd”,
    html fb53e92 Jason Willwerscheid 2018-08-30 Build site.
    Rmd 861ba07 Jason Willwerscheid 2018-08-30 wflow_publish(“analysis/MASHvFLASHnondiagonalV.Rmd”)
    html 273efe5 Jason Willwerscheid 2018-08-30 Build site.
    Rmd fad7859 Jason Willwerscheid 2018-08-30 wflow_publish(“analysis/MASHvFLASHnondiagonalV.Rmd”)


Introduction

Here I again fit nonnegative loadings to the “strong” GTEx dataset, but I now use the trick discussed here to fit a non-diagonal error covariance matrix \(V\).

I pre-run the code below and load the results from file.

Comparison with previous results

Whereas my earlier analysis (which implicitly assumed that \(V = I\)) found 39 data-driven loadings, here I was only able to obtain 26.

Similar loadings

I first plot the new loadings side by side with the previous loadings to which they most closely correspond. Note that, in general, the previous loadings tend to be sparser than these new loadings. Further, this new approach finds two unique effects (caudate basal ganglia and nucleus accumbens basal ganglia) where the previous approach very plausibly found correlations among three types of basal ganglia (loading 25). However, the adipose tissue effects (loading 11) do not get tangled up with the tibial nerve tissue effect (as they did in the previous approach), and there is a single loading describing correlations among skin tissues (loading 3) rather than two largely overlapping loadings.

nondiag <- readRDS("./output/MASHvFLASHnondiagonalV/2dRepeat3.rds")
diag <- readRDS("./output/MASHvFLASHnn/fl.rds")

nondiag_order <- c(44:46, 46:54, 54:nondiag$nfactors)
diag_order <- c(1, 3, 6, 31, 10,
                4, 9, 7, 15,
                5, 13, 8, 23, 20,
                21, 12, 14, 16,
                17, 18, 19, 2,
                24, 25, 34, 32,
                25, 28)
                
diag_not_nondiag <- setdiff(c(1:29, 32:40), diag_order)

tissue_names <- rownames(diag$EL)

missing.tissues <- c(7, 8, 19, 20, 24, 25, 31, 34, 37)
gtex.colors <- read.table("https://github.com/stephenslab/gtexresults/blob/master/data/GTExColors.txt?raw=TRUE", sep = '\t', comment.char = '')[-missing.tissues, 2]
gtex.colors <- as.character(gtex.colors)

par(mfrow=c(2,2))
display_names <- strtrim(tissue_names, 20)
for(i in 1:length(nondiag_order)) {
  plot_names <- display_names
  
  next_nondiag <- nondiag$fit$EL[, nondiag_order[i]]
  plot_names[next_nondiag < 0.25 * max(next_nondiag)] <- ""
  barplot(next_nondiag, 
          main=paste0('Nondiagonal V, loading ', nondiag_order[i] - 43), 
          las=2, cex.names=0.4, yaxt='n',
          col=gtex.colors, names=plot_names)
          
  barplot(diag$EL[, diag_order[i]], 
          main=paste0('Diagonal V, loading ', diag_order[i]), 
          las=2, cex.names=0.4, yaxt='n',
          col=gtex.colors, names=plot_names)
}

Expand here to see past versions of shared-1.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-2.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-3.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-4.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-5.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-6.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-7.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-8.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-9.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-10.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-11.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-12.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-13.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of shared-14.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Differences

Next, I plot previous loadings that do not correspond to any of the new loadings. One of these (loading 11) may indeed be an artefactual loading that corresponds to covariance in errors rather than biological reality. However, one of the others (loading 22) describes a plausible correlation among ovarian, uterine, and vaginal tissues. The remaining are unique effects.

par(mfrow=c(2,2))
for (i in 1:length(diag_not_nondiag)) {
  plot_names <- display_names
  
  next_diag <- diag$EL[, diag_not_nondiag[i]]
  plot_names[next_diag < 0.25 * max(next_diag)] <- ""
  barplot(next_diag, 
          main=paste0('Diagonal, loading ', diag_not_nondiag[i]), 
          las=2, cex.names=0.4, yaxt='n',
          col=gtex.colors, names=plot_names)
}

Expand here to see past versions of extra-1.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of extra-2.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Expand here to see past versions of extra-3.png:
Version Author Date
273efe5 Jason Willwerscheid 2018-08-30

Code

Click “Code” to view the code used to obtain the above results.

devtools::load_all("/Users/willwerscheid/GitHub/flashr/")
devtools::load_all("/Users/willwerscheid/GitHub/ebnm/")

library(mashr)

gtex <- readRDS(gzcon(url("https://github.com/stephenslab/gtexresults/blob/master/data/MatrixEQTLSumStats.Portable.Z.rds?raw=TRUE")))

strong <- gtex$strong.z
random <- gtex$random.z

# Step 1. Estimate correlation structure using MASH.

m_random <- mash_set_data(random, Shat = 1)
Vhat <- estimate_null_correlation(m_random)

# Step 2. Estimate data-driven loadings using FLASH.

# Step 2a. Fit Vhat.
n <- nrow(Vhat)
lambda.min <- min(eigen(Vhat, symmetric=TRUE, only.values=TRUE)$values)

data <- flash_set_data(Y, S = sqrt(lambda.min))

W.eigen <- eigen(Vhat - diag(rep(lambda.min, n)), symmetric=TRUE)
# The rank of W is at most n - 1, so we can drop the last eigenval/vec:
W.eigen$values <- W.eigen$values[-n]
W.eigen$vectors <- W.eigen$vectors[, -n, drop=FALSE]

fl <- flash_add_fixed_loadings(data,
                               LL=W.eigen$vectors,
                               init_fn="udv_svd",
                               backfit=FALSE)

ebnm_param_f <- lapply(as.list(W.eigen$values),
                       function(eigenval) {
                         list(g = list(a=1/eigenval, pi0=0), fixg = TRUE)
                       })
ebnm_param_l <- lapply(vector("list", n - 1),
                       function(k) {list()})
fl <- flash_backfit(data,
                    fl,
                    var_type="zero",
                    ebnm_fn="ebnm_pn",
                    ebnm_param=(list(f = ebnm_param_f, l = ebnm_param_l)),
                    nullcheck=FALSE)

# Step 2b. Add nonnegative factors.
ebnm_fn = list(f = "ebnm_pn", l = "ebnm_ash")
ebnm_param = list(f = list(warmstart = TRUE),
                  l = list(mixcompdist="+uniform"))
fl <- flash_add_greedy(data,
                       Kmax=50,
                       f_init=fl,
                       var_type="zero",
                       init_fn="udv_svd",
                       ebnm_fn=ebnm_fn,
                       ebnm_param=ebnm_param)
saveRDS(fl, "./output/MASHvFLASHVhat/2bGreedy.rds")

# Step 2c (optional). Backfit factors from step 2b.
fl <- flash_backfit(data,
                    fl,
                    kset=n:fl$nfactors,
                    var_type="zero",
                    ebnm_fn=ebnm_fn,
                    ebnm_param=ebnm_param,
                    nullcheck=FALSE)
saveRDS(fl, "./output/MASHvFLASHVhat/2cBackfit.rds")

# Step 2d (optional). Repeat steps 2b and 2c as desired.
fl <- flash_add_greedy(data,
                       Kmax=50,
                       f_init=fl,
                       var_type="zero",
                       init_fn="udv_svd",
                       ebnm_fn=ebnm_fn,
                       ebnm_param=ebnm_param)
fl <- flash_backfit(data,
                    fl,
                    kset=n:fl$nfactors,
                    var_type="zero",
                    ebnm_fn=ebnm_fn,
                    ebnm_param=ebnm_param,
                    nullcheck=FALSE)
saveRDS(fl, "./output/MASHvFLASHVhat/2dRepeat3.rds")

Session information

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.0.1   Rcpp_0.12.17      digest_0.6.15    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] git2r_0.21.0      magrittr_1.5      evaluate_0.10.1  
[10] stringi_1.1.6     whisker_0.3-2     R.oo_1.21.0      
[13] R.utils_2.6.0     rmarkdown_1.8     tools_3.4.3      
[16] stringr_1.3.0     yaml_2.1.17       compiler_3.4.3   
[19] htmltools_0.3.6   knitr_1.20       

This reproducible R Markdown analysis was created with workflowr 1.0.1