Empirical Bayes matrix factorization

Fits an empirical Bayes matrix factorization (see Details for a description of the model). The resulting fit is referred to as a "flash" object (short for Factors and Loadings using Adaptive SHrinkage). Two interfaces are provided. The flash function provides a simple interface that allows a flash object to be fit in a single pass, while flash_xxx functions are pipeable functions that allow for more complex flash objects to be fit incrementally (available functions are listed below under See Also). See the vignettes and Examples for usage.

flash(
  data,
  S = NULL,
  ebnm_fn = ebnm_point_normal,
  var_type = 0L,
  greedy_Kmax = 50L,
  backfit = FALSE,
  nullcheck = TRUE,
  verbose = 1L
)

Arguments

data

The observations. Usually a matrix, but can also be a sparse matrix of class Matrix or a low-rank matrix representation as returned by, for example, svd, irlba, rsvd, or softImpute (in general, any list that includes fields u, d, and v will be interpreted as a low-rank matrix representation).

S

The standard errors. Can be NULL (in which case all residual variance will be estimated) or a matrix, vector, or scalar. S should be a scalar if standard errors are identical across observations. It should be a vector if standard errors either vary across columns but are constant within any given row, or vary across rows but are constant within any given column (flash will use the length of the vector to determine whether the supplied values correspond to rows or columns; if the data matrix is square, then the sense must be specified using parameter S_dim in function flash_init).

ebnm_fn

The function or functions used to solve the empirical Bayes normal means (EBNM) subproblems. Most importantly, these functions specify the families of distributions $G_\ell^{(k)}$ and $G_f^{(k)}$ to which the priors on loadings and factors $g_\ell^{(k)}$ and $g_f^{(k)}$ are assumed to belong. If the same function is to be used for both loadings $L$ and factors $F$, then ebnm_fn can be a single function. If one function is to be used for loadings and a second for factors, then ebnm_fn should be a list of length two, with the first element giving the function for loadings and the second the function for factors. If different functions are to be used for different values of $k$, then factor/loadings pairs must be added successively using multiple calls to either flash_greedy or flash_factors_init.

Any EBNM function provided by package ebnm can be used as input. Non-default arguments to parameters can be supplied using the helper function flash_ebnm. Custom EBNM functions can also be used: for details, see flash_ebnm.

var_type

Describes the structure of the estimated residual variance. Can be NULL, 0, 1, 2, or c(1, 2). If NULL, then S accounts for all residual variance. If var_type = 0, then the estimated residual variance (which is added to any variance given by S) is assumed to be constant across all observations. Setting var_type = 1 estimates a single variance parameter for each row; var_type = 2 estimates one parameter for each column; and var_type = c(1, 2) optimizes over all rank-one matrices (that is, it assumes that the residual variance parameter $s_{ij}$ can be written $s_{ij} = a_i b_j$, where the $n$-vector $a$ and the $p$-vector $b$ are to be estimated).

Note that if any portion of the residual variance is to be estimated, then it is usually faster to set S = NULL and to let flash estimate all of the residual variance. Further, var_type = c(1, 2) is typically much slower than other options, so it should be used with care.

greedy_Kmax

The maximum number of factors to be added. This will not necessarily be the total number of factors added by flash, since factors are only added as long as they increase the variational lower bound on the log likelihood for the model.

backfit

A "greedy" fit is performed by adding up to greedy_Kmax factors, optimizing each newly added factor in one go without returning to optimize previously added factors. When backfit = TRUE, flash will additionally perform a final "backfit" where all factors are cyclically updated until convergence. The backfitting procedure typically takes much longer than the greedy algorithm, but it also usually improves the final fit to a significant degree.

nullcheck

If nullcheck = TRUE, then flash will check that each factor in the final flash object improves the overall fit. Any factor that fails the check will be removed.

verbose

When and how to display progress updates. Set to 0 for none, 1 for updates after a factor is added or a backfit is completed, 2 for additional notifications about the variational lower bound, and 3 for updates after every iteration. It is also possible to output a single tab-delimited table of values using function flash_set_verbose with verbose = -1.

Value

A flash object. Contains elements:

n_factors: The total number of factor/loadings pairs $K$ in the fitted model.
pve: The proportion of variance explained by each factor/loadings pair. Since factors and loadings are not required to be orthogonal, this should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
elbo: The variational lower bound achieved by the fitted model.
residuals_sd: Estimated residual standard deviations (these include any variance component given as an argument to S).
L_pm, L_psd, L_lfsr: Posterior means, standard deviations, and local false sign rates for loadings $L$.
F_pm, F_psd, F_lfsr: Posterior means, standard deviations, and local false sign rates for factors $F$.
L_ghat: The fitted priors on loadings $\hat{g}_\ell^{(k)}$.
F_ghat: The fitted priors on factors $\hat{g}_f^{(k)}$.
sampler: A function that takes a single argument nsamp and returns nsamp samples from the posterior distributions for factors $F$ and loadings $L$.
flash_fit: A flash_fit object. Used by flash when fitting is not performed all at once, but incrementally via calls to various flash_xxx functions.

The following methods are available:

fitted.flash: Returns the "fitted values" $E(LF') = E(L) E(F)'$.
residuals.flash: Returns the expected residuals $Y - E(LF') = Y - E(L) E(F)'$.
ldf.flash: Returns an $LDF$ decomposition (see Details above), with columns of $L$ and $F$ scaled as specified by the user.

Details

If $Y$ is an $n \times p$ data matrix, then the rank-one empirical Bayes matrix factorization model is: $$Y = \ell f' + E,$$ where $\ell$ is an $n$-vector of loadings, $f$ is a $p$-vector of factors, and $E$ is an $n \times p$ matrix of residuals (or "errors"). Additionally: $$e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p$$ $$\ell \sim g_\ell \in G_\ell$$ $$f \sim g_f \in G_f.$$ The residual variance parameters $s_{ij}^2$ are constrained to have a simple structure and are fit via maximum likelihood. (For example, one might assume that all standard errors are identical: $s_{ij}^2 = s^2$ for some $s^2$ and for all $i$, $j$). The functions $g_\ell$ and $g_f$ are assumed to belong to some families of priors $G_\ell$ and $G_f$ that are specified in advance, and are estimated via variational approximation.

The general rank-$K$ empirical Bayes matrix factorization model is: $$Y = LF' + E$$ or $$y_{ij} = \sum_k \ell_{ik} f_{jk} + e_{ij}: i = 1, ..., n; j = 1, ..., p,$$ where $L$ is now a matrix of loadings and $F$ is a matrix of factors.

Separate priors $g_\ell^{(k)}$ and $g_f^{(k)}$ are estimated via empirical Bayes, and different prior families may be used for different values of $k$. In general, then: $$e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p$$ $$\ell_{ik} \sim g_\ell^{(k)} \in G_\ell^{(k)}: i = 1, ..., n; k = 1, ..., K$$ $$f_{ik} \sim g_f^{(k)} \in G_f^{(k)}: j = 1, ..., p; k = 1, ..., K.$$

Typically, $G_\ell^{(k)}$ and $G_f^{(k)}$ will be closed under scaling, in which case $\ell_k$ and $f_k$ are only identifiable up to a scaling factor $d_k$. In other words, we can write: $$Y = LDF' + E,$$ where $D$ is a diagonal matrix with diagonal entries $d_1, ..., d_K$. The model can then be made identifiable by constraining the scale of $\ell_k$ and $f_k$ for $k = 1, ..., K$.

References

Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1--40.

Examples

data(gtex)

# Fit up to 3 factors and backfit.
fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Backfitting 3 factors (tolerance: 6.56e-04)...
#>   Difference between iterations is within 1.0e+01...
#>   Difference between iterations is within 1.0e+00...
#>   Difference between iterations is within 1.0e-01...
#>   Difference between iterations is within 1.0e-02...
#>   Difference between iterations is within 1.0e-03...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.

# This is equivalent to the series of calls:
fl <- flash_init(gtex) |>
  flash_greedy(Kmax = 3L) |>
  flash_backfit() |>
  flash_nullcheck()
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Backfitting 3 factors (tolerance: 6.56e-04)...
#>   Difference between iterations is within 1.0e+01...
#>   Difference between iterations is within 1.0e+00...
#>   Difference between iterations is within 1.0e-01...
#>   Difference between iterations is within 1.0e-02...
#>   Difference between iterations is within 1.0e-03...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.

# Fit a unimodal distribution with mean zero to each set of loadings
#   and a scale mixture of normals with mean zero to each factor.
fl <- flash(gtex,
            ebnm_fn = c(ebnm_unimodal,
                        ebnm_normal_scale_mixture),
            greedy_Kmax = 3)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.

# Fit point-laplace priors using a non-default optimization method.
fl <- flash(gtex,
            ebnm_fn = flash_ebnm(prior_family = "point_laplace",
                                 optmethod = "trust"),
            greedy_Kmax = 3)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.

# Fit a "Kronecker" (rank-one) variance structure (this can be slow).
fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.