Fits an empirical Bayes matrix factorization (see Details for a
description of the model). The resulting fit is referred to as a "flash"
object (short for Factors and Loadings using Adaptive SHrinkage). Two
interfaces are provided. The flash
function provides a simple
interface that allows a flash object to be fit in a single pass, while
flash_xxx
functions are pipeable functions that allow for more
complex flash objects to be fit incrementally (available functions are
listed below under See Also). See the vignettes and
Examples for usage.
flash(
data,
S = NULL,
ebnm_fn = ebnm_point_normal,
var_type = 0L,
greedy_Kmax = 50L,
backfit = FALSE,
nullcheck = TRUE,
verbose = 1L
)
The observations. Usually a matrix, but can also be a sparse
matrix of class Matrix
or a low-rank matrix
representation as returned by, for example, svd
,
irlba
, rsvd
, or
softImpute
(in general, any list that
includes fields u
, d
, and v
will be interpreted
as a low-rank matrix representation).
The standard errors. Can be NULL
(in which case all residual
variance will be estimated) or a matrix, vector, or scalar. S
should be a scalar if standard errors are identical across observations. It
should be a vector if standard errors either vary across columns but are
constant within any given row, or vary across rows but are constant within
any given column (flash
will use the length of the vector
to determine whether the supplied values correspond to rows or columns; if the
data matrix is square, then the sense must be specified using parameter
S_dim
in function flash_init
).
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions \(G_\ell^{(k)}\) and \(G_f^{(k)}\) to which the
priors on loadings and factors \(g_\ell^{(k)}\) and \(g_f^{(k)}\) are
assumed to belong. If the same function is to be used for both loadings
\(L\) and factors \(F\), then ebnm_fn
can be a single function.
If one function is to be used for loadings and a second for factors,
then ebnm_fn
should be a list of length two, with the first
element giving the function for loadings and the second the function
for factors. If different functions are to be used for different values of
\(k\), then factor/loadings pairs must be added successively using
multiple calls to either flash_greedy
or
flash_factors_init
.
Any EBNM function provided by package ebnm
can be
used as input. Non-default arguments to parameters can be supplied using
the helper function flash_ebnm
. Custom EBNM functions can
also be used: for details, see flash_ebnm
.
Describes the structure of the estimated residual variance.
Can be NULL
, 0
, 1
, 2
, or c(1, 2)
. If
NULL
, then S
accounts for all residual variance. If
var_type = 0
, then the estimated residual variance (which is added
to any variance given by S
) is assumed to be constant
across all observations. Setting var_type = 1
estimates a single
variance parameter for each row; var_type = 2
estimates one
parameter for each column; and var_type = c(1, 2)
optimizes over
all rank-one matrices (that is, it assumes that the residual variance
parameter \(s_{ij}\) can be written \(s_{ij} = a_i b_j\), where the
\(n\)-vector \(a\) and the \(p\)-vector \(b\) are to be
estimated).
Note that if any portion of the residual variance is to be estimated, then
it is usually faster to set S = NULL
and to let flash
estimate all of the residual variance. Further, var_type = c(1, 2)
is typically much slower than other options, so it should be used with
care.
The maximum number of factors to be added. This will not
necessarily be the total number of factors added by flash
, since
factors are only added as long as they increase the variational lower
bound on the log likelihood for the model.
A "greedy" fit is performed by adding up to
greedy_Kmax
factors, optimizing each newly added factor in one go
without returning to optimize previously added factors. When
backfit = TRUE
, flash
will additionally perform a final
"backfit" where all factors are cyclically updated until convergence.
The backfitting procedure typically takes much longer than the greedy
algorithm, but it also usually improves the final fit to a significant
degree.
If nullcheck = TRUE
, then flash
will check
that each factor in the final flash object improves the overall fit. Any
factor that fails the check will be removed.
When and how to display progress updates. Set to
0
for none, 1
for updates after a factor is added or a
backfit is completed, 2
for additional notifications about the
variational lower bound, and 3
for updates after every iteration.
It is also possible to output a single tab-delimited table of values
using function flash_set_verbose
with verbose = -1
.
A flash
object. Contains elements:
n_factors
The total number of factor/loadings pairs \(K\) in the fitted model.
pve
The proportion of variance explained by each factor/loadings pair. Since factors and loadings are not required to be orthogonal, this should be interpreted loosely: for example, the total proportion of variance explained could be larger than 1.
elbo
The variational lower bound achieved by the fitted model.
residuals_sd
Estimated residual standard deviations (these
include any variance component given as an argument to S
).
L_pm, L_psd, L_lfsr
Posterior means, standard deviations, and local false sign rates for loadings \(L\).
F_pm, F_psd, F_lfsr
Posterior means, standard deviations, and local false sign rates for factors \(F\).
L_ghat
The fitted priors on loadings \(\hat{g}_\ell^{(k)}\).
F_ghat
The fitted priors on factors \(\hat{g}_f^{(k)}\).
sampler
A function that takes a single argument
nsamp
and returns nsamp
samples from the posterior
distributions for factors \(F\) and loadings \(L\).
flash_fit
A flash_fit
object. Used by
flash
when fitting is not performed all at once, but
incrementally via calls to various flash_xxx
functions.
The following methods are available:
fitted.flash
Returns the "fitted values" \(E(LF') = E(L) E(F)'\).
residuals.flash
Returns the expected residuals \(Y - E(LF') = Y - E(L) E(F)'\).
ldf.flash
Returns an \(LDF\) decomposition (see Details above), with columns of \(L\) and \(F\) scaled as specified by the user.
If \(Y\) is an \(n \times p\) data matrix, then the rank-one empirical Bayes matrix factorization model is: $$Y = \ell f' + E,$$ where \(\ell\) is an \(n\)-vector of loadings, \(f\) is a \(p\)-vector of factors, and \(E\) is an \(n \times p\) matrix of residuals (or "errors"). Additionally: $$e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p$$ $$\ell \sim g_\ell \in G_\ell$$ $$f \sim g_f \in G_f.$$ The residual variance parameters \(s_{ij}^2\) are constrained to have a simple structure and are fit via maximum likelihood. (For example, one might assume that all standard errors are identical: \(s_{ij}^2 = s^2\) for some \(s^2\) and for all \(i\), \(j\)). The functions \(g_\ell\) and \(g_f\) are assumed to belong to some families of priors \(G_\ell\) and \(G_f\) that are specified in advance, and are estimated via variational approximation.
The general rank-\(K\) empirical Bayes matrix factorization model is: $$Y = LF' + E$$ or $$y_{ij} = \sum_k \ell_{ik} f_{jk} + e_{ij}: i = 1, ..., n; j = 1, ..., p,$$ where \(L\) is now a matrix of loadings and \(F\) is a matrix of factors.
Separate priors \(g_\ell^{(k)}\) and \(g_f^{(k)}\) are estimated via empirical Bayes, and different prior families may be used for different values of \(k\). In general, then: $$e_{ij} \sim N(0, s_{ij}^2): i = 1, ..., n; j = 1, ..., p$$ $$\ell_{ik} \sim g_\ell^{(k)} \in G_\ell^{(k)}: i = 1, ..., n; k = 1, ..., K$$ $$f_{ik} \sim g_f^{(k)} \in G_f^{(k)}: j = 1, ..., p; k = 1, ..., K.$$
Typically, \(G_\ell^{(k)}\) and \(G_f^{(k)}\) will be closed under scaling, in which case \(\ell_k\) and \(f_k\) are only identifiable up to a scaling factor \(d_k\). In other words, we can write: $$Y = LDF' + E,$$ where \(D\) is a diagonal matrix with diagonal entries \(d_1, ..., d_K\). The model can then be made identifiable by constraining the scale of \(\ell_k\) and \(f_k\) for \(k = 1, ..., K\).
Wei Wang and Matthew Stephens (2021). "Empirical Bayes matrix factorization." Journal of Machine Learning Research 22, 1--40.
flash_init
, flash_greedy
,
flash_backfit
, and flash_nullcheck
. For more
advanced functionality, see flash_factors_init
,
flash_factors_fix
, flash_factors_set_to_zero
,
flash_factors_remove
, flash_set_verbose
, and
flash_set_conv_crit
.
For extracting useful data from flash
objects, see
fitted.flash
, residuals.flash
, and
ldf.flash
.
data(gtex)
# Fit up to 3 factors and backfit.
fl <- flash(gtex, greedy_Kmax = 3L, backfit = TRUE)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Backfitting 3 factors (tolerance: 6.56e-04)...
#> Difference between iterations is within 1.0e+01...
#> Difference between iterations is within 1.0e+00...
#> Difference between iterations is within 1.0e-01...
#> Difference between iterations is within 1.0e-02...
#> Difference between iterations is within 1.0e-03...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.
# This is equivalent to the series of calls:
fl <- flash_init(gtex) |>
flash_greedy(Kmax = 3L) |>
flash_backfit() |>
flash_nullcheck()
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Backfitting 3 factors (tolerance: 6.56e-04)...
#> Difference between iterations is within 1.0e+01...
#> Difference between iterations is within 1.0e+00...
#> Difference between iterations is within 1.0e-01...
#> Difference between iterations is within 1.0e-02...
#> Difference between iterations is within 1.0e-03...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.
# Fit a unimodal distribution with mean zero to each set of loadings
# and a scale mixture of normals with mean zero to each factor.
fl <- flash(gtex,
ebnm_fn = c(ebnm_unimodal,
ebnm_normal_scale_mixture),
greedy_Kmax = 3)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.
# Fit point-laplace priors using a non-default optimization method.
fl <- flash(gtex,
ebnm_fn = flash_ebnm(prior_family = "point_laplace",
optmethod = "trust"),
greedy_Kmax = 3)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.
# Fit a "Kronecker" (rank-one) variance structure (this can be slow).
fl <- flash(gtex, var_type = c(1, 2), greedy_Kmax = 3L)
#> Adding factor 1 to flash object...
#> Adding factor 2 to flash object...
#> Adding factor 3 to flash object...
#> Wrapping up...
#> Done.
#> Nullchecking 3 factors...
#> Done.