Adds factor/loadings pairs to a flash object in a "greedy" manner. Up to
Kmax
pairs are added one at a time. At each step, flash_greedy
attempts to find an optimal additional (rank-one) factor given all
previously added factors. The additional factor is retained if it
increases the variational lower bound (ELBO); otherwise, fitting terminates.
See flash
for examples of usage.
flash_greedy(
flash,
Kmax = 1,
ebnm_fn = ebnm_point_normal,
init_fn = NULL,
extrapolate = FALSE,
warmstart = FALSE,
maxiter = 500,
tol = NULL,
verbose = NULL
)
A flash
or flash_fit
object to which factors are
to be added.
The maximum number of factors to be added. This will not
necessarily be the total number of factors added by
flash_greedy
, since factors are only added as long as they
increase the ELBO.
The function or functions used to solve the empirical Bayes
normal means (EBNM) subproblems. Most importantly, these functions specify
the families of distributions \(G_\ell^{(k)}\) and \(G_f^{(k)}\) to which the
priors on loadings and factors \(g_\ell^{(k)}\) and \(g_f^{(k)}\) are
assumed to belong. If the same function is to be used for both loadings
\(L\) and factors \(F\), then ebnm_fn
can be a single function.
If one function is to be used for loadings and a second for factors,
then ebnm_fn
should be a list of length two, with the first
element giving the function for loadings and the second the function
for factors. If different functions are to be used for different values of
\(k\), then factor/loadings pairs must be added successively using
multiple calls to either flash_greedy
or
flash_factors_init
.
Any EBNM function provided by package ebnm
can be
used as input. Non-default arguments to parameters can be supplied using
the helper function flash_ebnm
. Custom EBNM functions can
also be used: for details, see flash_ebnm
.
The function used to initialize factor/loadings pairs. Functions
flash_greedy_init_default
, flash_greedy_init_softImpute
, and
flash_greedy_init_irlba
have been supplied; note, in particular, that
flash_greedy_init_softImpute
can yield better results than the
default initialization function when there is missing data. Custom
initialization functions may also be used. If init_fn = NULL
then
flash_greedy_init_default
will be used, with an attempt made to set
argument sign_constraints
appropriately via test calls to
the EBNM function(s) specified by parameter ebnm_fn
. If factors or loadings
are constrained in some other fashion (e.g., bounded support), then the
initialization function should be modified to account for the constraints
--- otherwise, the greedy algorithm can stop adding factor/loadings pairs
too early. Custom initialization functions should accept a single
parameter referring to a flash_fit
object and should output
a list consisting of two vectors, which will be used as initial values for
the new loadings \(\ell_{\cdot k}\) and the new factor \(f_{\cdot k}\). Typically,
a custom initialization function will extract the matrix of residuals from
the flash_fit
object using method residuals.flash_fit
and
then return a (possibly constrained) rank-one approximation to the matrix
of residuals. See Examples below.
Whether to use an extrapolation technique
inspired by Ang and Gillis (2019) to accelerate the fitting process.
Control parameters are handled via global options and can be set by
calling options("extrapolate.control") <- control.param
.
Whether to use "warmstarts" when solving the EBNM
subproblems by initializing solutions at the previous value of the fitted
prior \(\hat{g}\). An important side effect of warmstarts for
ashr
-like prior families is to fix the grid at its initial setting.
Fixing the grid can lead to poor fits if there
are large changes in the scale of the estimated prior over the
course of the fitting process. However, allowing the grid to
vary can occasionally result in decreases in ELBO.
The maximum number of iterations when optimizing a greedily added factor/loadings pair.
The convergence tolerance parameter. At each iteration, the fit
is compared to the fit from the previous iteration using a convergence
criterion function (by default, the difference in ELBO, but the criterion
can be changed via flash_set_conv_crit
). When
the value returned by this function is less than or equal to tol
,
the newly added factor/loadings pair is considered to have "converged,"
so that flash_greedy
moves on and attempts to add another new
factor (or, if the maximum number of factors Kmax
has been reached,
the process terminates). Note that
specifying tol
here will override any value set by
flash_set_conv_crit
; to use the "global" tolerance parameter,
tol
must be left unspecified (NULL
).
If tol = NULL
and a global tolerance
parameter has not been set, then the default
tolerance used is \(np\sqrt{\epsilon}\), where \(n\) is the
number of rows in the dataset, \(p\) is the number of columns, and
\(\epsilon\) is equal to .Machine$double.eps
.
When and how to display progress updates. Set to
0
for none, 1
for updates after a factor is added or a
backfit is completed, 2
for additional notifications about the
variational lower bound, and 3
for updates after every iteration.
It is also possible to output a single tab-delimited table of values
using function flash_set_verbose
with verbose = -1
.
The flash
object from argument flash
, with up
to Kmax
new factor/loadings pairs "greedily" added.
# The following are examples of advanced usage. See ?flash for basic usage.
# Increase the maximum number of iterations in the default initialization
# method.
my_init_fn <- function(f) flash_greedy_init_default(f, maxiter = 500)
fl <- flash_init(gtex) |>
flash_greedy(init_fn = my_init_fn)
#> Adding factor 1 to flash object...
#> Wrapping up...
#> Done.
# Use a custom initialization function that wraps function nmf from
# package RcppML.
nmf_init_fn <- function(f) {
nmf_res <- RcppML::nmf(resid(f), k = 1, verbose = FALSE)
return(list(as.vector(nmf_res$w), as.vector(nmf_res$h)))
}
fl.nmf <- flash_init(gtex) |>
flash_greedy(ebnm_fn = ebnm_unimodal_nonnegative,
init_fn = nmf_init_fn)
#> Adding factor 1 to flash object...
#> Wrapping up...
#> Done.