Fit a Bayesian DFA
fit_dfa( y = y, num_trends = 1, varIndx = NULL, scale = c("zscore", "center", "none"), iter = 2000, chains = 4, thin = 1, control = list(adapt_delta = 0.99, max_treedepth = 20), nu_fixed = 101, est_correlation = FALSE, estimate_nu = FALSE, estimate_trend_ar = FALSE, estimate_trend_ma = FALSE, estimate_process_sigma = FALSE, equal_process_sigma = TRUE, estimation = c("sampling", "optimizing", "vb", "none"), data_shape = c("wide", "long"), obs_covar = NULL, pro_covar = NULL, z_bound = NULL, z_model = c("dfa", "proportion"), trend_model = c("rw", "bs", "ps", "gp"), n_knots = NULL, knot_locs = NULL, par_list = NULL, family = "gaussian", verbose = FALSE, gp_theta_prior = c(3, 1), expansion_prior = FALSE, ... )
y  A matrix of data to fit. See 

num_trends  Number of trends to fit. 
varIndx  Indices indicating which timeseries should have shared variances. 
scale  Character string, used to standardized data. Can be "zscore" to center and standardize data, "center" to just standardize data, or "none". Defaults to "zscore" 
iter  Number of iterations in Stan sampling, defaults to 2000. Used for both

chains  Number of chains in Stan sampling, defaults to 4. 
thin  Thinning rate in Stan sampling, defaults to 1. 
control  A list of options to pass to Stan sampling. Defaults to

nu_fixed  Student t degrees of freedom parameter. If specified as
greater than 100, a normal random walk is used instead of a random walk
with a tdistribution. Defaults to 
est_correlation  Boolean, whether to estimate correlation of
observation error matrix 
estimate_nu  Logical. Estimate the student t degrees of freedom
parameter? Defaults to 
estimate_trend_ar  Logical. Estimate AR(1) parameters on DFA trends? Defaults to `FALSE``, in which case AR(1) parameters are set to 1 
estimate_trend_ma  Logical. Estimate MA(1) parameters on DFA trends? Defaults to `FALSE``, in which case MA(1) parameters are set to 0. 
estimate_process_sigma  Logical. Defaults FALSE, whether or not to estimate process error sigma. If not estimated, sigma is fixed at 1, like conventional DFAs. 
equal_process_sigma  Logical. If process sigma is estimated, whether or not to estimate a single shared value across trends (default) or estimate equal values for each trend 
estimation  Character string. Should the model be sampled using 
data_shape  If 
obs_covar  Optional dataframe of data with 4 named columns ("time","timeseries","covariate","value"), representing: (1) time, (2) the time series affected, (3) the covariate number for models with more than one covariate affecting each trend, and (4) the value of the covariate 
pro_covar  Optional dataframe of data with 4 named columns ("time","trend","covariate","value"), representing: (1) time, (2) the trend affected, (3) the covariate number for models with more than one covariate affecting each trend, and (4) the value of the covariate 
z_bound  Optional hard constraints for estimated factor loadings  really only applies to model with 1 trend. Passed in as a 2element vector representing the lower and upper bound, e.g. (0, 100) to constrain positive 
z_model  Optional argument allowing for elements of Z to be constrained to be proportions (each time series modeled as a mixture of trends). Arguments can be "dfa" (default) or "proportion" 
trend_model  Optional argument to change the model of the underlying latent trend. By default this is set to 'rw', where the trend is modeled as a random walk  as in conentional DFA. Alternative options are 'bs', where Bsplines are used to model the trends, "ps" where Psplines are used to model the trends, or 'gp', where gaussian predictive processes are used. If models other than 'rw' are used, there are some key points. First, the MA and AR parameters on these models will be turned off. Second, for Bsplines and Psplines, the process_sigma becomes an optional scalar on the spline coefficients, and is turned off by default. Third, the number of knots can be specified (more knots = more wiggliness, and n_knots < N). For models with > 2 trends, each trend has their own spline coefficients estimated though the knot locations are assumed shared. If knots aren't specified, the default is N/3. By default both the Bspline and Pspline models use 3rd degree functions for smoothing, and include an intercept term. The Pspline model uses a difference penalty of 2. 
n_knots  The number of knots for the Bspline, Pspline, or Gaussian predictive process models. Optional, defaults to round(N/3) 
knot_locs  Locations of knots (optional), defaults to uniform spacing between 1 and N 
par_list  A vector of parameter names of variables to be estimated by Stan. If NULL, this will default to c("x", "Z", "sigma", "log_lik", "psi","xstar") for most models  though if AR / MA, or Studentt models are used additional parameters will be monitored. If you want to use diagnostic tools in rstan, including moment_matching, you will need to pass in a larger list. Setting this argument to "all" will monitor all parameters, enabling the use of diagnostic functions  but making the models a lot larger for storage. Finally, this argument may be a custom string of parameters to monitor, e.g. c("x","sigma") 
family  String describing the observation model. Default is "gaussian", but included options are "gamma", "lognormal", negative binomial ("nbinom2"), "poisson", or "binomial". The binomial family is assumed to have logit link, gaussian family is assumed to be identity, and the rest are loglink. 
verbose  Whether to print iterations and information from Stan, defaults to FALSE. 
gp_theta_prior  A 2element vector controlling the prior on the Gaussian process parameter in cov_exp_quad. This prior is a halfStudent t prior, with the first argument of gp_theta_prior being the degrees of freedom (nu), and the second element being the standard deviation 
expansion_prior  Defaults to FALSE, if TRUE uses the parameter expansion prior of Ghosh & Dunson 2009 
...  Any other arguments to pass to 
Note that there is nothing restricting the loadings and trends from
being inverted (i.e. multiplied by 1
) for a given chain. Therefore, if
you fit multiple chains, the package will attempt to determine which chains
need to be inverted using the function find_inverted_chains()
.
plot_loadings plot_trends rotate_trends find_swans
set.seed(42) s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) # only 1 chain and 250 iterations used so example runs quickly: m < fit_dfa(y = s$y_sim, iter = 50, chains = 1)#> #> SAMPLING FOR MODEL 'dfa' NOW (CHAIN 1). #> Chain 1: #> Chain 1: Gradient evaluation took 3.4e05 seconds #> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds. #> Chain 1: Adjust your expectations accordingly! #> Chain 1: #> Chain 1: #> Chain 1: WARNING: There aren't enough warmup iterations to fit the #> Chain 1: three stages of adaptation as currently configured. #> Chain 1: Reducing each adaptation stage to 15%/75%/10% of #> Chain 1: the given number of warmup iterations: #> Chain 1: init_buffer = 3 #> Chain 1: adapt_window = 20 #> Chain 1: term_buffer = 2 #> Chain 1: #> Chain 1: Iteration: 1 / 50 [ 2%] (Warmup) #> Chain 1: Iteration: 5 / 50 [ 10%] (Warmup) #> Chain 1: Iteration: 10 / 50 [ 20%] (Warmup) #> Chain 1: Iteration: 15 / 50 [ 30%] (Warmup) #> Chain 1: Iteration: 20 / 50 [ 40%] (Warmup) #> Chain 1: Iteration: 25 / 50 [ 50%] (Warmup) #> Chain 1: Iteration: 26 / 50 [ 52%] (Sampling) #> Chain 1: Iteration: 30 / 50 [ 60%] (Sampling) #> Chain 1: Iteration: 35 / 50 [ 70%] (Sampling) #> Chain 1: Iteration: 40 / 50 [ 80%] (Sampling) #> Chain 1: Iteration: 45 / 50 [ 90%] (Sampling) #> Chain 1: Iteration: 50 / 50 [100%] (Sampling) #> Chain 1: #> Chain 1: Elapsed Time: 0.005395 seconds (Warmup) #> Chain 1: 0.085396 seconds (Sampling) #> Chain 1: 0.090791 seconds (Total) #> Chain 1:#> Warning: There were 16 divergent transitions after warmup. See #> http://mcstan.org/misc/warnings.html#divergenttransitionsafterwarmup #> to find out why this is a problem and how to eliminate them.#> Warning: There were 1 chains where the estimated Bayesian Fraction of Missing Information was low. See #> http://mcstan.org/misc/warnings.html#bfmilow#> Warning: Examine the pairs() plot to diagnose sampling problems#> Warning: The largest Rhat is 2.11, indicating chains have not mixed. #> Running the chains for more iterations may help. See #> http://mcstan.org/misc/warnings.html#rhat#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable. #> Running the chains for more iterations may help. See #> http://mcstan.org/misc/warnings.html#bulkess#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable. #> Running the chains for more iterations may help. See #> http://mcstan.org/misc/warnings.html#tailess#> Inference for the input samples (1 chains: each with iter = 25; warmup = 12): #> #> Q5 Q50 Q95 Mean SD Rhat Bulk_ESS Tail_ESS #> x[1,1] 0.3 0.3 0.7 0.4 0.2 2.19 13 13 #> x[1,2] 0.2 0.0 0.0 0.0 0.1 1.72 4 13 #> x[1,3] 0.2 0.2 0.9 0.5 0.3 2.19 4 13 #> x[1,4] 0.5 0.6 1.0 0.7 0.2 2.19 6 13 #> x[1,5] 0.2 0.5 1.0 0.4 0.4 2.19 13 13 #> x[1,6] 0.4 1.3 1.4 1.1 0.4 2.19 8 13 #> x[1,7] 0.3 1.0 1.0 0.8 0.3 2.10 6 13 #> x[1,8] 0.5 0.7 1.1 0.7 0.3 2.19 13 13 #> x[1,9] 0.4 0.5 1.4 0.7 0.6 1.62 12 13 #> x[1,10] 1.5 1.6 2.5 1.8 0.5 2.19 12 13 #> x[1,11] 2.0 2.7 2.9 2.6 0.4 2.08 6 13 #> x[1,12] 0.8 2.2 2.3 1.8 0.6 2.10 4 13 #> x[1,13] 0.0 1.2 1.3 0.8 0.6 2.10 7 13 #> x[1,14] 0.9 0.2 0.1 0.3 0.4 2.08 6 13 #> x[1,15] 1.0 1.0 0.3 0.7 0.3 2.08 4 13 #> x[1,16] 2.2 2.0 1.3 1.8 0.4 2.10 4 13 #> x[1,17] 2.6 2.4 2.0 2.3 0.2 2.08 9 13 #> x[1,18] 2.4 2.0 1.3 1.9 0.5 1.18 8 13 #> x[1,19] 1.9 1.7 0.7 1.4 0.5 2.08 7 13 #> x[1,20] 1.4 1.2 0.0 0.8 0.5 2.10 8 13 #> Z[1,1] 0.9 0.7 0.6 0.7 0.1 1.20 13 13 #> Z[2,1] 0.7 0.4 0.2 0.4 0.2 1.46 13 13 #> Z[3,1] 0.8 0.5 0.4 0.6 0.1 1.72 4 13 #> log_lik[1] 0.7 0.4 0.4 0.5 0.1 2.19 13 13 #> log_lik[2] 2.4 2.1 1.8 2.1 0.2 1.95 13 13 #> log_lik[3] 1.0 0.6 0.6 0.7 0.2 1.18 8 13 #> log_lik[4] 0.8 0.5 0.4 0.5 0.1 2.08 12 13 #> log_lik[5] 0.8 0.5 0.5 0.6 0.1 2.08 13 13 #> log_lik[6] 0.7 0.5 0.4 0.5 0.1 2.08 12 13 #> log_lik[7] 0.7 0.6 0.4 0.6 0.1 2.19 13 13 #> log_lik[8] 2.2 1.3 1.2 1.5 0.4 1.95 6 13 #> log_lik[9] 2.4 1.1 1.0 1.4 0.5 2.10 4 13 #> log_lik[10] 0.7 0.4 0.4 0.5 0.1 2.19 4 13 #> log_lik[11] 0.7 0.4 0.4 0.5 0.1 2.10 9 13 #> log_lik[12] 0.8 0.7 0.3 0.6 0.2 1.62 13 13 #> log_lik[13] 1.3 0.7 0.7 0.8 0.3 1.33 5 13 #> log_lik[14] 0.9 0.6 0.4 0.6 0.2 2.08 12 13 #> log_lik[15] 2.5 1.1 0.8 1.3 0.6 2.19 13 13 #> log_lik[16] 0.9 0.5 0.4 0.5 0.2 1.18 9 13 #> log_lik[17] 0.7 0.5 0.4 0.5 0.1 1.77 8 13 #> log_lik[18] 0.8 0.6 0.4 0.6 0.1 1.88 4 13 #> log_lik[19] 0.9 0.8 0.4 0.7 0.2 1.63 7 13 #> log_lik[20] 0.7 0.6 0.5 0.6 0.1 2.08 13 13 #> log_lik[21] 0.8 0.5 0.4 0.6 0.1 1.88 11 13 #> log_lik[22] 2.5 2.0 1.6 2.0 0.3 1.51 8 13 #> log_lik[23] 0.8 0.6 0.5 0.6 0.1 2.08 13 13 #> log_lik[24] 0.8 0.5 0.4 0.5 0.2 1.77 10 13 #> log_lik[25] 1.6 0.6 0.6 0.8 0.6 1.33 5 13 #> log_lik[26] 0.8 0.5 0.5 0.6 0.1 1.95 13 13 #> log_lik[27] 1.1 1.0 0.7 1.0 0.1 1.30 5 13 #> log_lik[28] 1.0 0.5 0.4 0.6 0.3 2.19 7 13 #> log_lik[29] 0.9 0.7 0.4 0.7 0.2 1.03 13 13 #> log_lik[30] 1.4 1.0 0.4 0.9 0.4 1.72 4 13 #> log_lik[31] 0.8 0.5 0.4 0.6 0.2 1.00 9 13 #> log_lik[32] 3.7 2.3 1.1 2.3 0.9 1.03 13 13 #> log_lik[33] 1.2 0.7 0.4 0.8 0.3 1.05 8 13 #> log_lik[34] 2.0 0.9 0.5 1.0 0.6 2.08 4 13 #> log_lik[35] 3.9 2.2 1.8 2.4 0.7 1.50 5 13 #> log_lik[36] 1.1 0.5 0.4 0.6 0.3 1.18 7 13 #> log_lik[37] 1.0 0.6 0.5 0.7 0.2 1.51 5 13 #> log_lik[38] 2.1 1.1 1.0 1.3 0.4 2.19 4 13 #> log_lik[39] 1.0 0.5 0.4 0.6 0.2 2.19 4 13 #> log_lik[40] 1.7 0.6 0.6 0.9 0.4 2.19 4 13 #> log_lik[41] 0.8 0.5 0.3 0.5 0.1 1.46 13 13 #> log_lik[42] 0.9 0.5 0.3 0.6 0.2 2.08 13 13 #> log_lik[43] 2.5 1.2 1.0 1.5 0.6 1.77 13 13 #> log_lik[44] 0.7 0.5 0.3 0.5 0.1 1.18 12 13 #> log_lik[45] 1.1 0.8 0.5 0.8 0.2 1.58 7 13 #> log_lik[46] 1.3 0.5 0.4 0.8 0.3 2.08 4 13 #> log_lik[47] 1.5 1.2 0.7 1.1 0.3 1.46 10 13 #> log_lik[48] 1.0 0.7 0.5 0.8 0.2 1.12 8 13 #> log_lik[49] 1.1 0.5 0.4 0.6 0.3 1.77 7 13 #> log_lik[50] 5.1 4.3 1.7 4.0 1.2 1.19 13 13 #> log_lik[51] 1.3 0.9 0.4 0.9 0.3 1.09 8 13 #> log_lik[52] 1.5 0.7 0.4 0.8 0.4 1.41 13 13 #> log_lik[53] 2.4 1.7 1.3 1.8 0.4 1.18 11 13 #> log_lik[54] 1.1 0.8 0.5 0.8 0.2 0.99 13 13 #> log_lik[55] 0.9 0.6 0.5 0.6 0.1 1.33 13 13 #> log_lik[56] 4.2 3.4 2.3 3.3 0.7 2.19 4 5 #> log_lik[57] 0.8 0.5 0.4 0.5 0.1 1.62 13 13 #> log_lik[58] 0.8 0.7 0.5 0.7 0.1 0.97 13 13 #> log_lik[59] 0.7 0.6 0.4 0.6 0.1 2.08 13 13 #> log_lik[60] 0.8 0.5 0.5 0.6 0.1 1.30 6 13 #> xstar[1,1] 2.7 0.7 1.1 0.7 1.2 1.09 13 13 #> sigma[1] 0.6 0.6 0.8 0.7 0.1 2.08 13 13 #> lp__ 59.5 49.2 46.5 51.7 5.4 2.10 4 13 #> #> For each parameter, Bulk_ESS and Tail_ESS are crude measures of #> effective sample size for bulk and tail quantities respectively (an ESS > 100 #> per chain is considered good), and Rhat is the potential scale reduction #> factor on rank normalized split chains (at convergence, Rhat <= 1.05).if (FALSE) { # example of observation error covariates set.seed(42) obs_covar < expand.grid("time" = 1:20, "timeseries" = 1:3, "covariate" = 1) obs_covar$value < rnorm(nrow(obs_covar), 0, 0.1) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, obs_covar = obs_covar) # example of process error covariates pro_covar < expand.grid("time" = 1:20, "trend" = 1:2, "covariate" = 1) pro_covar$value < rnorm(nrow(pro_covar), 0, 0.1) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, num_trends = 2, pro_covar = pro_covar) # example of long format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) obs < c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ]) long < data.frame("obs" = obs, "ts" = sort(rep(1:3, 20)), "time" = rep(1:20, 3)) m < fit_dfa(y = long, data_shape = "long", iter = 50, chains = 1) # example of long format data with obs covariates s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) obs < c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ]) long < data.frame("obs" = obs, "ts" = sort(rep(1:3, 20)), "time" = rep(1:20, 3)) obs_covar < expand.grid("time" = 1:20, "timeseries" = 1:3, "covariate" = 1:2) obs_covar$value < rnorm(nrow(obs_covar), 0, 0.1) m < fit_dfa(y = long, data_shape = "long", iter = 50, chains = 1, obs_covar = obs_covar) # example of model with Z constrained to be proportions and wide format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) m < fit_dfa(y = s$y_sim, z_model = "proportion", iter = 50, chains = 1) # example of model with Z constrained to be proportions and long format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) obs < c(s$y_sim[1, ], s$y_sim[2, ], s$y_sim[3, ]) long < data.frame("obs" = obs, "ts" = sort(rep(1:3, 20)), "time" = rep(1:20, 3)) m < fit_dfa(y = long, data_shape = "long", z_model = "proportion", iter = 50, chains = 1) #' # example of Bspline model with wide format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, trend_model = "bs", n_knots = 10) #' #' # example of Pspline model with wide format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, trend_model = "ps", n_knots = 10) # example of Gaussian process model with wide format data s < sim_dfa(num_trends = 1, num_years = 20, num_ts = 3) m < fit_dfa(y = s$y_sim, iter = 50, chains = 1, trend_model = "gp", n_knots = 5) }