Order-invariant prior specification in Bayesian factor analysis

https://doi.org/10.1016/j.spl.2016.01.006Get rights and content

Abstract

Using lower triangular loading matrices in Bayesian factor analysis ensures identifiability but may lead to inferences that depend on how the considered variables are ordered. We show how a standard approach to prior specification can be modified to avoid order-dependence.

Introduction

Let y be an m-vector of observed random variables, which for simplicity we take to be centered. Let fNk(0,Ik) be a standard normal k-vector of latent factors, with km. The factor analysis model postulates that y=βf+ε, where β=(βij)Rm×k is an unknown loading matrix, and εNm(0,Ω) is an m-vector of normally distributed error terms that are independent of f. The error terms are assumed to be mutually independent with Ω=diag(ω12,,ωm2) comprising m unknown positive variances that are also known as uniquenesses. This model with an unrestricted m×k loading matrix β is sometimes referred to as exploratory factor analysis—in contrast to confirmatory factor analysis, which refers to situations in which some collection of entries of β is modeled as zero.

Integrating out the latent factors f in  (1.1), the observed random vector y is seen to follow a centered multivariate normal distribution with covariance matrix Σ=Ω+ββ. As discussed in detail in Anderson and Rubin (1956), Σ determines the unrestricted loading matrix β only up to orthogonal rotation. Indeed, ββ=βQQβ for any k×k orthogonal matrix Q. More details on factor analysis can be found, for instance, in Bartholomew et al. (2011), Drton et al. (2007), and Mulaik (2010).

In this paper, we are concerned with Bayesian inference in (exploratory) factor analysis. In Bayesian computation, it is convenient to impose an identifiability constraint on the loading matrix β. A common choice is to restrict β to be lower triangular with nonnegative diagonal entries, that is, βij=0 for 1i<jk and βii0 for 1ik; see Geweke and Zhou (1996), Aguilar and West (2000), Lopes and West (2004) and Chapter 12 in Congdon (2001). Under these constraints, a full rank matrix β is uniquely determined by ββ. In the papers just referenced and also the software implementation provided by Martin et al. (2011), a default prior on the lower triangular loading matrix has all its non-zero entries independent with βij{TN(0,C0)if  i=j,N(0,C0)if  i>j. Here, TN(0,C0) denotes a truncated normal distribution on (0,), i.e., the conditional distribution of X given X>0 for XN(0,C0). The variance C0>0 is a hyperparameter. The prior distribution for the uniquenesses has ω12,,ωm2 independent of β and also mutually independent with Inverse Gamma distribution, ωi2IG(ν/2,νs2/2) for hyperparameters ν,s>0. Equivalently, νs2/ωi2 is chi-square distributed with ν degrees of freedom; compare Eqn. (26) in Geweke and Zhou (1996).

In the approach that was just described and that will be in the focus of this paper, the prior for the loadings is derived from centered normal distributions with common variance. While this can be restrictive, it is a frequently used default, possibly due to a lack of prior information that warrants non-zero means or unequal variances; see e.g. Section 6 in Ansari et al. (2002). This said, many generalizations have been discussed. For instance, Ghosh and Dunson (2009), Bhattacharya and Dunson (2011), and Conti et al. (2014) present methods to capture sparsity in loading matrices. Other extensions consider t-distributed latent factors (Ando, 2009), nonparametric Bayes techniques (Paisley and Carin, 2009) and problems with temporal dependence, see e.g., Nakajima and West (2013) and Zhou et al. (2014).

As discussed in Lopes and West (2004, Sect. 6), the prior specification in (1.3) is such that the induced prior on ββ depends on the way the variables and the associated rows of the loading matrix β are ordered. Indeed, a priori, (ββ)ii/C0=j=1kβij2/C0=j=1min{i,k}βij2/C0 follows a chi-square distribution with degrees of freedom min{i,k}. Consequently, the implied prior and also the posterior distribution for the covariance matrix Σ from (1.2) is not invariant under permutations of the variables.

In this paper we propose a modification of the prior distribution for β that maintains the convenience of computing with an identifiable lower triangular loading matrix all the while making the prior distributions of ββ and Σ invariant under reordering of the variables. Our proposal, described in Section  2, merely changes the prior distributions of the diagonal entries βii in  (1.3), which will be taken from a slightly more general family than the truncated normal. The details of a Gibbs sampler to draw from the resulting posterior are given in Section  3. Numerical examples are shown Section  4. We conclude with a discussion in Section  5, where we emphasize in particular that the role of lower-triangular loading matrices is a computational one; other ways of mapping the covariance matrix Σ to a (unique) loading matrix β can be considered when defining a target of inference.

Section snippets

Order-invariant prior distribution

Without any identifiability constraints, the loading matrix β takes its values in all of Rm×k. A natural default prior would then be to take all entries βij, i=1,,m and j=1,,k, to be independent N(0,C0) random variables; we write βNm×k(0,C0ImIk). The spherical normal distribution Nm×k(0,C0ImIk) is clearly invariant under permutation of the rows of the matrix. Hence, the induced prior distribution of ββ and of the covariance matrix Σ from  (1.2) is invariant under simultaneous permutation

Gibbs sampler

Consider an actual inferential setting in which we observe a sample y1,,yn that comprises n independent random vectors drawn from a distribution in the k-factor model. Let Y be the n×m matrix with the vectors y1,,yn as rows. Let F be an associated n×k matrix whose rows f1,,fn are independent vectors of latent factors. The factor analysis model dictates that Y=Fβ+E, where E=(ε1,,εn) is an n×m matrix of stochastic errors. The pairs (ft,εt) for 1tn are independent, and in each pair ftNk(0,

Numerical experiments

We illustrate the use of the two different priors, obtained from  (1.3) and  (2.1), respectively, on a simulated dataset Y that involves m=15 variables and is of size n=30. The data are drawn from the k=3 factor distribution given by the following loading matrix and uniquenesses that we generated randomly: β0=1231(0.9700)20.040.90031.001.120.5742.030.420.5750.310.470.0960.430.210.3570.750.310.6880.450.481.5092.211.450.38101.980.300.96112.630.411.09120.721.390.97130.882.010.39140.53

Conclusion

In Bayesian inference in exploratory factor analysis, priors are often specified via a lower triangular loading matrix β whose entries are assumed to be independent normal or truncated normal. We propose a modification of this approach, replacing the truncated normal priors by other distributions such that the induced prior on the covariance matrix Σ in (1.2) is invariant under reordering of the considered variables. Specifically, the prior distribution of ββ is equal to that obtained when

Acknowledgments

This work was supported by the U.S. National Science Foundation (DMS-1305154) and by the University of Washington Royalty Research Fund.

References (22)

  • M. Finegold et al.

    Robust graphical modeling with classical and alternative t-distributions

    Ann. Appl. Stat.

    (2011)
  • View full text