Parametrising correlation matrices

https://doi.org/10.1016/j.jmva.2020.104619Get rights and content

Abstract

Correlation matrices are the sub-class of positive definite real matrices with all entries on the diagonal equal to unity. Earlier work has exhibited a parametrisation of the corresponding Cholesky factorisation in terms of partial correlations, and also in terms of hyperspherical co-ordinates. We show how the two are related, starting from the definition of the partial correlations in terms of the Schur complement. We extend this to the generalisation of correlation matrices to the cases of complex and quaternion entries. As in the real case, we show how the hyperspherical parametrisation leads naturally to a distribution on the space of correlation matrices {R} with probability density function proportional to (detR)a. For certain a, a construction of random correlation matrices realising this distribution is given in terms of rectangular standard Gaussian matrices.

Introduction

In applications of matrices, there are many settings in which the rows and columns have distinct meaning. For example, in a survey of n people, giving a numerical score between 0 and 5 for their rating of N different movies, there is an n×N matrix X – the data matrix – such that the rows correspond to the people and the columns to the movies. The kth column X(k) is then the vector of scores given for movie k, and its jth entry is the score given by person j. Let μk denote the average of the scores in column k, and let 1n denote the n×1 vector with all entries equal to 1. The recentred, zero mean score vectors are then specified as Y(k)=X(k)μk1n, and the recentred data matrix is Y=Y(1)Y(2)Y(N).

The sample covariance matrix S is specified in terms of Y as S=1n1YY.Note that S is a N×N symmetric matrix, and its entry in rows j and column k gives the sample covariance between the scores of movies j and k. In the case that the rows and/or columns of Y are drawn from a vector Gaussian distribution with given covariance, (1.2) is referred to a Wishart matrix. For such random matrices, a vast number of theoretical results have been assembled, and applied settings identified, since the pioneering paper [23]; see, e.g., [16].

Natural from the viewpoint of data analysis is to further refine (1.2) by forming the sample correlation matrix R=Y(j)Y(k)Y(j)Y(k)j,k=1Nρjkj,k=1N.Here, as well as the original score vectors being centred by subtracting their mean, each has been scaled to correspond to a unit vector. One sees immediately that the entries of R are all equal to unity on the diagonal, while on the off diagonal, in accordance with the Cauchy–Schwarz inequality, they all have modulus less than or equal to 1. Moreover, the decomposition R=DYYD,where D=diag(1Y(1),,1Y(N))shows that R, like S, is positive definite but now with bounded entries |ρjk|1.

This latter feature, although making some aspects of theoretical analysis more difficult (e.g., studies of the eigenvalues) allows for a distinct set of questions to be posed. For example, in the case of correlation matrices the volume of the natural embedding in RN(N1)2—referred to as an elliptope [13] and to be denoted RN—is well defined. Knowledge of this volume allows an answer to the question: if the strictly upper triangular entries of (1.3) are chosen uniformly at random in the range (1,1), what is the probability that R is a valid correlation matrix (i.e., is positive definite) [6]?

A direct approach to this question requires a parametrisation of the space of correlation matrices. Two such parametrisations are available in the literature, both applying to the lower triangular matrix L in the Cholesky factorisation R=LL.One of these use hyperspherical co-ordinates in Rj to parametrise row j (j{1,,N}) [6], [18], [19], [20], [21], and the other makes use of a sequence of partial correlations [10], [11], [14]. The latter method yielded the first direct computation of the volume [11] volRN=j=2N2(j1)2B(j2,j2)j1,where B(a,b)=Γ(a)Γ(b)Γ(a+b).It is only in the last few years that this same formula (in equivalent forms) was derived using the hyperspherical parametrisation [6], [19].

Indirect computations of volRN are also possible. Such a method, giving a formula equivalent to (1.6) actually predates the work [11]—this is due to Wong et al. [24]. As implied by a comment in [19, 2nd paragraph of Introduction] the same result, again deduced indirectly, follows from the still earlier work of Muirhead [16, p. 148].

The circumstances just described suggest a number of follow up problems. The most immediate is to relate the hyperspherical and partial correlation parametrisations. To give a satisfactory account on this point, a self contained theory relating to the latter must be developed. Moreover, the hyperspherical parametrisation gives a different viewpoint on known results [16] for the marginal distribution of the elements of (1.3), when chosen uniformly at random, and similarly for the moments of detR.

The literature cited above is restricted to the case of real entries. Complex valued covariance matrices, and thus complex valued correlation matrices, are well motivated from the viewpoint of their application in wireless communication; see for example [22]. Thus, in addition to addressing the above problems when the correlation matrices have real entries, we consider too the case of complex (and quaternion) entries.

Section snippets

Cholesky factorisation and parametrisations

Let R be an N×N positive definite matrix with all diagonal entries equal to unity, as is consistent with (1.3). Introduce the Cholesky factorisation (1.5) with L=110002122003132330N1N2N3NN.Well established theory (see, e.g., [9]) gives that this is unique for R positive definite subject to the requirement that jj>0,j{1,,N}.

The fact that the diagonal entries in (1.3) are all equal to unity implies that the sum of the squares of the non-zero entries along each row j of L is

The Jacobian, hyperspherical parametrisation of determinant and some consequences

Let |J{ρjk}{θjk}| denote the Jacobian (absolute value of the determinant of the Jacobian matrix) for the change of variables {ρjk}{θjk} as implied by (1.3), (1.5), (2.4). It is shown in [19] and [6] that upon ordering the entries {ρ21,ρ31,ρ32,} i.e., reading sequentially along rows of the strictly lower triangular portion of R, and similarly ordering the angles, the Jacobian matrix is lower triangular. Its determinant and thus the Jacobian can be read off as equal to |J{ρjk}{θjk}|=k=1N1j=

Random correlation matrices with complex or quaternion entries

A correlation matrix can be constructed out of a data matrix Y with complex entries by replacing YY in (1.4) by ȲY. As mentioned in the Introduction, this is of interest in studies in wireless communications engineering . Of less practical interest, but still of theoretical relevance within random matrix theory (see, e.g., [7]) is to form correlation matrices out of data matrices with entries having 2 × 2 block structure zww¯z¯.Such 2 × 2 matrices form a representation of quaternions,

CRediT authorship contribution statement

Peter J. Forrester: Conceptualization.

Acknowledgments

This work is part of a research program supported by the Australian Research Council (ARC) through the ARC Centre of Excellence for Mathematical and Statistical frontiers (ACEMS). PJF also acknowledges partial support from ARC grant DP170102028, and JZ acknowledges the support of a Melbourne postgraduate award, and an ACEMS, Australia top up scholarship.

References (24)

  • DiaconisP. et al.

    Hurwitz and the origins of random matrix theory in mathematics

    Random Matrices Theory Appl.

    (2017)
  • EastmanS. et al.

    The volume of the spatial region corresponding to n×n correlation matrices

    Amer. Math. Monthly

    (2016)
  • Cited by (0)

    View full text