Generalized estimating equations with stabilized working correlation structure
Introduction
Generalized estimating equations (GEE) proposed by Liang and Zeger (1986) have been a popular analytic tool for correlated data. A consistent estimator for the regression parameter can be achieved without correctly specifying the correlation structure of the repeatedly measured outcomes. However, the efficiency of regression coefficient estimator increases if the working correlation matrix is close to the true one (Albert and McShane, 1995). Structured working correlations such as independent, autoregressive and exchangeable are available from built-in functions from software. These choices give a manageable number of parameters in the correlation matrix, and can be helpful when the sample size is small and the number of time points is large. To select a working correlation matrix from various choices, criteria such as the ‘quasi-likelihood under the independence model criterion’ (Pan, 2001) and the ‘correlation information criterion’ (Hin and Wang, 2009) have been proposed among others (Carey and Wang, 2011, Gosho et al., 2011, Zhou et al., 2012, Westgate, 2013, Westgate, 2014). The unstructured working correlation matrix can correctly model the correlation structure and is available from built-in functions from software, but the number of unknown parameters increases as the number of time points. When the sample size is small relative to the number of time points, variability of many nuisance parameters in the unstructured correlation matrix affects the variance of the regression parameter estimators, and Westgate (2013) proposed a method to address this problem. However, when the maximum of numbers of time points is fixed, the asymptotic variance of the regression coefficient estimator is unaffected by the variance of the correlation estimator, and reducing the number of parameters does not lead to gain in asymptotic efficiency of the regression coefficient estimator. Misspecification of working correlation could not only lead to loss of efficiency, but more seriously, could lead to infeasibility of the GEE solutions (Qu et al., 2008, Wang and Carey, 2004). Despite these shortcomings, choosing aforementioned structured working correlation matrix guarantees the correlation matrix to be positive definite. The estimated unstructured correlation matrix sometimes fails to be positive definite due to varying numbers of subunits, in which case the GEE estimates are not defined. Even when the estimated unstructured matrix is positive definite, if the minimum eigenvalue is small, the coefficient estimate can be unstable and the standard error of regression parameter estimates can be large (Vens and Ziegler, 2012). If lack of positive definiteness can be solved, the unstructured working correlation matrix can be an attractive choice since it improves the asymptotic variance of the regression coefficient estimator.
Many researchers have worked on solving lack of positive-definiteness of the sample covariance matrix mainly by replacing the eigenvalues of sample covariance matrix by their linear or nonlinear transforms (Stein, 1956, Haff, 1991, Daniels and Kass, 1999, Daniels and Kass, 2001, Ledoit and Wolf, 2004, Schäfer and Strimmer, 2005, Ledoit and Wolf, 2012, Won et al., 2013, Lam, 2016). In a regression setting with longitudinal data, Daniels and Kass (2001) obtained stabilized regression coefficients estimators by placing a normally-distributed prior to the logarithm of the sample eigenvalues. This method requires that the eigenvalues of the sample covariance matrix are positive.
Our goal is to broaden practical choices of working correlation structure to unstructured correlation matrix by alleviating problems due to lack of positive definiteness. To achieve this goal we propose to modify working correlation matrix by linear shrinkage method proposed by Choi (2015). We show that the resulting regression estimator of GEE is asymptotically equivalent to that of the original GEE. Simulation studies show that the proposed modification has advantages in cases where the minimum eigenvalue of the estimated working correlation structure is small. Two real data examples are presented where the standard error of the regression coefficient estimator is reduced using the proposed method.
Section snippets
Basic notations
We denote the vector of the outcomes and the matrix of covariates for the th subject by and , respectively. We assume that the first two moments of are given by where is a regression parameter, and is a link function. The true covariance matrix of given , is denoted by . Let the maximum of be , and assume that is bounded. The
Motivation
To motivate the proposed method, we first quantify the loss of the asymptotic relative efficiency (ARE) by limiting the choice of working correlation structure to exchangeable and autoregressive of order 1 (AR-1). This quantification demonstrates a range of potential improvement in efficiency by using the unstructured working correlation structure. Second, we quantify prevalence of a negative minimum eigenvalue for the estimated unstructured correlation matrix.
First, Fig. 1 shows how much
GEE using the working correlation matrix with linear shrinkage
Let be a positive number smaller than minimum of eigenvalues of true correlation matrix and and be minimum and maximum eigenvalues of the matrix , respectively. Choi (2015) proposed the linear shrinkage transformation of a symmetric matrix given by where and denote the sample mean and sample variance of the eigenvalues of , respectively. Choi
Simulation studies
We conducted simulation studies to evaluate finite sample performance of the proposed estimator in three settings. The first scenario mimics the correlation structure of the data from the randomized controlled trial presented in Section 6.1, the second scenario mimics the correlation structure of the data from the study presented in Section 6.2, and the third considers the case in which the true correlation structure is 1-dependent. In all scenarios the estimated unstructured correlation is
Real data analysis
In this section, we present two real data analyses with (i) SB–LOT data from a randomized clinical trial and (ii) Urinary data from an observational study. We compared five GEE estimators as in Section 5 including the estimates from the proposed method.
Discussion and conclusion
We proposed a GEE using the modified working correlation matrix with a linear shrinkage method. The proposed method could broaden choices of working correlation structures to unstructured when a small or negative minimum eigenvalue of the estimated working correlation causes problems. Asymptotically we showed that the proposed estimator has the same distributional property with the estimator without any adjustment, but in finite samples the linear shrinkage can help stabilizing the regression
Acknowledgements
The authors are grateful to Schaper and Brümmer (contact: Dr. Hans-Heinrich Henneicke-von Zepelin) for making the SB-LOT data available. The authors also thank Nicole Heßler for the SB-LOT data re-analysis.
References (27)
- et al.
A well-conditioned estimator for large-dimensional covariance matrices
J. Multivariate Anal.
(2004) - et al.
Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: A case study
Comput. Statist. Data Anal.
(2012) - et al.
A Generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data
Biometrics
(1995) - et al.
Working covariance model selection for generalized estimating equations
Stat. Med.
(2011) Positive-definite correction of covariance matrix estimators via linear shrinkage
(2015)On the use of a working correlation matrix in using generalized linear models for repeated measurements
Biometrika
(1995)- et al.
Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models
J. Amer. Statist. Assoc.
(1999) - et al.
Shrinkage estimators for covariance matrices
Biometrics
(2001) - et al.
Criterion for the selection of a working correlation structure in the generalized estimating equation approach for longitudinal balanced data
Comm. Statist. - Theory Methods
(2011) The variational form of certain Bayes estimators
Ann. Statist.
(1991)
Working-correlation-structure identification in generalized estimating equations
Stat. Med.
Nonparametric eigenvalue-regularized precision or covariance matrix estimator
Ann. Statist.
Nonlinear shrinkage estimation of large-dimensional covariance matrices
Ann. Statist.
Cited by (4)
Novel method for ecosystem services assessment and analysis of road-effect zones
2024, Transportation Research Part D: Transport and EnvironmentEEG differentiates left and right imagined Lower Limb movement
2021, Gait and PostureCitation Excerpt :Analyses were undertaken to identify the EEG data, from the different frequency band/electrode combinations, that significantly predicted when the visual stimulus was left versus right stepping during the imagined condition across all 16 participants. Given that: 1) the outcome was binary (left/right), 2) the predictors were continuous (EEG signals), and 3) multiple steps were processed (60 steps each left and right) and nested within the 16 subjects, general estimating equations (GEE) analyses using the IBM SPSS program was used [31,32]. GEE accounts for the correlation within these nested data.
Exploration of roadway factors and habitat quality using InVEST
2020, Transportation Research Part D: Transport and EnvironmentFixed support positive-definite modification of covariance matrix estimators via linear shrinkage
2019, Journal of Multivariate Analysis