Generalized estimating equations with stabilized working correlation structure

https://doi.org/10.1016/j.csda.2016.08.016Get rights and content

Abstract

Generalized estimating equations (GEE) proposed by Liang and Zeger (1986) yield a consistent estimator for the regression parameter without correctly specifying the correlation structure of the repeatedly measured outcomes. It is well known that the efficiency of regression coefficient estimator increases with correctly specified working correlation and thus unstructured correlation could be a good candidate. However, lack of positive-definiteness of the estimated correlation matrix in unbalanced case causes practitioners to choose independent, autoregressive or exchangeable matrices as working correlation structure. Our goal is to broaden practical choices of working correlation structure to unstructured correlation matrix or any other matrices by proposing a GEE with a stabilized working correlation matrix via linear shrinkage method in which the minimum eigenvalue is forced to be bounded below by a small positive number. We show that the resulting regression estimator of GEE is asymptotically equivalent to that of the original GEE. Simulation studies show that the proposed modification can stabilize the variance of the GEE regression estimator with unstructured working correlation, and improve efficiency over popular choices of working correlation. Two real data examples are presented where the standard error of the regression coefficient estimator can be reduced using the proposed method.

Introduction

Generalized estimating equations (GEE) proposed by Liang and Zeger (1986) have been a popular analytic tool for correlated data. A consistent estimator for the regression parameter can be achieved without correctly specifying the correlation structure of the repeatedly measured outcomes. However, the efficiency of regression coefficient estimator increases if the working correlation matrix is close to the true one (Albert and McShane, 1995). Structured working correlations such as independent, autoregressive and exchangeable are available from built-in functions from software. These choices give a manageable number of parameters in the correlation matrix, and can be helpful when the sample size is small and the number of time points is large. To select a working correlation matrix from various choices, criteria such as the ‘quasi-likelihood under the independence model criterion’ (Pan, 2001) and the ‘correlation information criterion’ (Hin and Wang, 2009) have been proposed among others (Carey and Wang, 2011, Gosho et al., 2011, Zhou et al., 2012, Westgate, 2013, Westgate, 2014). The unstructured working correlation matrix can correctly model the correlation structure and is available from built-in functions from software, but the number of unknown parameters increases as the number of time points. When the sample size is small relative to the number of time points, variability of many nuisance parameters in the unstructured correlation matrix affects the variance of the regression parameter estimators, and Westgate (2013) proposed a method to address this problem. However, when the maximum of numbers of time points is fixed, the asymptotic variance of the regression coefficient estimator is unaffected by the variance of the correlation estimator, and reducing the number of parameters does not lead to gain in asymptotic efficiency of the regression coefficient estimator. Misspecification of working correlation could not only lead to loss of efficiency, but more seriously, could lead to infeasibility of the GEE solutions (Qu et al., 2008, Wang and Carey, 2004). Despite these shortcomings, choosing aforementioned structured working correlation matrix guarantees the correlation matrix to be positive definite. The estimated unstructured correlation matrix sometimes fails to be positive definite due to varying numbers of subunits, in which case the GEE estimates are not defined. Even when the estimated unstructured matrix is positive definite, if the minimum eigenvalue is small, the coefficient estimate can be unstable and the standard error of regression parameter estimates can be large (Vens and Ziegler, 2012). If lack of positive definiteness can be solved, the unstructured working correlation matrix can be an attractive choice since it improves the asymptotic variance of the regression coefficient estimator.

Many researchers have worked on solving lack of positive-definiteness of the sample covariance matrix mainly by replacing the eigenvalues of sample covariance matrix by their linear or nonlinear transforms (Stein, 1956, Haff, 1991, Daniels and Kass, 1999, Daniels and Kass, 2001, Ledoit and Wolf, 2004, Schäfer and Strimmer, 2005, Ledoit and Wolf, 2012, Won et al., 2013, Lam, 2016). In a regression setting with longitudinal data, Daniels and Kass (2001) obtained stabilized regression coefficients estimators by placing a normally-distributed prior to the logarithm of the sample eigenvalues. This method requires that the eigenvalues of the sample covariance matrix are positive.

Our goal is to broaden practical choices of working correlation structure to unstructured correlation matrix by alleviating problems due to lack of positive definiteness. To achieve this goal we propose to modify working correlation matrix by linear shrinkage method proposed by Choi (2015). We show that the resulting regression estimator of GEE is asymptotically equivalent to that of the original GEE. Simulation studies show that the proposed modification has advantages in cases where the minimum eigenvalue of the estimated working correlation structure is small. Two real data examples are presented where the standard error of the regression coefficient estimator is reduced using the proposed method.

Section snippets

Basic notations

We denote the ni×1 vector of the outcomes and the ni×p matrix of covariates for the ith subject (i=1,,K) by yi=(yi1,yi2,,yini)T and Xi=(xi1,xi2,,xini)T, respectively. We assume that the first two moments of yij are given by E(yijxij)=μij=g(ηij)=g(xijTβ),andVar(yijxij)=ϕa(μij), where β is a p×1 regression parameter, and g1() is a link function. The true ni×ni covariance matrix of yi given Xi, Var(yiXi) is denoted by Ωi. Let the maximum of ni be q, and assume that q is bounded. The

Motivation

To motivate the proposed method, we first quantify the loss of the asymptotic relative efficiency (ARE) by limiting the choice of working correlation structure to exchangeable and autoregressive of order 1 (AR-1). This quantification demonstrates a range of potential improvement in efficiency by using the unstructured working correlation structure. Second, we quantify prevalence of a negative minimum eigenvalue for the estimated unstructured correlation matrix.

First, Fig. 1 shows how much

GEE using the working correlation matrix with linear shrinkage

Let ϵ be a positive number smaller than minimum of eigenvalues of true correlation matrix and γmin(A) and γmax(A) be minimum and maximum eigenvalues of the matrix A, respectively. Choi (2015) proposed the linear shrinkage transformation of a symmetric matrix given by R̃(α,ϵ)=tR(α)+(1t)νIq, where t={1ifγmin{R(α)}ϵ(νϵ)/(νγmin{R(α)})ifγmin{R(α)}<ϵ,ν=max{γmax{R(α)}+γmin{R(α)}2,M+VMγmin{R(α)}},M and V denote the sample mean and sample variance of the eigenvalues of R(α), respectively. Choi

Simulation studies

We conducted simulation studies to evaluate finite sample performance of the proposed estimator in three settings. The first scenario mimics the correlation structure of the data from the randomized controlled trial presented in Section  6.1, the second scenario mimics the correlation structure of the data from the study presented in Section  6.2, and the third considers the case in which the true correlation structure is 1-dependent. In all scenarios the estimated unstructured correlation is

Real data analysis

In this section, we present two real data analyses with (i) SB–LOT data from a randomized clinical trial and (ii) Urinary data from an observational study. We compared five GEE estimators as in Section  5 including the estimates from the proposed method.

Discussion and conclusion

We proposed a GEE using the modified working correlation matrix with a linear shrinkage method. The proposed method could broaden choices of working correlation structures to unstructured when a small or negative minimum eigenvalue of the estimated working correlation causes problems. Asymptotically we showed that the proposed estimator has the same distributional property with the estimator without any adjustment, but in finite samples the linear shrinkage can help stabilizing the regression

Acknowledgements

The authors are grateful to Schaper and Brümmer (contact: Dr. Hans-Heinrich Henneicke-von Zepelin) for making the SB-LOT data available. The authors also thank Nicole Heßler for the SB-LOT data re-analysis.

References (27)

  • O. Ledoit et al.

    A well-conditioned estimator for large-dimensional covariance matrices

    J. Multivariate Anal.

    (2004)
  • M. Vens et al.

    Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: A case study

    Comput. Statist. Data Anal.

    (2012)
  • P.S. Albert et al.

    A Generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data

    Biometrics

    (1995)
  • V. Carey et al.

    Working covariance model selection for generalized estimating equations

    Stat. Med.

    (2011)
  • Y.-G. Choi

    Positive-definite correction of covariance matrix estimators via linear shrinkage

    (2015)
  • M. Crowder

    On the use of a working correlation matrix in using generalized linear models for repeated measurements

    Biometrika

    (1995)
  • M.J. Daniels et al.

    Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models

    J. Amer. Statist. Assoc.

    (1999)
  • M.J. Daniels et al.

    Shrinkage estimators for covariance matrices

    Biometrics

    (2001)
  • M. Gosho et al.

    Criterion for the selection of a working correlation structure in the generalized estimating equation approach for longitudinal balanced data

    Comm. Statist. - Theory Methods

    (2011)
  • L.R. Haff

    The variational form of certain Bayes estimators

    Ann. Statist.

    (1991)
  • L.-Y. Hin et al.

    Working-correlation-structure identification in generalized estimating equations

    Stat. Med.

    (2009)
  • C. Lam

    Nonparametric eigenvalue-regularized precision or covariance matrix estimator

    Ann. Statist.

    (2016)
  • O. Ledoit et al.

    Nonlinear shrinkage estimation of large-dimensional covariance matrices

    Ann. Statist.

    (2012)
  • Cited by (4)

    • Novel method for ecosystem services assessment and analysis of road-effect zones

      2024, Transportation Research Part D: Transport and Environment
    • EEG differentiates left and right imagined Lower Limb movement

      2021, Gait and Posture
      Citation Excerpt :

      Analyses were undertaken to identify the EEG data, from the different frequency band/electrode combinations, that significantly predicted when the visual stimulus was left versus right stepping during the imagined condition across all 16 participants. Given that: 1) the outcome was binary (left/right), 2) the predictors were continuous (EEG signals), and 3) multiple steps were processed (60 steps each left and right) and nested within the 16 subjects, general estimating equations (GEE) analyses using the IBM SPSS program was used [31,32]. GEE accounts for the correlation within these nested data.

    • Exploration of roadway factors and habitat quality using InVEST

      2020, Transportation Research Part D: Transport and Environment
    View full text