Generalized estimating equations with stabilized working correlation structure

doi:10.1016/j.csda.2016.08.016

Computational Statistics & Data Analysis

Volume 106, February 2017, Pages 1-11

https://doi.org/10.1016/j.csda.2016.08.016 Get rights and content

Abstract

Generalized estimating equations (GEE) proposed by Liang and Zeger (1986) yield a consistent estimator for the regression parameter without correctly specifying the correlation structure of the repeatedly measured outcomes. It is well known that the efficiency of regression coefficient estimator increases with correctly specified working correlation and thus unstructured correlation could be a good candidate. However, lack of positive-definiteness of the estimated correlation matrix in unbalanced case causes practitioners to choose independent, autoregressive or exchangeable matrices as working correlation structure. Our goal is to broaden practical choices of working correlation structure to unstructured correlation matrix or any other matrices by proposing a GEE with a stabilized working correlation matrix via linear shrinkage method in which the minimum eigenvalue is forced to be bounded below by a small positive number. We show that the resulting regression estimator of GEE is asymptotically equivalent to that of the original GEE. Simulation studies show that the proposed modification can stabilize the variance of the GEE regression estimator with unstructured working correlation, and improve efficiency over popular choices of working correlation. Two real data examples are presented where the standard error of the regression coefficient estimator can be reduced using the proposed method.

Introduction

Generalized estimating equations (GEE) proposed by Liang and Zeger (1986) have been a popular analytic tool for correlated data. A consistent estimator for the regression parameter can be achieved without correctly specifying the correlation structure of the repeatedly measured outcomes. However, the efficiency of regression coefficient estimator increases if the working correlation matrix is close to the true one (Albert and McShane, 1995). Structured working correlations such as independent, autoregressive and exchangeable are available from built-in functions from software. These choices give a manageable number of parameters in the correlation matrix, and can be helpful when the sample size is small and the number of time points is large. To select a working correlation matrix from various choices, criteria such as the ‘quasi-likelihood under the independence model criterion’ (Pan, 2001) and the ‘correlation information criterion’ (Hin and Wang, 2009) have been proposed among others (Carey and Wang, 2011, Gosho et al., 2011, Zhou et al., 2012, Westgate, 2013, Westgate, 2014). The unstructured working correlation matrix can correctly model the correlation structure and is available from built-in functions from software, but the number of unknown parameters increases as the number of time points. When the sample size is small relative to the number of time points, variability of many nuisance parameters in the unstructured correlation matrix affects the variance of the regression parameter estimators, and Westgate (2013) proposed a method to address this problem. However, when the maximum of numbers of time points is fixed, the asymptotic variance of the regression coefficient estimator is unaffected by the variance of the correlation estimator, and reducing the number of parameters does not lead to gain in asymptotic efficiency of the regression coefficient estimator. Misspecification of working correlation could not only lead to loss of efficiency, but more seriously, could lead to infeasibility of the GEE solutions (Qu et al., 2008, Wang and Carey, 2004). Despite these shortcomings, choosing aforementioned structured working correlation matrix guarantees the correlation matrix to be positive definite. The estimated unstructured correlation matrix sometimes fails to be positive definite due to varying numbers of subunits, in which case the GEE estimates are not defined. Even when the estimated unstructured matrix is positive definite, if the minimum eigenvalue is small, the coefficient estimate can be unstable and the standard error of regression parameter estimates can be large (Vens and Ziegler, 2012). If lack of positive definiteness can be solved, the unstructured working correlation matrix can be an attractive choice since it improves the asymptotic variance of the regression coefficient estimator.

Many researchers have worked on solving lack of positive-definiteness of the sample covariance matrix mainly by replacing the eigenvalues of sample covariance matrix by their linear or nonlinear transforms (Stein, 1956, Haff, 1991, Daniels and Kass, 1999, Daniels and Kass, 2001, Ledoit and Wolf, 2004, Schäfer and Strimmer, 2005, Ledoit and Wolf, 2012, Won et al., 2013, Lam, 2016). In a regression setting with longitudinal data, Daniels and Kass (2001) obtained stabilized regression coefficients estimators by placing a normally-distributed prior to the logarithm of the sample eigenvalues. This method requires that the eigenvalues of the sample covariance matrix are positive.

Our goal is to broaden practical choices of working correlation structure to unstructured correlation matrix by alleviating problems due to lack of positive definiteness. To achieve this goal we propose to modify working correlation matrix by linear shrinkage method proposed by Choi (2015). We show that the resulting regression estimator of GEE is asymptotically equivalent to that of the original GEE. Simulation studies show that the proposed modification has advantages in cases where the minimum eigenvalue of the estimated working correlation structure is small. Two real data examples are presented where the standard error of the regression coefficient estimator is reduced using the proposed method.

Section snippets

Basic notations

We denote the $n_{i} \times 1$ vector of the outcomes and the $n_{i} \times p$ matrix of covariates for the $i$ th subject $(i = 1, \dots, K)$ by $y_{i} = {(y_{i 1}, y_{i 2}, \dots, y_{i n_{i}})}^{T}$ and $X_{i} = {(x_{i 1}, x_{i 2}, \dots, x_{i n_{i}})}^{T}$ , respectively. We assume that the first two moments of $y_{i j}$ are given by $E (y_{i j} ∣ x_{i j}) = μ_{i j} = g (η_{i j}) = g (x_{i j}^{T} β), and Var (y_{i j} ∣ x_{i j}) = ϕ a (μ_{i j}),$ where $β$ is a $p \times 1$ regression parameter, and $g^{- 1} (\cdot)$ is a link function. The true $n_{i} \times n_{i}$ covariance matrix of $y_{i}$ given $X_{i}$ , $Var (y_{i} ∣ X_{i})$ is denoted by $Ω_{i}$ . Let the maximum of $n_{i}$ be $q$ , and assume that $q$ is bounded. The

Motivation

To motivate the proposed method, we first quantify the loss of the asymptotic relative efficiency (ARE) by limiting the choice of working correlation structure to exchangeable and autoregressive of order 1 (AR-1). This quantification demonstrates a range of potential improvement in efficiency by using the unstructured working correlation structure. Second, we quantify prevalence of a negative minimum eigenvalue for the estimated unstructured correlation matrix.

First, Fig. 1 shows how much

GEE using the working correlation matrix with linear shrinkage

Let $ϵ$ be a positive number smaller than minimum of eigenvalues of true correlation matrix and $γ_{\min} (A)$ and $γ_{\max} (A)$ be minimum and maximum eigenvalues of the matrix $A$ , respectively. Choi (2015) proposed the linear shrinkage transformation of a symmetric matrix given by $\tilde{R} (α, ϵ) = t R (α) + (1 - t) ν I_{q},$ where $t = {\begin{cases} 1 & if γ_{\min} {R (α)} \geq ϵ \\ (ν - ϵ) / (ν - γ_{\min} {R (α)}) & if γ_{\min} {R (α)} < ϵ, \end{cases}$ $ν = max {\frac{γ_{\max} {R (α)} + γ_{\min} {R (α)}}{2}, M + \frac{V}{M - γ_{\min} {R (α)}}},$ $M$ and $V$ denote the sample mean and sample variance of the eigenvalues of $R (α)$ , respectively. Choi

Simulation studies

We conducted simulation studies to evaluate finite sample performance of the proposed estimator in three settings. The first scenario mimics the correlation structure of the data from the randomized controlled trial presented in Section 6.1, the second scenario mimics the correlation structure of the data from the study presented in Section 6.2, and the third considers the case in which the true correlation structure is 1-dependent. In all scenarios the estimated unstructured correlation is

Real data analysis

In this section, we present two real data analyses with (i) SB–LOT data from a randomized clinical trial and (ii) Urinary data from an observational study. We compared five GEE estimators as in Section 5 including the estimates from the proposed method.

Discussion and conclusion

We proposed a GEE using the modified working correlation matrix with a linear shrinkage method. The proposed method could broaden choices of working correlation structures to unstructured when a small or negative minimum eigenvalue of the estimated working correlation causes problems. Asymptotically we showed that the proposed estimator has the same distributional property with the estimator without any adjustment, but in finite samples the linear shrinkage can help stabilizing the regression

Acknowledgements

The authors are grateful to Schaper and Brümmer (contact: Dr. Hans-Heinrich Henneicke-von Zepelin) for making the SB-LOT data available. The authors also thank Nicole Heßler for the SB-LOT data re-analysis.

References (27)

O. Ledoit et al.
A well-conditioned estimator for large-dimensional covariance matrices
J. Multivariate Anal.
(2004)
M. Vens et al.
Generalized estimating equations and regression diagnostics for longitudinal controlled clinical trials: A case study
Comput. Statist. Data Anal.
(2012)
P.S. Albert et al.
A Generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data
Biometrics
(1995)
V. Carey et al.
Working covariance model selection for generalized estimating equations
Stat. Med.
(2011)
Y.-G. Choi
Positive-definite correction of covariance matrix estimators via linear shrinkage
(2015)
M. Crowder
On the use of a working correlation matrix in using generalized linear models for repeated measurements
Biometrika
(1995)
M.J. Daniels et al.
Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models
J. Amer. Statist. Assoc.
(1999)
M.J. Daniels et al.
Shrinkage estimators for covariance matrices
Biometrics
(2001)
M. Gosho et al.
Criterion for the selection of a working correlation structure in the generalized estimating equation approach for longitudinal balanced data
Comm. Statist. - Theory Methods
(2011)
L.R. Haff
The variational form of certain Bayes estimators
Ann. Statist.
(1991)

L.-Y. Hin et al.

Working-correlation-structure identification in generalized estimating equations

Stat. Med.

(2009)

C. Lam

Nonparametric eigenvalue-regularized precision or covariance matrix estimator

Ann. Statist.

(2016)

O. Ledoit et al.

Nonlinear shrinkage estimation of large-dimensional covariance matrices

Ann. Statist.

(2012)

Cited by (4)

Novel method for ecosystem services assessment and analysis of road-effect zones
2024, Transportation Research Part D: Transport and Environment
To explore the cumulative ecological effects of roads in ecologically fragile areas, the Integrated Valuation of Ecosystem Services and Trade-offs (InVEST) tool was applied to assess the spatiotemporal changes in habitat quality, water yield, and soil erosion in road-effect zones of the Western Sichuan Plateau, China. Then, generalized estimating equations were formulated to analyze the impact of synergies among road attributes, climate, topography, land cover, and other factors on ecosystem service changes. The results showed that the habitat quality within the road-effect zones was mostly affected by road grade and structure, and water yield and soil erosion were attributed to the factors of road structure, rainfall, and topography. Roadbed sections had the greatest negative impact on ecosystem services, followed by bridge sections and tunnel sections. Overall, the results of this study address habitat encroachment and soil and water loss in ecologically fragile areas, contributing to knowledge on green infrastructure planning.
EEG differentiates left and right imagined Lower Limb movement
2021, Gait and Posture
Citation Excerpt :
Analyses were undertaken to identify the EEG data, from the different frequency band/electrode combinations, that significantly predicted when the visual stimulus was left versus right stepping during the imagined condition across all 16 participants. Given that: 1) the outcome was binary (left/right), 2) the predictors were continuous (EEG signals), and 3) multiple steps were processed (60 steps each left and right) and nested within the 16 subjects, general estimating equations (GEE) analyses using the IBM SPSS program was used [31,32]. GEE accounts for the correlation within these nested data.
Identifying which EEG signals distinguish left from right leg movements in imagined lower limb movement is crucial to building an effective and efficient brain-computer interface (BCI). Past findings on this issue have been mixed, partly due to the difficulty in collecting and isolating the relevant information. The purpose of this study was to contribute to this new and important literature.
Can left versus right imagined stepping be differentiated using the alpha, beta, and gamma frequencies of EEG data at four electrodes (C1, C2, PO3, and PO4)?
An experiment was conducted with a sample of 16 healthy male participants. They imagined left and right lower limb movements across 60 trials at two time periods separated by one week. Participants were fitted with a 64-electrode headcap, lay supine on a specially designed device and then completed the imagined task while observing a customized computer-generated image of a human walking to signify the left and right steps, respectively.
Findings showed that eight of the twelve frequency bands from 4 EEG electrodes were significant in differentiating imagined left from right lower limb movement. Using these data points, a neural network analysis resulted in an overall participant average test classification accuracy of left versus right movements at 63 %.
Our study provides support for using the alpha, beta and gamma frequency bands at the sensorimotor areas (C1 and C2 electrodes) and incorporating information from the parietal/occipital lobes (PO3 and PO4 electrodes) for focused, real-time EEG signal processing to assist in creating a BCI for those with lower limb compromised mobility.
Exploration of roadway factors and habitat quality using InVEST
2020, Transportation Research Part D: Transport and Environment
Roadways vary in structural, geotechnical, locational, and operational properties, and synergies among these factors may present overwhelming challenges to understanding their full effects on the habitat quality (HQ). To explore the impact of dense roadway networks on an ecologically fragile region in the northwest of China, this study applied the Integrated Valuation of Ecosystem Services and Trade-offs (InVEST) to evaluate the HQ spatiotemporal distribution of the study area. Then, Generalised Estimating Equations (GEE) were formulated to examine the cumulative impact due to the operation of an increasing amount of roadways over the past two decades. According to the results, the influence of different road types on the HQ was apparent within the road-effect zone, and road grading reduction, road length and operation duration increase can harm the HQ within the road-effect zone. Overall, this study generates knowledge concerning the design and operation of environmentally-friendly roadways in ecologically fragile areas.
Fixed support positive-definite modification of covariance matrix estimators via linear shrinkage
2019, Journal of Multivariate Analysis
This paper is concerned with the positive definiteness (PDness) problem in covariance matrix estimation. For high-dimensional data, many regularized estimators have been proposed under structural assumptions on the true covariance matrix, including sparsity. They were shown to be asymptotically consistent and rate-optimal in estimating the true covariance matrix and its structure. However, many of them do not take into account the PDness of the estimator and produce a non-PD estimate. To achieve PDness, researchers considered additional regularizations (or constraints) on eigenvalues, which make both the asymptotic analysis and computation much harder. In this paper, we propose a simple modification of the regularized covariance matrix estimator to make it PD while preserving the support. We revisit the idea of linear shrinkage and propose to take a convex combination between the first-stage estimator (the regularized covariance matrix without PDness) and a given form of diagonal matrix. The proposed modification, which we call the FSPD (Fixed Support and Positive Definiteness) estimator, is shown to preserve the asymptotic properties of the first-stage estimator if the shrinkage parameters are carefully selected. It has a closed form expression and its computation is optimization-free, unlike existing PD sparse estimators. In addition, the FSPD is generic in the sense that it can be applied to any non-PD matrix, including the precision matrix. The FSPD estimator is numerically compared with other sparse PD estimators to understand its finite-sample properties as well as its computational gain. It is also applied to two multivariate procedures relying on the covariance matrix estimator – the linear minimax classification problem and the Markowitz portfolio optimization problem – and is shown to improve substantially the performance of both procedures.

View full text

Generalized estimating equations with stabilized working correlation structure

Abstract

Introduction

Section snippets

Basic notations

Motivation

GEE using the working correlation matrix with linear shrinkage

Simulation studies

Real data analysis

Discussion and conclusion

Acknowledgements

J. Multivariate Anal.

Comput. Statist. Data Anal.

A Generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data

Biometrics

Working covariance model selection for generalized estimating equations

Stat. Med.

Positive-definite correction of covariance matrix estimators via linear shrinkage

On the use of a working correlation matrix in using generalized linear models for repeated measurements

Biometrika

Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models

J. Amer. Statist. Assoc.

Shrinkage estimators for covariance matrices

Biometrics

Criterion for the selection of a working correlation structure in the generalized estimating equation approach for longitudinal balanced data

Comm. Statist. - Theory Methods

The variational form of certain Bayes estimators

Ann. Statist.

Working-correlation-structure identification in generalized estimating equations

Stat. Med.

Nonparametric eigenvalue-regularized precision or covariance matrix estimator

Ann. Statist.

Nonlinear shrinkage estimation of large-dimensional covariance matrices

Ann. Statist.