Skip to main content

Advertisement

Log in

Joint Analysis of Survival Time and Longitudinal Categorical Outcomes

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

In biomedical or public health research, it is common for both survival time and longitudinal categorical outcomes to be collected for a subject, along with the subject’s characteristics or risk factors. Investigators are often interested in finding important variables for predicting both survival time and longitudinal outcomes which could be correlated within the same subject. Existing approaches for such joint analyses deal with continuous longitudinal outcomes. New statistical methods need to be developed for categorical longitudinal outcomes. We propose to simultaneously model the survival time with a stratified Cox proportional hazards model and the longitudinal categorical outcomes with a generalized linear mixed model. Random effects are introduced to account for the dependence between survival time and longitudinal outcomes due to unobserved factors. The Expectation–Maximization (EM) algorithm is used to derive the point estimates for the model parameters, and the observed information matrix is adopted to estimate their asymptotic variances. Asymptotic properties for our proposed maximum likelihood estimators are established using the theory of empirical processes. The method is demonstrated to perform well in finite samples via simulation studies. We illustrate our approach with data from the Carolina Head and Neck Cancer Study (CHANCE) and compare the results based on our simultaneous analysis and the separately conducted analyses using the generalized linear mixed model and the Cox proportional hazards model. Our proposed method identifies more predictors than by separate analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Albert PS, Follmann DA (2000) Modeling repeated count data subject to informative dropout. Biometrics 56:667–677

    Article  MATH  Google Scholar 

  2. Albert PS, Follmann DA (2007) Random effects and latent processes approaches for analyzing binary longitudinal data with missingness: a comparison of approaches using opiate clinical trial data. Stat Methods Med Res 16:417–439

    Article  MathSciNet  Google Scholar 

  3. Albert PS, Follmann DA, Wang SA, Suh EB (2002) A latent autoregressive model for longitudinal binary data subject to informative missingness. Biometrics 58:631–642

    Article  MATH  MathSciNet  Google Scholar 

  4. Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins University Press, Baltimore

    MATH  Google Scholar 

  5. Brown ER, Ibrahim JG (2003) A Bayesian semiparametric joint hierarchical model for longitudinal and survival data. Biometrics 59:221–228

    Article  MATH  MathSciNet  Google Scholar 

  6. Chakraborty A, Das K (2010) Inferences for joint modelling of repeated ordinal scores and time to event data. Comput Math Methods Med 11:281–295

    Article  MATH  MathSciNet  Google Scholar 

  7. Chen W, Ghosh D, Raghunathan TE, Sargent DJ (2009) Bayesian variable selection with joint modeling of categorical and survival outcomes: an application to individualizing chemotherapy treatment in advanced colorectal cancer. Biometrics 65:1030–1040

    Article  MATH  MathSciNet  Google Scholar 

  8. Chen MH, Ibrahim JG, Lipsitz SR (2002) Bayesian methods for missing covariates in cure rate models. Lifetime Data Anal 8:117–146

    Article  MATH  MathSciNet  Google Scholar 

  9. Ding J, Wang JL (2008) Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics 64:546–556

    Article  MATH  MathSciNet  Google Scholar 

  10. Divaris K, Olshan AF, Smith J, Bell ME, Weissler MC, Funkhouser WK, Bradshaw PT (2010) Oral health and risk for head and neck squamous cell carcinoma: the Carolina head and neck cancer study. Cancer Causes Control 21:567–575

    Article  Google Scholar 

  11. Dunson DB, Herring AH (2010) Bayesian latent variable models for mixed discrete outcomes. Biostatistics 6:11–25

    Article  Google Scholar 

  12. Elashoff RM, Li G, Li N (2007) An approach to joint analysis of longitudinal measurements and competing risks failure time data. Stat Med 26:2813–2835

    Article  MathSciNet  Google Scholar 

  13. Elashoff RM, Li G, Li N (2008) A joint model for longitudinal measurements and survival data in the presence of multiple failure types. Biometrics 64:762–771

    Article  MATH  MathSciNet  Google Scholar 

  14. Faucett CL, Schenker N, Elashoff RM (1998) Analysis of censored survival data with intermittently observed time-dependent binary covariates. J Am Stat Assoc 93:427–437

    Article  MATH  Google Scholar 

  15. Faucett CJ, Thomas DC (1996) Simultaneously modeling censored survival data and repeatedly measured covariates: a Gibbs sampling approach. Stat Med 15:1663–1685

    Article  Google Scholar 

  16. Henderson R, Diggle P, Dobson A (2000) Joint modeling of longitudinal measurements and event time data. Biometrics 4:465–480

    Google Scholar 

  17. Hogan J, Laird N (1997) Mixture models for the joint distribution of repeated measures and event times. Stat Med 16:239–257

    Article  Google Scholar 

  18. Hu W, Li G, Li N (2009) A Bayesian approach to joint analysis of longitudinal measurements and competing risks failure time data. Stat Med 28:1601–1619

    Article  MathSciNet  Google Scholar 

  19. Huang X, Li G, Elashoff RM, Pan J (2011) A general joint model for longitudinal measurements and competing risks survival data with heterogeneous random effects. Lifetime Data Anal 17:80–100

    Article  MathSciNet  Google Scholar 

  20. Huang W, Zeger S, Anthony J, Garrett E (2001) Latent variable model for joint analysis of multiple repeated measures and bivariate event times. J Am Stat Assoc 96:906–914

    Article  MATH  MathSciNet  Google Scholar 

  21. Larsen K (2004) Joint analysis of time-to-event and multiple binary indicators of latent classes. Biometrics 60:85–92

    Article  MATH  MathSciNet  Google Scholar 

  22. Lin HQ, McCulloch CE, Mayne ST (2002) Maximum likelihood estimation in the joint analysis of time-to-event and multiple longitudinal variables. Stat Med 21:2369–2382

    Article  Google Scholar 

  23. Liu L, Ma JZ, O’Quigley J (2008) Joint analysis of multi-level repeated measures data and survival: an application to the end stage renal disease (ESRD) data. Stat Med 27:5676–5691

    MathSciNet  Google Scholar 

  24. Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc B 44:226–233

    MATH  MathSciNet  Google Scholar 

  25. Parner E (1998) Asymptotic theory for the correlated gamma-frailty model. Ann Stat 26:183–214

    Article  MATH  MathSciNet  Google Scholar 

  26. Pulkstenis EP, Ten Have TR, Landis JR (1998) Model for the analysis of binary longitudinal pain data subject to informative dropout through remedication. J Am Stat Assoc 93:438–450

    Article  MATH  Google Scholar 

  27. Ribaudo HJ, Thompson SG, Allen-Mersh TG (2000) A joint analysis of quality of life and survival using a random effect selection model. Stat Med 19:3237–3250

    Article  Google Scholar 

  28. Rizopoulos D, Verbeke G, Lesaffre E, Vanrenterghem Y (2008) A two-part joint model for the analysis of survival and longitudinal binary data with excess zeros. Biometrics 64:611–619

    Article  MATH  MathSciNet  Google Scholar 

  29. Rizopoulos D, Verbeke G, Molenberghs G (2008) Shared parameter models under random effects misspecification. Biometrika 95:63–74

    Article  MATH  MathSciNet  Google Scholar 

  30. Rudin W (1973) Functional analysis. McGraw-Hill, New York

    MATH  Google Scholar 

  31. Satterthwaite FW (1946) An approximate distribution of estimates of variance components. Biometrics 2:110–114

    Article  Google Scholar 

  32. Schluchter MD (1992) Methods for the analysis of informatively censored longitudinal data. Stat Med 11:1861–1870

    Article  Google Scholar 

  33. Song X, Davidian M, Tsiatis AA (2002) A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics 58:742–753

    Article  MATH  MathSciNet  Google Scholar 

  34. Song X, Wang CY (2007) Semiparametric approaches for joint modeling of longitudinal and survival data with time-varying coefficients. Biometrics 64:557–566

    Article  Google Scholar 

  35. Tseng YK, Hsieh R, Wang JL (2005) Joint modelling of accelerated failure time and longitudinal data. Biometrika 92:587–603

    Article  MATH  MathSciNet  Google Scholar 

  36. Tsiatis AA, De Gruttola V, Wulfsohn M (1995) Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. J Am Stat Assoc 90:27–37

    Article  MATH  Google Scholar 

  37. Tsiatis AA, Davidian M (2001) A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika 88:447–458

    Article  MATH  MathSciNet  Google Scholar 

  38. van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  39. van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York

    Book  MATH  Google Scholar 

  40. Wang Y, Taylor JMG (2001) Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. J Am Stat Assoc 96:895–905

    Article  MATH  MathSciNet  Google Scholar 

  41. Wu M, Bailey K (1989) Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics 45:939–955

    Article  MATH  MathSciNet  Google Scholar 

  42. Wu M, Carroll R (1988) Estimation and comparison of changes in the presence of informative right censoring by modelling the censoring process. Biometrics 44:175–188

    Article  MATH  MathSciNet  Google Scholar 

  43. Wulfsohn M, Tsiatis AA (1997) A joint model for survival and longitudinal data measured with error. Biometrics 53:330–339

    Article  MATH  MathSciNet  Google Scholar 

  44. Xu J, Zeger S (2001) The evaluation of multiple surrogate endpoints. Biometrics 57:81–87

    Article  MATH  MathSciNet  Google Scholar 

  45. Xu J, Zeger S (2001) Joint analysis of longitudinal data comprising repeated measures and times to events. Appl Stat 50:375–387

    MATH  MathSciNet  Google Scholar 

  46. Yao F (2008) Functional approach of flexibly modelling generalized longitudinal data and survival time. J Stat Plan Inference 138:995–1009

    Article  MATH  Google Scholar 

  47. Ye W, Lin XH, Taylor JMG (2008) Semiparametric modeling of longitudinal measurements and time-to-event data—a two-stage regression calibration approach. Biometrics 64:1238–1246

    Article  MATH  MathSciNet  Google Scholar 

  48. Zeng D, Cai J (2005) Simultaneous modelling of survival and longitudinal data with an application to repeated quality of life measures. Lifetime Data Anal 11:151–174

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the editor, the associate editor, and two referees of the Statistics in Bioscience for their valuable suggestions that considerably improved this article. This research was supported by the National Institutes of Health grant P01 CA142538 and the National Center for Research Resources grant UL1 RR025747.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaeun Choi.

Appendices

Appendix A: EM Algorithms

1.1 A.1 EM Algorithm—Binary Longitudinal Data and Survival Time

(1) E-step: For binary longitudinal outcomes and survival time, we calculate the conditional expectation of q(b i ) for subject i with S i =s given the observations and the current estimate \((\boldsymbol{\theta}^{(k)}, \varLambda_{s}^{(k)})\) for some known function q(⋅). The conditional expectation denoted by \(E [ q(\boldsymbol{b}_{i}) | \boldsymbol{\theta}^{(k)}, \varLambda_{s}^{(k)} ]\) can be expressed as the following:

Given the current estimate \((\boldsymbol{\theta}^{(k)}, \varLambda_{s}^{(k)})\),

(6)

where

(7)

\(( \boldsymbol{\Sigma}_{b}^{(k)} )^{\frac{1}{2}}\) is an unique non-negative square root of \(\boldsymbol{\Sigma}_{b}^{(k)}\) (i.e. \(( \boldsymbol{\Sigma}_{b}^{(k)} )^{\frac{1}{2}} \times ( \boldsymbol{\Sigma}_{b}^{(k)} )^{\frac{1}{2}} = \boldsymbol {\varSigma}_{b}^{(k)}\)), and z G follows a multivariate Gaussian distribution with mean zero.

(2) M-step: Since the parameter ϕ is set to 1 for logistic distribution, we estimate only β in the longitudinal process. β (k+1) solves the conditional expectation of complete data log-likelihood score equation, using one-step Newton–Raphson iteration,

\(\boldsymbol{\Sigma}_{b} ^{(k+1)}\), ψ (k+1),γ (k+1) and \(\varLambda_{s}^{(k+1)}\) have the same expressions as in Sect. 2.2.

1.2 A.2 EM Algorithm—Poisson Longitudinal Data and Survival Time

(1) E-step: For Poisson longitudinal outcomes and survival time, given the current estimate \((\boldsymbol{\theta}^{(k)}, \varLambda_{s}^{(k)})\), the conditional expectation denoted by \(E [ q(\boldsymbol{b}_{i}) | \boldsymbol{\theta}^{(k)}, \varLambda_{s}^{(k)} ]\) can be expressed as in (6) with R(z G ) defined in (7):

and z G follows a multivariate Gaussian distribution with mean zero.

(2) M-step: Since the parameter ϕ is set to 1 for Poisson distribution, we estimate only β in the longitudinal process. β (k+1) solves the conditional expectation of complete data log-likelihood score equation, using one-step Newton–Raphson iteration,

\(\boldsymbol{\Sigma}_{b} ^{(k+1)}\), ψ (k+1),γ (k+1) and \(\varLambda_{s}^{(k+1)}\) have the same expressions as in Sect. 2.2.

Appendix B: Proofs for Theorems

In Appendices B.1 and B.2, we sketch the proofs for Theorems 1 and 2. All detailed technical proofs are available from the authors. From (3) in Sect. 2.2, we can have the observed log-likelihood function, l f (θ,Λ;Y,V)=log{L f (θ,Λ;Y,V)}. We obtain and use the following modified object function, by replacing λ s (V i ) with Λ s {V i } in l f (θ,Λ;Y,V) where Λ s {V i } is the jump size of Λ s (t) at the observed time V i with Δ i =1,

(8)

and \((\widehat{\boldsymbol{\theta}},\widehat{\boldsymbol{\Lambda }})\) maximizes l n (θ,Λ) over the space \(\{ (\boldsymbol{\theta}, \boldsymbol{\Lambda}) : \boldsymbol{\theta}\in\varTheta, \boldsymbol {\varLambda}\in \mathbb{W}\times\mathbb{W}\times \cdots\times\mathbb{W}\}\), where \(\mathbb{W}\) consists of all the right-continuous step functions only; that is, \(\boldsymbol{\Lambda}= (\varLambda_{1}, \ldots, \varLambda_{S})^{T}, s=1, \ldots, S, \varLambda_{s} \in \mathbb{W}\). For the proofs of both Theorems 1 and 2, the modified object function is used in place of the observed log-likelihood function.

2.1 B.1 Proof of Consistency—Theorem 1

Consistency can be proved by verifying the following three steps: First, we show the maximum likelihood estimate \((\widehat{\boldsymbol{\theta}}, \widehat{\boldsymbol{\Lambda}})\) exists. Second, we show that, with probability one, \(\widehat{\varLambda}_{s} (\tau)\), s=1,…,S, are bounded as n→∞. Third, if the second step is true, by Helly’s selection theorem [38], we can choose a subsequence of \(\widehat{\varLambda}_{s}(t) \) such that \(\widehat{\varLambda}_{s}(t) \) weakly converges to some right-continuous monotone function \(\varLambda^{*}_{s}(t) \) with probability one. For any subsequence, we can find a further subsequence, still denoted as \(\widehat{\boldsymbol{\theta}}\), such that \(\widehat{\boldsymbol {\theta}} \rightarrow\boldsymbol{\theta}^{*}\). Thus, in the third step, we show θ =θ 0 and \(\varLambda_{s}^{*} = \boldsymbol{\Lambda}_{s0}\), s=1,…,S. Once the three steps are completed, we can conclude that, with probability one, \(\widehat{\boldsymbol{\theta}}\) converges to θ 0 and \(\widehat{\boldsymbol{\Lambda}}_{s}(t) \) converges to Λ s0(t) in [0,τ], s=1,…,S. Moreover, since Λ s0(t) is right-continuous in [0,τ], the latter can be strengthened to uniform convergence; that is, \(\sup_{t \in[0, \tau]} \| \widehat{\boldsymbol{\Lambda}} (t) - \boldsymbol{\Lambda}_{0} (t) \| \rightarrow0 \) a.s. Then, the proof of Theorem 1 will be done.

In the first step, since θ belongs to a compact set Θ by Assumption (A1), it is sufficient to show that Λ s {V i } with Δ i =1 is finite. For each subject i with Δ i =1, after simple algebra, we have that, from (8),

If Λ s {V i }→∞ for some i with Δ i =1, then l n (θ,Λ)→−∞, which is contradictory to that l n (θ,Λ) is bounded. Therefore, we conclude that Λ s {⋅} must be finite. By the conclusion and assumption (A1), the maximum likelihood estimate \((\widehat{\boldsymbol{\theta}}, \widehat{\boldsymbol{\Lambda}})\) exists.

In the second step, we define \(\widehat{\zeta}_{s} = \log\widehat{\varLambda}_{s} (\tau) \) and rescale \(\widehat{\varLambda}_{s}\) by the factor \(e^{\widehat{\zeta}_{s}}\). Then, we let \(\widetilde{\varLambda}_{s}\) denote the rescaled function; that is, \(\widetilde{\varLambda}_{s} (t) = \widehat{\varLambda}_{s} (t) / \widehat{\varLambda}_{s} (\tau) = \widehat{\varLambda}_{s} (t) e^{-\widehat{\zeta}_{s} }\). thus, \(\widetilde{\varLambda}_{s} (\tau) =1\). To prove this second step, it is sufficient to show that \(\widehat{\zeta}_{s}\) is bounded. After some algebra in (8), we obtain that, for any \(\boldsymbol{\Lambda}\in\mathbb{W}\times \mathbb{W} \cdots\times\mathbb{W}\),

where

and

Thus, since \(0 \le n^{-1} l_{n} (\widehat{\boldsymbol{\theta}}, \widehat{\boldsymbol{\Lambda}}) - n^{-1} l_{n} (\widehat{\boldsymbol {\theta}}, \widetilde{\boldsymbol{\Lambda}}) \) where \(\widehat{\boldsymbol {\varLambda}} = e^{\widehat{\boldsymbol{\xi}}} \circ\widetilde{\boldsymbol {\varLambda}}\), it follows that

(9)

According to the assumption (A2), there exist some positive constants C1, C2 and C3 such that \(| Q_{1i} ( t, \boldsymbol{b}_{0}, \widehat{\boldsymbol{\theta}})| \le C_{1} \|\boldsymbol {b}_{0}\|+C_{2}\|\boldsymbol{Y}_{i} \| + C_{3}\). By denoting b 0 as a vector of variables following a standard multivariate normal distribution, from concavity of the logarithm function, in the third term of (9),

where C 4 and C 5 are positive constants. Since it is easily verified that

$$\operatorname{E}_{\boldsymbol{b}_0} \Biggl[ \sum_{ j= 1}^{n_i} \frac{B(\widehat{\boldsymbol{\beta}}; \boldsymbol{b}_0)}{A(D_{i} (t_{j} ; \widehat{\phi} ) )} + e^{ C_1 \|\boldsymbol{b}_0\|+C_2\|\boldsymbol{Y}_{ i} \| + C_3 } \Biggr] < \infty, $$

by the strong law of large numbers and the assumption (A4), the third term of (9) can be bounded by a constant C 6. i.e. \(- \frac{1}{n} \sum_{i=1}^{n} H_{i} \le\frac{1}{n} \sum_{i=1}^{n} (e^{C_{2}\|\boldsymbol{Y}_{i} \| + C_{4}} + C_{5}) \triangleq C_{6} \). Then, (9) becomes

(10)

where C 7 is a constant. On the other hand, since, for any Γ≥0 and x>0, Γlog(1+x/Γ)≤Γx/Γ=x, we have that e x≤(1+x/Γ)Γ. Therefore, with \(Q_{1i} (t, \boldsymbol{b}_{0}, \widehat{\boldsymbol{\theta}} ) \ge-C_{1} \|\boldsymbol{b}_{0}\| - C_{2} \| \boldsymbol{Y}_{i} \| -C_{3} \), (10) gives that

(11)

where C 8(Γ) is a deterministic function of Γ. For the sth stratum, (11) is

By the strong law of large numbers, \(\sum_{i=1}^{n} I(V_{i}=\tau) I(S_{i}=s) / n \longrightarrow P (V_{i} = \tau, S_{i} = s) > 0\). Then, we can choose Γ large enough such that \(\sum_{i=1}^{n} \Delta_{i} I(S_{i}=s)/ n \le(\varGamma/2n) \sum_{i=1}^{n} I(V_{i}=\tau) I(S_{i}=s)\). Thus, we obtain \(0 \le C_{7} + C_{8} (\varGamma) - \frac{\varGamma }{2n}\times \sum_{i=1}^{n} I(V_{i}=\tau) I(S_{i}=s) \widehat{\zeta}_{s} \) . In other words,

If we denote B s0=exp{2(C 7+C 8(Γ))/(ΓP(V i =τ,S i =s))}, we conclude that \(\widehat{\varLambda}_{s} (\tau) \le B_{s0}, s=1, \ldots, S\). Note that the above arguments hold for every sample in the probability space except a set with zero probability. Therefore, we have shown that, with probability one, \(\widehat{\varLambda}_{s} (\tau) \) is bounded for any sample size n.

In the third step, for convenience, we omit the index i. Then, for the number of observed longitudinal measurements per subject, we use n N instead of the n i without subscript i since we denoted sample size as n. Use O to abbreviate the observed statistics \((\boldsymbol{Y}, \boldsymbol{X}, \widetilde {\boldsymbol{X}} , V, \Delta, n_{N}, s )\) and \(\{ \boldsymbol{Z}(t), \widetilde{\boldsymbol {Z}} (t), 0 \le t \le V \}\) for a subject, and define

and a class , where B s0 is the constant given in the second step and \(\mathbb{W}\) contains all nondecreasing functions in [0,τ]. Employing empirical process formulation, the class can be proved to be as P-Donsker by Theorems 2.5.6 and 2.7.5 in [39]. In stratum s, denote m s as the number of subjects, and V sk and Δ sk as the observed time and censoring indicator for the kth subject, respectively. By differentiating (8) with respect to Λ s {V sk }, we obtain

$$ \widehat{\varLambda}_s \{ V_{sk} \} = \frac{\Delta_{sk} }{ m_s \mathbf{P}_{m_s} \{ I(V_s \ge v) Q (v, \boldsymbol{O}; \widehat {\boldsymbol{\theta}}, \widehat{\varLambda}_s) \} |_{v=V_{sk}}}. $$

We also construct \(\bar{\varLambda}_{s}(t)\), another step function with the jump size \(\bar{\varLambda}_{s} \{ V_{sk} \} \), given by

$$ \bar{\varLambda}_s \{ V_{sk} \} = \frac{\Delta_{sk} }{ m_s \mathbf{P}_{m_s} \{ I(V_s \ge v) Q (v, \boldsymbol{O}; \boldsymbol {\theta}_0, \varLambda_{s0} ) \} |_{v=V_{sk}} }. $$

Through the arguments using empirical process and relevant properties of P-Donsker and Glivenko–Cantelli class, we can prove that \(\bar{\varLambda}_{s} (t)\) uniformly converges to Λ s0(t) in [0,τ]. Next, by the bounded convergence theorem, the fact that \(\widehat{\boldsymbol {\theta}}\) converges to θ and \(\widehat{\varLambda}_{s}\) weakly converges to \(\varLambda_{s}^{*}\) , and the Arzela–Ascoli theorem, we can prove that \(\widehat{\varLambda}_{s} (t)\) uniformly converges to \(\varLambda_{s}^{*} (t) \). Then, from \(n^{-1} l_{n} (\widehat{\boldsymbol {\theta}}, \widehat{\boldsymbol{\Lambda}} ) - n^{-1} l_{n} (\boldsymbol{\theta}_{0}, \bar{\boldsymbol{\Lambda}}) \ge0 \), using the properties of Glivenko–Cantelli class and Kullback–Leibler information, the following holds, with probability one,

$$ \bigl(\lambda_s^* (V_s) \bigr)^{\Delta_s} \int_{\boldsymbol{b}} G \bigl(\boldsymbol{b}, \boldsymbol{O}, \boldsymbol{\theta}^*, \varLambda_s^* \bigr) \,d \boldsymbol{b} = \bigl(\lambda_{s0} (V_s) \bigr)^{\Delta_s} \int_{\boldsymbol{b}} G (\boldsymbol{b}, \boldsymbol{O}, \boldsymbol{\theta}_0, \varLambda _{s0} ) \,d\boldsymbol{b}. $$
(12)

Our proof will be completed if we can show θ =θ 0 and \(\varLambda_{s}^{*} = \varLambda_{s0}\) from (12). To show that β =β 0, ϕ =ϕ 0 and \(\boldsymbol {\varSigma}_{b}^{*} = \boldsymbol{\Sigma}_{b0}\), we let Δ s =0 and V s =0 in (12). By the comparison of the coefficients of Y T Y and Y in the exponential part and the constant term out of the exponential part and assumption (A5), we obtain ϕ =ϕ 0, β =β 0 and \(\boldsymbol {\varSigma}_{b}^{*} = \boldsymbol{\Sigma}_{b0}\). To show that ψ =ψ 0, γ =γ 0 and \(\varLambda_{s}^{*} = \varLambda_{s0}\), we let Δ s =0 in (12). By the similar arguments done for β =β 0, ϕ =ϕ 0 and \(\boldsymbol{\Sigma}_{b}^{*} = \boldsymbol{\Sigma}_{b0}\), both sides of (12) with Δ s =0 are expressed as the expected values with respect to the random effects b following a multivariate normal distribution with mean \(\boldsymbol{\Sigma}_{b0}^{1/2} ( \sum_{j=1}^{n_{N}} Y_{j} \widetilde{\boldsymbol{X}}_{j} / A(D (t_{j} ; \phi_{0}) ) )^{T}\) and covariance Σ b0. By the fact that for any fixed \(\widetilde{\boldsymbol{X}}\), treating \(\widetilde{\boldsymbol {X}}^{T} \boldsymbol{Y}\) as a parameter in the normal family, b is the complete statistic for \(\widetilde{\boldsymbol{X}}^{T} \boldsymbol{Y}\), and the assumptions (A2) and (A5), we have ψ =ψ 0, γ =γ 0 and \(\varLambda_{s}^{*} = \varLambda_{s0}\). Therefore, the proof of Theorem 1 is completed.

2.2 B.2 Proof of Asymptotic Normality—Theorem 2

Asymptotic distribution for the proposed estimators can be shown if we can verify the conditions of Theorem 3.3.1 in [39]. Then, it will be shown that the distribution is Gaussian. For completeness, we use Theorem 4 in [25] which restated the Theorem 3.3.1 of [39].

Theorem 4

[25]

Let U n and U be random maps and a fixed map, respectively, from ξ to a Banach space such that:

  1. (a)

    \(\sqrt{n} (U_{n} - U) ( \widehat{\xi}_{n} ) - \sqrt{n} (U_{n} - U) ( \xi_{0} ) = o_{P}^{*} (1+ \sqrt{n} \| \widehat{\xi}_{n} - \xi_{0} \| )\).

  2. (b)

    The sequence \(\sqrt{n} (U_{n} - U) ( \xi_{0} )\) converges in distribution to a tight random element Z.

  3. (c)

    the function ξU(ξ) is Fréchet differentiable at ξ 0 with a continuously invertible derivative \(\nabla U_{\xi_{0}} \) (on its range).

  4. (d)

    \(U_{\xi_{0}} \) and \(\widehat{\xi}_{n}\) satisfies \(U_{n} ( \widehat {\xi }_{n} ) = o_{P}^{*} (n^{-1/2})\) and converges in outer probability to ξ 0.

Then \(\sqrt{n} ( \widehat{\xi}_{n} - \xi_{0} ) \Rightarrow\nabla U_{\xi_{0}}^{-1} Z \).

In our situation, the parameter ξ s =(θ,Λ s )∈Ξ={(θ,Λ s ):∥θθ 0∥+sup t∈[0,τ]|Λ s (t)−Λ s0(t)|≤δ,s=1,…,S} for a fixed small constant δ. We define , where ∥h 2 V is the total variation of h 2 in [0,τ] defined as

$$ \sup_{0=t_0 \le t_2 \le\cdots\le t_k= \tau} \sum_{j=1}^{k} \big|h_2 (t_j) - h_2 (t_{j-1}) \big|, $$

and also define that, for stratum s,

where l θ (θ,Λ s ) is the first derivative of the log-likelihood function from one single subject belonging to stratum s, denoted by l(O;θ,Λ s ), with respect to θ, and \(l_{\varLambda_{s}} (\boldsymbol {\theta}, \varLambda_{s} )\) is the derivative of l(O;θ,Λ ) at ε=0, where \(\varLambda_{s \varepsilon} (t) = \int_{0}^{t} (1+\varepsilon h_{2} (u) ) \,d \varLambda_{s0} (u)\). Therefore, both \(U_{m_{s}}\) and U s map from Ξ to , and \(\sqrt{m_{s}} \{ U_{m_{s}} (\xi_{s}) - U_{s} (\xi_{s}) \} \) is an empirical process in the space .

Denoting \(\mathbf{h}_{1} = (\mathbf{h}_{1}^{\beta T}, \mathbf{h}_{1}^{\phi T}, \mathbf{h}_{1}^{b T} , \mathbf{h}_{1}^{\psi T} , \mathbf{h}_{1}^{\gamma T} )^{T}\) corresponding to \(\boldsymbol{\theta}= (\boldsymbol{\beta}^{T}, \boldsymbol{\phi}^{T}, \operatorname{Vech}(\boldsymbol{\Sigma}_{b})^{T},\boldsymbol{\psi}^{T},\boldsymbol{\gamma}^{T})^{T}\), for any , the class

can be shown as P-Donsker. From the P-Donsker property, it is also implied that

as ∥θθ 0∥+sup t∈[0,τ]|Λ s (t)−Λ s0(t)|→0. Then, for the conditions (a)–(d) in Theorem 4 of [25], we have that: (a) follows from Lemma 3.3.5 (p. 311) of [39]; (b) holds as a result of P-Donsker property and the convergence is defined in the metric space by the Donsker theorem in [39]; (d) is true because \((\widehat{\boldsymbol{\theta}} , \widehat {\varLambda}_{s}) \) maximizes \(\mathbf{P}_{m_{s}} l(\boldsymbol{O}; \boldsymbol{\theta}, \varLambda_{s})\), (θ 0,Λ s0) maximizes P l(O;θ,Λ s ), and \((\widehat{\boldsymbol {\theta}}, \widehat{\varLambda}_{s}) \) converges to (θ 0,Λ s0) from Theorem 1;

The first half of condition (c), that the function ξU(ξ) is Fréchet differentiable at ξ 0, is proved by showing there exists a bounded linear operator for the function.

Thus, it only remains to prove the second half of condition (c), that the derivative \(\nabla U_{\xi_{0}} \) is continuously invertible on its range . From the proof of the Fréchet differentiability of U(ξ) at ξ 0, we have that for any (θ 1,Λ s1) and (θ 1,Λ s1) in Ξ,

(13)

where both Ω 1 and Ω 2 are bounded linear operators on , and Ω=(Ω 1,Ω 2) maps to R d×BV[0,τ], where BV[0,τ] contains all the functions with finite total variation in [0,τ]. The explicit expressions of Ω 1 and Ω 2 can be obtained from P-Donsker property and the derivation of \(\nabla U_{\xi_{0}} \) by definition. Thus, \(\nabla U_{\xi_{0}}\) is a linear operator from to itself. We note that to prove that \(\nabla U_{\xi_{0}}\) is continuously invertible is equivalent to showing that Ω is invertible. Then, by Theorem 4.25 of [30], for the proof of invertibility of Ω, it is sufficient to verify that Ω is one to one: if Ω[h 1,h 2]=0, then, by choosing θ 1θ 2=ε h 1 and Λ s1Λ s2=ε h 2 s0 in (13) for a small constant ε , we obtain

Since \(\nabla U_{\xi_{0}} ( \mathbf{h}_{1}, \int h_{2} \,d \varLambda_{s0}) [\mathbf{h}_{1}, h_{2}]\) is the negative information matrix in the submodel (θ 0+ε h 1,Λ s0+εh 2 s0), the score function along this submodel is \(l_{\theta} (\boldsymbol{\theta}_{0}, \varLambda_{s0} )^{T} \mathbf{h}_{1} + l_{\varLambda_{s}} (\boldsymbol{\theta}_{0}, \varLambda_{s0} ) [h_{2}]= 0 \); that is, with probability one, the numerator of the score function

(14)

where A′(D(t j ;ϕ 0)) and C′(Y j ;D(t j ;ϕ 0)) are the derivatives of A(D(t j ;ϕ)) and C(Y j ;D(t j ;ϕ)) with respect to ϕ evaluated at ϕ 0 and B′(β 0;b) is the derivative of B(β;b) with respect to β evaluated at β 0. The proof of invertibility of Ω will be completed if we can show h 1=0 and h 2(t)=0 from (14).

To show h 1=0, particularly we let Δ s =0 and V s =0 in (14). Examining the coefficient for Y and the constant terms without Y and using assumption (A5) and the similar arguments done in Appendix B.1 give \(\mathrm{h}_{1}^{\phi}=0\), \(\mathbf{h}_{1}^{\beta}=0\) and D b =0. On the other hand, letting Δ s =0 in (14) with assumptions (A2) and (A5) and the similar arguments done in Appendix B.1 lead to \(\mathbf{h}_{1}^{\psi} = 0\), \(\mathbf{h}_{1}^{\gamma} = 0\) and h 2(t)=0. Hence, the proof of condition (c) is completed.

Since the conditions (a)–(d) have been proved, Theorem 3.3.1 of [39] concludes that \(\sqrt{m_{s}} ( \widehat{\boldsymbol{\theta}} - \boldsymbol{\theta}_{0}, \widehat {\varLambda}_{s} - \varLambda_{s0} )\) weakly converges to a tight random element in . Then, we have

where o P (1) is a random variable which converges to zero in probability in , and, from (13),

By denoting \((\mathbf{h}_{1}^{*}, h_{2}^{*}) = \varOmega^{-1} (\mathbf{h}_{1}, h_{2}) \), we have \((\mathbf{h}_{1}, h_{2}) = \varOmega(\mathbf{h}_{1}^{*}, h_{2}^{*}) \), and by replacing (h 1,h 2) with \((\mathbf{h}_{1}^{*}, h_{2}^{*}) \) in the above two equations, we obtain

(15)

We can see that the first term on the right-hand side in (15) is \(\sqrt{m_{s}} \{ U_{m_{s}} (\boldsymbol {\theta}_{0}, \varLambda_{s0}) - U_{s} (\boldsymbol{\theta}_{0}, \varLambda_{s0}) \} \), which is an empirical process in the space . Furthermore, it is already shown that is P-Donsker. Therefore, \(\sqrt{m_{s}}( \widehat{\boldsymbol{\theta}} - \boldsymbol {\theta}_{0}, \widehat{\varLambda}_{s} - \varLambda_{s0} )\) weakly converges to a Gaussian process in .

Choose h 2=0 in (15) with Proposition 3.3.1 in [4] concludes that \(\widehat{\boldsymbol{\theta}}\) is an efficient estimator for θ 0. Therefore, Theorem 2 is proved.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Choi, J., Cai, J., Zeng, D. et al. Joint Analysis of Survival Time and Longitudinal Categorical Outcomes. Stat Biosci 7, 19–47 (2015). https://doi.org/10.1007/s12561-013-9091-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-013-9091-z

Keywords

Navigation