A Simple GMM Estimator for the Semiparametric Mixed Proportional Hazard Model

Govert E. Bijwaard; Geert Ridder; Tiemen Woutersen

doi:10.1515/jem-2012-0005

Published by De Gruyter May 28, 2013

A Simple GMM Estimator for the Semiparametric Mixed Proportional Hazard Model

Govert E. Bijwaard , Geert Ridder and Tiemen Woutersen

From the journal Journal of Econometric Methods

https://doi.org/10.1515/jem-2012-0005

Showing a limited preview of this publication:

Abstract

Ridder and Woutersen (Ridder, G., and T. Woutersen. 2003. “The Singularity of the Efficiency Bound of the Mixed Proportional Hazard Model.” Econometrica 71: 1579–1589) have shown that under a weak condition on the baseline hazard, there exist root-N consistent estimators of the parameters in a semiparametric Mixed Proportional Hazard model with a parametric baseline hazard and unspecified distribution of the unobserved heterogeneity. We extend the linear rank estimator (LRE) of Tsiatis (Tsiatis, A. A. 1990. “Estimating Regression Parameters using Linear Rank Tests for Censored Data.” Annals of Statistics 18: 354–372) and Robins and Tsiatis (Robins, J. M., and A. A. Tsiatis. 1992. “Semiparametric Estimation of an Accelerated Failure Time Model with Time-Dependent Covariates.” Biometrika 79: 311–319) to this class of models. The optimal LRE is a two-step estimator. We propose a simple one-step estimator that is close to optimal if there is no unobserved heterogeneity. The efficiency gain associated with the optimal LRE increases with the degree of unobserved heterogeneity.

Keywords: counting process; linear rank estimation; mixed proportional hazard; JEL Classification: C41; C14

Corresponding author: Govert E. Bijwaard, Netherlands Interdisciplinary Demographic Institute (NIDI), PO Box 11650, NL-2502, AR, The Hague, The Netherlands

We thank Kei Hirano and Nicole Lott for very helpful comments. We also thank seminar participants at the University of Western Ontario, and the Netherlands Interdisciplinary Demographic Institute. This paper replaces the paper Method of Moments Estimation of Duration Models with Exogenous Regressors (2003). Financial support from NORFACE research programme on Migration in Europe – Social, Economic, Cultural and Policy Dynamics is gratefully acknowledged.

¹
Horowitz (2001, theorem 2.2) averages g_n (X_i); the STATA program on our website is sufficiently fast to apply the bootstrap to most survey datasets.
²
The Brent’s method combines the bisection method, the secant method and inverse quadratic interpolation. The idea is to use the secant method or inverse quadratic interpolation if possible, because they converge faster, but to fall back to the more robust bisection method if necessary. The secant method can be thought of as a finite difference approximation of the Newton-Raphson method. The Powell method extends the Brent method by searching in a specific direction, rather than changing one parameter at the time.
³
See http://publ.nidi.nl/output/other/LRE.zip for the program and http://publ.nidi.nl/output/other/LRE_help.pdf for the help file.
⁴
In the MLE for models with duration dependence, we do not need the standard identification restriction that the unobserved heterogeneity term has mean one because the baseline hazard is normalized to be equal to 1 in the first interval.
⁵
The Gâteaux derivative is a directional derivative; let
and
and η>0 then df(x, a)=lim_η₀[{f(x+aη)–f(x)}/η].
⁶
Our calculations were done in Gauss 6.0 on 3 parallel computers: a Pentium 2.1 PC, a Pentium 2.8 PC and a Pentium 2.0 laptop. The calculations took about 9 weeks of CPU time.
⁷
The LRE with a duration dependence on 10 intervals for a sample size of 500 did not converge in seven of the experiments. The average is therefore base on 93 experiments instead of 100.
⁸
The results for the parameters of the piecewise constant duration dependence, α₂ and α₃, are given in Tables A3 and A4 in Appendix A.
⁹
The Doob-Meyer decomposition theorem is a theorem in stochastic calculus stating the conditions under which a submartingale may be decomposed in a unique way as the sum of a martingale and a continuous increasing process, see Meyer (1963) and Protter (2005).

Appendix A: Additional tables

Table A1

Average Bias of Estimates of the Log α’s Across the Experiments with a Piecewise Constant Duration Dependence on 4 Intervals.

Estimation method		Sample size
		500	1000	5000
MLE no hetero	α₂	–0.0480*	–0.0319*	–0.0095*
		(0.150)	(0.0103)	(0.0042)
	α₃	–0.0082	–0.0127	–0.0094*
		(0.0132)	(0.0088)	(0.0041)
	α₄	–0.0149	–0.0102	–0.0079
		(0.0127)	(0.0089)	(0.0046)
MLE 2 points	α₂	0.0282	0.0257	0.0140*
		(0.0194)	(0.0158)	(0.0053)
	α₃	0.1131*	0.0713*	0.0257*
		(0.0237)	(0.0175)	(0.0064)
	α₄	0.1480*	0.1013*	0.0438*
		(0.0273)	(0.0213)	(0.0076)
NPMLE	α₂	0.0785*	0.0495*	0.0211*
		(0.0210)	(0.0152)	(0.0050)
	α₃	0.2011*	0.1027*	0.389*
		(0.0275)	(0.0183)	(0.0059)
	α₄	0.2835*	0.1782*	0.0612*
		(0.0339)	(0.0228)	(0.0079)
LRE	α₂	–0.0333	–0.0234	–0.0074
		(0.0230)	(0.0184)	(0.0066)
	α₃	0.0391	0.0158	–0.0087
		(0.0306)	(0.0224)	(0.0093)
	α₄	0.0536	0.0264	–0.0109
		(0.0383)	(0.0287)	(0.0128)

^*p<0.05

Table A2

Average Bias of Estimates of the Log α’s Across the Experiments with a Piecewise Constant Duration Dependence on 10 Intervals.

	Sample size			Sample size
	500	1000	5000	500	1000	5000
	MLE no hetero			MLE 2 points
α₂	–0.0240	–0.0098	0.0068	0.0704*	0.0498*	0.0464*
	(0.0216)	(0.0153)	(0.0063)	(0.0230)	(0.0176)	(0.0080)
α₃	–0.0162	–0.0089	–0.0090	0.1096*	0.0740*	0.0420*
	(0.0241)	(0.0157)	(0.0061)	(0.0283)	(0.0195)	(0.0086)
α₄	–0.0609*	–0.0378*	–0.0069	0.0958*	0.0627*	0.0590*
	(0.0207)	(0.0135)	(0.0054)	(0.0273)	(0.0204)	(0.0098)
α₅	0.0073	–0.0035	–0.0115	0.1991*	0.1229*	0.0690*
	(0.0206)	(0.0144)	(0.0069)	(0.0305)	(0.0231)	(0.0117)
α₆	–0.0097	–0.0024	–0.0059	0.1986*	0.1348*	0.0766*
	(0.0207)	(0.0127)	(0.0067)	(0.0340)	(0.0226)	(0.0123)
α₇	–0.0593*	–0.0464*	–0.0074	0.1617*	0.0971*	0.0823*
	(0.0226)	(0.0154)	(0.0072)	(0.0364)	(0.0269)	(0.0135)
α₈	–0.0144	–0.0130	–0.0023	0.2161*	0.1491*	0.0963*
	(0.0204)	(0.0151)	(0.0070)	(0.0360)	(0.0277)	(0.0141)
α₉	–0.0209	–0.0076	–0.0120	0.2309*	0.1616*	0.0964*
	(0.0243)	(0.0149)	(0.0075)	(0.0388)	(0.0284)	(0.0137)
α₁₀	–0.0383	–0.0217	–0.0078	0.2324*	0.1658*	0.1068*
	(0.0206)	(0.0153)	(0.0071)	(0.0379)	(0.0287)	(0.0154)
	NPMLE			LRE
α₂	0.1790*	0.1157*	0.0703*	–0.0648*	–0.0460*	0.0088
	(0.0267)	(0.0184)	(0.0088)	(0.0298)	(0.0221)	(0.0106)
α₃	0.3039*	0.1880*	0.0871*	–0.0784	–0.0664*	–0.0070
	(0.0397)	(0.0239)	(0.0099)	(0.0446)	(0.0315)	(0.0136)
α₄	0.3730*	0.2298*	0.1181*	–0.1236*	–0.0942*	–0.0041
	(0.0466)	(0.0298)	(0.0120)	(0.0514)	(0.0387)	(0.0166)
α₅	0.5390*	0.3248*	0.1372*	–0.0554	–0.0605	–0.0093
	(0.0554)	(0.0343)	(0.0146)	(0.0599)	(0.0443)	(0.0203)
α₆	0.5848*	0.3649*	0.1573*	–0.0716	–0.0617	–0.0050
	(0.0583)	(0.0383)	(0.0151)	(0.0646)	(0.0496)	(0.0220)
α₇	0.5910*	0.3554*	0.1692*	–0.1230	–0.1079*	–0.0078
	(0.0646)	(0.0413)	(0.0170)	(0.0698)	(0.0530)	(0.0245)
α₈	0.6916*	0.4232*	0.1884*	–0.0844	–0.0792	–0.0042
	(0.0678)	(0.0429)	(0.0179)	(0.0782)	(0.0570)	(0.0258)
α₉	0.7346*	0.4594*	0.1918*	–0.0921	–0.0819	–0.0157
	(0.0734)	(0.0441)	(0.0191)	(0.0782)	(0.0578)	(0.0278)
α₁₀	0.7758*	0.48169*	0.2123*	–0.1230	–0.1038	–0.0117
	(0.0736)	(0.0486)	(0.0209)	(0.0803)	(0.0637)	(0.0309)

For sample size of 500 based on 93 experiments, because in seven experiments the estimation procedure did not convergence . ^*p<0.05.

Table A3

Average Bias, Standard error and RMSE of Estimates of Parameters of Piecewise Constant Baseline Hazard Across the Experiments, Second set of Monte Carlo experiments.

Duration dependence	Estimation method		Bias	Std error	RMSE
Positive duration dependence	MLE gamma	α₂	0.0069	0.0096	0.0118
Positive duration dependence		α₃	–0.0149	0.0206	0.0255
	NPMLE	α₂	0.0205	0.0157	0.0258
		α₃	0.0091	0.0283	0.0298
	LRE	α₂	–0.0130	0.0200	0.0238
		α₃	–0.0645	0.0329	0.0724
	LRE-opt	α₂	–0.0134	0.0195	0.0236
		α₃	–0.0533	0.327	0.0625
Negative duration dependence	MLE gamma	α₂	0.0211	0.0111	0.0239
		α₃	0.0553*	0.0229	0.0598
	NPMLE	α₂	0.0345*	0.0174	0.0386
		α₃	0.1079*	0.0310	0.1123
	LRE	α₂	0.0369*	0.0179	0.0410
		α₃	0.0643*	0.0315	0.0716
	LRE-opt	α₂	0.0358*	0.0178	0.0400
		α₃	0.0627*	0.0314	0.0701
U-shaped duration dependence	MLE gamma	α₂	–0.0009	0.0097	0.0097
		α₃	–0.0338*	0.0173	0.0379
	NPMLE	α₂	0.0385*	0.0155	0.0416
		α₃	0.0149	0.0251	0.0292
	LRE	α₂	0.0334	0.0186	0.0383
		α₃	–0.0215	0.0271	0.0346
	LRE-opt	α₂	0.0261	0.0183	0.0319
		α₃	–0.0247	0.0263	0.0361
Inverse U duration dependence	MLE gamma	α₂	0.0102	0.0104	0.0146
		α₃	–0.0047	0.0232	0.0237
	NPMLE	α₂	0.0232	0.0140	0.0271
		α₃	0.0327	0.0295	0.0440
	LRE	α₂	0.0335	0.0183	0.0381
		α₃	0.0400	0.0336	0.0522
	LRE-opt	α₂	0.0321	0.0182	0.0369
		α₃	0.0344	0.0336	0.0481

For each DGP (gamma mixture) 100 simulations with 1000 observations each. ^*p<0.05

Table A4

Average Bias, Standard error and RMSE of Estimates of Parameters of Piecewise Constant Baseline Hazard Across the Experiments, Second set of Monte Carlo Experiments, Censored Sample.

Duration dependence	Estimation method		Bias	Std error	RMSE
Positive duration dependence	MLE gamma	α₂	0.0010	0.0135	0.0135
Positive duration dependence	MLE gamma	α₃	–0.0267	0.0269	0.0379
	NPMLE	α₂	0.0120	0.0177	0.0213
		α₃	–0.0204	0.0310	0.0371
	LRE	α₂	–0.0148	0.0199	0.0248
		α₃	–0.0656^*	0.0329	0.0734
	LRE-opt	α₂	–0.0138	0.0199	0.0242
		α₃	–0.0599	0.0328	0.0683
Negative duration dependence	MLE gamma	α₂	0.0347*	0.0131	0.0371
Negative duration dependence	MLE gamma	α₃	0.0633*	0.0277	0.0691
	NPMLE	α₂	0.0417*	0.0184	0.0456
		α₃	0.0898*	0.0325	0.0956
	LRE	α₂	0.0378*	0.0182	0.0420
		α₃	0.0539	0.0329	0.0631
	LRE-opt	α₂	0.0375*	0.0181	0.0416
		α₃	0.0501	0.0327	0.0598
U-shaped duration dependence	MLE gamma	α₂	0.0052	0.0133	0.0143
		α₃	–0.0269	0.0225	0.0350
	NPMLE	α₂	0.0308	0.0173	0.0353
		α₃	–0.0159	0.0292	0.0333
	LRE	α₂	0.0266	0.0184	0.0323
		α₃	–0.0321	0.0254	0.0410
	LRE-opt	α₂	0.0263	0.0182	0.0320
		α₃	–0.0315	0.0253	0.0404
Inverse U duration dependence	MLE gamma	α₂	0.0137	0.0123	0.0184
		α₃	–0.0030	0.0263	0.0264
	NPMLE	α₂	0.0183	0.0149	0.0236
		α₃	0.0283	0.0305	0.0416
	LRE	α₂	0.0340	0.0185	0.0387
		α₃	0.0360	0.0335	0.0491
	LRE-opt	α₂	0.0313	0.0183	0.0363
		α₃	0.0290	0.0333	0.0441

For each DGP (gamma mixture) 100 simulations with 1000 observations each. ^*p<0.05

Appendix B: Proofs and Technical Details

Technical Details Section 2: A Counting Process Approach

The counting process approach is a very useful framework for analyzing duration data since an indicator can be used to denote whether a transition happened or not. Andersen et al. (1993) have provided an excellent survey of counting processes. Less technical surveys have been given by Klein and Moeschberger (1997), Therneau and Grambsch (2000), and Aalen et al. (2009). The main advantage of this framework is that it allows us to express the duration distribution as a regression model with an error term that is a martingale difference. Regression models with martingale difference errors are the basis for inference in time series models with dependent observations. Hence, it is not surprising that inference is much simplified by using a similar representation in duration models.

To start the discussion, we first introduce some notation. A counting process {N(t)|t≥0} is a stochastic process describing the number of events in the interval [0, t] as time proceeds. The process contains only jumps of size +1. For single duration data, the event can only occur once because the units are observed until the event occurs. Therefore we introduce the observation indicator Y(t)=I(T≥t) that is equal to one if the unit is under observation at time t and zero after the event has occurred. The counting process is governed by its random intensity process, Y(t)κ(t), where κ(t) is the hazard in (2). If we consider a small interval (t–dt] of length dt, then Y(t)κ(t) is the conditional probability that the increment dN(t)=N(t)–N(t–) jumps in that interval given all that has happened until just before t. By specifying the intensity as the product of this observation indicator and the hazard rate, we effectively limit the number of occurrences of the event to one. It is essential that the observation indicator only depends on events up to time t.

Usually we do not observe T directly. Instead we observe

with g a known function and C a random vector. The most common example is right censoring, where g(T, C)=min (T, C). By defining the observation indicator as the product of the indicator I(t≤T) and, if necessary, an indicator of the observation plan, we capture when a unit is at risk for the event. In the case of right censoring Y(t)=I(t≤T)I(t≤C), and in all cases of interest we have Y(t)=I(t≤T)I_A(t) with A a random set that may depend on random variables. We assume that C and T are conditionally independent given X. The history up to and including t, Y_h(t) is assumed to be a left continuous function of t. The history of the whole process also includes the history of the covariate process, X_h(t), and V. Thus, we have

The sample paths of the conditioning variables should be up to t–, but because these paths are left continuous we can take them up to t. A fundamental result in the theory of counting processes, the Doob-Meyer decomposition,⁹ allows us to write

where M(t), t≥0 is a martingale with conditional mean and variance given by

The (conditional) mean and variance of the counting process are equal, so the disturbances in (B.2) are heteroscedastic. The probability in (B.1) is zero, if the unit is no longer under observation. A counting process can be considered as a sequence of Bernoulli experiments because if dt is small, (B.3) and (6) give the mean and variance of a Bernoulli random variable. The relation between the counting process and the sequence of Bernoulli experiments given in (B.2) can be considered as a regression model with an additive error that is a martingale difference. This equation resembles a time-series regression model. The Doob-Meyer decomposition is very helpful to the derivation of the distribution of the estimators because the asymptotic behavior of partial sums of martingales is well-known.

Technical Details Section 3: Assumptions 1–4

To simplify the expressions, we use the notation h_i(t, θ)= h_i(t, X_h_,_I (t), θ).

The conditional distribution of T given X(‧) and V has hazard rate
with X(‧) a K covariate bounded stochastic process that is independent of V and such that if the probability of the event
some set S with positive measure and for some constants c₁, c₂, then c₁=c₂=0. For the baseline hazard, 0<lim_t↓₀λ(t, α₀)<∞.
For the covariate process X(t), t≥0, we assume that the sample paths are piecewise constant, i.e., its derivative with respect to t is 0 almost everywhere, and left continuous. The hazard that is not conditional on V is
The observation process is Y(t), t≥0 with Y(t)=I9(t≤T)I(t≤C) and we assume
The support of C is bounded.
The parameter vector θ=(β′, α′)′ is an M vector with β a K vector and α an L vector. The parameter space Θ is convex. The baseline hazard λ(t, α)>0 and is twice differentiable and the second derivative is bounded in α (in the parameter space) and t.
The weight function
is an M vector of bounded and left continuous functions. If
then there are functions μ(u, θ) (an M vector), V_β (u, s, θ) (an M×K matrix), and V_α (u, s, θ) (an M×L matrix) such that
and
and
Define

We assume that the M×M matrix [B(θ₀) A(θ₀)] is nonsingular.

The restriction on the baseline hazard in Assumption A1 ensures identification (see Section 3) and guarantees that the semiparametric information bound is nonsingular (see below). Assumption A2 states that the covariates and the observation indicator are predetermined. Assumption A4 is about smoothness: Suppose that one censors all the data at u=τ+ψ then the expressions in equation (30) and (31) do not change if the value of ψ varies. The derivation of the asymptotic distribution of the LR estimator follows the proof in Tsiatis (1990). Tsiatis requires that the density of U₀ is bounded. For the MPH model, this density is

If E(V)=∞, this density is not bounded at u₀=0. Inspection of Tsiatis’ proof shows that this does not change the result, and we do not need to impose the restriction that E(V) is finite. The transformed durations are observed up to τ with τ<∞ such that for some ψ,η>0

Pr[min (U₀, C) > τ+ψ]≥η.

In the MPH model, this is just an assumption on the distribution of C because for U₀ it is satisfied for all τ<∞.

Technical Details Section 4: Lemma 2–3

Lemma 2: If the derivative of κ is bounded on [0, τ] then for ε>0 with

and

we have

for u₁, u₂ with 0<u₁<u₂<τ.

If Y_h_,_N(t) is bounded away from zero on [0, τ] for large N, then (B.14) and (B.15) imply that if b_N=N^–^c for

then

Note that the uniform convergence holds on a compact subset of [0, τ]. Although this can be generalized to uniform convergence on [0, τ], the variable kernels that are needed for this generalization complicate the asymptotic analysis. In practice, estimation of the hazard is inaccurate near the endpoints, and it may be preferable to exclude observations that are close to the endpoints. Note that the observations near the endpoints are used in the estimation of the hazard. Also, using a bandwidth proportional to N^–1/5 and

satisfies all the assumptions of this paper.

We do not observe the transformed duration

but rather an estimate

of this transformed duration, and hence we consider the kernel estimator

Lemma 3: The kernel K is positive and bounded on [–1, 1] (and zero elsewhere) and satisfies a Lipschitz condition on this interval. The covariate process X(t) is bounded on [0, τ] and so is

for all α in an open neighborhood of α₀. Moreover

uniformly for 0≤u≤τ, θ∈N(θ₀) and H has derivatives that are bounded for 0≤u≤τ, θ∈N(θ₀). Then for ε>0 such that

we have

Proof: See below.

Note that the conditions on b_N are determined in Lemma 2 and that a bandwidth proportional to N^–1/5 and

satisfies all the assumptions of this paper. The fact that we use estimated transformed durations does not change the restrictions on the bandwidth choice.

At this point we consider the condition in (B.18) more closely. With

if the duration T is (right) censored at C, Y(t)=I(T≥t)I(C≥t), so

Y^U (u, θ)=I(h(T, θ)≥u)‧I(h(C, θ)≥u).

If the censoring time and the duration are conditionally independent given the history up to t, i.e.,

then

If N(θ₀) is an open neighborhood of θ₀, X_i and C_i are i.i.d., and

then

and by the uniform law of large numbers

uniformly for θ∈N(θ₀) and 0≤u≤τ. Because by (B.23) the limit is bounded away from zero, we have

uniformly for θ∈N(θ₀) and 0≤u≤τ with

Because h(T,θ₀)=U₀, (B.19) holds for θ=θ₀ if κ₀(u) is bounded for 0≤u≤τ. From the expression for κ_U (u, θ) in (9), a sufficient condition for κ_U(u, θ) to be bounded for all θ in a neighborhood of θ₀ and 0≤t≤τ is that λ(t, α)>0 for all t and on a neighborhood of α₀. In the same way, (B.20) holds if the hazard of C is bounded and λ(t, α) is bounded away from zero in a neighborhood around α₀.

Proof of Lemma 1

is a linearization of

Because S_N(θ) is not continuous in θ, it is not possible to linearize this function by a first order Taylor series expansion. Instead we linearize the hazard rate of the transformed durations U(θ). From (4) and (5) we obtain

This relates the hazard of the distribution of U(θ) to that of U₀

Because h(h^–1(u, θ), θ)=u, we have

The derivatives of κ_U(u, θ) with respect to θ are

where the last equality follows from a change of variables in the integral. In the same way, we obtain with a change of variable in the integral

The proof consists of checking the conditions for asymptotic linearity of S_N(θ) in Tsiatis (1990) and a computation of the coefficients in the linear approximation. In Tsiatis’ proof the covariate in the estimating equation is X_i. We have

and hence the requirement that this is a vector of bounded functions. The equations (9), (10) and (11) are stability conditions [see also Andersen et al. (1993)]. Instead of a mean and variance condition as in Tsiatis (1990), we have a mean and two covariance conditions. Note that by setting s=u, we obtain conditions for uniform convergence to V_α (u, u) and V_β (u, u). The final condition for linearization is that for u≤τ

The assumptions that λ(t,α) is bounded away from zero for all t≥0 and α in the parameter space, that

for all t≥0 and α in the parameter space, and that X(t) is bounded, imply that the second derivative of κ_U(u, θ) with respect to θ is bounded for all u≤τ and θ∈Θ. This is sufficient for (B.31) if the parameter space is convex.

Next we linearize S_N(θ). Because

we have if |θ–θ₀| is small

The second term is after substitution of (B.29), and (B.30)

The normalized vectors of coefficients converge to (B.12) and (B.13) if (B.10) and (11) hold. This proves the lemma.

Proof of Theorem 1

By van der Vaart (1998) Theorem 5.45, we have from Lemma 1

with M₀ the martingale associated with the counting process N₀ for U₀. By the central limit theorem for integrals of predetermined functions with respect to a martingale, [see e.g., Anderson et al. (1993)], the sum on the right-hand side converges to a normal distribution with the variance matrix in (24).

Proof of Lemma 2 and 3

We have

We first consider the second term. Because K is Lipschitz this is bounded by

Moreover by the mean value theorem, we have that for some intermediate

Because X_i(t) is bounded on [0, τ] and so is

for all α in an open neighborhood of α₀, (B.36) is bounded by

and substitution in (B.35) gives the upper bound

Because the estimator

consistent, the upper bound converges to 0 in probability if

Next we consider the first term in (B.34). By subtraction and addition of expected values, this term is bounded by

The first and second terms converge to 0 in probability if

Because of (B.18) the final term converges in probability to

This expression is bounded (both H and K are bounded) by

The first term goes to 0 in probability if

and the second if

This completes the proof.

References

Aalen, O. O., O. Borgan, and H. K. Gjessing. 2009. Survival and Event History Analysis. New York: Springer Verlag.10.1007/978-0-387-68560-1Search in Google Scholar

Amemiya, T. 1974. “The Nonlinear Two-Stage Least-Squares Estimator.” Journal of Econometrics 2: 105–110.10.1016/0304-4076(74)90033-5Search in Google Scholar

Amemiya, T. 1985. “Instrumental Variable Estimation for the Nonlinear Errors-in-Variables Model.” Journal of Econometrics 28: 273–289.10.1016/0304-4076(85)90001-6Search in Google Scholar

Andersen, P. K., O. Borgan, R. D. Gill, and N. Keiding. 1993. Statistical Models Based on Counting Processes. New York: Springer Verlag.10.1007/978-1-4612-4348-9Search in Google Scholar

Baker, M., and A. Melino. 2000. “Duration Dependence and Nonparametric Heterogeneity: A Monte Carlo Study.” Journal of Econometrics 96: 357–393.10.1016/S0304-4076(99)00064-0Search in Google Scholar

Bearse, P., J. Canals-Cerda, and P. Rilstone. 2007. “Efficient Semiparametric Estimation of Duration Models with Unobserved Heterogeneity.” Econometric Theory 23: 281–308.10.1017/S0266466607070120Search in Google Scholar

Bijwaard, G. E. 2009. “Instrumental Variable Estimation for Duration Data.” In Causal Analysis in Population Studies: Concepts, Methods, Applications, edited by H. Engelhardt, H.-P. Kohler, and A. Fürnkranz-Prskawetz, 111–148. New York: Springer Verlag.10.1007/978-1-4020-9967-0_6Search in Google Scholar

Bijwaard, G. E. 2010. “Immigrant Migration Dynamics Model for The Netherlands.” Journal of Population Economics 23: 1213–1247.10.1007/s00148-008-0228-1Search in Google Scholar

Bijwaard, G. E., and G. Ridder. 2005. “Correcting for Selective Compliance in a Re–employment Bonus Experiment.” Journal of Econometrics 125: 77–111.10.1016/j.jeconom.2004.04.004Search in Google Scholar

Bijwaard, G. E., C. Schluter, and J. Wahba. 2013. “The Impact of Labour Market Dynamics on the Return–Migration of Immigrants.” Review of Economics & Statistics, forthcoming.10.1162/REST_a_00389Search in Google Scholar

Chen, S. 2002. “Rank Estimation of Transformation Models.” Econometrica 70: 1683–1697.10.1111/1468-0262.00347Search in Google Scholar

Chiaporri, P. A., and B. Salanie. 2000. “Testing for Asymmetric Information in Insurance Markets.” Journal of Political Economy 108: 56–78.10.1086/262111Search in Google Scholar

Cox, D. R., and D. Oakes. 1984. Analysis of Survival Data. London: Chapman and Hall.Search in Google Scholar

Elbers, C., and G. Ridder. 1982. “True and Spurious Duration Dependence: The Identifiability of the Proportional Hazard Model.” Review of Economic Studies 49: 403–410.10.2307/2297364Search in Google Scholar

Feller, W. 1971. An Introduction to Probability Theory and its Applications. 3rd ed. John Wiley and Sons.Search in Google Scholar

Hahn, J. 1994. “The Efficiency Bound of the Mixed Proportional Hazard Model.” Review of Economic Studies 61: 607–629.10.2307/2297911Search in Google Scholar

Han, A. K. 1987. “Non–parametric Analysis of a Generalized Regression Model: The Maximum Rank Correlation Estimator.” Journal of Econometrics 35: 303–316.10.1016/0304-4076(87)90030-3Search in Google Scholar

Hausman, J. A., and T. Woutersen. 2005. “Estimating a Semi–Parametric Duration Model without Specifying Heterogeneity.” CeMMAP, working paper, CWP11/05.Search in Google Scholar

Heckman, J. J. 1991. “Identifying the Hand of the Past: Distinguishing State Dependence from Heterogeneity.” American Economic Review 81: 75–79.Search in Google Scholar

Heckman, J. J., and B. Singer. 1984a. “Econometric Duration Analysis.” Journal of Econometrics 24: 63–132.10.1016/0304-4076(84)90075-7Search in Google Scholar

Heckman, J. J., and B. Singer. 1984b. “A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data.” Econometrica 52: 271–320.10.2307/1911491Search in Google Scholar

Honoré, B. E. 1990. “Simple Estimation of a Duration Model with Unobserved Heterogeneity.” Econometrica 58: 453–473.10.2307/2938211Search in Google Scholar

Horowitz, J. L. 1996. “Semiparametric Estimation of a Regression Model with an Unknown Transformation of the Dependent Variable.” Econometrica 64: 103–137.10.2307/2171926Search in Google Scholar

Horowitz, J. L. 1999. “Semiparametric Estimation of a Proportional Hazard Model with Unobserved Heterogeneity.” Econometrica 67: 1001–1018.10.1111/1468-0262.00068Search in Google Scholar

Horowitz, J. L. 2001. The Bootstrap in Handbook of Econometrics, Vol. 5, edited by J. J. Heckman and E. Leamer. North-Holland: Amsterdam.Search in Google Scholar

Khan, S. 2001. “Two Stage Rank Estimation of Quantile Index Models.” Journal of Econometrics 100: 319–355.10.1016/S0304-4076(00)00040-3Search in Google Scholar

Khan, S., and E. Tamer. 2007. “Partial Rank Estimation of Duration Models with General forms of Censoring.” Journal of Econometrics 136: 251–280.10.1016/j.jeconom.2006.03.003Search in Google Scholar

Klein, J. P., and M. L. Moeschberger. 1997. Survival Analysis: Techniques for Censored and Truncated Data. New York: Springer Verlag.Search in Google Scholar

Lai, T. L., and Z. Ying. 1991. “Rank Regression Methods for Left–Truncated and Right-Censored Data.” Annals of Statistics 19: 531–556.10.1214/aos/1176348110Search in Google Scholar

Lancaster, T. 1976. “Redundancy, Unemployment and Manpower Policy: A Comment.” Economic Journal 86: 335–338.10.2307/2230754Search in Google Scholar

Lancaster, T. 1979. “Econometric Methods for the Duration of Unemployment.” Econometrica 47: 939–956.10.2307/1914140Search in Google Scholar

Lin, D. Y., and Z. Ying. 1995. “Semiparametric Inference for the Accelerated Life Model with Time-Dependent Covariates.” Journao of Statistical Planning and Inference 44: 47–63.10.1016/0378-3758(94)00039-XSearch in Google Scholar

Lindsay, B. G. 1983. “The Geometry of Mixture Likelihoods: A General Theory.” Annals of Statistics 11: 86–94.10.1214/aos/1176346059Search in Google Scholar

Manton, K. G., E. Stallard, and J. W. Vaupel. 1981. “Methods for the Mortality Experience of Heterogeneous Populations.” Demography 18: 389–410.10.2307/2061005Search in Google Scholar

Meyer, P. 1963. “Decomposition of Supermartingales: The Uniqueness Theorem.” Illinois Journal of Mathematics 7: 1–17.10.1215/ijm/1255637477Search in Google Scholar

Newey, W. K., and D. McFadden. 1994. “Large Sample Estimation and Hypothesis Testing.” In Handbook of Econometrics,Vol. 4, edited by R. F. Engle and D. MacFadden. North-Holland: Amsterdam.10.1016/S1573-4412(05)80005-4Search in Google Scholar

Powell, M. J. D. 1964. “An Efficient Method for Finding the Minimum of a Function of Several Variables without Calculating Derivatives.” The Computer Journal 7: 155–162.10.1093/comjnl/7.2.155Search in Google Scholar

Prentice, R. L. 1978. “Linear Rank Tests with Right Censored Data.” Biometrika 65: 167–179.10.1093/biomet/65.1.167Search in Google Scholar

Press, W. H., B. P. Flannert, S. A. Teukolsky, and W. T. Vetterling. 1986. Numerical Recipes: The Art of Scientific Computing. Cambridge: Cambridge University Press.10.1016/S0003-2670(00)82860-3Search in Google Scholar

Protter, P. 2005. Stochastic Integration and Differential Equations. New York: Springer Verlag, 107–113.Search in Google Scholar

Ramlau-Hansen, H. 1983. “Smoothing Counting Process Intensities by Means of Kernel Functions.” Annals of Statistics 11: 453–466.10.1214/aos/1176346152Search in Google Scholar

Ridder, G., and T. Woutersen. 2003. “The Singularity of the Efficiency Bound of the Mixed Proportional Hazard Model.” Econometrica 71: 1579–1589.10.1111/1468-0262.00460Search in Google Scholar

Robins, J. M., and A. A. Tsiatis. 1992. “Semiparametric Estimation of an Accelerated Failure Time Model with Time-Dependent Covariates.” Biometrika 79: 311–319.Search in Google Scholar

Sherman, R. P. 1993. “The Limiting Distribution of the Maximum Rank Correlation Estimator.” Econometrica 61: 123–137.10.2307/2951780Search in Google Scholar

Therneau, T., and P. Grambsch. 2000. Modeling Survival Data: Extending the Cox Model. New York: Springer Verlag.10.1007/978-1-4757-3294-8Search in Google Scholar

Tsiatis, A. A. 1990. “Estimating Regression Parameters using Linear Rank Tests for Censored Data.” Annals of Statistics 18: 354–372.10.1214/aos/1176347504Search in Google Scholar

van der Vaart, A. W. 1998. Asymptotic Statistics. Cambridge: Cambridge University Press.10.1017/CBO9780511802256Search in Google Scholar

Wooldridge, J. M. 2005. “Unobserved Heterogeneity and Estimation of Average Partial Effects.” In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, edited by D. W. K. Andrews and J. H. Stock, 27–55. Cambridge University Press.10.1017/CBO9780511614491.004Search in Google Scholar

Woutersen, T. 2000. Consistent Estimators for Panel Duration Data with Endogenous Censoring and Endogenous Regressors. Dissertation Brown University.Search in Google Scholar

Published Online: 2013-05-28

Published in Print: 2013-07-01

A Simple GMM Estimator for the Semiparametric Mixed Proportional Hazard Model

Abstract

Appendix A: Additional tables

Appendix B: Proofs and Technical Details

Technical Details Section 2: A Counting Process Approach

Technical Details Section 3: Assumptions 1–4

Technical Details Section 4: Lemma 2–3

Proof of Lemma 1

Proof of Theorem 1

Proof of Lemma 2 and 3

References

Journal and Issue

Articles in the same Issue