A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes

Biswas, Jayabrata; Ghosh, Pulak; Das, Kiranmoy

doi:10.1007/s10182-020-00362-9

A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes

Original Paper
Published: 04 March 2020

Volume 104, pages 261–283, (2020)
Cite this article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

599 Accesses
7 Citations
Explore all metrics

Abstract

Quantile regression models are typically used for modeling non-Gaussian outcomes, and such models allow quantile-specific inference. While there exists a vast literature on conditional quantile regression (where the model parameters are estimated precisely for one prefixed quantile level), relatively less work has been reported on joint quantile regression. The challenge in joint quantile regression is to avoid quantile crossing while estimating multiple quantiles simultaneously. In this article, we propose a semi-parametric approach of handling non-Gaussian zero-inflated and incomplete longitudinal outcomes. We use a two-part model for handling the excess zeros, and propose a dynamic joint quantile regression model for the nonzero outcomes. A multinomial probit model is used for modeling the missingness. We develop a Bayesian joint estimation method where the model parameters are estimated through Markov Chain Monte Carlo. The unknown distribution of the outcome can be constructed based on the estimated quantiles. We analyze data from the health and retirement study and model the out-of-pocket medical expenditure through the proposed joint quantile regression method. Simulation studies are performed to assess the practical usefulness and efficiency of the proposed approach compared to the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

Ulrich Knief & Wolfgang Forstmeier

Estimating Transition Probabilities from Published Evidence: A Tutorial for Decision Modelers

Article 14 August 2020

Risha Gidwani & Louise B. Russell

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

References

Albert, J., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993)
Article MathSciNet Google Scholar
Alfo, M. et al.: M-quantile regression for multivariate longitudinal data: analysis of the Millennium Cohort Study data. arXiv:1612.08114 (2016)
Amemiya, T.: Tobit models: a survey. J. Econom. 24, 3–61 (1984)
Article MathSciNet Google Scholar
Bhuyan, P., Biswas, J., Ghosh, P., Das, K.: A Bayesian two-stage regression approach of analyzing longitudinal outcomes with endogeneity and incompleteness. Stat. Model. 19, 157–173 (2018)
Article Google Scholar
Biswas, J., Das, K.: A Bayesian approach of analyzing semi-continuous longitudinal data with monotone missingness. Stat. Model. 20, 148–170 (2019)
Article Google Scholar
Brown, S., Ghosh, P., Taylor, K.: Modelling household finances: a Bayesian approach to a multivariate two-part model. J. Empir. Finance 33, 190–207 (2015)
Article Google Scholar
Duan, N., Manning, W., Morris, C., Newhouse, J.P.: A comparison of alternative models for the demand for medical care (Corr: V2 P413). J. Bus. Econ. Stat. 1, 115–126 (1983)
Google Scholar
Farewell, V.T., Long, D.L., Tom, B.D.M., Yiu, S., Su, L.: Two-part and related regression models for longitudinal data. Annu. Rev. Stat. Appl. 4, 283–315 (2017)
Article Google Scholar
Geraci, M., Bottai, M.: Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8, 140–154 (2007)
Article Google Scholar
Geraci, M., Bottai, M.: Linear quantile mixed models. Stat. Comput. 24, 461–479 (2014)
Article MathSciNet Google Scholar
Hall, D.: Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)
Article MathSciNet Google Scholar
Jang, W., Wang, H.: A semiparametric Bayesian approach for joint-quantile regression with clustered data. Comput. Stat. Data Anal. 84, 99–115 (2015)
Article MathSciNet Google Scholar
King, C., Song, J.J.: A Bayesian two-part quantile regression model for count data with excess zeros. Stat. Model. 19, 653–673 (2019)
Article MathSciNet Google Scholar
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
Article MathSciNet Google Scholar
Koenker, R.: Quantile regression for longitudinal data. J. Multivar. Anal. 91, 74–89 (2004)
Article MathSciNet Google Scholar
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Book Google Scholar
Kozumi, H., Kobayashi, G.: Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul. 81, 1565–1578 (2011)
Article MathSciNet Google Scholar
Kulkarni, H., Biswas, J., Das, K.: A joint quantile regression model for multiple longitudinal outcomes. Adv. Stat. Anal. 103, 453–473 (2018)
Article MathSciNet Google Scholar
Laird, N., Ware, J.: Random effects model for longitudinal data. Biometrics 38, 963–974 (1982)
Article Google Scholar
Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14 (1992)
Article Google Scholar
Mukherji, A., Roychoudhury, S., Ghosh, P., Brown, S.: Estimating health demand for an aging population: a flexible and robust Bayesian joint model. J. Appl. Econom. 31, 1140–1158 (2016)
Article MathSciNet Google Scholar
Olsen, M., Schafer, J.: A two-part random-effects model for semi-continuous longitudinal data. J. Am. Stat. Assoc. 96, 730–745 (2001)
Article Google Scholar
Reich, B., Fuentes, M., Dunson, D.: Bayesian spatial quantile regression. J. Am. Stat. Assoc. 106, 6–20 (2011)
Article MathSciNet Google Scholar
Rodrigues, T., Fan, Y.: Regression adjustment for non-crossing Bayesian quantile regression. J. Comput. Graph. Stat. 26, 275–284 (2017)
Article Google Scholar
Santos, B., Bolfarine, H.: Bayesian analysis for zero-or-one inflated proportion data using quantile regression. J. Stat. Comput. Simul. 85, 3579–3593 (2015)
Article MathSciNet Google Scholar
Santos, B., Bolfarine, H.: Bayesian quantile regression analysis for continuous data with a discrete component at zero. Stat. Model. 18, 73–93 (2018)
Article MathSciNet Google Scholar
Yu, K., Moyeed, R.: Bayesian quantile regression. Stat. Probab. Lett. 54, 437–447 (2001)
Article MathSciNet Google Scholar
Yuan, Y., Yin, G.: Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66, 105–114 (2010)
Article MathSciNet Google Scholar
Zhu, J., Santerre, R., Chang, X.-W.: A Bayesian method for linear, inequality-constrained adjustment and its application to GPS positioning. J. Geod. 78, 528–534 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Interdisciplinary Statistical Research Unit, Applied Statistics Division, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India
Jayabrata Biswas & Kiranmoy Das
Department of Decision Sciences and Center of Public Policy, Indian Institute of Management, Bangalore, India
Pulak Ghosh

Authors

Jayabrata Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Pulak Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Kiranmoy Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kiranmoy Das.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Full conditional distributions and MCMC details

The prior distribution of the respective parameters is given as the following:

$$\begin{aligned} \varvec{\alpha }\sim N(\varvec{\mu _\alpha },\sigma ^2_\alpha I_{p^0\times p^0}) \end{aligned}$$

where $p^0=(m_0+1)J+J'$.

The quantile regression parameters $\varvec{\beta _{\tau }}$ have a truncated multivariate normal prior $N(\varvec{0},D)$ subject to $\varvec{x}^T\varvec{\beta _{\tau _1}}<\varvec{x}^T\varvec{\beta _{\tau _2}}<\cdots <\varvec{x}^T\varvec{\beta _{\tau _k}}$ and assume $\pi (\varvec{\beta _{\tau }})$ is the respective prior density.

For other parameters, we have the following prior structures:

$$\begin{aligned} \varvec{\eta }\sim N(\varvec{\mu _\eta },\sigma ^2_\eta I),\; \sigma ^2_\epsilon \sim \mathrm{{IG}}(\alpha _\epsilon ,\beta _\epsilon ),\; \sigma ^2_d\sim \mathrm{{IG}}(\alpha _d,\beta _d)\; \mathrm{{and}}\; \Gamma \sim \mathrm{{IW}}(\nu ,\Psi ). \end{aligned}$$

Here, IG stands for inverse gamma and IW stands for inverse Wishart, respectively. Let $\Gamma ^{-1}_{2\times 2}=((\gamma _{ij}))_{2\times 2}$ where $i=1,2$ and $j=1,2$.

We use the notation $\varvec{X^m}$ for the design matrix corresponding to all the covariates in Eq. (8), where we model the missingness.

The full conditional distributions for the model parameters are given below:

$$\begin{aligned} \varvec{\alpha }\mid -\sim N(V_\alpha \varvec{v_\alpha },V_\alpha ) \end{aligned}$$

(20)

where $V_\alpha =\left( \sum \nolimits _{i=1}^N\sum \nolimits _{t\in \S _i}\varvec{X_i(t)}\varvec{X_i(t)}^T+\frac{1}{\sigma ^2_\alpha } I_{p^0\times p^0} \right) ^{-1}$ and $\varvec{v_\alpha }=\sum \nolimits _{i=1}^N\sum \nolimits _{t\in \S _i}\left( R_{\mathrm{{it}}}-b_i \right) \varvec{X_i(t)}+\frac{1}{\sigma ^2_\alpha }\varvec{\mu _\alpha }$.

$$\begin{aligned} \varvec{\beta _{\tau }}\mid - \propto \left( \prod \limits _{i=1}^N\prod \limits _{l=1}^{T_i}\hat{f_Y}(y_i(t_{\mathrm{{il}}})\mid \varvec{\beta _{\tau }}, -)\right) \pi (\varvec{\beta _{\tau }}) \end{aligned}$$

(21)

$$\begin{aligned} b_i\mid -\sim N(V_{b_i}v_{b_i},V_{b_i}) \end{aligned}$$

(22)

where $V_{b_i}=\left( \sum \nolimits _{t\in \S _i}1+\frac{3T}{\sigma ^2_{\epsilon }}+\gamma _{11} \right) ^{-1}$ and $v_{b_i}=\sum \nolimits _{t\in \S _i}\left( R_{\mathrm{{it}}}-\varvec{X_i(t)}^T\varvec{\alpha } \right) +\frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X_i^m(t)}^T\varvec{\eta }-d_{\mathrm{{it}}} \right) -\gamma _{12}c_i$.

$$\begin{aligned} c_i\mid - \propto \left( \prod \limits _{l=1}^{T_i}\hat{f_Y}(y_i(t_{\mathrm{{il}}})\mid c_i, -)\right) f(b_i,c_i\mid \Gamma ) \end{aligned}$$

(23)

$$\begin{aligned} \Gamma \mid - \sim \mathrm{{IW}}\left( N+\nu ,\Psi +\sum \limits _{i=1}^N (b_i,c_i)^T(b_i,c_i) \right) \end{aligned}$$

(24)

$$\begin{aligned} \varvec{\eta }\mid -\sim N(V_{\eta }\varvec{v_{\eta }},V_{\eta }) \end{aligned}$$

(25)

where $V_{\eta }=\left( \frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{i=1}^N\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\varvec{X_i^m(t)}^T\varvec{X_i^m(t)}+\frac{1}{\sigma ^2_{\eta }}I_{\overline{J+J'}\times \overline{J+J'}} \right) ^{-1}$ and $\varvec{v_{\eta }}=\frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{i=1}^N\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-b_i-d_{\mathrm{{it}}}\right) \varvec{X^m_i(t)}+\frac{1}{\sigma ^2_{\eta }}\varvec{\mu _{\eta }}$.

$$\begin{aligned} d_{\mathrm{{it}}}\mid - \sim N(V_{d_{\mathrm{{it}}}}v_{d_{\mathrm{{it}}}},V_{d_{\mathrm{{it}}}}) \end{aligned}$$

(26)

where $V_{d_{\mathrm{{it}}}}=\left( \frac{3}{\sigma ^2_\epsilon }+\frac{1}{\sigma ^2_d}\right) ^{-1}$ and $v_{d_{\mathrm{{it}}}}=\frac{1}{\sigma ^2_\epsilon }\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X^m_i(t)}^T\varvec{\eta }-b_i \right) $.

$$\begin{aligned} \sigma ^2_\epsilon \mid -\sim \mathrm{{IG}}\left( \frac{3}{2}\mathrm{{NT}}+\alpha _\epsilon ,\frac{1}{2}\sum \limits _{i=1}^N\sum \limits _{t=1}^{T}\sum \limits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X^m_i(t)}^T\varvec{\eta }-b_i-d_{\mathrm{{it}}} \right) ^2+\beta _\epsilon \right) \end{aligned}$$

(27)

$$\begin{aligned} \sigma ^2_d\mid -\sim \mathrm{{IG}}\left( \frac{1}{2}\mathrm{{NT}}+\alpha _d,\frac{1}{2}\sum \limits _{i=1}^N\sum \limits _{t=1}^Td^2_{\mathrm{{it}}}+\beta _d \right) \end{aligned}$$

(28)

$$\begin{aligned} R_{\mathrm{{it}}}\mid -\sim N(\varvec{X_i(t)}^T\varvec{\alpha }+b_i,1) \end{aligned}$$

(29)

which is truncated (at 0) by left if $E_{\mathrm{{it}}}=0$ and truncated (at 0) by right if $E_{\mathrm{{it}}}=1$.

$$\begin{aligned} \varvec{Z_{\mathrm{{it}}}}\mid - \sim N\left( \varvec{X^m_i(t)}^T\varvec{\eta }*\varvec{1}+b_i*\varvec{1}+d_{\mathrm{{it}}}*\varvec{1},\sigma ^2_\epsilon I_{3\times 3}\right) \end{aligned}$$

(30)

Quantile Smoothing: post-processing part

Note that we assume a multivariate normal likelihood for $\varvec{\hat{\beta }}$, i.e., we assume $\varvec{\hat{\beta }}\sim N(\varvec{\beta },\hat{\Sigma )}$, where $\hat{\Sigma }$ is defined in Sect. 3. Also using Bernstein basis functions, we have $\varvec{\beta }=\Omega \varvec{\delta }$. By considering independent uniform$(0,\infty )$ priors for the components of $\varvec{\delta }$, the full conditional distribution of $\varvec{\delta }$ is given as the following:

$$\begin{aligned} \varvec{\delta }\mid - \sim N\left( (\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\Omega ^{T}\hat{\Sigma }^{-1}\varvec{\hat{\beta }},(\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\right) \times I_{(\varvec{\delta } \ge 0)}. \end{aligned}$$

(31)

Note that above distribution is a truncated multivariate normal distribution. For generating from this, we sample from $N\left( (\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\Omega ^{T}\hat{\Sigma }^{-1}\varvec{\hat{\beta }},(\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\right) $, and if the components are all nonnegative, we accept it, else we consider another sample. After generating 10,000 (accepted) samples, we discard the first 2,000 iterations and then compute sample mean based on the remaining 8,000 samples. $\varvec{\delta }$ is estimated by this sample mean.

1.1 Proof of the statement in Sect. 3.1

Let $\beta (\tau )=\sum \nolimits _{m=1}^MB_m(\tau )\alpha _m$ be a function of $\tau $, where $0<\tau <1$ and $B_m(\tau )=\left( \begin{array}{c} M \\ m \end{array} \right) \tau ^m(1-\tau )^{M-m}$. The function $\beta (\tau )$ is increasing if $\alpha _m \ge \alpha _{m-1}$, for all $m>1$ and $\alpha _1>0$.

Proof

It is easy to see that $B_m(\tau )$ is the pmf of a binomial random variable which have M trials with success probability= $\tau $.

Let $0<\tau _1<\tau _2<1$. Suppose we have two coins, say, coin 1 and coin 2 with probability of heads $\tau _1$ and $\tau _2,$ respectively. Now our event is out of M trials at least m number of heads occur, for $m=1,2,\ldots ,M$. So the probability of occurring this event is more likely for coin 2 than for coin 1. Hence, we can write:

$$\begin{aligned}&B_M(\tau _2)>B_M(\tau _1), \\&B_M(\tau _2)+B_{M-1}(\tau _2)>B_M(\tau _1)+B_{M-1}(\tau _1), \\&\vdots \\&B_M(\tau _2)+B_{M-1}(\tau _2)+\ldots +B_1(\tau _2)>B_M(\tau _1)+B_{M-1}(\tau _1)+\cdots +B_1(\tau _1). \end{aligned}$$

We can also write:

$$\begin{aligned}&c_MB_M(\tau _2)>c_MB_M(\tau _1), \\&c_{M-1}B_M(\tau _2)+c_{M-1}B_{M-1}(\tau _2)>c_{M-1}B_M(\tau _1)+c_{M-1}B_{M-1}(\tau _1), \\&\vdots \\&c_1B_M(\tau _2)+c_1B_{M-1}(\tau _2)+\cdots +c_1B_1(\tau _2)>c_1B_M(\tau _1)+c_1B_{M-1}(\tau _1)+\ldots +c_1B_1(\tau _1), \end{aligned}$$

for $c_i>0$, $i=1,2,\ldots ,M$. From the second set of inequalities, we get:

$$\begin{aligned}&\left( \sum \limits _{m=1}^Mc_m\right) B_M(\tau _2)+\left( \sum \limits _{m=1}^{M-1}c_m\right) B_{M-1}(\tau _2)+\cdots +c_1B_1(\tau _2)>\left( \sum \limits _{m=1}^Mc_m\right) B_M(\tau _1) \\&\quad +\left( \sum \limits _{m=1}^{M-1}c_m\right) B_{M-1}(\tau _1)+\ldots +c_1B_1(\tau _1). \end{aligned}$$

Now define $\sum \nolimits _{i=1}^mc_i=\alpha _m$, for all $m=1,2,\ldots ,M$. Then, the above inequality is reduced to the following:

$$\begin{aligned} \sum \limits _{m=1}^MB_m(\tau _2)\alpha _m>\sum \limits _{m=1}^MB_m(\tau _1)\alpha _m, \end{aligned}$$

which is essentially same as $\beta (\tau _2)>\beta (\tau _1)$. Here, clearly $\alpha _m \ge \alpha _{m-1}$.

Hence, for any $0<\tau _1<\tau _2<1$, we have $\beta (\tau _1)<\beta (\tau _2)$, which implies that $\beta (\tau )\uparrow \tau $ if $\alpha _m \ge \alpha _{m-1}$, for all $m>1$ and $\alpha _1>0$ (i.e., $c_1>0$). This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Biswas, J., Ghosh, P. & Das, K. A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes. AStA Adv Stat Anal 104, 261–283 (2020). https://doi.org/10.1007/s10182-020-00362-9

Download citation

Received: 06 March 2019
Accepted: 20 February 2020
Published: 04 March 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10182-020-00362-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Estimating Transition Probabilities from Published Evidence: A Tutorial for Decision Modelers

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Full conditional distributions and MCMC details

1.1 Proof of the statement in Sect. 3.1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

Estimating Transition Probabilities from Published Evidence: A Tutorial for Decision Modelers

Handling Missing Data in Principal Component Analysis Using Multiple Imputation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Full conditional distributions and MCMC details

Appendix: Full conditional distributions and MCMC details

1.1 Proof of the statement in Sect. 3.1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation