Skip to main content
Log in

A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

Quantile regression models are typically used for modeling non-Gaussian outcomes, and such models allow quantile-specific inference. While there exists a vast literature on conditional quantile regression (where the model parameters are estimated precisely for one prefixed quantile level), relatively less work has been reported on joint quantile regression. The challenge in joint quantile regression is to avoid quantile crossing while estimating multiple quantiles simultaneously. In this article, we propose a semi-parametric approach of handling non-Gaussian zero-inflated and incomplete longitudinal outcomes. We use a two-part model for handling the excess zeros, and propose a dynamic joint quantile regression model for the nonzero outcomes. A multinomial probit model is used for modeling the missingness. We develop a Bayesian joint estimation method where the model parameters are estimated through Markov Chain Monte Carlo. The unknown distribution of the outcome can be constructed based on the estimated quantiles. We analyze data from the health and retirement study and model the out-of-pocket medical expenditure through the proposed joint quantile regression method. Simulation studies are performed to assess the practical usefulness and efficiency of the proposed approach compared to the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Albert, J., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993)

    Article  MathSciNet  Google Scholar 

  • Alfo, M. et al.: M-quantile regression for multivariate longitudinal data: analysis of the Millennium Cohort Study data. arXiv:1612.08114 (2016)

  • Amemiya, T.: Tobit models: a survey. J. Econom. 24, 3–61 (1984)

    Article  MathSciNet  Google Scholar 

  • Bhuyan, P., Biswas, J., Ghosh, P., Das, K.: A Bayesian two-stage regression approach of analyzing longitudinal outcomes with endogeneity and incompleteness. Stat. Model. 19, 157–173 (2018)

    Article  Google Scholar 

  • Biswas, J., Das, K.: A Bayesian approach of analyzing semi-continuous longitudinal data with monotone missingness. Stat. Model. 20, 148–170 (2019)

    Article  Google Scholar 

  • Brown, S., Ghosh, P., Taylor, K.: Modelling household finances: a Bayesian approach to a multivariate two-part model. J. Empir. Finance 33, 190–207 (2015)

    Article  Google Scholar 

  • Duan, N., Manning, W., Morris, C., Newhouse, J.P.: A comparison of alternative models for the demand for medical care (Corr: V2 P413). J. Bus. Econ. Stat. 1, 115–126 (1983)

    Google Scholar 

  • Farewell, V.T., Long, D.L., Tom, B.D.M., Yiu, S., Su, L.: Two-part and related regression models for longitudinal data. Annu. Rev. Stat. Appl. 4, 283–315 (2017)

    Article  Google Scholar 

  • Geraci, M., Bottai, M.: Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8, 140–154 (2007)

    Article  Google Scholar 

  • Geraci, M., Bottai, M.: Linear quantile mixed models. Stat. Comput. 24, 461–479 (2014)

    Article  MathSciNet  Google Scholar 

  • Hall, D.: Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)

    Article  MathSciNet  Google Scholar 

  • Jang, W., Wang, H.: A semiparametric Bayesian approach for joint-quantile regression with clustered data. Comput. Stat. Data Anal. 84, 99–115 (2015)

    Article  MathSciNet  Google Scholar 

  • King, C., Song, J.J.: A Bayesian two-part quantile regression model for count data with excess zeros. Stat. Model. 19, 653–673 (2019)

    Article  MathSciNet  Google Scholar 

  • Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)

    Article  MathSciNet  Google Scholar 

  • Koenker, R.: Quantile regression for longitudinal data. J. Multivar. Anal. 91, 74–89 (2004)

    Article  MathSciNet  Google Scholar 

  • Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)

    Book  Google Scholar 

  • Kozumi, H., Kobayashi, G.: Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul. 81, 1565–1578 (2011)

    Article  MathSciNet  Google Scholar 

  • Kulkarni, H., Biswas, J., Das, K.: A joint quantile regression model for multiple longitudinal outcomes. Adv. Stat. Anal. 103, 453–473 (2018)

    Article  MathSciNet  Google Scholar 

  • Laird, N., Ware, J.: Random effects model for longitudinal data. Biometrics 38, 963–974 (1982)

    Article  Google Scholar 

  • Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14 (1992)

    Article  Google Scholar 

  • Mukherji, A., Roychoudhury, S., Ghosh, P., Brown, S.: Estimating health demand for an aging population: a flexible and robust Bayesian joint model. J. Appl. Econom. 31, 1140–1158 (2016)

    Article  MathSciNet  Google Scholar 

  • Olsen, M., Schafer, J.: A two-part random-effects model for semi-continuous longitudinal data. J. Am. Stat. Assoc. 96, 730–745 (2001)

    Article  Google Scholar 

  • Reich, B., Fuentes, M., Dunson, D.: Bayesian spatial quantile regression. J. Am. Stat. Assoc. 106, 6–20 (2011)

    Article  MathSciNet  Google Scholar 

  • Rodrigues, T., Fan, Y.: Regression adjustment for non-crossing Bayesian quantile regression. J. Comput. Graph. Stat. 26, 275–284 (2017)

    Article  Google Scholar 

  • Santos, B., Bolfarine, H.: Bayesian analysis for zero-or-one inflated proportion data using quantile regression. J. Stat. Comput. Simul. 85, 3579–3593 (2015)

    Article  MathSciNet  Google Scholar 

  • Santos, B., Bolfarine, H.: Bayesian quantile regression analysis for continuous data with a discrete component at zero. Stat. Model. 18, 73–93 (2018)

    Article  MathSciNet  Google Scholar 

  • Yu, K., Moyeed, R.: Bayesian quantile regression. Stat. Probab. Lett. 54, 437–447 (2001)

    Article  MathSciNet  Google Scholar 

  • Yuan, Y., Yin, G.: Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66, 105–114 (2010)

    Article  MathSciNet  Google Scholar 

  • Zhu, J., Santerre, R., Chang, X.-W.: A Bayesian method for linear, inequality-constrained adjustment and its application to GPS positioning. J. Geod. 78, 528–534 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kiranmoy Das.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Full conditional distributions and MCMC details

Appendix: Full conditional distributions and MCMC details

The prior distribution of the respective parameters is given as the following:

$$\begin{aligned} \varvec{\alpha }\sim N(\varvec{\mu _\alpha },\sigma ^2_\alpha I_{p^0\times p^0}) \end{aligned}$$

where \(p^0=(m_0+1)J+J'\).

The quantile regression parameters \(\varvec{\beta _{\tau }}\) have a truncated multivariate normal prior \(N(\varvec{0},D)\) subject to \(\varvec{x}^T\varvec{\beta _{\tau _1}}<\varvec{x}^T\varvec{\beta _{\tau _2}}<\cdots <\varvec{x}^T\varvec{\beta _{\tau _k}}\) and assume \(\pi (\varvec{\beta _{\tau }})\) is the respective prior density.

For other parameters, we have the following prior structures:

$$\begin{aligned} \varvec{\eta }\sim N(\varvec{\mu _\eta },\sigma ^2_\eta I),\; \sigma ^2_\epsilon \sim \mathrm{{IG}}(\alpha _\epsilon ,\beta _\epsilon ),\; \sigma ^2_d\sim \mathrm{{IG}}(\alpha _d,\beta _d)\; \mathrm{{and}}\; \Gamma \sim \mathrm{{IW}}(\nu ,\Psi ). \end{aligned}$$

Here, IG stands for inverse gamma and IW stands for inverse Wishart, respectively. Let \(\Gamma ^{-1}_{2\times 2}=((\gamma _{ij}))_{2\times 2}\) where \(i=1,2\) and \(j=1,2\).

We use the notation \(\varvec{X^m}\) for the design matrix corresponding to all the covariates in Eq. (8), where we model the missingness.

The full conditional distributions for the model parameters are given below:

$$\begin{aligned} \varvec{\alpha }\mid -\sim N(V_\alpha \varvec{v_\alpha },V_\alpha ) \end{aligned}$$
(20)

where \(V_\alpha =\left( \sum \nolimits _{i=1}^N\sum \nolimits _{t\in \S _i}\varvec{X_i(t)}\varvec{X_i(t)}^T+\frac{1}{\sigma ^2_\alpha } I_{p^0\times p^0} \right) ^{-1}\) and \(\varvec{v_\alpha }=\sum \nolimits _{i=1}^N\sum \nolimits _{t\in \S _i}\left( R_{\mathrm{{it}}}-b_i \right) \varvec{X_i(t)}+\frac{1}{\sigma ^2_\alpha }\varvec{\mu _\alpha }\).

$$\begin{aligned} \varvec{\beta _{\tau }}\mid - \propto \left( \prod \limits _{i=1}^N\prod \limits _{l=1}^{T_i}\hat{f_Y}(y_i(t_{\mathrm{{il}}})\mid \varvec{\beta _{\tau }}, -)\right) \pi (\varvec{\beta _{\tau }}) \end{aligned}$$
(21)
$$\begin{aligned} b_i\mid -\sim N(V_{b_i}v_{b_i},V_{b_i}) \end{aligned}$$
(22)

where \(V_{b_i}=\left( \sum \nolimits _{t\in \S _i}1+\frac{3T}{\sigma ^2_{\epsilon }}+\gamma _{11} \right) ^{-1}\) and \(v_{b_i}=\sum \nolimits _{t\in \S _i}\left( R_{\mathrm{{it}}}-\varvec{X_i(t)}^T\varvec{\alpha } \right) +\frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X_i^m(t)}^T\varvec{\eta }-d_{\mathrm{{it}}} \right) -\gamma _{12}c_i\).

$$\begin{aligned} c_i\mid - \propto \left( \prod \limits _{l=1}^{T_i}\hat{f_Y}(y_i(t_{\mathrm{{il}}})\mid c_i, -)\right) f(b_i,c_i\mid \Gamma ) \end{aligned}$$
(23)
$$\begin{aligned} \Gamma \mid - \sim \mathrm{{IW}}\left( N+\nu ,\Psi +\sum \limits _{i=1}^N (b_i,c_i)^T(b_i,c_i) \right) \end{aligned}$$
(24)
$$\begin{aligned} \varvec{\eta }\mid -\sim N(V_{\eta }\varvec{v_{\eta }},V_{\eta }) \end{aligned}$$
(25)

where \(V_{\eta }=\left( \frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{i=1}^N\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\varvec{X_i^m(t)}^T\varvec{X_i^m(t)}+\frac{1}{\sigma ^2_{\eta }}I_{\overline{J+J'}\times \overline{J+J'}} \right) ^{-1}\) and \(\varvec{v_{\eta }}=\frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{i=1}^N\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-b_i-d_{\mathrm{{it}}}\right) \varvec{X^m_i(t)}+\frac{1}{\sigma ^2_{\eta }}\varvec{\mu _{\eta }}\).

$$\begin{aligned} d_{\mathrm{{it}}}\mid - \sim N(V_{d_{\mathrm{{it}}}}v_{d_{\mathrm{{it}}}},V_{d_{\mathrm{{it}}}}) \end{aligned}$$
(26)

where \(V_{d_{\mathrm{{it}}}}=\left( \frac{3}{\sigma ^2_\epsilon }+\frac{1}{\sigma ^2_d}\right) ^{-1}\) and \(v_{d_{\mathrm{{it}}}}=\frac{1}{\sigma ^2_\epsilon }\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X^m_i(t)}^T\varvec{\eta }-b_i \right) \).

$$\begin{aligned} \sigma ^2_\epsilon \mid -\sim \mathrm{{IG}}\left( \frac{3}{2}\mathrm{{NT}}+\alpha _\epsilon ,\frac{1}{2}\sum \limits _{i=1}^N\sum \limits _{t=1}^{T}\sum \limits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X^m_i(t)}^T\varvec{\eta }-b_i-d_{\mathrm{{it}}} \right) ^2+\beta _\epsilon \right) \end{aligned}$$
(27)
$$\begin{aligned} \sigma ^2_d\mid -\sim \mathrm{{IG}}\left( \frac{1}{2}\mathrm{{NT}}+\alpha _d,\frac{1}{2}\sum \limits _{i=1}^N\sum \limits _{t=1}^Td^2_{\mathrm{{it}}}+\beta _d \right) \end{aligned}$$
(28)
$$\begin{aligned} R_{\mathrm{{it}}}\mid -\sim N(\varvec{X_i(t)}^T\varvec{\alpha }+b_i,1) \end{aligned}$$
(29)

which is truncated (at 0) by left if \(E_{\mathrm{{it}}}=0\) and truncated (at 0) by right if \(E_{\mathrm{{it}}}=1\).

$$\begin{aligned} \varvec{Z_{\mathrm{{it}}}}\mid - \sim N\left( \varvec{X^m_i(t)}^T\varvec{\eta }*\varvec{1}+b_i*\varvec{1}+d_{\mathrm{{it}}}*\varvec{1},\sigma ^2_\epsilon I_{3\times 3}\right) \end{aligned}$$
(30)

Quantile Smoothing: post-processing part

Note that we assume a multivariate normal likelihood for \(\varvec{\hat{\beta }}\), i.e., we assume \(\varvec{\hat{\beta }}\sim N(\varvec{\beta },\hat{\Sigma )}\), where \(\hat{\Sigma }\) is defined in Sect. 3. Also using Bernstein basis functions, we have \(\varvec{\beta }=\Omega \varvec{\delta }\). By considering independent uniform\((0,\infty )\) priors for the components of \(\varvec{\delta }\), the full conditional distribution of \(\varvec{\delta }\) is given as the following:

$$\begin{aligned} \varvec{\delta }\mid - \sim N\left( (\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\Omega ^{T}\hat{\Sigma }^{-1}\varvec{\hat{\beta }},(\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\right) \times I_{(\varvec{\delta } \ge 0)}. \end{aligned}$$
(31)

Note that above distribution is a truncated multivariate normal distribution. For generating from this, we sample from \(N\left( (\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\Omega ^{T}\hat{\Sigma }^{-1}\varvec{\hat{\beta }},(\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\right) \), and if the components are all nonnegative, we accept it, else we consider another sample. After generating 10,000 (accepted) samples, we discard the first 2,000 iterations and then compute sample mean based on the remaining 8,000 samples. \(\varvec{\delta }\) is estimated by this sample mean.

1.1 Proof of the statement in Sect. 3.1

Let \(\beta (\tau )=\sum \nolimits _{m=1}^MB_m(\tau )\alpha _m\) be a function of \(\tau \), where \(0<\tau <1\) and \(B_m(\tau )=\left( \begin{array}{c} M \\ m \end{array} \right) \tau ^m(1-\tau )^{M-m}\). The function \(\beta (\tau )\) is increasing if \(\alpha _m \ge \alpha _{m-1}\), for all \(m>1\) and \(\alpha _1>0\).

Proof

It is easy to see that \(B_m(\tau )\) is the pmf of a binomial random variable which have M trials with success probability= \(\tau \).

Let \(0<\tau _1<\tau _2<1\). Suppose we have two coins, say, coin 1 and coin 2 with probability of heads \(\tau _1\) and \(\tau _2,\) respectively. Now our event is out of M trials at least m number of heads occur, for \(m=1,2,\ldots ,M\). So the probability of occurring this event is more likely for coin 2 than for coin 1. Hence, we can write:

$$\begin{aligned}&B_M(\tau _2)>B_M(\tau _1), \\&B_M(\tau _2)+B_{M-1}(\tau _2)>B_M(\tau _1)+B_{M-1}(\tau _1), \\&\vdots \\&B_M(\tau _2)+B_{M-1}(\tau _2)+\ldots +B_1(\tau _2)>B_M(\tau _1)+B_{M-1}(\tau _1)+\cdots +B_1(\tau _1). \end{aligned}$$

We can also write:

$$\begin{aligned}&c_MB_M(\tau _2)>c_MB_M(\tau _1), \\&c_{M-1}B_M(\tau _2)+c_{M-1}B_{M-1}(\tau _2)>c_{M-1}B_M(\tau _1)+c_{M-1}B_{M-1}(\tau _1), \\&\vdots \\&c_1B_M(\tau _2)+c_1B_{M-1}(\tau _2)+\cdots +c_1B_1(\tau _2)>c_1B_M(\tau _1)+c_1B_{M-1}(\tau _1)+\ldots +c_1B_1(\tau _1), \end{aligned}$$

for \(c_i>0\), \(i=1,2,\ldots ,M\). From the second set of inequalities, we get:

$$\begin{aligned}&\left( \sum \limits _{m=1}^Mc_m\right) B_M(\tau _2)+\left( \sum \limits _{m=1}^{M-1}c_m\right) B_{M-1}(\tau _2)+\cdots +c_1B_1(\tau _2)>\left( \sum \limits _{m=1}^Mc_m\right) B_M(\tau _1) \\&\quad +\left( \sum \limits _{m=1}^{M-1}c_m\right) B_{M-1}(\tau _1)+\ldots +c_1B_1(\tau _1). \end{aligned}$$

Now define \(\sum \nolimits _{i=1}^mc_i=\alpha _m\), for all \(m=1,2,\ldots ,M\). Then, the above inequality is reduced to the following:

$$\begin{aligned} \sum \limits _{m=1}^MB_m(\tau _2)\alpha _m>\sum \limits _{m=1}^MB_m(\tau _1)\alpha _m, \end{aligned}$$

which is essentially same as \(\beta (\tau _2)>\beta (\tau _1)\). Here, clearly \(\alpha _m \ge \alpha _{m-1}\).

Hence, for any \(0<\tau _1<\tau _2<1\), we have \(\beta (\tau _1)<\beta (\tau _2)\), which implies that \(\beta (\tau )\uparrow \tau \) if \(\alpha _m \ge \alpha _{m-1}\), for all \(m>1\) and \(\alpha _1>0\) (i.e., \(c_1>0\)). This completes the proof.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Biswas, J., Ghosh, P. & Das, K. A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes. AStA Adv Stat Anal 104, 261–283 (2020). https://doi.org/10.1007/s10182-020-00362-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-020-00362-9

Keywords

Navigation