Abstract
Quantile regression models are typically used for modeling non-Gaussian outcomes, and such models allow quantile-specific inference. While there exists a vast literature on conditional quantile regression (where the model parameters are estimated precisely for one prefixed quantile level), relatively less work has been reported on joint quantile regression. The challenge in joint quantile regression is to avoid quantile crossing while estimating multiple quantiles simultaneously. In this article, we propose a semi-parametric approach of handling non-Gaussian zero-inflated and incomplete longitudinal outcomes. We use a two-part model for handling the excess zeros, and propose a dynamic joint quantile regression model for the nonzero outcomes. A multinomial probit model is used for modeling the missingness. We develop a Bayesian joint estimation method where the model parameters are estimated through Markov Chain Monte Carlo. The unknown distribution of the outcome can be constructed based on the estimated quantiles. We analyze data from the health and retirement study and model the out-of-pocket medical expenditure through the proposed joint quantile regression method. Simulation studies are performed to assess the practical usefulness and efficiency of the proposed approach compared to the existing methods.
Similar content being viewed by others
References
Albert, J., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993)
Alfo, M. et al.: M-quantile regression for multivariate longitudinal data: analysis of the Millennium Cohort Study data. arXiv:1612.08114 (2016)
Amemiya, T.: Tobit models: a survey. J. Econom. 24, 3–61 (1984)
Bhuyan, P., Biswas, J., Ghosh, P., Das, K.: A Bayesian two-stage regression approach of analyzing longitudinal outcomes with endogeneity and incompleteness. Stat. Model. 19, 157–173 (2018)
Biswas, J., Das, K.: A Bayesian approach of analyzing semi-continuous longitudinal data with monotone missingness. Stat. Model. 20, 148–170 (2019)
Brown, S., Ghosh, P., Taylor, K.: Modelling household finances: a Bayesian approach to a multivariate two-part model. J. Empir. Finance 33, 190–207 (2015)
Duan, N., Manning, W., Morris, C., Newhouse, J.P.: A comparison of alternative models for the demand for medical care (Corr: V2 P413). J. Bus. Econ. Stat. 1, 115–126 (1983)
Farewell, V.T., Long, D.L., Tom, B.D.M., Yiu, S., Su, L.: Two-part and related regression models for longitudinal data. Annu. Rev. Stat. Appl. 4, 283–315 (2017)
Geraci, M., Bottai, M.: Quantile regression for longitudinal data using the asymmetric Laplace distribution. Biostatistics 8, 140–154 (2007)
Geraci, M., Bottai, M.: Linear quantile mixed models. Stat. Comput. 24, 461–479 (2014)
Hall, D.: Zero-inflated Poisson and binomial regression with random effects: a case study. Biometrics 56, 1030–1039 (2000)
Jang, W., Wang, H.: A semiparametric Bayesian approach for joint-quantile regression with clustered data. Comput. Stat. Data Anal. 84, 99–115 (2015)
King, C., Song, J.J.: A Bayesian two-part quantile regression model for count data with excess zeros. Stat. Model. 19, 653–673 (2019)
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
Koenker, R.: Quantile regression for longitudinal data. J. Multivar. Anal. 91, 74–89 (2004)
Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)
Kozumi, H., Kobayashi, G.: Gibbs sampling methods for Bayesian quantile regression. J. Stat. Comput. Simul. 81, 1565–1578 (2011)
Kulkarni, H., Biswas, J., Das, K.: A joint quantile regression model for multiple longitudinal outcomes. Adv. Stat. Anal. 103, 453–473 (2018)
Laird, N., Ware, J.: Random effects model for longitudinal data. Biometrics 38, 963–974 (1982)
Lambert, D.: Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14 (1992)
Mukherji, A., Roychoudhury, S., Ghosh, P., Brown, S.: Estimating health demand for an aging population: a flexible and robust Bayesian joint model. J. Appl. Econom. 31, 1140–1158 (2016)
Olsen, M., Schafer, J.: A two-part random-effects model for semi-continuous longitudinal data. J. Am. Stat. Assoc. 96, 730–745 (2001)
Reich, B., Fuentes, M., Dunson, D.: Bayesian spatial quantile regression. J. Am. Stat. Assoc. 106, 6–20 (2011)
Rodrigues, T., Fan, Y.: Regression adjustment for non-crossing Bayesian quantile regression. J. Comput. Graph. Stat. 26, 275–284 (2017)
Santos, B., Bolfarine, H.: Bayesian analysis for zero-or-one inflated proportion data using quantile regression. J. Stat. Comput. Simul. 85, 3579–3593 (2015)
Santos, B., Bolfarine, H.: Bayesian quantile regression analysis for continuous data with a discrete component at zero. Stat. Model. 18, 73–93 (2018)
Yu, K., Moyeed, R.: Bayesian quantile regression. Stat. Probab. Lett. 54, 437–447 (2001)
Yuan, Y., Yin, G.: Bayesian quantile regression for longitudinal studies with nonignorable missing data. Biometrics 66, 105–114 (2010)
Zhu, J., Santerre, R., Chang, X.-W.: A Bayesian method for linear, inequality-constrained adjustment and its application to GPS positioning. J. Geod. 78, 528–534 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Full conditional distributions and MCMC details
Appendix: Full conditional distributions and MCMC details
The prior distribution of the respective parameters is given as the following:
where \(p^0=(m_0+1)J+J'\).
The quantile regression parameters \(\varvec{\beta _{\tau }}\) have a truncated multivariate normal prior \(N(\varvec{0},D)\) subject to \(\varvec{x}^T\varvec{\beta _{\tau _1}}<\varvec{x}^T\varvec{\beta _{\tau _2}}<\cdots <\varvec{x}^T\varvec{\beta _{\tau _k}}\) and assume \(\pi (\varvec{\beta _{\tau }})\) is the respective prior density.
For other parameters, we have the following prior structures:
Here, IG stands for inverse gamma and IW stands for inverse Wishart, respectively. Let \(\Gamma ^{-1}_{2\times 2}=((\gamma _{ij}))_{2\times 2}\) where \(i=1,2\) and \(j=1,2\).
We use the notation \(\varvec{X^m}\) for the design matrix corresponding to all the covariates in Eq. (8), where we model the missingness.
The full conditional distributions for the model parameters are given below:
where \(V_\alpha =\left( \sum \nolimits _{i=1}^N\sum \nolimits _{t\in \S _i}\varvec{X_i(t)}\varvec{X_i(t)}^T+\frac{1}{\sigma ^2_\alpha } I_{p^0\times p^0} \right) ^{-1}\) and \(\varvec{v_\alpha }=\sum \nolimits _{i=1}^N\sum \nolimits _{t\in \S _i}\left( R_{\mathrm{{it}}}-b_i \right) \varvec{X_i(t)}+\frac{1}{\sigma ^2_\alpha }\varvec{\mu _\alpha }\).
where \(V_{b_i}=\left( \sum \nolimits _{t\in \S _i}1+\frac{3T}{\sigma ^2_{\epsilon }}+\gamma _{11} \right) ^{-1}\) and \(v_{b_i}=\sum \nolimits _{t\in \S _i}\left( R_{\mathrm{{it}}}-\varvec{X_i(t)}^T\varvec{\alpha } \right) +\frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X_i^m(t)}^T\varvec{\eta }-d_{\mathrm{{it}}} \right) -\gamma _{12}c_i\).
where \(V_{\eta }=\left( \frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{i=1}^N\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\varvec{X_i^m(t)}^T\varvec{X_i^m(t)}+\frac{1}{\sigma ^2_{\eta }}I_{\overline{J+J'}\times \overline{J+J'}} \right) ^{-1}\) and \(\varvec{v_{\eta }}=\frac{1}{\sigma ^2_{\epsilon }}\sum \nolimits _{i=1}^N\sum \nolimits _{t=1}^T\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-b_i-d_{\mathrm{{it}}}\right) \varvec{X^m_i(t)}+\frac{1}{\sigma ^2_{\eta }}\varvec{\mu _{\eta }}\).
where \(V_{d_{\mathrm{{it}}}}=\left( \frac{3}{\sigma ^2_\epsilon }+\frac{1}{\sigma ^2_d}\right) ^{-1}\) and \(v_{d_{\mathrm{{it}}}}=\frac{1}{\sigma ^2_\epsilon }\sum \nolimits _{k=1}^3\left( Z_{{\mathrm{{it}}}k}-\varvec{X^m_i(t)}^T\varvec{\eta }-b_i \right) \).
which is truncated (at 0) by left if \(E_{\mathrm{{it}}}=0\) and truncated (at 0) by right if \(E_{\mathrm{{it}}}=1\).
Quantile Smoothing: post-processing part
Note that we assume a multivariate normal likelihood for \(\varvec{\hat{\beta }}\), i.e., we assume \(\varvec{\hat{\beta }}\sim N(\varvec{\beta },\hat{\Sigma )}\), where \(\hat{\Sigma }\) is defined in Sect. 3. Also using Bernstein basis functions, we have \(\varvec{\beta }=\Omega \varvec{\delta }\). By considering independent uniform\((0,\infty )\) priors for the components of \(\varvec{\delta }\), the full conditional distribution of \(\varvec{\delta }\) is given as the following:
Note that above distribution is a truncated multivariate normal distribution. For generating from this, we sample from \(N\left( (\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\Omega ^{T}\hat{\Sigma }^{-1}\varvec{\hat{\beta }},(\Omega ^{T}\hat{\Sigma }^{-1} \Omega )^{-1}\right) \), and if the components are all nonnegative, we accept it, else we consider another sample. After generating 10,000 (accepted) samples, we discard the first 2,000 iterations and then compute sample mean based on the remaining 8,000 samples. \(\varvec{\delta }\) is estimated by this sample mean.
1.1 Proof of the statement in Sect. 3.1
Let \(\beta (\tau )=\sum \nolimits _{m=1}^MB_m(\tau )\alpha _m\) be a function of \(\tau \), where \(0<\tau <1\) and \(B_m(\tau )=\left( \begin{array}{c} M \\ m \end{array} \right) \tau ^m(1-\tau )^{M-m}\). The function \(\beta (\tau )\) is increasing if \(\alpha _m \ge \alpha _{m-1}\), for all \(m>1\) and \(\alpha _1>0\).
Proof
It is easy to see that \(B_m(\tau )\) is the pmf of a binomial random variable which have M trials with success probability= \(\tau \).
Let \(0<\tau _1<\tau _2<1\). Suppose we have two coins, say, coin 1 and coin 2 with probability of heads \(\tau _1\) and \(\tau _2,\) respectively. Now our event is out of M trials at least m number of heads occur, for \(m=1,2,\ldots ,M\). So the probability of occurring this event is more likely for coin 2 than for coin 1. Hence, we can write:
We can also write:
for \(c_i>0\), \(i=1,2,\ldots ,M\). From the second set of inequalities, we get:
Now define \(\sum \nolimits _{i=1}^mc_i=\alpha _m\), for all \(m=1,2,\ldots ,M\). Then, the above inequality is reduced to the following:
which is essentially same as \(\beta (\tau _2)>\beta (\tau _1)\). Here, clearly \(\alpha _m \ge \alpha _{m-1}\).
Hence, for any \(0<\tau _1<\tau _2<1\), we have \(\beta (\tau _1)<\beta (\tau _2)\), which implies that \(\beta (\tau )\uparrow \tau \) if \(\alpha _m \ge \alpha _{m-1}\), for all \(m>1\) and \(\alpha _1>0\) (i.e., \(c_1>0\)). This completes the proof.
Rights and permissions
About this article
Cite this article
Biswas, J., Ghosh, P. & Das, K. A semi-parametric quantile regression approach to zero-inflated and incomplete longitudinal outcomes. AStA Adv Stat Anal 104, 261–283 (2020). https://doi.org/10.1007/s10182-020-00362-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-020-00362-9