Abstract
Count data is often modeled using Poisson regression, although this probability model naturally restricts the conditional variance to be equal to the conditional mean (equidispersion property). While overdispersion has been intensively studied, there are few alternative models in the statistical literature for analyzing count data with underdispersion. The primary goal of this paper is to introduce a novel model based on Bernoulli-Poisson convolution for modelling count data that are underdispersed relative to the Poisson distribution. We study the statistical properties of the proposed model, and we provide a useful interpretation of the parameters. We consider a regression structure for both components based on a new parameterization indexed by mean and dispersion parameters. An expectation-maximization (EM) algorithm is proposed for parameter estimation and some diagnostic measures, based on the EM algorithm, are considered. Simulation studies are conducted to evaluate its finite sample performance. Finally, we illustrate the usefulness of the new regression model by an application.
Similar content being viewed by others
References
Bonat WH, Jørgensen B, Kokonendji CC, Hinde J, Demétrio CG (2017) Extended Poisson-Tweedie: properties and regression models for count data. Stat Model 18:24–49
Bourguignon M, Gallardo D (2020) Reparameterized inverse gamma regression models with varying precision. Stat Neerl 74:611–627
Bourguignon M, Leao J, Gallardo D (2020) Parametric modal regression with varying precision. Biometr J 62:202–220
Cameron A, Trivedi P (1998) Regression analysis of count data. Econometric society monographs. Cambridge University Press, Cambrige
Castillo J, Pérez-Casany M (1998) Weighted poisson distributions for overdispersion and underdispersion situations. Ann Inst Stat Math 50:567–585
Castillo J, Pérez-Casany M (2005) Overdispersed and underdispersed Poisson generalizations. Journal of Statistical Planning and Inference 134:486–500
Consul P, Famoye F (1992) Generalized Poisson regression model. Commun Stat Theory Methods 21:89–109
Cox F (1993) Modern parasitology: a textbook of parasitology. Blackwell Science, Oxford
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B (Methodol) 39:1–22
Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244
Efron B (1986) Double exponential families and their use in generalized linear regression. J Am Stat Assoc 81:709–721
Hardy I (2002) Sex ratios. Concepts and research methods. Cambridge University Press, Cambridge
Huang A (2017) Mean-parametrized Conway–Maxwell–Poisson regression models for dispersed counts. Stat Model 17:359–380
Johnson SG (2018) The NLopt nonlinear-optimization package. https://CRAN.R-project.org/package=nloptr, version 1.2.1
King G (1989) Variance specification in event count models: from restrictive assumptions to a generalized estimator. Am J Polit Sci 33:762–784
Nelder J, Wedderburn R (1972) Generalized linear models. J R Stat Soc Ser A 135:370–384
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ribeiro EE Jr, Zeviani WM, Bonat WH, Demetrio CG, Hinde J (2020) Reparametrization of COM-Poisson regression models with applications in the analysis of experimental data. Stat Model 20:443–466
Ridout MS, Besbeas P (2004) An empirical model for underdispersed count data. Stat Model 4:77–89
Rigby RA, Stasinopoulos D, Akantziliotou C (2008) A framework for modelling overdispersed count data, including the Poisson-shifted generalized inverse Gaussian distribution. Comput Stat Data Anal 53:381–393
Sáez-Castillo A, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61:148–157
Samaniego FJ (1976) A characterization of convoluted poisson distributions with applications to estimation. J Am Stat Assoc 71:475–479
Sellers KF, Morris DS (2017) Underdispersion models: models that are ‘under the radar’. Commun Stat Theory Methods 46:12075–12086
Sellers KF, Shmueli G (2010) A flexible regression model for count data. Ann Appl Stat 4:943–961
Sen P, Singer J, Pedroso-de Lima A (2010) From finite sample to asymptotic methods in statistics. Cambridge University Press, New York
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61:439–447
Weiß C (2013) Integer-valued autoregressive models for counts showing underdispersion. J Appl Stat 40:1931–1948
Winkelmann R (1995) Duration dependence and dispersion in count-data models. J Busin Econ Stat 13:467–474
Yazici B, Yolacan S (2007) A comparison of various tests of normality. J Stat Comput Simul 77:175–183
Zeviani WM, Ribeiro PJ Jr, Bonat WH, Shimakura SE, Muniz JA (2014) The Gamma-count distribution in the analysis of experimental underdispersed data. J Appl Stat 41:2616–2626
Zhu H, Lee S (2001) Local influence for incomplete data models. J R Stat Soc Ser B (Stat Methodol) 63:111–126
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this section, we provided explicit expression for the terms involved in Eq. (8) when \(g_1\) and \(g_2\) are the logarithmic and logit functions, respectively. The first derivative of the complete log-likelihood in relation to \({{\varvec{\theta }}}\) is given by
where
On the other hand, as \(\text {E}(T_i^2\mid {{\varvec{D}}}_{obs}; {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}})=\text {E}(T_i\mid {{\varvec{D}}}_{obs}; {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}})={\widetilde{t}}_i\), we have that
Finally, the expected value of the second derivative of the complete log-likelihood in relation to \({{\varvec{\theta }}}\) is given by
where
Rights and permissions
About this article
Cite this article
Bourguignon, M., Gallardo, D.I. & Medeiros, R.M.R. A simple and useful regression model for underdispersed count data based on Bernoulli–Poisson convolution. Stat Papers 63, 821–848 (2022). https://doi.org/10.1007/s00362-021-01253-0
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-021-01253-0