A simple and useful regression model for underdispersed count data based on Bernoulli–Poisson convolution

Bourguignon, Marcelo; Gallardo, Diego I.; Medeiros, Rodrigo M. R.

doi:10.1007/s00362-021-01253-0

A simple and useful regression model for underdispersed count data based on Bernoulli–Poisson convolution

Regular Article
Published: 17 August 2021

Volume 63, pages 821–848, (2022)
Cite this article

Statistical Papers Aims and scope Submit manuscript

899 Accesses
6 Citations
Explore all metrics

Abstract

Count data is often modeled using Poisson regression, although this probability model naturally restricts the conditional variance to be equal to the conditional mean (equidispersion property). While overdispersion has been intensively studied, there are few alternative models in the statistical literature for analyzing count data with underdispersion. The primary goal of this paper is to introduce a novel model based on Bernoulli-Poisson convolution for modelling count data that are underdispersed relative to the Poisson distribution. We study the statistical properties of the proposed model, and we provide a useful interpretation of the parameters. We consider a regression structure for both components based on a new parameterization indexed by mean and dispersion parameters. An expectation-maximization (EM) algorithm is proposed for parameter estimation and some diagnostic measures, based on the EM algorithm, are considered. Simulation studies are conducted to evaluate its finite sample performance. Finally, we illustrate the usefulness of the new regression model by an application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Flexible INAR(1) models for equidispersed, underdispersed or overdispersed counts

Article 31 August 2022

On Poisson-exponential-Tweedie models for ultra-overdispersed count data

Article 11 August 2020

A flexible distribution class for count data

Article Open access 26 September 2017

References

Bonat WH, Jørgensen B, Kokonendji CC, Hinde J, Demétrio CG (2017) Extended Poisson-Tweedie: properties and regression models for count data. Stat Model 18:24–49
Article MathSciNet Google Scholar
Bourguignon M, Gallardo D (2020) Reparameterized inverse gamma regression models with varying precision. Stat Neerl 74:611–627
Article MathSciNet Google Scholar
Bourguignon M, Leao J, Gallardo D (2020) Parametric modal regression with varying precision. Biometr J 62:202–220
Article MathSciNet Google Scholar
Cameron A, Trivedi P (1998) Regression analysis of count data. Econometric society monographs. Cambridge University Press, Cambrige
Book Google Scholar
Castillo J, Pérez-Casany M (1998) Weighted poisson distributions for overdispersion and underdispersion situations. Ann Inst Stat Math 50:567–585
Article MathSciNet Google Scholar
Castillo J, Pérez-Casany M (2005) Overdispersed and underdispersed Poisson generalizations. Journal of Statistical Planning and Inference 134:486–500
Article MathSciNet Google Scholar
Consul P, Famoye F (1992) Generalized Poisson regression model. Commun Stat Theory Methods 21:89–109
Article Google Scholar
Cox F (1993) Modern parasitology: a textbook of parasitology. Blackwell Science, Oxford
Book Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc Ser B (Methodol) 39:1–22
MathSciNet MATH Google Scholar
Dunn PK, Smyth GK (1996) Randomized quantile residuals. J Comput Graph Stat 5:236–244
Google Scholar
Efron B (1986) Double exponential families and their use in generalized linear regression. J Am Stat Assoc 81:709–721
Article MathSciNet Google Scholar
Hardy I (2002) Sex ratios. Concepts and research methods. Cambridge University Press, Cambridge
Book Google Scholar
Huang A (2017) Mean-parametrized Conway–Maxwell–Poisson regression models for dispersed counts. Stat Model 17:359–380
Article MathSciNet Google Scholar
Johnson SG (2018) The NLopt nonlinear-optimization package. https://CRAN.R-project.org/package=nloptr, version 1.2.1
King G (1989) Variance specification in event count models: from restrictive assumptions to a generalized estimator. Am J Polit Sci 33:762–784
Article Google Scholar
Nelder J, Wedderburn R (1972) Generalized linear models. J R Stat Soc Ser A 135:370–384
Article Google Scholar
R Core Team (2020) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ribeiro EE Jr, Zeviani WM, Bonat WH, Demetrio CG, Hinde J (2020) Reparametrization of COM-Poisson regression models with applications in the analysis of experimental data. Stat Model 20:443–466
Article MathSciNet Google Scholar
Ridout MS, Besbeas P (2004) An empirical model for underdispersed count data. Stat Model 4:77–89
Article MathSciNet Google Scholar
Rigby RA, Stasinopoulos D, Akantziliotou C (2008) A framework for modelling overdispersed count data, including the Poisson-shifted generalized inverse Gaussian distribution. Comput Stat Data Anal 53:381–393
Article MathSciNet Google Scholar
Sáez-Castillo A, Conde-Sánchez A (2013) A hyper-Poisson regression model for overdispersed and underdispersed count data. Comput Stat Data Anal 61:148–157
Article MathSciNet Google Scholar
Samaniego FJ (1976) A characterization of convoluted poisson distributions with applications to estimation. J Am Stat Assoc 71:475–479
Article MathSciNet Google Scholar
Sellers KF, Morris DS (2017) Underdispersion models: models that are ‘under the radar’. Commun Stat Theory Methods 46:12075–12086
Sellers KF, Shmueli G (2010) A flexible regression model for count data. Ann Appl Stat 4:943–961
Sen P, Singer J, Pedroso-de Lima A (2010) From finite sample to asymptotic methods in statistics. Cambridge University Press, New York
MATH Google Scholar
Wedderburn RWM (1974) Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61:439–447
MathSciNet MATH Google Scholar
Weiß C (2013) Integer-valued autoregressive models for counts showing underdispersion. J Appl Stat 40:1931–1948
Article MathSciNet Google Scholar
Winkelmann R (1995) Duration dependence and dispersion in count-data models. J Busin Econ Stat 13:467–474
MathSciNet Google Scholar
Yazici B, Yolacan S (2007) A comparison of various tests of normality. J Stat Comput Simul 77:175–183
Article MathSciNet Google Scholar
Zeviani WM, Ribeiro PJ Jr, Bonat WH, Shimakura SE, Muniz JA (2014) The Gamma-count distribution in the analysis of experimental underdispersed data. J Appl Stat 41:2616–2626
Article MathSciNet Google Scholar
Zhu H, Lee S (2001) Local influence for incomplete data models. J R Stat Soc Ser B (Stat Methodol) 63:111–126
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Estatística, Universidade Federal do Rio Grande do Norte, Natal, Brazil
Marcelo Bourguignon
Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó, Chile
Diego I. Gallardo
Departamento de Estatística, Universidade de São Paulo, Paulo, Brazil
Rodrigo M. R. Medeiros

Authors

Marcelo Bourguignon
View author publications
You can also search for this author in PubMed Google Scholar
Diego I. Gallardo
View author publications
You can also search for this author in PubMed Google Scholar
Rodrigo M. R. Medeiros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcelo Bourguignon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this section, we provided explicit expression for the terms involved in Eq. (8) when $g_1$ and $g_2$ are the logarithmic and logit functions, respectively. The first derivative of the complete log-likelihood in relation to ${{\varvec{\theta }}}$ is given by

$$\begin{aligned}&S_i({{\varvec{\theta }}})=\frac{\partial \ell _c({{\varvec{\theta }}})}{\partial {{\varvec{\theta }}}}=\left( \begin{array}{c} \displaystyle \sum _{i=1}^n {\mathbf {x}}_i^\top (a_i+b_i t_i) \\ \displaystyle \sum _{i=1}^n {\mathbf {z}}_i^\top (c_i+d_i t_i) \end{array} \right) \Rightarrow \text {E}\left( S_i({{\varvec{\theta }}})\mid {{\varvec{D}}}_{obs}; {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}}\right) \\&\quad \quad =\left( \begin{array}{c} \displaystyle \sum _{i=1}^n {\mathbf {x}}_i^\top (a_i+b_i {\widetilde{t}}_i) \\ \displaystyle \sum _{i=1}^n {\mathbf {z}}_i^\top (c_i+d_i {\widetilde{t}}_i) \end{array} \right) , \end{aligned}$$

where

$$\begin{aligned} a_i&= \mu _i \left( -1+\frac{y_i}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}\right. \\&\quad \left. -\frac{1}{2}\sqrt{\frac{1-\phi _i}{\mu _i}}\left\{ -1+\frac{y_i}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}+\frac{1}{1-\sqrt{\mu _i(1-\phi _i)}}\right\} \right) ,\\ b_i&=\mu _i \left( \frac{1}{2\mu _i}-\frac{1}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}\right. \\&\quad \left. +\frac{1}{2}\sqrt{\frac{1-\phi _i}{\mu _i}}\left\{ \frac{1}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}+\frac{1}{1-\sqrt{\mu _i(1-\phi _i)}}\right\} \right) ,\\ c_i&=\frac{1}{2}\phi _i\sqrt{\mu _i(1-\phi _i)}\left( -1+\frac{y_i}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}+\frac{1}{1-\sqrt{\mu _i(1-\phi _i)}}\right) , \quad \hbox {and}\\ d_i&=-\frac{1}{2}\phi _i(1-\phi _i)\\&\quad \left( \frac{1}{(1+\phi _i)}+\sqrt{\frac{\mu _i}{1-\phi _i}}\left\{ \frac{1}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}+\frac{1}{1-\sqrt{\mu _i(1-\phi _i)}}\right\} \right) . \end{aligned}$$

On the other hand, as $\text {E}(T_i^2\mid {{\varvec{D}}}_{obs}; {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}})=\text {E}(T_i\mid {{\varvec{D}}}_{obs}; {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}})={\widetilde{t}}_i$, we have that

$$\begin{aligned}&\text {E}\left( S_i({{\varvec{\theta }}})S_i^\top ({{\varvec{\theta }}})\mid {{\varvec{D}}}_{obs}; {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}}\right) \\&\quad =\left( \begin{array}{cc} a_i^\top a_i+{\widetilde{t}}_i(2a_ib_i^\top +b_ib_i^\top ) &{} a_i c_i^\top +{\widetilde{t}}_i (a_id_i+b_ic_i+b_id_i) \\ \cdot &{} c_i^\top c_i+{\widetilde{t}}_i(2c_id_i^\top +d_id_i^\top ) \\ \end{array} \right) . \end{aligned}$$

Finally, the expected value of the second derivative of the complete log-likelihood in relation to ${{\varvec{\theta }}}$ is given by

$$\begin{aligned} \text {E}\left( B_i({{\varvec{\theta }}})\mid {{\varvec{D}}}_{obs}, {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}}\right)&=\text {E}\left( \frac{\partial ^2 \ell _c({{\varvec{\theta }}})}{\partial {{\varvec{\theta }}}\partial {{\varvec{\theta }}}^\top }\mid {{\varvec{D}}}_{obs}, {{\varvec{\theta }}}=\widehat{{\varvec{\theta }}}\right) \\&=\left( \begin{array}{cc} \displaystyle \sum _{i=1}^n {\mathbf {x}}_i^\top {\mathbf {x}}_iB_{11i} &{} \displaystyle \sum _{i=1}^n {\mathbf {x}}_i^\top {\mathbf {z}}_i B_{12i}\\ \displaystyle \sum _{i=1}^n {\mathbf {z}}_i^\top {\mathbf {x}}_i B_{12i} &{} \displaystyle \sum _{i=1}^n {\mathbf {z}}_i^\top {\mathbf {z}}_i B_{22i} \end{array} \right) , \end{aligned}$$

where

$$\begin{aligned} B_{11i}&=a_i+b_i {\widetilde{t}}_i+\mu _i^2 \frac{\partial ^2 \ell _c({{\varvec{\theta }}})}{\partial \mu _i^2}, \qquad B_{12i}=\mu _i \phi _i(1-\phi _i)\frac{\partial ^2 \ell _c({{\varvec{\theta }}})}{\partial \mu _i \partial \phi _i},\\ B_{22i}&=(1-2\phi _i)(c_i+d_i {\widetilde{t}}_i)+\phi _i^2(1-\phi _i)^2\frac{\partial ^2 \ell _c({{\varvec{\theta }}})}{\partial \phi _i^2},\\ \frac{\partial ^2 \ell _c({{\varvec{\theta }}})}{\partial \mu _i^2}&=-\frac{{\widetilde{t}}_i}{2\mu _i^2}-\frac{(y_i-{\widetilde{t}}_i) \left( 1-\frac{1}{2}\sqrt{\frac{1-\phi _i}{\mu _i}}\right) }{ (\mu _i-\sqrt{\mu _i(1-\phi _i)})^2}\\&\quad +\frac{1}{2}\sqrt{\frac{1-\phi _i}{\mu _i}}\Bigg (\frac{1}{2\mu _i}\bigg \{-1+\frac{y_i-{\widetilde{t}}_i}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}\\&~~~~~+\frac{1-{\widetilde{t}}_i}{1-\sqrt{\mu _i(1-\phi _i)}}\bigg \} +\frac{(y_i-{\widetilde{t}}_i)\left( 1-\frac{1}{2}\sqrt{\frac{1-\phi _i}{\mu _i}}\right) }{(\mu _i-\sqrt{\mu _i(1-\phi _i)})^2}-\frac{(1-{\widetilde{t}}_i)\sqrt{\frac{1-\phi _i}{\mu _i}}}{(1-\sqrt{\mu _i(1-\phi _i)})^2}\Bigg ),\\ \frac{\partial ^2 \ell _c({{\varvec{\theta }}})}{\partial \phi _i^2}&=- \frac{{\widetilde{t}}_i}{2(1-\phi _i)^2}\\&\quad +\frac{1}{4}\sqrt{\frac{\mu _i}{1-\phi _i}}\Bigg (\frac{1}{(1-\phi _i)}\left\{ -1+\frac{y_i-{\widetilde{t}}_i}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}+\frac{1-{\widetilde{t}}_i}{1-\sqrt{\mu _i(1-\phi _i)}}\right\} \\&~~~~~-\frac{(y_i-{\widetilde{t}}_i)}{(\mu _i-\sqrt{\mu _i(1-\phi _i)})^2}-\frac{(1-{\widetilde{t}}_i)}{(1-\sqrt{\mu _i(1-\phi _i)})^2}\Bigg ), \quad \hbox {and}\\ \frac{\partial ^2 \ell _c({{\varvec{\theta }}})}{\partial \mu _i \partial \phi _i}&=\frac{1}{4\sqrt{\mu _i(1-\phi _i)}}\left\{ -1+\frac{y_i-{\widetilde{t}}_i}{\mu _i-\sqrt{\mu _i(1-\phi _i)}}+\frac{1-{\widetilde{t}}_i}{1-\sqrt{\mu _i(1-\phi _i)}}\right\} \\&~~~~~-\frac{1}{4}\frac{\mu _i}{(1-\phi _i)}\left\{ \frac{(y_i-{\widetilde{t}}_i)}{(\mu _i-\sqrt{\mu _i(1-\phi _i)})^2}+\frac{(1-{\widetilde{t}}_i)}{(1-\sqrt{\mu _i(1-\phi _i)})^2}\right\} . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bourguignon, M., Gallardo, D.I. & Medeiros, R.M.R. A simple and useful regression model for underdispersed count data based on Bernoulli–Poisson convolution. Stat Papers 63, 821–848 (2022). https://doi.org/10.1007/s00362-021-01253-0

Download citation

Received: 23 October 2020
Revised: 07 July 2021
Published: 17 August 2021
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00362-021-01253-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A simple and useful regression model for underdispersed count data based on Bernoulli–Poisson convolution

Abstract

Access this article

Similar content being viewed by others

Flexible INAR(1) models for equidispersed, underdispersed or overdispersed counts

On Poisson-exponential-Tweedie models for ultra-overdispersed count data

A flexible distribution class for count data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A simple and useful regression model for underdispersed count data based on Bernoulli–Poisson convolution

Abstract

Access this article

Similar content being viewed by others

Flexible INAR(1) models for equidispersed, underdispersed or overdispersed counts

On Poisson-exponential-Tweedie models for ultra-overdispersed count data

A flexible distribution class for count data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation