Abstract
Count variables are often positively skewed and may include many zero observations, requiring specific statistical approaches. Interpreting abiotic factor changes in insect populations of crop pests, under this condition, can be difficult. The analysis becomes even more complicated because of possible temporal or spatial correlation, irregularly spaced data, heterogeneity over time, and zero inflation. Generalized additive models (GAM) are important tools to evaluate abiotic factors. Moreover, Markov chain Monte Carlo (MCMC) techniques can be used to fit a model that contains a temporal correlation structure, based on Bayesian statistics (BGAM). We compared methods of modeling the effects of temperature, precipitation, and time for the Brevicoryne brassicae (L.) population in Uberlândia, Brasil. We applied the proposed BGAM to the data, comparing this to the GAM model with and without autocorrelation for time, using the statistical programming language R. Analysis of deviance identified significant effects of the smoothers for precipitation and time on the frequentist models. With BGAM, the problem in variance estimations for precipitation and temperature from the previous models was solved. Furthermore, trace and density plots for population-level effects for all parameters converged well. The estimated smoothing curves showed a linear effect with an increase of precipitation, where lower precipitation indicated no presence of the aphid. The average temperature did not affect the aphid incidence. Autocorrelation was solved with ARMA structures, and the excess of zero was solved with zero-inflation models. The example of B. brassicae incidence showed how well abiotic (and biotic) factors can be modeled and analyzed using BGAM.
Similar content being viewed by others
References
Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Petrov BN, Csaki F (eds) Second International Symposium on Information Theory. Akademai Kiado, Budapest, pp 267–281
Atkins DC, Gallop RJ (2007) Rethinking how family researchers model infrequent outcomes: a tutorial on count regression and zero-inflated models. J Fam Psychol 21:726–735. https://doi.org/10.1037/0893-3200.21.4.726
Atkins DC, Baldwin SA, Zheng C, Gallop RJ, Neighbors C (2013) A tutorial on count regression and zero-altered count models for longitudinal substance use data. Psychol Addict Behav 27:166–177. https://doi.org/10.1037/a0029508
Beck N, Katz JN (1995) What to do (and not to do) with time-series cross-section data. Am Polit Sci Rev 89:634–647. https://doi.org/10.2307/2082979
Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis: forecasting and control. Holden-Day, San Francisco, p 500
Bürkner PC (2017) brms: an R package for Bayesian multilevel models using Stan. J Stat Softw 80:1–28. https://doi.org/10.18637/jss.v080.i01
Coxe S, West SG, Aiken LS (2009) The analysis of count data: a gentle introduction to Poisson regression and its alternatives. J Pers Assess 91:121–136. https://doi.org/10.1080/00223890802634175
Everaert G, Eschutter Y, Troch M, Colin RJ, Schamphelaere K (2018) Multimodel inference to quantify the relative importance of abiotic factors in the population dynamics of marine zooplankton. J Mar Syst 181:91–98. https://doi.org/10.1016/j.jmarsys.2018.02.009
Falk MG, O’leary R, Nayak M, Collins P, Choy SL (2015) A Bayesian hurdle model for analysis of an insect resistance monitoring database. Environ Ecol Stat 22:207–226. https://doi.org/10.1007/s10651-014-0294-3
Faraway JJ (2006) Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. Chapman and Hall, Florida, p 301
Feyrer F, Newman K, Nobriga M, Sommer T (2011) Modeling the effects of future outflow on the abiotic habitat of an imperiled estuarine fish. Estuar Coasts 34:120–128. https://doi.org/10.1007/s12237-010-9343-9
Fletcher D, Mackenzie D, Villouta E (2005) Modelling skewed data with many zeros: a simple approach combining ordinary and logistic regression. Environ Ecol Stat 12:45–54. https://doi.org/10.1007/s10651-005-6817-1
Frison KJ, Josephs O, Zarahn E, Holmes AP, Rouquette S, Poline JB (2000) To smooth or not to smooth? NeuroImage 12:196–208. https://doi.org/10.1006/nimg.2000.0609
Gelman A (1996) Inference and monitoring convergence. In: Wilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain Monte Carlo in practice, vol 1996. Chapman and Hall, London, pp 131–143
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–511
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian data analysis. Chapman and Hall, New York, p 675
Ghosh SK, Mukhopadhyay P, Lu JC (2006) Bayesian analysis of zero-inflated regression models. J Stat Plan Infer 136:1360–1375. https://doi.org/10.1016/j.jspi.2004.10.008
Griewank A, Walther A (2008) Evaluating derivatives: principles and techniques of algorithmic differentiation. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, p 438
Guisan A, Zimmerman NE (2000) Predictive habitat distribution models in ecology. Ecol Model 135:147–186
Hilbe JM (2011) Negative binomial regression. Cambridge University Press, New York, p 553. https://doi.org/10.1017/CBO9780511973420
Hoffman MD, Gelman A (2014) The no-u-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J Mach Learn Res 15:1593–1623
Horowitz AR, Ishaaya I (2004) Insect pest management. Springer-Verlag, Berlin Heidelberg, p 344. https://doi.org/10.1007/978-3-662-07913-3
Hughes RD, Gilbert NA (1968) Model of an aphid population-a general statement. J Anim Ecol 37:553–563
Ishwaran H, Rao JS (2005) Spike and slab variable selection: frequentist and Bayesian strategies. Ann Stat 33:730–773. https://doi.org/10.1214/009053604000001147
Kim YJ, Gu C (2004) Smoothing spline gaussian regression: more scalable computation via efficient approximation. J R Stat Soc 66:337–356
Lee Y, Nelder JA, Pawitan Y (2006) Generalized linear models with random effects. Chapman and Hall, New York, p 380
Martin TG, Wintle BA, Rhodes JR, Kuhnert PM, Field SA, Low-Choy SJ, Tyre AJ, Possingham HP (2005) Zero tolerance ecology: improving ecological inference by modeling the source of zero observation. Ecol Lett 8:1235–1246. https://doi.org/10.1111/j.1461-0248.2005.00826.x
Neal RM (1993) Probabilistic inference using Markov chain Monte Carlo methods. University of Toronto, Toronto, p 140
Neal RM (2003) Slice sampling. Ann Stat 31:705–741. https://doi.org/10.1214/aos/1056562461
Neal RM (2011) MCMC using Hamiltonian dynamics. In: Brooks S, Gelman A, Jones GL, Meng X (eds) Handbook of Markov chain Monte Carlo. CRC Press, Boston, pp 113–162
Neal DJ, Simons JS (2007) Inference in regression models of heavily skewed alcohol use data: a comparison of ordinary least squares, generalized linear models, and bootstrap resampling. Psychol Addict Behav 21:441–452. https://doi.org/10.1037/0893-164X.21.4.441
Parmesan C (2007) Influences of species, latitudes, and methodologies on estimates of phonological response to global warming. Glob Chang Biol 13:1860–1872. https://doi.org/10.1111/j.1365-2486.2007.01404.x
Pinheiro JC, Bates DM (2000) Mixed-effects models in S and S-PLUS. Springer, New York, p 530
Price PW, Denno RF, Eubanks MD, Finke DL, Kaplan I (2011) Insect ecology behavior, populations and communities. United Kingdom, Cambridge, p 791
R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, URL https://www.R-project.org/, version 3.5.0
Ramos MR, Oliveira MM, Borges JG, McDill ME (2015) Statistical models for categorical data: brief review for applications in ecology. AIP Conf Proc 1648:1. https://doi.org/10.1063/1.4913055
Sampaio MV, Korndörfer AP, Pujade-Villar J, Hubaide JEA, Ferreira SE, Arantes SO, Bortoletto DM, Guimarães CM, Sánchez-espigares JA, Caballero-López B (2017) Brassica aphid (Hemiptera: Aphididae) populations are conditioned by climatic variables and parasitism level: a study case of Triângulo Mineiro, Brazil. Bull Entomol Res 107:410–418. https://doi.org/10.1017/S0007485317000220
Stroup WW (2012) Generalized linear mixed models: modern concepts, methods and applications. CRC Press, Boca Raton, p 555
Stroup WW (2015) Rethinking the analysis of non-normal data in plant and soil science. Agron J 107:811–827. https://doi.org/10.2134/agronj2013.0342
Tarone AM, Foran DR (2008) Generalized additive models and Lucilia sericata growth: assessing confidence intervals and error rates in forensic entomology. J Forensic Sci 53:942–948. https://doi.org/10.1111/j.1556-4029.2008.00744.x
Thackray DJ, Diggle AJ, Berlandier FA, Jones RAC (2004) Forecasting aphid outbreaks and epidemics of cucumber mosaic virus in lupin crops in a Mediterranean type environment. Virus Res 100:67–82. https://doi.org/10.1016/j.virusres.2003.12.015
Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27:1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Warton DI (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16:275–289. https://doi.org/10.1002/env.702
Watanabe S (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J Mach Learn Res 11:3571–3594
Welsh AH, Cunningham RB, Chambers RL (2000) Methodology for estimating the abundance of rare animals: seabird nesting on North East Herald Cay. Biometrics 56:22–30. https://doi.org/10.1111/j.0006-341X.2000.00022.x
Whitney SK, Meehan TD, Kucharik CJ, Zhu J, Townsend PA, Hamilton K, Gratton C (2016) Explicit modeling of abiotic and landscape factors reveals precipitation and forests associated with aphid abundance. Ecol Appl 26:2600–2610. https://doi.org/10.1002/eap.1418
Wilson LT, Barnett WW (1983) Degree days: an aid in crop and pest management. Calif Agric 37:4–7
Winder L (1990) Predation of the cereal aphid Sitobion avenae by polyphagous predators on the ground. Ecol Entomol 15:105–110. https://doi.org/10.1111/j.1365-2311.1990.tb00789.x
Wood SN (2003) Thin plate regression splines. J R Stat Soc 65:95–114. https://doi.org/10.1111/1467-9868.00374
Wood SN (2017) Generalized additive models: an introduction with R. Chapman & Hall, New York, p 476. https://doi.org/10.1201/9781315370279
Yau KKW, Wang K, Lee AH (2006) Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros. Biom J 45:437–452. https://doi.org/10.1002/bimj.200390024
Zuur AF, Ieno EN, Smith GM (2007) Analysing ecological data. Springer, New York, p 680
Zuur AF, Ieno EN, Walker NJ, Saveliev AA, Smith GM (2009) Mixed effects models and extensions in ecology with R. Springer, New York, p 574
Author information
Authors and Affiliations
Contributions
FJC, DGS, and MVS planned the experimental work and wrote the manuscript. FJC and DGS designed and conducted data analyses.
Corresponding author
Additional information
Edited by Edison Ryoiti Sujii – Embrapa/CENARGEN
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(PDF 48 kb)
Rights and permissions
About this article
Cite this article
Carvalho, F.J., de Santana, D.G. & Sampaio, M.V. Modeling Overdispersion, Autocorrelation, and Zero-Inflated Count Data Via Generalized Additive Models and Bayesian Statistics in an Aphid Population Study. Neotrop Entomol 49, 40–51 (2020). https://doi.org/10.1007/s13744-019-00729-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13744-019-00729-x