Aggregating the response in time series regression models, applied to weather-related cardiovascular mortality
Graphical abstract
Introduction
In environmental epidemiology, studies on the health effect of various environmental exposures, often rely on regression models applied to time series data (Gasparrini and Armstrong, 2013). Environmental exposure variables include atmospheric pollutant levels and temperature, while health issues include various diseases (Barreca and Shimshack, 2012; Blangiardo et al., 2011; Braga et al., 2002; Knowlton et al., 2009; Martins et al., 2006; Nitschke et al., 2011; Szpiro et al., 2014; Yang et al., 2015). In this context, the exposure-response relationship is complex since, among other reasons, the effect of exposure on health issues lasts several days. This is why models have often used exposure windows under the form of moving averages (MA, e.g. Armstrong, 2006) or distributed lags (DL, Schwartz, 2000a). In particular, the latter has been extended to deal with nonlinear relationships (distributed lags nonlinear models, DLNM, Gasparrini et al., 2010) which is now widely used in weather-related health population studies (e.g. Phung et al., 2016; Vanos et al., 2015; Wu et al., 2013).
The health response, however, is almost always used directly as a daily time series. This could lead to several drawbacks in the regression models of environmental exposure on a health issue. First, the response to an exposure can also be spread across several days (Lipfert, 1993), which means that it would seem more realistic to consider a health time window in response to an associated exposure window. Second, health time series data used in epidemiologic studies are often noisy. The noise can conceal the true signal of the response to an exposure, especially in areas with small populations where the number of cases (mortality or morbidity) is low. Sources of noise include diverse organizational factors such as weekends and holidays (Suissa et al., 2014; Wong et al., 2009), slight changes in the definition of diseases (e.g. Antman et al., 2000) as well as behavioral and technological changes. In the end, the noise in the response can reduce the accuracy of the model and the conclusions (e.g. Todeschini et al., 2004).
In order to assess a more realistic relationship between an exposure and a health issue as well as reduce the noise impact in the health response, it is proposed to consider an aggregation window over time in the health response also, in addition to the exposure. More precisely, moving aggregation is considered here, i.e. the time step of data points in the obtained series remains the same, in opposition to aggregation where the time step of data points is reduced (e.g. from daily values to monthly values).
Aggregating the response series is expected to have two advantages: (1) better representing the spread of the health response to an exposure and (2) reducing the noise in the health series. Indeed, aggregated series are less sensitive to random perturbation in the data. An aggregated response should make regression models more robust to variations induced by noise, leading to more reliable relationship estimates. This idea is consistent with the results of Cristobal et al. (1987) in a non-time series context, which showed that pre-smoothing a response variable to remove noise leads to consistent estimates with low variance in linear regression. In a similar study, Sarmento et al. (2011) concluded that regression models are more robust to noise when both the response and the exposure are aggregated.
There have been few preceding cases of aggregated responses (Roberts, 2015; Sarmento et al., 2011; Schwartz, 2000b), but the regression models applied did not account for the specificities of an aggregated response. These specificities include the presence of extra autocorrelation in the residuals and a modification of their distribution. Therefore, the objective of the present paper is to introduce a general methodology dealing with an aggregated response. The methodology allows the use of a DLNM with an aggregated response and deals with the autocorrelation created by the aggregation. The exposure-response surface of a DLNM with aggregated response is then compared to the surface of a classical DLNM in order to assess the impact on the estimated relationship. In past studies, only the moving average (Roberts, 2015; Sarmento et al., 2011) and Loess (Schwartz, 2000b) have been considered to aggregate the response. In the present paper, other aggregations are considered, in particular Nadaraya-Watson kernel smoothing (Nadaraya, 1964; Watson, 1964) with different kernels including the Epanechnikov kernel (Epanechnikov, 1969) and an asymmetric kernel proposed in Michels (1992).
The paper is organized as follows. Section 2 introduces the proposed methodology for an aggregated response. Section 3 illustrates the methodology and its benefits by applying it on a weather-related cardiovascular mortality case. The methodology is first compared to models with a non-aggregated response and then, different aggregation strategies are compared. The results are discussed in Section 4 and the conclusions are presented in Section 5.
Section snippets
Methods
This section introduces the statistical methodology consisting in 1) performing a temporal aggregation on the response time series yt; and 2) modelling the aggregated response according to an exposure xt through a regression model.
Application and comparison
In Canada, cardiovascular diseases remain the main cause of mortality and put an increasing burden on the public health system (Wielgosz et al., 2009). It has already been shown that temperature affects cardiovascular mortality and morbidity (e.g. Bayentin et al., 2010; Bustinza et al., 2013; Masselot et al., 2018). Therefore, in order to efficiently organize private and public health service and mitigate the effect of temperature on cardiovascular diseases, it is important to understand every
Discussion
The CVD mortality and temperature data were used to compare DLNM without aggregated response to DLNM with aggregated response, with and without modelling the created temporal dependence. Results show that when the temporal dependence is not modelled, results are quite similar between aggregated response (model MA) and non-aggregated response (model C), although the former smooth the relationship. For the latter, it is important to note that the results and interpretation are very similar to
Conclusions
The present paper proposes to aggregate the health response in environmental epidemiology studies, in order to reduce the importance of noise in the health data. The proposed methodology consists in aggregating the response and then applying a time series regression model to account for the temporal dependence created by the aggregation. This model is general and therefore not limited to linear regression and allows the use of DLNMs. The proposed methodology is then applied to the practical
Acknowledgements
The authors are thankful to the Fonds Vert du Québec for funding this study and to the Institut national de santé publique du Québec for data access. The authors also wish to thank Jean-Xavier Giroux (INRS-ETE) for his help with database building, Yohann Chiu (INRS-ETE) for all his relevant comments during the project as well as two anonymous reviewers for their helpful comments in improving the quality of the paper. All the analyses were performed using the R software (R Core Team, 2015) with
References (65)
- et al.
Myocardial infarction redefined—a consensus document of The Joint European Society of Cardiology/American College of Cardiology committee for the redefinition of myocardial infarction
J. Am. Coll. Cardiol.
(2000) - et al.
On the use of cross-validation for time series predictor evaluation
Inf. Sci.
(2012) - et al.
Mortality risk attributable to high and low ambient temperature: a multicountry observational study
Lancet
(2015) - et al.
The short-term influence of temperature on daily mortality in the temperate climate of Montreal, Canada
Environ. Res.
(2011) - et al.
EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality
Sci. Total Environ.
(2018) A simple message for autocorrelation correctors: don't
J. Econ.
(1995)- et al.
The effects of high temperature on cardiovascular admissions in the most populous tropical city in Vietnam
Environ. Pollut.
(2016) Consistent cross-validatory model-selection for dependent data: hv-block cross-validation
J. Econ.
(2000)- et al.
Detecting “bad” regression models: multicriteria fitness functions in regression analysis
Anal. Chim. Acta
(2004) - et al.
Temperature–mortality relationship in four subtropical Chinese cities: a time-series study using a distributed lag non-linear model
Sci. Total Environ.
(2013)
Long-term variations in the association between ambient temperature and daily cardiovascular mortality in Shanghai, China
Sci. Total Environ.
On least squares and linear combination of observations
Proc. R. Soc. Edinb.
A new look at the statistical model identification
IEEE Trans. Autom. Control
Models for the relationship between ambient temperature and daily mortality
Epidemiology
Absolute humidity, temperature, and influenza mortality: 30 years of county-level evidence from the United States
Am. J. Epidemiol.
Spatial variability of climate effects on ischemic heart disease hospitalization rates for the period 1989–2006 in Quebec, Canada
Int. J. Health Geogr.
Probability and measure
A Bayesian analysis of the impact of air pollution episodes on cardio-respiratory hospital admissions in the Greater London area
Stat. Methods Med. Res.
Time series analysis: forecasting and control
The effect of weather on respiratory and cardiovascular deaths in 12 U.S. cities
Environ. Health Perspect.
The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity
Methods Ecol. Evol.
Multimodel inference: understanding AIC and BIC in model selection
Sociol. Methods Res.
Health impacts of the July 2010 heat wave in Quebec, Canada
BMC Public Health
A general and flexible methodology to define thresholds for heat health watch and warning systems, applied to the province of Québec (Canada)
Int. J. Biometeorol.
Mortality and morbidity peaks modeling: an extreme value theory approach
Stat. Methods Med. Res.
Understanding time-series regression estimators
Am. Stat.
Locally weighted regression: an approach to regression analysis by local fitting
J. Am. Stat. Assoc.
Application of least squares regression to relationships containing auto- correlated error terms
J. Am. Stat. Assoc.
A class of linear regression parameter estimators constructed by nonparametric estimation
Ann. Stat.
Ten Lectures on Wavelets
The potential impact of climate change on annual and seasonal mortality for three cities in Québec, Canada
Int. J. Health Geogr.
Non-parametric estimation of a multivariate probability density
Theory Probab. Appl.
Cited by (10)
Machine and deep learning for modelling heat-health relationships
2023, Science of the Total EnvironmentImpact of energy structure on carbon emission and economy of China in the scenario of carbon taxation
2021, Science of the Total EnvironmentCitation Excerpt :Grey model GM (1,1) is a method suitable for the short-term prediction of a small amount of data by multiple uncertain factors (He et al., 2019). Therefore, this paper processes the original data by this method and uses the ARMA model to predict the CO2 emissions of each sector referring to the approach proposed by Masselot et al. (2018). In scenario F-lower, the trends of GDP and CO2 in the future are predicted based on the planning requirements of the Chinese government.
Revisiting the importance of temperature, weather and air pollution variables in heat-mortality relationships with machine learning
2024, Environmental Science and Pollution ResearchInvestigation of the effects of temperature and relative humidity on the propagation of COVID-19 in different climatic zones
2023, Environmental Science and Pollution ResearchAssessing 100 biophysical indices performances in the Mediterranean basin using multi-satellite data
2023, International Journal of Remote Sensing