Elsevier

Science of The Total Environment

Volume 612, 15 January 2018, Pages 1018-1029
Science of The Total Environment

EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

https://doi.org/10.1016/j.scitotenv.2017.08.276Get rights and content

Highlights

  • EMD-regression aims at studying multi-scale relationships.

  • EMD-regression has better performances than commonly used models in epidemiology.

  • Traditional nonlinear relations are deeply understood with EMD-regression.

  • An increase of mortality is associated with the winter cold.

  • Humidity variations can cause excess cardiovascular mortality especially during spring and autumn.

Abstract

In a number of environmental studies, relationships between nat4ural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.

Introduction

In a number of scientific fields (e.g. hydrology, environmental health, ecology, etc.), it is of interest to understand the effect of one or several predictor variables on a response variable. The classical class of models for this purpose is regression analysis (e.g. Nelder and Wedderbum, 1972). However, the variables of interest are often represented by time series processes, which potentially leads to modelling and accuracy issues. The multi-scale nature of some time series processes found in applications such as climatology and public health is of special interest. Indeed, such time series are often non-stationary (i.e. the moments vary with time) and some dominant patterns in the time series (e.g. annual cycles) create a large amount of multicollinearity in the exposure time series when several covariates are considered. In a regression analysis, if the model does not take these issues into account, it can lead to an increase in the variability of parameter estimates, making the final result less reliable (e.g. Ventosa-Santaulària, 2009). This also increases the possibility of making the wrong conclusions concerning whether or not a predictor influences the response (i.e. the so-called “spurious regression” issue, see Granger and Newbold, 1974, Phillips, 1986, Hoover, 2003).

The present paper proposes to address the issue of multi-scale time series data in regression by decomposing the series into intrinsic mode functions (IMF) through the empirical mode decomposition algorithm (EMD, Huang et al., 1998). The obtained IMFs are the basic oscillation modes of time series data, and can be used as variables in a regression analysis. Therefore, the proposed method combines EMD and regression as illustrated in Table 1 and is hereby called “EMD-regression” (EMD-R). The proposed approach differs significantly from other methods commonly used to address the issue of non-stationarity, such as removing the trend and the seasonality (detrending and deseasonalisation), applying a difference operator, or adding a smooth time variable. The main difference lies in the fact that no information is removed from the data. Instead, EMD-R acts as a scan of the relationship over all time scales that are present in the data. This allows isolating the most important time scales for a better understanding of the relationship, and even unveiling signals that may be hidden by the dominant frequencies.

Transforming the data as a prior step to regression analysis has been commonly carried out in the literature, for instance through the use of principal components (e.g. Jolliffe, 1982). More adapted to time series data, a number of spectral decomposition approaches have also been suggested by a number of authors, such as the STL (Seasonal-Trend using Loess) algorithm (e.g. Schwartz, 2000b), Fourier decomposition (e.g. Dominici et al., 2003) or Wavelet transform (e.g. Kucuk and Agiralioglu, 2006, Kişi, 2009). The main advantage of EMD for the decomposition is that it is entirely data-adaptive (Huang et al., 1998). Therefore, the algorithm automatically determines the time scales that are present in the data, avoiding hence the a priori choice that is necessary in the STL algorithm (Cleveland et al., 1990) for instance. In addition, no predetermined function is used to perform the decomposition, unlike Fourier and Wavelet based decompositions. This allows EMD to decompose non-stationary and non-linear time series into a small number of components (Huang and Wu, 2008).

In addition to being widely applied directly in several fields such as geosciences (Huang and Wu, 2008) and mechanical engineering (Lei et al., 2013), the EMD algorithm has been successfully combined with other established statistical methods. For instance, Lee and Ouarda, 2010, Lee and Ouarda, 2011 combined EMD and k-nearest neighbour simulations to predict climatic oscillations. Chen et al. (2012) applied an artificial neural network to forecast the IMFs of a tourism demand series. Lee and Ouarda (2012a) also combined EMD and principal component analysis to separate meaningful signals from noise in climatic applications. EMD has also been used to study the relationship between two variables. For instance, Durocher et al. (2016) used a combination of EMD and cross-wavelet analysis to study the relationship between two time series. For the same purpose, Biswas and Si (2011) and then Hu and Si (2013) used EMD before computing correlation coefficients on the IMFs. A more general method is developed in Chen et al. (2010) to study the correlation between two time series through the use of EMD.

Combining EMD and linear regression has been performed by Yang et al., 2011a, Yang et al., 2011b. In a recent article, Qin et al. (2016) proposed to use the Lasso approach to select the more relevant IMFs in predicting the response series. The present work goes a step further by proposing a broader scope for the procedure and proposing a number of generalisations of the approach. In particular, the previous studies decomposed only one predictor series, while the present work does not limit itself to only one predictor. In addition, two models are proposed here, one of which decomposes the response series also, allowing its prediction in the frequency space to gain insights at hidden variation scales. A sensitivity score for predictor's IMFs is also described as an interpretation tool for practitioners. Finally, unlike the cited studies, a comparison to state of the art regression methods is provided.

The EMD-regression method basically consists in two steps: i) decomposing the time series into their IMFs through EMD, and ii) using the IMFs as variables in a regression analysis. More specifically, two different designs are introduced: a) only the predictors are decomposed and all their IMFs are used as alternative predictors (EMD-R1) such as in Qin et al. (2016) and b) both the response and predictors are decomposed and each response's IMF is modeled according to the predictors' IMF of the same order (EMD-R2). The new EMD-R2 procedure provides hence more details concerning the relationship between predictors and the response variable than the EMD-R1 procedure.

The present study is motivated by an application in weather-related health, which contains typical examples of multi-scale processes. Such studies often control the seasonality and trend by using a time variable in order to focus on the day-to-day variations in the health issue of interest (Bhaskaran et al., 2013). EMD-regression provides a tool for the assessment of the long term effects of climatic variables through the low frequency IMFs. This represents a major challenge for the planning of future of public health conditions (Xun et al., 2010) and for setting more appropriate public health alerts, especially under climate change conditions. It is hoped that the use of EMD-regression may also unveil hidden features of the weather-health relationship such as the influence of weather factors at non dominant time scales.

The present paper is organized as follows. The background material associated to the EMD-R methodology and the details of the EMD-R approach are introduced in Section 2. In Section 3, both EMD-R1 and EMD-R2 methods are applied to the weather-related cardiovascular issue in the census metropolitan area (CMA) of Montréal (Canada). Since the motivation context for the present study concerns weather-related health, the EMD-R methods are then compared to commonly used models in this type of study. The results of the application are then discussed in Section 4, and the conclusions are presented in Section 5.

Section snippets

EMD-regression (EMD-R)

The EMD-regression methodology aims at explaining the effects of covariates Xj on a response variable Y by: 1) decomposing the time series using EMD and 2) using the IMFs as new variables in a sparse regression model, namely the Lasso (least absolute shrinkage and selection operator, Tibshirani, 1996). The methodology is summarized in Fig. 1.

Application to weather-related cardiovascular mortality

The literature abounds with studies documenting the potentially harmful impacts of climate change. Among these impacts, it is expected to observe an increase in weather-related mortality. Cardiovascular diseases (CVD) are among the diseases that are most affected by climate change since they are impacted by extreme weather (e.g. Braga et al., 2002, Bustinza et al., 2013). CVD are already the main cause of mortality in Canada and could represent an increasing burden on the Canadian public health

Discussion

The results of the weather-related cardiovascular mortality presented in Section 3, already show one advantage of the EMD-R: its ability to display some hidden aspects of the relationship. In this case, the effect of humidity found during spring season and at very large time scales (i.e. periodicities of several years) is quite new in the field of environmental epidemiology. Indeed, no significant association between relative humidity and mortality has been found when studying as a variable of

Conclusion

The present paper introduces a general methodology for EMD-regression when dealing with time series data (and more generally all data with autocorrelation) often found in environmental sciences. The purpose of the EMD-R approach is to understand a relationship between variables from a different point a view, i.e. from a time scale point of view. This point of view acknowledges the complexity of many real-world time series which contain a significant amount of information in their variations.

Acknowledgements

The authors are thankful to the Fonds Vert du Québec for funding this study and to the Institut national de santé publique du Québec for data access. The authors also thank Jean-Xavier Giroux (INRS-ETE) for his help on the database establishing as well as Yohann Chiu (INRS-ETE) for all his relevant comments during the project. The authors are grateful to Scott Sheridan, the associate editor of Science of the total environment as well as three anonymous reviewers for their judicious comments and

References (77)

  • A.C. Yang et al.

    Decomposing the association of completed suicide with air pollution, weather, and unemployment data at different time scales

    J. Affect. Disord.

    (2011)
  • C. Yang et al.

    Long-term variations in the association between ambient temperature and daily cardiovascular mortality in Shanghai, China

    Sci. Total Environ.

    (2015)
  • L. Bayentin et al.

    Spatial variability of climate effects on ischemic heart disease hospitalization rates for the period 1989–2006 in Quebec, Canada

    Int. J. Health Geogr.

    (2010)
  • K. Bhaskaran et al.

    Time series regression studies in environmental epidemiology

    Int. J. Epidemiol.

    (2013)
  • A. Biswas et al.

    Revealing the controls of soil water storage at different scales in a hummocky landscape

    Soil Sci. Soc. Am. J.

    (2011)
  • A.L.F. Braga et al.

    The time course of weather-related deaths

    Epidemiology

    (2001)
  • A.L.F. Braga et al.

    The effect of weather on respiratory and cardiovascular deaths in 12 U.S. cities

    Environ. Health Perspect.

    (2002)
  • R.D. Brook et al.

    Particulate matter air pollution and cardiovascular disease

    Updat. Sci. Stat. Am. Heart Assoc.

    (2010)
  • R. Bustinza et al.

    Health impacts of the July 2010 heat wave in Quebec, Canada

    BMC Public Health

    (2013)
  • A. Chatterjee et al.

    Bootstrapping Lasso Estimators

    J. Am. Stat. Assoc.

    (2011)
  • F. Chebana et al.

    A general and flexible methodology to define thresholds for heat health watch and warning systems, applied to the province of Québec (Canada)

    Int. J. Biometeorol.

    (2012)
  • X. Chen et al.

    The time-dependent intrinsic correlation based on the empirical mode decomposition

    Adv. Adapt. Data Anal.

    (2010)
  • R.B. Cleveland et al.

    STL: a seasonal-trend decomposition procedure based on loess

    J. Off. Stat.

    (1990)
  • P. Craven et al.

    Smoothing noisy data with spline functions

    Numer. Math.

    (1978)
  • F. Dominici et al.

    Airborne particulate matter and mortality: timescale effects in four US cities

    Am. J. Epidemiol.

    (2003)
  • M. Durocher et al.

    Hybrid signal detection approach for hydro-meteorological variables combining EMD and cross-wavelet analysis

    Int. J. Climatol.

    (2016)
  • J. Friedman et al.

    The Elements of Statistical Learning

    (2009)
  • J. Friedman et al.

    Regularization paths for generalized linear models via coordinate descent

    J. Stat. Softw.

    (2010)
  • W.A. Fuller

    Measurement Error Models

    (2009)
  • A. Gasparrini et al.

    Distributed lag non-linear models

    Stat. Med.

    (2010)
  • B. Ghouse et al.

    Long-term projections of temperature, precipitation and soil moisture using non-stationary oscillation processes over the UAE region

    Int. J. Climatol.

    (2015)
  • J.-X. Giroux et al.

    Projet M1: comparaison de l'utilisation des moyennes spatiales à celle du krigeage, appliquée à la relation mortalité par MCV - météorologie, au Québec, de 1996 à 2007

  • D. Hammami et al.

    Predictor selection for downscaling GCM data with LASSO

    J. Geophys. Res.-Atmos.

    (2012)
  • T. Hastie et al.

    Generalized additive models

    Stat. Sci.

    (1986)
  • K.D. Hoover

    Nonstationary time series, cointegration, and the principle of the common cause

    Br. J. Philos. Sci.

    (2003)
  • N.E. Huang et al.

    A review on Hilbert-Huang transform: method and its applications to geophysical studies

    Rev. Geophys.

    (2008)
  • N.E. Huang et al.

    The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis

    Proc. R. Soc. London, Ser. A

    (1998)
  • N.E. Huang et al.

    A new view of nonlinear water waves: the Hilbert spectrum1

    Annu. Rev. Fluid Mech.

    (1999)
  • Cited by (0)

    View full text