Comparison of land use regression and random forests models on estimating noise levels in five Canadian cities☆
Graphical abstract
Introduction
Chronic exposure to environmental noise on has been increasingly linked to adverse human health effects (Brown and van Kamp, 2017). Long-term exposure to environmental noise is likely to lead to sleep disturbance (Clark and Paunovic, 2018a; Guski et al., 2017; Shepherd et al., 2010), cardiovascular and metabolic diseases (van Kempen et al., 2018; Clark et al., 2017), adverse birth outcomes (Gehring et al., 2014) and cognitive and mental health issues (Clark and Paunovic, 2018b). An accurate assessment of the spatial variation in noise levels is of critical importance for the assessment of human health impacts associated with noise exposure, the management of environmental noise sources, and for the control and mitigation of adverse health outcomes associated with noise exposure. For example, the Environmental Noise Directive by the European Commission requires strategic noise mapping every five years in agglomerations with more than 100,000 residents (European Commission, 2002). However, at present no country in the world has established a population-wide noise monitoring network, even in North America and Western Europe where noise has been more extensively studied as an environmental issue (Héroux et al., 2015).
For the purpose of noise mapping, modelling has been a common approach to address gaps in noise measurement data. Propagation models and geo-statistical models (such as kriging) are two common approaches in early noise mapping studies. Propagation models, e.g., FHWA TNM (Barry and Reagan, 1978; Shu et al., 2007), SoundPlan (Oiamo et al., 2018), CoRTN (Gulliver et al., 2015), and ASJ RTN (Koyasu, 1978), use meteorological, land surface, and transportation variables to simulate acoustic reflection, diffraction, absorption, and transmission according to the physical mechanism of sound propagation and attenuation (Xie et al., 2011). Despite the relatively high accuracy for small areas with well-defined noise sources, laboratory-simulated noise levels deviate from actual measurements because it is not possible to fully characterize the types and distributions of noise sources, or to fully define features of the built environment that influence sound propagation (Cvetković et al., 2011). Geostatistical models start by placing portable noise sensors to obtain noise levels at individual geographic points. Based on the principle of spatial autocorrelation, these in situ measurements can be further interpolated and output as continuous surfaces (Aguilera et al., 2015). Using ordinary kriging, Tsai et al. (2009) and Harman et al. (2016) mapped noise levels for Taiwan and Isparta, Turkey, respectively. Kriging is limited by the density of points of the in situ noise measurements, but also because it does not consider the many geographical, environmental, and social factors with considerable influences on the spatial variation of noise. Thus, geo-statistical models for noise mapping are not purely used but usually jointly used with the consideration of emission sources from geographic and socioeconomic environment.
The application of land use regression (LUR) modelling is a relatively new approach to map the spatial distribution of noise levels. LUR modelling, initially developed for estimating traffic-related air pollution, is based on the principle that concentrations of pollutants at a given location depend on the environmental features of the surrounding area (Hoek et al., 2008). Several studies have used LUR modelling to estimate noise levels in Asian, North American, African, and European cities, and modelled values tend to show better agreement with noise measurements than the conventional kriging models (Xie et al., 2011; Chang et al., 2019; Harouvi et al., 2018; Ragettli et al., 2016; Sieber et al., 2017; Aguilera et al., 2015). Hybrid approaches that combine LUR with geo-statistical models (Ryu et al., 2017; Zuo et al., 2014) or noise propagation models (Oiamo et al., 2018) have also been developed in recent years.
A major limitation of the LUR modelling approach is the inability to capture the complex nonlinear relationships that exist between noise levels and the related characteristics of the built and social environment (predictor variables). The development of machine learning methods has shown utility in dealing with nonlinearity in assessing the relationship between characteristics of the built and social environment and noise levels in urban areas. For example, as a classic machine learning method, the artificial neural network (ANN) was used in early studies to assess traffic or construction noise (Cammarata et al., 1995; Hamoda, 2008; Givargis and Karimi, 2010). Parbat and Nagarnaik (2008) and Genaro et al. (2009) improved the ANN models with a multi-layer perception approach to estimate traffic noise levels for Yavatmal, India and Granada, Spain, obtaining higher model accuracy than linear regression models.
Random forests (RF) is a nonparametric decision tree-based machine learning algorithm (Breiman, 2001) that is useful in overcoming the occurrence of over-fitting common to the decision trees, artificial neural network, and other machine learning methods. The RF is robust, can handle multiple heterogeneous covariates, and has been successfully used to map population density (Gaughan et al., 2016; Stevens et al., 2015), soil properties (Guo et al., 2015; Hengl et al., 2015), and concentrations of air pollutants (Liu et al., 2018; Brokamp et al., 2017). Compared with LUR modelling, RF regression has several dominant advantages. First, the RF approach can well capture the complex non-linear interactions between the dependent and independent variables as so to achieve high model accuracy (Liu et al., 2018). Second, the RF model is non-additive which allows to optimal selection of predictors to establish best splits for regression and leads to improvement in model accuracy. Third, the RF model is robust to avoid outliers by constraining predictions to the scope of the training data (Craig and Huettmann, 2009). Mennitt et al. (2014) first utilized the RF model to estimate noise levels in national parks across the United States. To our knowledge, RF model has not been applied to estimate the spatial distribution of noise levels in urban environment.
The major objectives of this study were to 1) develop RF models to estimate noise levels for five Canadian cities and 2) compare the RF estimation results with those derived from LUR models. To achieve these objectives, we identified the best predictor combinations from thirty-three candidate variables using the leave one out cross-validation (CVone) method for the LUR model and the index for variable importance (i.e., percentage of increased mean squared error, the definition is described in section 3.3) for the RF model. Next, we developed RF and LUR models at the global (overall five cities) and the local (individual cities) scales and compared the accuracy of their estimations. Then, we examined the importance of each used predictor in developing the RF and LUR models for noise exposure estimation. Both global RF and LUR models were applied to compare noise levels at the residential postal code level for the five cities.
Section snippets
Study area
The study cities were selected for several reasons. First, noise measurements at a reasonable fine spatial scale are available for each city. Second, the cities are representative of regional variations in Canadian urban development approaches. Vancouver is located on the east coast, Toronto, Montreal, and Longueuil situate in central Canada, and Halifax sits on the eastern coast. Third, these cities have the greatest populations within each region. According to Statistics Canada (2017),
Data
The noise measurements for these five municipalities were retrieved from Ragettli et al. (2016) for Montreal (87 measurements in 2010 summer and 117 measurements in 2014 spring; a total of 29 repeated sampling sites in 2010 and 2014), Oiamo et al. (2018) for Toronto (217 measurements in 2016 summer and 54 measurements in 2018 winter; a total of 49 repeated sampling sites in 2016 and 2018), Rainham and Dummer (2011) for Halifax (48 two-week average measurements in fall 2010), and Davies et al.
Data preprocessing
Multiple buffers (i.e., 50 m, 100 m, 150 m, 200 m, 300 m, 400 m, 500 m, 600 m, 700 m, 800 m, 900 m, 1000 m, 1500 m, 2000 m, 2500 m, 3000 m, 3500 m, 4000 m, 4500 m, 5000 m) surrounding the individual noise monitoring sites were created. The sum lengths of traffic lines (i.e., local roads, major roads, highways, railways, and bus routes), the total numbers of bus stops, transitions, and intersections, POIs (i.e., cinemas, health centers, fire stations, education centers, and police stations), and
The best buffer and predictors
Table S3 displays the entire candidate predictors, used predictors and their best buffers in the LUR and the RF models at the global and the local scales. The RF model employed more predictors than the LUR model at the global scale (30 vs. 25) but used fewer predictors for the city models. Compared with the global models, fewer predictors were adopted in the local models. Similar predictors, e.g., NDVI, population density, traffic flow, distance to highway, number of intersections, sum length
Discussion
This study is the first attempt to incorporate a machine learning-based RF modelling approach for estimating the spatial distribution of noise in urban areas at such a large geographical scale. To assess the superiority of the RF approach, we developed the predictive models at the local level (each of the five cities) and global level (all of the five cities) and compared the outcomes with the traditional LUR model. The RMSEs, MAEs, and fitting R2s by cross-validation indicated the RF modelling
Limitations and conclusion
This current study has several limitations. First, we did not consider temporal variation in estimating the spatial variation of noise levels across the five cities. Second, noise measurements used in this study were derived from sampling campaigns that employed diverse measurement periods. For example, one-week measurement durations were used in Toronto, Montreal, and Longueuil while noise in Halifax was measured over a two-week period. Noise levels in Vancouver were based on short-term
Acknowledgements
This work was supported by the Canadian Urban Environmental Health Research Consortium grant funded by the Canadian Institutes of Health Research. The authors acknowledge Dr. Jeffrey Brook for his leadership in obtaining this grant that allowed this work to be performed.
References (72)
- et al.
The spatial relationship between traffic-generated air pollution and noise in 2 US cities
Environ. Res.
(2009) - et al.
Exposure assessment models for elemental components of particulate matter in an urban environment: a comparison of regression and random forest approaches
Atmos. Environ.
(2017) - et al.
A neural network architecture for noise prediction
Neural Netw.
(1995) - et al.
On the relation between NDVI, fractional vegetation cover, and leaf area index
Remote Sens. Environ.
(1997) - et al.
Application of land-use regression models to estimate sound pressure levels and frequency components of road traffic noise in Taichung, Taiwan
Environ. Int.
(2019) - et al.
Spatiotemporal patterns of PM10 concentrations over China during 2005–2016: a satellite-based estimation using the random forests approach
Environ. Pollut.
(2018) - et al.
Investigation of the noise reduction provided by tree belts
Landsc. Urban Plan.
(2003) - et al.
A basic neural traffic noise prediction model for Tehran’s roads
J. Environ. Manag.
(2010) - et al.
Development of an open-source road traffic noise model for exposure assessment
Environ. Model. Softw.
(2015) - et al.
Digital mapping of soil organic matter for rubber plantation at regional scale: an application of random forest plus residuals kriging approach
Geoderma
(2015)
Effectiveness of existing noise barriers: comparison between vegetation, concrete hollow block, and panel concrete
Procedia Environ. Sci.
Performance evaluation of IDW, Kriging and multiquadric interpolation methods in producing noise mapping: a case study at the city of Isparta, Turkey
Appl. Acoust.
A review of land-use regression models to assess spatial variation of outdoor air pollution
Atmos. Environ.
Satellite data regarding the eutrophication response to human activities in the plateau lake Dianchi in China from 1974 to 2009
Sci. Total Environ.
Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain
Environ. Pollut.
Monitoring economic development from space: using nighttime light and land cover data to measure economic growth
World Dev.
Road traffic noise attenuation by belts of trees
J. Sound Vib.
Improve ground-level PM2.5 concentration mapping using a random forests-based geostatistical approach
Environ. Pollut.
A combined emission and receptor-based approach to modelling environmental noise in urban environments
Environ. Pollut.
Spatial statistical analysis of the effects of urban form indicators on road-traffic noise exposure of a city in South Korea
Appl. Acoust.
Comparative evaluation of the ground reflection algorithm in FHWA Traffic Noise Model (TNM 2.5)
Appl. Acoust.
A critical review of some traffic noise prediction models
Appl. Acoust.
Noise mapping in urban environments: a Taiwan study
Appl. Acoust.
Automated, electric, or both? Investigating the effects of transportation and technology scenarios on metropolitan greenhouse gas emissions
Sustain. Cities Soc.
Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2. 5
Environ. Pollut.
Temporal and spatial variability of traffic-related noise in the City of Toronto, Canada
Sci. Total Environ.
Application of land use regression modelling to assess the spatial distribution of road traffic noise in three European cities
J. Expo. Sci. Environ. Epidemiol.
GIS based spatial noise impact analysis (SNIA) of the broadening of national highway in Sikkim Himalayas: a case study
AIMS Environ. Sci.
FHWA Highway Traffic Noise Prediction Model
Random forests
Mach. Learn.
WHO environmental noise guidelines for the European region: a systematic review of transport noise interventions and their impacts on health
Int. J. Environ. Res. Public Health
Sound and noise in urban parks
Using luminosity data as a proxy for economic statistics
Proc. Natl. Acad. Sci.
WHO Environmental noise guidelines for the European Region: a systematic review on environmental noise and quality of life, wellbeing and mental health
Int. J. Environ. Res. Public Health
WHO environmental noise guidelines for the European Region: a systematic review on environmental noise and cognition
Int. J. Environ. Res. Public Health
Association of long-term exposure to transportation noise and traffic-related air pollution with the incidence of diabetes: a prospective cohort study
Environ. Health Perspect.
Cited by (28)
Hybrid machine learning model for hourly ozone concentrations prediction and exposure risk assessment
2023, Atmospheric Pollution ResearchEnvironmental noise and health in low-middle-income-countries: A systematic review of epidemiological evidence
2023, Environmental PollutionThe Canadian Environmental Quality Index (Can-EQI): Development and calculation of an index to assess spatial variation of environmental quality in Canada's 30 largest cities
2022, Environment InternationalCitation Excerpt :The final selection of data included two air pollution datasets: fine particulate matter (PM2.5) and nitrogen dioxide (NO2); two natural environment datasets: the normalized difference vegetation index (NDVI) and distance to water bodies; two built environment datasets: length of highways and distance to coal, gas, and oil power plants; one UV radiation dataset; and two temperature datasets: the difference in average DA temperature and the overall city temperature during heat and cold wave events. Noise data were not included as estimates for noise levels were only available for five cities (Liu et al., 2020). We were not able to identify a dataset of water quality parameters at the municipal level for all the cities included in the study.
Spatial modelling and inequalities of environmental noise in Accra, Ghana
2022, Environmental ResearchCitation Excerpt :We additionally modelled the final predictor variable sets with Random Forest models as a sensitivity analysis for the choice of model infrastructure. Random Forest models have been shown previously to improve predictive accuracy over linear regression in a noise LUR study conducted in Canadian cities (Liu et al., 2020). We made predictions of annual average noise levels for each hour of the day for an ~50 m × 50 m surface of unmeasured locations in the GAMA.
Application of land use regression to map environmental noise in Shanghai, China
2022, Environment International
- ☆
This paper has been recommended for acceptance by Eddy Y. Zeng.