Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter November 18, 2017

Forecasting Air Quality Index Using an Ensemble of Artificial Neural Networks and Regression Models

  • S. Sankar Ganesh EMAIL logo , Pachaiyappan Arulmozhivarman and Rao Tatavarti

Abstract

Air is the most essential constituent for the sustenance of life on earth. The air we inhale has a tremendous impact on our health and well-being. Hence, it is always advisable to monitor the quality of air in our environment. To forecast the air quality index (AQI), artificial neural networks (ANNs) trained with conjugate gradient descent (CGD), such as multilayer perceptron (MLP), cascade forward neural network, Elman neural network, radial basis function (RBF) neural network, and nonlinear autoregressive model with exogenous input (NARX) along with regression models such as multiple linear regression (MLR) consisting of batch gradient descent (BGD), stochastic gradient descent (SGD), mini-BGD (MBGD) and CGD algorithms, and support vector regression (SVR), are implemented. In these models, the AQI is the dependent variable and the concentrations of NO2, CO, O3, PM2.5, SO2, and PM10 for the years 2010–2016 in Houston and Los Angeles are the independent variables. For the final forecast, several ensemble models of individual neural network predictors and individual regression predictors are presented. This proposed approach performs with the highest efficiency in terms of forecasting air quality index.

1 Introduction

The quality of air significantly contributes to the health of the inhabitants in a particular area. Inhaling polluted air can cause dreadful diseases such as lung cancer, stroke, and respiratory infections, especially in children. The long-term consequences of polluted air lead to global warming and greenhouse [15]. It is a major concern in most of the densely populated areas. New diseases are being diagnosed every day and more deaths are being reported every year due to air pollution. There are several reasons for the increase in the levels of air pollution, such as industrialization and globalization [8]. A number of monitoring sites have been set up globally to monitor the quality of air [4, 18]. Air quality monitoring using soft computing techniques has produced a tremendous research analysis in recent years.

All over the world, there are more deaths due to poor air quality compared to other sources [7]. The World Health Organization (WHO) states that there are about 2.4 million deaths that are directly attributed to air pollution [19]. The concept of the air quality index (AQI) has been proposed to address the purpose of measurement of air quality in an area. Several methods under artificial intelligence (AI) domain have been proposed for forecasting. In particular, computational intelligence methods, such as artificial neural networks (ANN), contribute a significant amount of accuracy in the prediction of air quality. These methods have been proven to be a success in various scenarios. The prescribed standards of the AQI in Houston and New York are given by the U.S. Environmental Protection Agency (EPA) (https://www3.epa.gov/airquality/cleanair.html) and are shown in Table 1.

Table 1:

Proposed Categories of AQI (0–500) by the US EPA.

RangeAQI category
0–50Good
51–100Moderate
101–150Unhealthy for sensitive groups
201–300Unhealthy
301–400Very unhealthy
401–500Hazardous

2 Materials and Methods

2.1 Data Preparation

Houston is the most populous city in Texas State, with a population count of about 2 million (https://en.wikipedia.org/wiki/Houston), and Los Angeles is the second most densely populated city (https://en.wikipedia.org/wiki/Los_Angeles) in the United States, with an estimated population of 4.04 million [11, 16]. The data related to the quality of air in Houston, including NO2, CO, O3, PM2.5, and SO2, and Los Angeles, including NO2, CO, O3, PM2.5, and PM10, for the years 2010–2016 were obtained from the US EPA. The Houston and Los Angeles data containing 2000 samples were divided into 1500 samples (75%) and 500 samples (25%) for training and testing each predictive model, respectively. The proposed approach uses the cascade forward neural network ensemble of individual neural networks and regression models and also the support vector regression (SVR) ensemble of individual neural networks and regression models. The base learners include neural networks such as multilayer perceptron (MLP), cascade forward neural network, Elman neural network, radial basis function (RBF) neural network, and nonlinear autoregressive model with exogenous input (NARX) and regression models such as multiple linear regression (MLR) with gradient descent variants as optimization algorithms and SVR. In the case of Houston, for each individual neural network predictor and regression model, the AQI is the dependent variable and the data related to the concentrations of NO2, CO, O3, SO2, and PM2.5 are the independent variables. The statistical measures of all data samples of Houston used in the current study are shown in Table 2. In the case of Los Angeles, for each individual neural network predictor and regression model, the AQI is the dependent variable and the data related to the concentrations of NO2, CO, O3, PM2.5, and PM10 are the independent variables. The statistical measures of all data samples of Los Angeles used in the current study are shown in Table 3. The performance of each individual base learner and ensembles is evaluated using the error indexes such as mean absolute error (MAE), mean absolute percent error (MAPE), correlation coefficient (R), root mean square error (RMSE), and index of agreement (IA). The performance of the testing data is evaluated, as it represents the accuracy of each predictor.

Table 2:

Statistical Measures of Houston Data Samples.

VariableUnitRangeMeanSD
NO2ppb6.36–132.7132.7013.95
COppm0.088–2.4640.530.277
O3ppm0.009–0.1210.040.01
SO2ppb0.009–0.1216.868.98
PM2.5μg/m33.84–45.6513.005.00
AQINA18–21760.3327
Table 3:

Statistical Measures of Los Angeles Data Samples.

VariableUnitRangeMeanSD
NO2ppb12.72–132.7144.5114.75
COppm0.264–4.71.140.65
O3ppm0.019–0.1220.050.01
PM2.5μg/m34.56–94.2321.279.10
PM10μg/m36.48–131.7737.1713.58
AQINA35–21889.5735.55

2.2 Gradient Descent Variants: Conjugate Gradient Descent (CGD)

The CGD optimization algorithm [12] uses the line search method in the conjugate directions to achieve faster convergence. In this method, the step size or the learning rate is calculated after each iteration, which is known as the adaptive learning rate. The steps involved in the CGD optimization are given as follows:

Say for a quadratic test function, ϕ(x)=12xTAxxTb, where A is assumed to be a symmetric positive definite (SPD) matrix. Minimizing this quadratic test function is equivalent to solving Ax=b.

Compute r0=Ax0b and set p0=−r0.

For k=0, 1, 2, … until convergence

(1)ηk=rkTrkpkTApk
(2)xk+1=xk+ηkpk
(3)rk+1=rk+ηkApk
(4)βk=rk+1Trk+1rkTrk
(5)pk+1=rk+1+βkrk

End

The first step in this method involves calculating the initial residual r0 or initial conjugate gradient direction equal to the steepest descent direction, which is the negative gradient of the function. The formula (1) gives the calculation of the optimal step length, which represents the distance that needs to be taken along the conjugate direction until it no longer descends. Using formula (2), we update the solution by adding the step to minimize the quadratic function ϕ along xk+ηkpk. It is obvious that this leads to search directions that are orthogonal to each other. After this, the residual is updated according to formula (3). The conjugate direction method has a property in which the new conjugate vector pk+1 can be computed using the previous vector pk using formula (4). The new direction or new step pk will be the linear combination of the negative residual −rk+1 and the previous search vector pk. The ratio of norm squared current residual to the norm squared of the previous residual gives the value of the constant βk using formula (5).

2.3 Neural Networks

ANNs are the most widely used machine learning algorithms for time series prediction [22]. Neurons are the building blocks of these predictors. The most basic ANN is MLP. It belongs to the class of feed-forward networks, which maps the input to an appropriate output and is commonly known for its robust universal approximation ability [13]. A basic MLP structure consists of an input layer, hidden layer, and an output layer in which both input and output layers are called visible layers. Cascade forward neural network also belongs to the class of feed-forward networks with a weight connection from the input layer to each layer in the network. It is based on the principle that feed-forward networks with a more number of layers in the architecture would learn more complex input-output relationship quickly. Elman neural network is a recurrent neural network with an extra layer called the context layer between the input and hidden layers to feed back the states of hidden units into hidden units during the next stage of input. It is mostly implemented to learn the time-varying patterns. RBF neural networks [3] are a kind of neural networks in which the radial basis kernel, such as Gaussian kernel, takes the role of the activation function. A clustering algorithm such as K-means clustering [14] is used to transform the input vector into K number of clusters. The optimal value of “K” is determined by the elbow method. NARX relates the current value of a time series with past values of the same series. The model employed in the current study has two tap delays, which means both previous and before previous input and output values are feed back to the input layer. In the entire neural network predictors implemented here, except for RBF neural network, the activation function used at each neuron is a logistic function given by Eq. (6):

(6)f(x)=11+exp(x).

The errors cost function used in each of the neural network predictors is given by Eq. (7):

(7)J=12mi=1m(y(x(i))t(i))2

where the variables t(i) and y(x(i)) represent the observed and predicted values of PM2.5, respectively, for the ith sample of input x, and m represents the number of input data samples. The neural networks implemented here relate the input and output in independent approaches. All neural networks presented in the current study have been trained using back-propagation with CGD optimization [12] to update the model parameters or weights of the network. The structural complexity of MLP, cascade forward, and NARX neural network predictors includes five input layer neurons, five hidden layer neurons, and one output layer neuron. The Elman recurrent neural network implemented in the current study has a structure of 5-5-5-1, which includes five neurons in the context layer. In the RBF neural network, the input data set has been divided into 15 clusters or it can be said that the neural network has 10 RBF neurons. These individual neural network predictors are then combined using an ensemble network consisting of cascade forward neural network with the same structured implemented before and SVR as the integrators to improve the accuracy of forecasting of the AQI. All implemented neural network predictors have used 1000 epochs. The results of both ensemble approaches are evaluated and compared.

2.4 Regression Models: MLR with Gradient Descent Variants and SVR

In MLR, the dependent and independent variables are related by determining the model parameters [21], and the relationship is governed by Eq. (8):

(8)y=w1x1+w2x2++wnxn

where y is the dependent variable, x1, x2, … xn are the independent variables, and w1, w2, … wn are the model parameters. This basic regression model is widely used in many studies for prediction [6, 10]. In the current study, this model has been implemented using batch gradient descent (BGD) with a learning rate or step size of 0.01, stochastic gradient descent (SGD) with a learning rate of 0.0001, mini-BGD (MBGD) with a learning rate of 0.005, and batches of 50 samples and CGD optimization algorithms, as these are the most common ways of optimizing any regression model. All MLR models employed here use 1000 epochs for training. SVR [9] based on support vector machines (SVM) [5] has also been implemented as a regression model in the current study for predicting the AQI in Houston and Los Angeles.

2.5 Modeling Technique: Stacking Ensemble

In the current study, the stacking ensemble method is used for the final forecast of the AQI. In the stacking ensemble method, the predictions of the individual base learners are given as input to the any of the base learner algorithm, which is used as a combiner algorithm for the higher level of training. Usually, this combiner algorithm outperforms the performance of the individual base learners [20]. It has been proven to be a successful approach for supervised learning tasks such as regression analysis [2] and also for unsupervised learning tasks such as density estimation [17]. In general, a simple logistic regression is used as a combiner algorithm. The combiner algorithms used here include cascade forward neural network and SVR. All other predictors including these two are considered as base learners. Here, the performance of both combiner algorithms have been evaluated by taking two different sets of base learners: the first set of base learners includes all neural network predictors such as MLP, cascade forward neural network, Elman recurrent neural network, RBF neural network, and NARX neural network with CGD as the optimization algorithm to update weights, and the second set of base learners includes MLR with different gradient descent variants such as BGD, SGD, MBGD, and CGD optimization algorithms and SVR. The predictions of these individual neural network predictors and regression models are given as input to the combiner algorithm such as cascade forward neural network and SVR. The final forecast of this combiner algorithm predicts the AQI in an area of interest. This stacked model often improves the accuracy due to its smoothing nature and its ability to credit each of the base learners where they perform best and to discredit where they perform poorly. The overall performance of this stacked model will be good when the base learners are significantly different in the way they predict. The block diagrams demonstrating ensembles are given in Figures 1 and 2 .

Figure 1: Cascade Forward and SVR Ensemble of Neural Network Predictors.
Figure 1:

Cascade Forward and SVR Ensemble of Neural Network Predictors.

Figure 2: Cascade Forward and SVR Ensemble of MLR with Gradient Descent Variants and SVR.
Figure 2:

Cascade Forward and SVR Ensemble of MLR with Gradient Descent Variants and SVR.

3 Results and Discussion

3.1 Cascade Forward and SVR Ensembles of Neural Network Predictors

In these two ensembles, the base learners are individual neural network predictors: MLP, cascade forward neural network, Elman neural network, RBF neural network, and NARX. The predictions of these predictors are given as input to cascade forward neural network and SVR for the final forecast of the AQI. The performance of these two combiner algorithms with same base learners is compared.

The testing results of the cascade forward neural network as the combiner algorithm with neural network base learners for forecasting the AQI in Houston and Los Angeles are given in Figure 3. The testing results of SVR as the combiner algorithm with neural network base learners for forecasting the AQI in Houston and Los Angeles are given in Figure 4. The performance measures of both ensembles with neural network predictors as the base learners are given in Table 4.

Figure 3: Cascade Forward Ensemble of Neural Network Predictors: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.
Figure 3:

Cascade Forward Ensemble of Neural Network Predictors: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.

Figure 4: SVR Ensemble of Neural Network Predictors: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.
Figure 4:

SVR Ensemble of Neural Network Predictors: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.

Table 4:

Performance of Cascade Forward and SVR Ensembles with Neural Networks as the Base Learners.

AlgorithmMAEMAPERRMSEIA
HLAHLAHLAHLAHLA
MLP5.708.626.7610.250.9780.9477.3111.120.9890.972
Cascade forward3.042.913.863.690.9930.9934.064.050.9960.996
Elman9.399.0410.8910.810.9390.94812.3111.230.9670.970
RBF4.184.475.385.890.9810.9816.836.650.9900.990
NARX14.3511.0817.2412.710.8800.91918.0613.950.9070.951
Ensemble (cascade forward)2.682.703.343.390.9940.9943.753.700.9970.997
Ensemble (SVR)2.953.013.823.960.9930.9934.063.960.9960.996
  1. H, Houston; LA, Los Angeles.

It can be seen that the cascade forward ensemble outperformed the SVR ensemble of individual neural network predictors. The regression plots represented in Figure 3B and D for Houston and Los Angeles, respectively, indicate that the relationship between the observed and predicted values of the AQI determined by the cascade forward ensemble makes a good fit.

3.2 Cascade Forward and SVR Ensembles of Regression Models

In these two ensembles, the base learners are individual regression models: MLR with BGD, SGD, MBGD, and CGD optimization algorithms and SVR. The predictions of these regression models are given as input to the combiner algorithms for the final forecast of the AQI.

The testing results of the cascade forward neural network as the combiner algorithm with regression models as the base learners for forecasting the AQI in Houston and Los Angeles are given in Figure 5. The testing results of SVR as the combiner algorithm with regression models as the base learners for forecasting the AQI in Houston and Los Angeles are given in Figure 6. The performance measures of both ensembles with regression models as the base learners are given in Table 5.

Figure 5: Cascade Forward Ensemble of Regression Models: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.
Figure 5:

Cascade Forward Ensemble of Regression Models: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.

Figure 6: SVR Ensemble of Regression Models: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.
Figure 6:

SVR Ensemble of Regression Models: (A) Testing Plot for Houston, (B) Regression Plot for Houston, (C) Testing Plot for Los Angeles, and (D) Regression Plot for Los Angeles.

Table 5:

Performance of Cascade Forward and SVR Ensembles with Regression Models as the Base Learners.

AlgorithmMAEMAPERRMSEIA
HLAHLAHLAHLAHLA
MLR (BGD)10.4610.9612.3413.060.9290.92213.0613.460.9610.957
MLR (SGD)10.2810.4612.1112.580.9310.92912.9212.850.9620.962
MLR (MBGD)10.2110.1912.0312.370.9310.93112.9012.640.9630.963
MLR (CGD)10.2210.1712.0512.340.9310.93112.8812.610.9630.963
SVR5.915.897.166.990.9790.9797.257.130.9880.988
Ensemble (cascade forward)4.424.225.525.100.9870.9875.495.450.9930.993
Ensemble (SVR)5.035.006.626.220.9840.9836.266.300.9910.991
  1. H, Houston; LA, Los Angeles.

It can be seen that the cascade forward ensemble outperformed the SVR ensemble of individual regression models. Also, most of the neural network predictors exhibited high accuracy compared to MLR models with gradient descent variants. Among the neural network predictors, the cascade forward neural network has shown high performance, and among regression models, SVR has shown high performance. The cascade forward neural network performed better due to its structure. It has greater ability to map nonlinear input to the output. Also, the neural networks usually outperform regression models such MLR when nonlinearities are involved. Often, generalization is more in the case of neural network predictors. Both ensembles have shown better accuracy than the individual predictors of the AQI.

4 Conclusion

The proposed approach investigates the concentrations of NO2, CO, O3, PM2.5, SO2, and PM10 to build a forecasting model of the AQI. The study has been carried out for Houston and Los Angeles. The accuracy of prediction has been improved by the stacked ensemble of individual predictors. The performance of the implemented ensemble methods for two different sets of base learners has been evaluated and compared. It can be concluded that cascade forward ensemble outperformed SVR ensemble for both sets of base learners. The limitation of the current study is the ability to predict the AQI more accurately from highly nonlinear data. Often, the neural network methods usually converge to locally optimal solutions. The CGD optimization method greatly deals with this problem. The future study can be attributed to the use of deep learning methods such as LSTM and deep belief networks such as restricted Boltzmann machines, as they extract features layer by layer and combine low-level features to form high-level features and they also have the ability to model complex mapping [1]. Fuzzy logic can also be implemented when the behavior of the mathematical model is complicated and the linguistic rules are needed to define the behavior of a system. The scope of the current study also includes the use of ensemble methods such as bagging and boosting. In conclusion, the present study has indicated that the ensemble approach provides promising results for forecasting the AQI compared to individual neural network predictors and regression models.

Acknowledgments

The authors would like to thank VIT University (Vellore, India) for supporting the current research work. The authors also acknowledge the US EPA for providing air quality data.

Bibliography

[1] Y. Bengion and O. Delalleau, On the expressive power of deep architectures, in: Proceedings of the 14th International Conference on Discovery Science, pp. 18–36, Springer-Verlag, Berlin, 2011.Search in Google Scholar

[2] L. Breiman, Stacked regression, Mach. Learn.24 (1996), 49–64.10.1007/BF00117832Search in Google Scholar

[3] D. S. Broomhead and D. Lowe, Radial basis functions, multi-variable functional interpolation and adaptive networks, Technical report, RSRE, 4148, 1988.Search in Google Scholar

[4] A. Chaloulakou, M. Saisana and N. Spyrellis, Comparative assessment of neural networks and regression models for forecasting summertime ozone in Athens, Sci. Total Environ.313 (2003), 1–13.10.1016/S0048-9697(03)00335-8Search in Google Scholar

[5] C. Cortes and V. Vapnik, Support-vector networks, Mach. Learn.20 (1995), 273–297.10.1007/BF00994018Search in Google Scholar

[6] J. W. Davidson, D. A. Savic and G. A. Walters, Symbolic and numerical regression: experiments and applications, Inf. Sci.150 (2002), 95–117.10.1007/978-3-7908-1829-1_21Search in Google Scholar

[7] S. Deleawe, J. Kusznir, B. Lamb and D. J. Cook, Predicting air quality in smart environments, J. Ambient Intell. Smart Environ.2 (2010), 145–154.10.3233/AIS-2010-0061Search in Google Scholar

[8] E. G. Dragomir, Air quality index prediction using K-nearest neighbor technique, Bull. PG Univ. Ploiesti Ser. Math. Inform. Phys.L.XII (2010), 103–108.Search in Google Scholar

[9] H. Drucker, C. J. C. Burges, L. Kaufman, A. J. Smola and V. N. Vapnik, Support vector regression machines, in: M. C. Mozer, M. I. Jordan, T. Petsche, eds., Advances in Neural Information Processing Systems 9, NIPS 1996, pp. 155–161, MIT Press, Cambridge, Massachusetts, 1997.Search in Google Scholar

[10] P. Goyal, A. T. Chan and N. Jaiswal, Statistical models for the prediction of respirable suspended particulate matter in urban cities, Atmos. Environ.40 (2006), 2068–2077.10.1016/j.atmosenv.2005.11.041Search in Google Scholar

[11] N. Gutierrez-Jaime, Population of the City of Los Angeles surpasses 4 million, KTLA 5, Tribune Broadcasting, Retrieved 17 May 2017.Search in Google Scholar

[12] M. R. Hestenes, R. Magnus and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Natl. Bureau Stand.49 (1952), 409–436.10.6028/jres.049.044Search in Google Scholar

[13] K. Hornik, M. Stinchcombe and H. White, Multilayer feedforward networks are universal approximators, Neural Netw.2 (1989), 359–366.10.1016/0893-6080(89)90020-8Search in Google Scholar

[14] T. Kanungo, D. M. Mount, N. S. Netanyahu, C. D. Piatko, R. Silverman and A. Y. Wu, An efficient k-means clustering algorithm: analysis and implementation, IEEE Trans. Pattern Anal. Mach. Intell.24 (2002), 881–892.10.1109/TPAMI.2002.1017616Search in Google Scholar

[15] A. Kurt and A. B. Oktay, Forecasting air pollutant indicator levels with geographic models 3 days in advance using neural networks, Expert Syst. Appl.37 (2010), 7986–7992.10.1016/j.eswa.2010.05.093Search in Google Scholar

[16] New state population report: California grew by over 335,000 residents in 2016, California Department of Finance. Retrieved 17 May 2017.Search in Google Scholar

[17] P. Smyth and D. H. Wolpert, Linearly combining density estimators via stacking, Mach. Learn. J.36 (1999), 59–83.10.1023/A:1007511322260Search in Google Scholar

[18] A. Vlachogianni, P. Kassomenos, A. Karppinen, S. Karakitsios and J. Kukkonen, Evaluation of a multiple regression model for the forecasting of the concentrations of NOx and PM10 in Athens and Helsinki, Sci. Total Environ.409 (2011), 1559–1571.10.1016/j.scitotenv.2010.12.040Search in Google Scholar

[19] World Health Organization (WHO), Estimated deaths and DALYS attributable to selected environmental risk factors, WHO, Geneva, Switzerland, 2015.Search in Google Scholar

[20] D. Wolpert, Stacked generalization, Neural Netw.5 (1992), 241–259.10.1016/S0893-6080(05)80023-1Search in Google Scholar

[21] X. Yan, Linear regression analysis: theory and computing, pp. 1–2, World Scientific, Singapore, 2009, ISBN 9789812834119.10.1142/6986Search in Google Scholar

[22] G. Zhang, B. Eddy Patuwo and M. Y. Hu, Forecasting with artificial neural networks: the state of the art, Int. J. Forecast.14 (1998), 35–62.10.1016/S0169-2070(97)00044-7Search in Google Scholar

Received: 2017-06-08
Published Online: 2017-11-18

©2019 Walter de Gruyter GmbH, Berlin/Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 5.5.2024 from https://www.degruyter.com/document/doi/10.1515/jisys-2017-0277/html
Scroll to top button