ABSTRACT
Air pollution is one of the major environmental problems in the industrial and populated cities. Predictive mapping of urban air pollution and sharing the generated maps with the public and city officials have positive impacts on society and environment. This article presents a solution based on distributed processing concepts to generate predictive map of air pollution for the next 24 hours. Apache Hadoop has been utilized as the underlying framework to form a cluster of processing machines. In order to improve the processing speed along with required machine learning functionalities, Apache Spark has been employed on the Hadoop cluster. The solution enables us to efficiently predict air quality classes on monitoring stations of Tehran, the capital of Iran for the next 24 hours. Using Inverse distance weighting (IDW) method, the predictive map of air quality classes is generated afterward for the whole city. The results showed that the proposed approach can achieve a reasonable speed in processing of big spatial data along with horizontal scalability.
- NASA-Earth-Observatory. (2016). "Tehran' air pollution." Retrieved December 28 2016, from http://earthobservatory.nasa.gov/IOTD/view.php?id=89130 http://earthobservatory.nasa.gov/IOTD/view.php?id=89130.Google Scholar
- Ji, C., Y. Li, W. Qiu, U. Awada and K. Li (2012). Big data processing in cloud computing environments. 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks, IEEE. Google ScholarDigital Library
- Lu, W., W. Wang, A. Leung, S.-M. Lo, R. K. Yuen, Z. Xu and H. Fan (2002). Air pollutant parameter forecasting using support vector machines. Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint Conference on, IEEE.Google Scholar
- Azid, A., H. Juahir, M. E. Toriman, M. K. A. Kamarudin, A. S. M. Saudi, C. N. C. Hasnam, N. A. A. Aziz, F. Azaman, M. T. Latif and S. F. M. Zainuddin (2014). "Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia." Water, Air, & Soil Pollution 225(8): 1--14.Google ScholarCross Ref
- Gocheva-Ilieva, S. G., A. V. Ivanov, D. S. Voynikova and D. T. Boyadzhiev (2014). "Time series analysis and forecasting for air pollution in small urban area: an SARIMA and factor analysis approach." Stochastic environmental research and risk assessment 28(4): 1045--1060.Google ScholarCross Ref
- Russo, A. and A. O. Soares (2014). "Hybrid model for urban air pollution forecasting: A stochastic spatio-temporal approach." Mathematical Geosciences 46(1): 75--93.Google ScholarCross Ref
- Ayyalasomayajula, H., E. Gabriel, P. Lindner and D. Price (2016). Air Quality Simulations Using Big Data Programming Models. 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).Google Scholar
- Ghaemi, Z., M. Farnaghi and A. Alimohammadi (2015). "Hadoop-Based Distributed System for Online Prediction of Air Pollution Based on Support Vector Machine." The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40(1): 215.Google ScholarCross Ref
- Zhang, C. and D. Yuan (2015). Fast Fine-Grained Air Quality Index Level Prediction Using Random Forest Algorithm on Cluster Computing of Spark. 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).Google Scholar
- Chimmiri, M. (2016). "what is hadoop?" Retrieved December 23 2016, from http://www.hadooptpoint.com/whatis- hadoop/http://www.hadooptpoint.com/what-is-hadoop/.Google Scholar
- Bappalige, S. P. (2014). "An introduction to Apache Hadoop for big data." Retrieved 13 Decemer 2016, from https://opensource.com/life/14/8/intro-apache-hadoop-bigdata https://opensource.com/life/14/8/intro-apache-hadoopbig-data.Google Scholar
- Penchikala, S. (2015). "Big Data Processing with Apache Spark ", from https://www.infoq.com/articles/apache-sparkintroduction https://www.infoq.com/articles/apache-sparkintroduction.Google Scholar
- Kestelyn, J. (2013). 12 January 2017, from http://blog.cloudera.com/blog/2013/11/putting-spark-to-use-fastin-memory-computing-for-your-big-dataapplications/ http://blog.cloudera.com/blog/2013/11/puttingspark- to-use-fast-in-memory-computing-for-your-big-dataapplications/.Google Scholar
- Stoica, I. (2014). "Apache Spark and Hadoop: Working Together." Retrieved January 5th 2017, from https://databricks.com/blog/2014/01/21/spark-andhadoop.html https://databricks.com/blog/2014/01/21/sparkand- hadoop.html.Google Scholar
- Kumar, A. and P. Goyal (2011). "Forecasting of daily air quality index in Delhi." Science of the Total Environment 409(24): 5517--5523.Google ScholarCross Ref
- Chen, H. and R. Copes (2013). "Review of air quality index and air quality health index." Toronto: Ontario Agency for Health Protection and Promotion (Public Health Ontario) 219.Google Scholar
Index Terms
- Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster
Recommendations
Impact of Memory Size on Bigdata Processing based on Hadoop and Spark
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent SystemsHadoop and Spark are well-known big data processing platforms. The main technologies of Hadoop are Hadoop Distributed File System and MapReduce processing. Hadoop stores intermediary data on Hadoop Distributed File System, which is a disk-based ...
A Performance Study of Big Spatial Data Systems
BigSpatial '18: Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial DataWith the accelerated growth in spatial data volume, being generated from a wide variety of sources, the need for efficient storage, retrieval, processing and analyzing of spatial data is ever more important. Hence, spatial data processing system has ...
A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
Comments