skip to main content
10.1145/3141128.3141131acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccbdcConference Proceedingsconference-collections
research-article

Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster

Authors Info & Claims
Published:17 September 2017Publication History

ABSTRACT

Air pollution is one of the major environmental problems in the industrial and populated cities. Predictive mapping of urban air pollution and sharing the generated maps with the public and city officials have positive impacts on society and environment. This article presents a solution based on distributed processing concepts to generate predictive map of air pollution for the next 24 hours. Apache Hadoop has been utilized as the underlying framework to form a cluster of processing machines. In order to improve the processing speed along with required machine learning functionalities, Apache Spark has been employed on the Hadoop cluster. The solution enables us to efficiently predict air quality classes on monitoring stations of Tehran, the capital of Iran for the next 24 hours. Using Inverse distance weighting (IDW) method, the predictive map of air quality classes is generated afterward for the whole city. The results showed that the proposed approach can achieve a reasonable speed in processing of big spatial data along with horizontal scalability.

References

  1. NASA-Earth-Observatory. (2016). "Tehran' air pollution." Retrieved December 28 2016, from http://earthobservatory.nasa.gov/IOTD/view.php?id=89130 http://earthobservatory.nasa.gov/IOTD/view.php?id=89130.Google ScholarGoogle Scholar
  2. Ji, C., Y. Li, W. Qiu, U. Awada and K. Li (2012). Big data processing in cloud computing environments. 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks, IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Lu, W., W. Wang, A. Leung, S.-M. Lo, R. K. Yuen, Z. Xu and H. Fan (2002). Air pollutant parameter forecasting using support vector machines. Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint Conference on, IEEE.Google ScholarGoogle Scholar
  4. Azid, A., H. Juahir, M. E. Toriman, M. K. A. Kamarudin, A. S. M. Saudi, C. N. C. Hasnam, N. A. A. Aziz, F. Azaman, M. T. Latif and S. F. M. Zainuddin (2014). "Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia." Water, Air, & Soil Pollution 225(8): 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  5. Gocheva-Ilieva, S. G., A. V. Ivanov, D. S. Voynikova and D. T. Boyadzhiev (2014). "Time series analysis and forecasting for air pollution in small urban area: an SARIMA and factor analysis approach." Stochastic environmental research and risk assessment 28(4): 1045--1060.Google ScholarGoogle ScholarCross RefCross Ref
  6. Russo, A. and A. O. Soares (2014). "Hybrid model for urban air pollution forecasting: A stochastic spatio-temporal approach." Mathematical Geosciences 46(1): 75--93.Google ScholarGoogle ScholarCross RefCross Ref
  7. Ayyalasomayajula, H., E. Gabriel, P. Lindner and D. Price (2016). Air Quality Simulations Using Big Data Programming Models. 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).Google ScholarGoogle Scholar
  8. Ghaemi, Z., M. Farnaghi and A. Alimohammadi (2015). "Hadoop-Based Distributed System for Online Prediction of Air Pollution Based on Support Vector Machine." The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40(1): 215.Google ScholarGoogle ScholarCross RefCross Ref
  9. Zhang, C. and D. Yuan (2015). Fast Fine-Grained Air Quality Index Level Prediction Using Random Forest Algorithm on Cluster Computing of Spark. 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).Google ScholarGoogle Scholar
  10. Chimmiri, M. (2016). "what is hadoop?" Retrieved December 23 2016, from http://www.hadooptpoint.com/whatis- hadoop/http://www.hadooptpoint.com/what-is-hadoop/.Google ScholarGoogle Scholar
  11. Bappalige, S. P. (2014). "An introduction to Apache Hadoop for big data." Retrieved 13 Decemer 2016, from https://opensource.com/life/14/8/intro-apache-hadoop-bigdata https://opensource.com/life/14/8/intro-apache-hadoopbig-data.Google ScholarGoogle Scholar
  12. Penchikala, S. (2015). "Big Data Processing with Apache Spark ", from https://www.infoq.com/articles/apache-sparkintroduction https://www.infoq.com/articles/apache-sparkintroduction.Google ScholarGoogle Scholar
  13. Kestelyn, J. (2013). 12 January 2017, from http://blog.cloudera.com/blog/2013/11/putting-spark-to-use-fastin-memory-computing-for-your-big-dataapplications/ http://blog.cloudera.com/blog/2013/11/puttingspark- to-use-fast-in-memory-computing-for-your-big-dataapplications/.Google ScholarGoogle Scholar
  14. Stoica, I. (2014). "Apache Spark and Hadoop: Working Together." Retrieved January 5th 2017, from https://databricks.com/blog/2014/01/21/spark-andhadoop.html https://databricks.com/blog/2014/01/21/sparkand- hadoop.html.Google ScholarGoogle Scholar
  15. Kumar, A. and P. Goyal (2011). "Forecasting of daily air quality index in Delhi." Science of the Total Environment 409(24): 5517--5523.Google ScholarGoogle ScholarCross RefCross Ref
  16. Chen, H. and R. Copes (2013). "Review of air quality index and air quality health index." Toronto: Ontario Agency for Health Protection and Promotion (Public Health Ontario) 219.Google ScholarGoogle Scholar

Index Terms

  1. Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICCBDC '17: Proceedings of the 2017 International Conference on Cloud and Big Data Computing
        September 2017
        135 pages
        ISBN:9781450353434
        DOI:10.1145/3141128

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 September 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader