research-article

Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster

Authors:
Marjan Asgari

Dep of Geodesy and Geomatics, K.N. Toosi University of Technology, Tehran, Iran

Dep of Geodesy and Geomatics, K.N. Toosi University of Technology, Tehran, Iran
View Profile

,
Mahdi Farnaghi

Dep of Geodesy and Geomatics, K.N. Toosi University of Technology, Tehran, Iran

Dep of Geodesy and Geomatics, K.N. Toosi University of Technology, Tehran, Iran
View Profile

,
Zeinab Ghaemi

Dep of Geodesy and Geomatics, K.N. Toosi University of Technology, Tehran, Iran

Dep of Geodesy and Geomatics, K.N. Toosi University of Technology, Tehran, Iran
View Profile

ICCBDC '17: Proceedings of the 2017 International Conference on Cloud and Big Data ComputingSeptember 2017Pages 89–93https://doi.org/10.1145/3141128.3141131

Published:17 September 2017Publication History

ICCBDC '17: Proceedings of the 2017 International Conference on Cloud and Big Data Computing

Pages 89–93

ABSTRACT

Air pollution is one of the major environmental problems in the industrial and populated cities. Predictive mapping of urban air pollution and sharing the generated maps with the public and city officials have positive impacts on society and environment. This article presents a solution based on distributed processing concepts to generate predictive map of air pollution for the next 24 hours. Apache Hadoop has been utilized as the underlying framework to form a cluster of processing machines. In order to improve the processing speed along with required machine learning functionalities, Apache Spark has been employed on the Hadoop cluster. The solution enables us to efficiently predict air quality classes on monitoring stations of Tehran, the capital of Iran for the next 24 hours. Using Inverse distance weighting (IDW) method, the predictive map of air quality classes is generated afterward for the whole city. The results showed that the proposed approach can achieve a reasonable speed in processing of big spatial data along with horizontal scalability.

References

NASA-Earth-Observatory. (2016). "Tehran' air pollution." Retrieved December 28 2016, from http://earthobservatory.nasa.gov/IOTD/view.php?id=89130 http://earthobservatory.nasa.gov/IOTD/view.php?id=89130.Google Scholar
Ji, C., Y. Li, W. Qiu, U. Awada and K. Li (2012). Big data processing in cloud computing environments. 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks, IEEE. Google ScholarDigital Library
Lu, W., W. Wang, A. Leung, S.-M. Lo, R. K. Yuen, Z. Xu and H. Fan (2002). Air pollutant parameter forecasting using support vector machines. Neural Networks, 2002. IJCNN'02. Proceedings of the 2002 International Joint Conference on, IEEE.Google Scholar
Azid, A., H. Juahir, M. E. Toriman, M. K. A. Kamarudin, A. S. M. Saudi, C. N. C. Hasnam, N. A. A. Aziz, F. Azaman, M. T. Latif and S. F. M. Zainuddin (2014). "Prediction of the level of air pollution using principal component analysis and artificial neural network techniques: A case study in Malaysia." Water, Air, & Soil Pollution 225(8): 1--14.Google ScholarCross Ref
Gocheva-Ilieva, S. G., A. V. Ivanov, D. S. Voynikova and D. T. Boyadzhiev (2014). "Time series analysis and forecasting for air pollution in small urban area: an SARIMA and factor analysis approach." Stochastic environmental research and risk assessment 28(4): 1045--1060.Google ScholarCross Ref
Russo, A. and A. O. Soares (2014). "Hybrid model for urban air pollution forecasting: A stochastic spatio-temporal approach." Mathematical Geosciences 46(1): 75--93.Google ScholarCross Ref
Ayyalasomayajula, H., E. Gabriel, P. Lindner and D. Price (2016). Air Quality Simulations Using Big Data Programming Models. 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).Google Scholar
Ghaemi, Z., M. Farnaghi and A. Alimohammadi (2015). "Hadoop-Based Distributed System for Online Prediction of Air Pollution Based on Support Vector Machine." The International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 40(1): 215.Google ScholarCross Ref
Zhang, C. and D. Yuan (2015). Fast Fine-Grained Air Quality Index Level Prediction Using Random Forest Algorithm on Cluster Computing of Spark. 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).Google Scholar
Chimmiri, M. (2016). "what is hadoop?" Retrieved December 23 2016, from http://www.hadooptpoint.com/whatis- hadoop/http://www.hadooptpoint.com/what-is-hadoop/.Google Scholar
Bappalige, S. P. (2014). "An introduction to Apache Hadoop for big data." Retrieved 13 Decemer 2016, from https://opensource.com/life/14/8/intro-apache-hadoop-bigdata https://opensource.com/life/14/8/intro-apache-hadoopbig-data.Google Scholar
Penchikala, S. (2015). "Big Data Processing with Apache Spark ", from https://www.infoq.com/articles/apache-sparkintroduction https://www.infoq.com/articles/apache-sparkintroduction.Google Scholar
Kestelyn, J. (2013). 12 January 2017, from http://blog.cloudera.com/blog/2013/11/putting-spark-to-use-fastin-memory-computing-for-your-big-dataapplications/ http://blog.cloudera.com/blog/2013/11/puttingspark- to-use-fast-in-memory-computing-for-your-big-dataapplications/.Google Scholar
Stoica, I. (2014). "Apache Spark and Hadoop: Working Together." Retrieved January 5th 2017, from https://databricks.com/blog/2014/01/21/spark-andhadoop.html https://databricks.com/blog/2014/01/21/sparkand- hadoop.html.Google Scholar
Kumar, A. and P. Goyal (2011). "Forecasting of daily air quality index in Delhi." Science of the Total Environment 409(24): 5517--5523.Google ScholarCross Ref
Chen, H. and R. Copes (2013). "Review of air quality index and air quality health index." Toronto: Ontario Agency for Health Protection and Promotion (Public Health Ontario) 219.Google Scholar

Index Terms

Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
      1. MapReduce algorithms
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Impact of Memory Size on Bigdata Processing based on Hadoop and Spark
RACS '17: Proceedings of the International Conference on Research in Adaptive and Convergent Systems

Hadoop and Spark are well-known big data processing platforms. The main technologies of Hadoop are Hadoop Distributed File System and MapReduce processing. Hadoop stores intermediary data on Hadoop Distributed File System, which is a disk-based ...
Read More
A Performance Study of Big Spatial Data Systems
BigSpatial '18: Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data

With the accelerated growth in spatial data volume, being generated from a wide variety of sources, the need for efficient storage, retrieval, processing and analyzing of spatial data is ever more important. Hence, spatial data processing system has ...
Read More
A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing
ICDS 2015: Proceedings of the Second International Conference on Data Science - Volume 9208

With the fast development of remote sensing techniques, the volume of acquired data grows exponentially. This brings a big challenge to process massive remote sensing data. In the paper, an in-memory computing framework is proposed to address this ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCBDC '17: Proceedings of the 2017 International Conference on Cloud and Big Data Computing
September 2017
135 pages
ISBN:9781450353434
DOI:10.1145/3141128

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 September 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Hadoop
Spark
air pollution
big spatial data
distributed processing
predictive mapping
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 293
  Total Downloads
- Downloads (Last 12 months)27
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster

ICCBDC '17: Proceedings of the 2017 International Conference on Cloud and Big Data Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Impact of Memory Size on Bigdata Processing based on Hadoop and Spark

A Performance Study of Big Spatial Data Systems

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Predictive mapping of urban air pollution using Apache Spark on a Hadoop cluster

ICCBDC '17: Proceedings of the 2017 International Conference on Cloud and Big Data Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Impact of Memory Size on Bigdata Processing based on Hadoop and Spark

A Performance Study of Big Spatial Data Systems

A Spark-Based Big Data Platform for Massive Remote Sensing Data Processing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media