A cloud-enabled automatic disaster analysis system of multi-sourced data streams: An example synthesizing social media, remote sensing and Wikipedia data
Introduction
Every year extreme weather and climate events, such as cyclones, floods, tornados and geological events such as volcanic eruptions, earthquakes or landslides, claim thousands of lives and cause billions of dollars of damage to property and severely impact the environment (Velev & Zlateva, 2012). Disasters and their effects have been increasing both in frequency and severity in the 21st century because of climate change, increasing population and their reliance on aging infrastructure. In fact, the first decade of the 21st century witnessed 3496 natural disasters including floods, storms, droughts and heat waves, nearly five times as many disasters as the 743 catastrophes reported during the 1970s.1 Therefore, an urgent need exists to understand spatiotemporal patterns and the general dynamics that contribute to the occurrences of disasters. These combined studies are necessary to develop effective strategies to mitigate their destructive effects, and to respond and coordinate efficiently to protect people, properties and the environment.
Social media have been primarily used as an intelligent “geo-sensor” network to detect extreme events and disasters such as hurricanes and earthquakes, and to gain situational awareness for emergency responders and relief coordinators during crises by monitoring and tracking citizens feedbacks (Sutton, Palen, & Shklovski, 2008). Additionally, they are widely used by scientists to study public risk perception, and people's reactions during disasters (Mandel et al., 2012). On the other hand, remote sensing data are paramount during disasters and have become the de-facto standard for providing high resolution imagery for damage assessment and the coordination of disaster relief operations (Cervone et al., 2016, Cutter, 2003, Joyce et al., 2009). Using high resolution imagery from commercial and research air- and space-borne instruments, it is possible to obtain data within hours of major events, frequently including ‘before’ and ‘after’ scenes of the affected areas (Cervone & Manca, 2011). These ‘before’ and ‘after’ images are quickly disseminated through scientific portal and news channels to assess damage and inform the public. In addition, first responders rely heavily on remotely sensed imagery for coordination of relief and response efforts as well as the prioritizing of resource allocation.
Despite the wide availability of large remote sensing datasets from numerous sensors, specific data might not be collected in the time and space most urgently required. Geo-temporal gaps result due to satellite revisit time limitations, atmospheric opacity, or other obstructions. Recently, stream data from social media and remote sensing are being fused for disaster analysis and assessment. Specifically, social media are used to fill in the gaps when remote sensing data are lacking or incomplete (Schnebele and Cervone, 2013, Schnebele et al., 2014, Schnebele et al., 2015).
However, current studies on using social media and remote sensing data for disaster analysis are performed on a case-by-case basis. The approaches typically start with identifying a specific disaster event, and then filters (e.g., keywords, spatiotemporal information) are designed to select and retrieve relevant stream data. These efforts are time-consuming. For example, identifying the tweet hashtags associated to a specific event, may take from hours to days for manual examination of hundreds of tweets to include relevant hashtags so we can use them to filter out non-relevant tweets during a disaster. Furthermore, these efforts need to be duplicated when analyzing a different event. As stated earlier, it is necessary to complete a comprehensive database that can display the historical events with relevant metadata (e.g., event type, severe category, damages, locations, and temporal spans) to allocate resources for analysis. Additionally, from the basic metadata, it is also needed to automatically derive relevant information (e.g., hashtags), which can then be used to retrieve relevant messages from long-term accumulated social media.
With multi-sourced data streams from a multitude of channels, identifying authoritative sources and extracting critical, validated messages information can be quite challenging, especially during a crisis. The volume, velocity, and variety of accumulated stream data produce the most compelling demands for computing technologies from big data management to technology infrastructure (Huang & Xu, 2014). To address these big data challenges, various types of computational infrastructures are designed, from the traditional cluster and grid computing to the recent development of cloud computing and CPU/GPU heterogeneous computing (Schadt, Linderman, Sorenson, Lee, & Nolan, 2010). Specifically, cloud computing has been increasingly viewed as a viable solution to utilize multiple low-profile computing resources to parallelize the analysis of massive data into smaller processes (Huang & Cervone, 2016).
This paper addresses these problems by proposing a novel system to support both historic disaster event analysis and upcoming event monitoring. Wikipedia is exploited as a source to build a disaster event database, which is then applied to retrieve relevant information for a specific disaster from massive social media data accumulated daily. Cloud computing is proposed to serve as the underlying infrastructure that offers the capability of providing on-demand and flexible computing resources to meet the dynamic computing requirements of real-time disaster analysis. The following contributions are made in this research:
- 1.
An integrated system framework is proposed for historical disaster analysis based on multi-sourced data with limit, if any human interaction. To analyze and understand the public behaviors or reactions captured by social media data, our system does not rely on human identification of filtering criteria to retrieve relevant messages. An automatic system based on text mining, and geocoding technologies are developed to derive these information.
- 2.
An event database is built based on Wikipedia. Such a database is useful for scientists easily selecting a relevant event for analysis or selecting disasters of a specific type to identify their patterns, and linking it to other GIS data (e.g., socioeconomic data), climate data, and environment data to understand the driving factors that contribute to the occurrences of these disaster events.
- 3.
Within the proposed system, cloud computing is used as the underlying infrastructure to provide flexible computing power to address the computing challenges posed by the massive data processing and a real-time operational system for emergencies response and disaster coordination. Such a system is suitable for online services and systems where a number of texts, and remote sensing images are dynamically streaming.
- 4.
A prototype is implemented, and recent flooding events are used as a case study to demonstrate the feasibility of the proposed system.
- 5.
This paper provides a general methodology that it is not event specific, and can be used both for retrospective analysis and for real time monitoring and decision making. The proposed framework sheds light on integrating various emerging data sources to support scientific applications of significant interests that go beyond disaster management.
Section snippets
Social media for disaster management
As social media applications are widely deployed in various platforms from personal computers to mobile devices, they are becoming a natural extension to human sensory system. The synthesis of social media with human intelligence has the potential to be the intelligent sensor network that can be used to detect, monitor and gain situational awareness during a hazard with unprecedented scale and capacity. By monitoring tweets, for example, an earthquake can be detected by developing a
A cloud-based disaster analysis system
Fig. 1 shows a general architecture for disaster analysis leveraging multiple sources. The system is designed to include six integrated components, including: 1) Data repository, responsible for archiving and retrieving datasets. An automatic system is developed to crawl and integrate unstructured, heterogonous data from various sources, such as Wikipedia, remote sensing, social media, and Web. Disaster relevant messages posted from different social medias such as Twitter, and Flickr are
Hashtag detection: Latent Dirichlet Allocation
Latent Dirichlet Allocation (LDA; Blei et al., 2003) is an example of a topic model for analyzing a large number of unlabeled data. LDA can be used to cluster words into “topics” and documents into mixtures of “topics” by uncovering the hidden thematic structure (or “topics”) in a large collection of documents. In LDA, each document is represented as a probability distribution of various topics, which are in turn distributions over words. Each word could belong to one or more topics.
As each
Demonstration
A system prototype is implemented based on JAVA, Java Server Page (JSP) and Python to automatically harvest and analyze various types of data. Several open-sources are used for the prototype development. For instance, Apache Mahout package is used for performing data and text mining tasks, Apache Lucene for text processing and indexing tasks, and Google Maps and Geocoding APIs for mapping and geocoding the tweets. Various geovisual tools are developed and accessible through the developed
Conclusion
This paper presents a novel framework to support the analysis of historical disaster events, and the real-time event detection and tracking of new events. Massive spatiotemporal data from social media streams and remote sensing are generated continuously and dynamically, posing new challenges and opportunities to study disasters. To meet the dynamic computing requirements of disaster analysis for real-time events, cloud computing is proposed as the infrastructure to provide on-demand and
Acknowledgement
Work performed under this project has been supported by grants from the Wisconsin Alumni Research Foundation, University of Wisconsin-Madison (Project No.: #PRJ93X), Department of Energy (Project No.:DE-AR0000717), and Office of Naval Research (Project No.: #N00014-16-1-2543).
References (54)
- et al.
Efficient generation of simple polygons for characterizing the shape of a set of points in the plane
Pattern Recognition
(2008) - et al.
Harnessing the crowdsourcing power of social media for disaster relief
IEEE Intelligent Systems
(2011) - et al.
Constructing gazetteers from volunteered big geo-data based on Hadoop, 2014
Computers; Environment and Urban Systems
(2017) - et al.
Usage of social media and cloud computing during natural hazards
- et al.
Evaluating open-source cloud computing solutions for geosciences
Computers & Geosciences
(2013) - et al.
Building model as a service to support geosciences
Computers, Environment and Urban Systems
(2017) - et al.
Parallel map projection of vector-based big spatial data: Coupling cloud computing with graphics processing units
Computers, Environment and Urban Systems
(2017) - et al.
Using GPS to learn significant locations and predict movement across multiple users
Personal and Ubiquitous Computing
(2003) - et al.
Tweedr: Mining twitter to inform disaster response
(2014) - et al.
Latent dirichlet allocation
The Journal of Machine Learning Research
(2003)
Emergency situation awareness from twitter for crisis management
Damage assessment of the 2011 Japanese tsunami using high-resolution satellite data
Cartographica: The International Journal for Geographic Information and Geovisualization
Using Twitter for tasking remote-sensing data collection and damage assessment: 2013 boulder flood case study
International Journal of Remote Sensing
GI science, disasters, and emergency management
Transactions in GIS
A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management
International Journal of Geographical Information Science
Omg, from here, i can see the flames!: A use case of mining location based social networks to acquire spatio-temporal data on forest fires
MapReduce: Simplified data processing on large clusters
Communications of the ACM
A density-based algorithm for discovering clusters in large spatial databases with noise
Cloud computing for parallel scientific HPC applications: Feasibility of running coupled atmosphere-ocean climate models on Amazon's EC2
Ratio
Tracing the German centennial flood in the stream of tweets: first lessons learned
pRPL 2.0: Improving the parallel raster processing library
Transactions in GIS
Geographic situational awareness: Mining tweets for disaster preparedness, emergency response, impact, and recovery
International Journal of Geo-Information
A data-driven framework for archiving and exploring social media data
Annals of GIS
Utilize cloud computing to support dust storm forecasting
International Journal of Digital Earth
Practical extraction of disaster-relevant information from social media
Real-time social network data mining for predicting the path for a disaster
Cited by (58)
Integrating remote sensing and geospatial big data for urban land use mapping: A review
2021, International Journal of Applied Earth Observation and GeoinformationWikiproject Tropical Cyclones: The most successful crowd-sourced knowledge project with near real-time coverage of extreme weather phenomena
2021, Weather and Climate ExtremesCitation Excerpt :However, relying simply on Wikipedia in providing crisis information to the global public may not be enough. Researchers argue that a reliable crisis response system should use combined data from Wikipedia, local government information systems and social media (Huang et al., 2017). While Wikipedia pageviews may serve as an indicator of how much people seek reliable information on a disastrous event, the virality of social media posts can be regarded as a proxy of the scale of the social activation around it (Gozzi et al., 2020).
KE-CNN: A new social sensing method for extracting geographical attributes from text semantic features and its application in Wuhan, China
2021, Computers, Environment and Urban SystemsCitation Excerpt :A social sensing method could extract reliable information from large Internet data collected from unknown and possibly unreliable sources (Wang, Abdelzaher, & Kaplan, 2015). Existing studies have demonstrated that social sensing can observe natural hazards or public health events and detect real-world events using location-based social media (LBSM) data (Arthur, Boulton, Shotton, & Williams, 2018; Huang, Cervone, & Zhang, 2017). Social sensing has also been used in disaster response and recovery such as earthquakes, (Avvenuti, Cresci, La Polla, Marchetti, & Tesconi, 2014) floods, (Arthur et al., 2018) typhoons, (Fan, Jiang, & Mostafavi, 2020; Gu et al., 2014) influenza (Allen, Tsou, Aslam, Nagel, & Gawron, 2016; Corley, Cook, Mikler, & Singh, 2010; Gao, Wang, Padmanabhan, Yin, & Cao, 2018) and urban poverty (Meng, Xing, Yuan, Wong, & Fan, 2020).
The Synergy Between Remote Sensing and Social Sensing in Urban Studies: Review and perspectives
2024, IEEE Geoscience and Remote Sensing Magazine