Abstract
Online search engine has been widely regarded as the most convenient approach for information acquisition. Indeed, the intensive information-seeking behaviors of search engine users make it possible to exploit search engine queries as effective “crowd sensors” for event monitoring. While some researchers have investigated the feasibility of using search engine queries for coarse-grained event analysis, the capability of search engine queries for real-time event detection has been largely neglected. To this end, in this article, we introduce a large-scale and systematic study on exploiting real-time search engine queries for outbreak event detection, with a focus on earthquake rapid reporting. In particular, we propose a realistic system of real-time earthquake detection through monitoring millions of queries related to earthquakes from a dominant online search engine in China. Specifically, we first investigate a large set of queries for selecting the representative queries that are highly correlated with the outbreak of earthquakes. Then, based on the real-time streams of selected queries, we design a novel machine learning–enhanced two-stage burst detection approach for detecting earthquake events. Meanwhile, the location of an earthquake epicenter can be accurately estimated based on the spatial-temporal distribution of search engine queries. Finally, through the extensive comparison with earthquake catalogs from China Earthquake Networks Center, 2015, the detection precision of our system can achieve 87.9%, and the accuracy of location estimation (province level) is 95.7%. In particular, 50% of successfully detected results can be found within 62 s after earthquake, and 50% of successful locations can be found within 25.5 km of seismic epicenter. Our system also found more than 23.3% extra earthquakes that were felt by people but not publicly released, 12.1% earthquake-like special outbreaks, and meanwhile, revealed many interesting findings, such as the typical query patterns of earthquake rumor and regular memorial events. Based on these results, our system can timely feed back information to the search engine users according to various cases and accelerate the information release of felt earthquakes.
- Google Play. 2019. LastQuake APP. Retrieved from https://play.google.com/store/apps/details?id=org.emsc_csem.lastquake.Google Scholar
- Twitter. 2019. Twitter lastquake. Retrieved from https://twitter.com/lastquake.Google Scholar
- Jubran Akram, Daniel Peter, and David Eaton. 2019. A k-mean characteristic function for optimizing STA/LTA-based detection of microseismic events. Geophysics 84, 4 (2019), KS143--KS153.Google Scholar
- Jaime Arguello, Bogeum Choi, and Robert Capra. 2018. Factors influencing users’ information requests: Medium, target, and extra-topical dimension. ACM Trans. Info. Syst. 36, 4 (2018), 1–37.Google ScholarDigital Library
- Farzindar Atefeh and Wael Khreich. 2015. A survey of techniques for event detection in Twitter. Comput. Intell. 31, 1 (2015), 132–164.Google ScholarDigital Library
- Gail M. Atkinson and David J. Wald. 2007. “Did You Feel It?” intensity data: A surprisingly good measure of earthquake ground motion. Seismol. Res. Lett. 78, 3 (2007), 362–368.Google ScholarCross Ref
- Marco Avvenuti, Stefano Cresci, Mariantonietta Noemi La Polla, Andrea Marchetti, and Maurizio Tesconi. 2014. Earthquake emergency management by social sensing. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom’14). 587–592.Google ScholarCross Ref
- Marco Avvenuti, Stefano Cresci, Andrea Marchetti, Carlo Meletti, and Maurizio Tesconi. 2014. EARS (earthquake alert and report system): A real-time decision support system for earthquake crisis management. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1749–1758.Google ScholarDigital Library
- John W. Ayers, Kurt M. Ribisl, and John S. Brownstein. 2011. Tracking the rise in popularity of electronic nicotine delivery systems (electronic cigarettes) using search query surveillance. Amer. J. Prevent. Med. 40, 4 (2011), 448–453.Google ScholarCross Ref
- Baidu. 2020. Baidu News. Retrieved from https://news.baidu.com/.Google Scholar
- Rodger Benham, Joel Mackenzie, Alistair Moffat, and J. Shane Culpepper. 2019. Boosting search performance using query variations. ACM Trans. Info. Syst. 37, 4 (2019), 1–25.Google ScholarDigital Library
- Ilaria Bordino, Stefano Battiston, Guido Caldarelli, Matthieu Cristelli, Antti Ukkonen, and Ingmar Weber. 2012. Web search queries can predict stock market volumes. PLoS ONE 7, 7 (2012), e40014.Google ScholarCross Ref
- Rémy Bossu, Frédéric Roussel, Laure Fallou, Matthieu Landès, Robert Steed, Gilles Mazet-Roux, Aurélien Dupont, Laurent Frobert, and Laura Petersen. 2018. LastQuake: From rapid information to global seismic risk reduction. Int. J. Disaster Risk Reduct. 28 (2018), 32–42.Google ScholarCross Ref
- Rémy Bossu, Robert Steed, Fréderic Roussel, Matthieu Landès, Amaya Fuenzalida, Emanuela Matrullo, Aurélien Dupont, Julien Roch, and Laure Fallou. 2019. App earthquake detection and automatic mapping of felt area. Seismol. Res. Lett. 90, 1 (2019), 305–312.Google ScholarCross Ref
- Fei Cai, Maarten De Rijke et al. 2016. A survey of query auto completion in information retrieval. Found. Trends Info. Retriev. 10, 4 (2016), 273–363.Google ScholarDigital Library
- China Internet Network Information Center (CINIC). 2017. China statistical report on Internet development. (2017).Google Scholar
- Emily H. Chan, Vikram Sahai, Corrie Conrad, and John S. Brownstein. 2011. Using web search query data to monitor dengue epidemics: A new model for neglected tropical disease surveillance. PLoS Neglect. Trop. Diseases 5, 5 (2011), e1206.Google ScholarCross Ref
- CNNIC, BG. 2019. The 44th China statistical report on internet development. (2019).Google Scholar
- Zhicheng Cui, Wenlin Chen, and Yixin Chen. 2016. Multi-scale convolutional neural networks for time series classification. Retrieved from https://arXiv:1603.06995.Google Scholar
- Zihang Dai, Zhilin Yang, Yiming Yang, William W Cohen, Jaime Carbonell, Quoc V Le, and Ruslan Salakhutdinov. 2019. Transformer-xl: Attentive language models beyond a fixed-length context. Retrieved from https://arXiv:1901.02860.Google Scholar
- Dazhong Shen, Qi Zhang, Tong Xu, Hengshu Zhu, Wenjia Zhao, Zikai Yin, Peilun Zhou, Lihua Fang, Enhong Chen, and Hui Xiong. 2019. A Machine Learning-enhanced Robust P-Phase Picker for Real-time Seismic Monitoring. Retrieved from https://arXiv:1911.09275.Google Scholar
- Qianjin Du, Weixi Gu, Lin Zhang, and Shao-Lun Huang. 2018. Attention-based LSTM-CNNs for time-series classification. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 410–411.Google ScholarDigital Library
- Damian R. Eads, Daniel Hill, Sean Davis, Simon J. Perkins, Junshui Ma, Reid B. Porter, and James P. Theiler. 2002. Genetic algorithms and support vector machines for time series classification. In Applications and Science of Neural Networks, Fuzzy Systems, and Evolutionary Computation V, Vol. 4787. International Society for Optics and Photonics, 74–85.Google Scholar
- Paul S. Earle, Daniel C. Bowden, and Michelle Guy. 2012. Twitter earthquake detection: Earthquake monitoring in a social world. Ann. Geophy. 54, 6 (2012).Google Scholar
- Ryohei Ebina, Kenji Nakamura, and Shigeru Oyanagi. 2011. A real-time burst detection method. In Proceedings of the IEEE 23rd International Conference on Tools with Artificial Intelligence. IEEE, 1040–1046.Google ScholarDigital Library
- Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. 2019. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 33, 4 (2019), 917–963.Google ScholarDigital Library
- Thomas Fischer and Christopher Krauss. 2018. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Operat. Res. 270, 2 (2018), 654–669.Google ScholarCross Ref
- Jazmine A. Maldonado Flores, Jheser Guzman, and Barbara Poblete. 2017. A lightweight and real-time worldwide earthquake detection and monitoring system based on citizen sensors. In Proceedings of the Conference on Human Computation and Crowdsourcing (HCOMP’17). 137–146.Google Scholar
- Kurt Frieden, Don L. Hayler, Michael Richards, and Vasif Shaikh. 2017. Autocomplete searching with security filtering and ranking. U.S. Patent No. 9,613,165.Google Scholar
- Felix A. Gers, Douglas Eck, and Jürgen Schmidhuber. 2002. Applying LSTM to time series predictable through time-window approaches. In Neural Nets WIRN Vietri-01. Springer, 193–200.Google ScholarDigital Library
- Jeremy Ginsberg, Matthew H. Mohebbi, Rajan S. Patel, Lynnette Brammer, Mark S. Smolinski, and Larry Brilliant. 2009. Detecting influenza epidemics using search engine query data. Nature 457, 7232 (2009), 1012.Google Scholar
- Google. 2020. Google News. Retrieved from https://news.google.com/.Google Scholar
- Mahmud Hasan, Mehmet A. Orgun, and Rolf Schwitter. 2018. A survey on real-time event detection from the Twitter data stream. J. Info. Sci. 44, 4 (2018), 443--463.Google ScholarDigital Library
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.Google ScholarDigital Library
- Fazle Karim, Somshubra Majumdar, Houshang Darabi, and Samuel Harford. 2019. Multivariate lstm-fcns for time series classification. Neural Netw. 116 (2019), 237–245.Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. Retrieved from https://arXiv:1412.6980.Google Scholar
- Qingkai Kong, Richard M. Allen, Louis Schreier, and Young-Woo Kwon. 2016. MyShake: A smartphone seismic network for earthquake early warning and beyond. Sci. Adv. 2, 2 (2016), e1501055.Google ScholarCross Ref
- Zhihao Li, Tao Liu, Guanghu Zhu, Hualiang Lin, Yonghui Zhang, Jianfeng He, Aiping Deng, Zhiqiang Peng, Jianpeng Xiao, Shannon Rutherford et al. 2017. Dengue Baidu search index data can improve the prediction of local dengue epidemic: A case study in Guangzhou, China. PLOS Neglect. Trop. Diseases 11, 3 (2017), e0005354.Google ScholarCross Ref
- Andy Liaw, Matthew Wiener et al. 2002. Classification and regression by randomForest. R News 2, 3 (2002), 18–22.Google Scholar
- Zachary C. Lipton, David C. Kale, Charles Elkan, and Randall Wetzel. 2015. Learning to diagnose with LSTM recurrent neural networks. Retrieved from https://arXiv:1511.03677.Google Scholar
- Hao Liu, Jindong Han, Yanjie Fu, Jingbo Zhou, Xinjiang Lu, and Hui Xiong. 2021. Multi-modal transportation recommendation with unified route representation learning. Proc. VLDB Endow. 14, 3 (2021), 342–350.Google Scholar
- Hao Liu, Yongxin Tong, Jindong Han, Panpan Zhang, Xinjiang Lu, and Hui Xiong. 2020. Incorporating multi-source urban data for personalized and context-aware multi-modal transportation recommendation. IEEE Trans. Knowl. Data Eng. (2020).Google ScholarDigital Library
- Yiqun Liu, Junqi Zhang, Jiaxin Mao, Min Zhang, Shaoping Ma, Qi Tian, Yanxiong Lu, and Leyu Lin. 2019. Search result reranking with visual and structure information sources. ACM Trans. Info. Syst. 37, 3 (2019), 1–38.Google ScholarDigital Library
- Anthony Lomax, Claudio Satriano, and Maurizio Vassallo. 2012. Automatic picker developments and optimization: FilterPicker—A robust, broadband picker for real-time seismic monitoring and earthquake early warning. Seismol. Res. Lett. 83, 3 (2012), 531–540.Google ScholarCross Ref
- Claudio Lucchese, Salvatore Orlando, Raffaele Perego, Fabrizio Silvestri, and Gabriele Tolomei. 2013. Discovering tasks from search engine query logs. ACM Trans. Info. Syst. 31, 3 (2013), 1–43.Google ScholarDigital Library
- Jiaxin Mao, Yiqun Liu, Noriko Kando, Min Zhang, and Shaoping Ma. 2018. How does domain expertise affect users’ search interaction and outcome in exploratory search?ACM Trans. Info. Syst. 36, 4 (2018), 1–30.Google ScholarDigital Library
- Stuart E. Middleton, Giorgos Kordopatis-Zilos, Symeon Papadopoulos, and Yiannis Kompatsiaris. 2018. Location extraction from social media: Geoparsing, location disambiguation, and geotagging. ACM Trans. Info. Syst. 36, 4 (2018), 1–27.Google ScholarDigital Library
- Robert Munro. 2013. Crowdsourcing and the crisis-affected community. Info. Retriev. 16, 2 (2013), 210–266.Google ScholarDigital Library
- Masafumi Nakano, Akihiko Takahashi, and Soichiro Takahashi. 2017. Generalized exponential moving average (EMA) model with particle filtering and anomaly detection. Expert Syst. Appl. 73 (2017), 187–200.Google ScholarDigital Library
- NetEase News. 2019. Shawan earthquake. Retrieved from https://baike.baidu.com/reference/19139644/ec79T6Wwg0ZeCBLTacQWSdqPomGrBPVur_LLI_Z6JI_ML7YsTkk5rjYAq5Fdr6RlUNJ1S3xY1s1r_f06lEoCNejonl-07FCaTVjICLzzwXVf.Google Scholar
- Ruben E. Ortega, John W. Avery, and Robert Frederick. 2003. Search query autocompletion. U.S. Patent No. 6,564,213.Google Scholar
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 (2011), 2825–2830.Google ScholarDigital Library
- Thibaut Perol, Michaël Gharbi, and Marine Denolle. 2018. Convolutional neural network for earthquake detection and location. Sci. Adv. 4, 2 (2018), e1700578.Google Scholar
- Barbara Poblete, Jheser Guzmán, Jazmine Maldonado, and Felipe Tobar. 2018. Robust detection of extreme events using Twitter: Worldwide earthquake monitoring. IEEE Trans. Multimedia 20, 10 (2018), 2551–2561.Google ScholarCross Ref
- Robert Power, Bella Robinson, and Adrienne Moseley. 2016. Comparing felt reports and tweets about earthquakes. In Proceedings of the 3rd International Conference on Information and Communication Technologies for Disaster Management (ICT-DM’16). IEEE, 1–8.Google ScholarCross Ref
- Yanxia Qin, Yue Zhang, Min Zhang, and Dequan Zheng. 2018. Frame-based representation for event detection on Twitter. IEICE Trans. Info. Syst. 101, 4 (2018), 1180–1188.Google ScholarCross Ref
- Juan Ramos et al. 2003. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, Vol. 242. 133–142.Google Scholar
- Pengjie Ren, Zhumin Chen, Zhaochun Ren, Furu Wei, Liqiang Nie, Jun Ma, and Maarten De Rijke. 2018. Sentence relations for extractive summarization with deep neural networks. ACM Trans. Info. Syst. 36, 4 (2018), 1–32.Google ScholarDigital Library
- Bella Robinson, Robert Power, and Mark Cameron. 2013. A sensitive Twitter earthquake detector. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 999–1002.Google ScholarDigital Library
- Alberto Rosi, Marco Mamei, Franco Zambonelli, Simon Dobson, Graeme Stevenson, Juan Ye et al. 2011. Social sensors and pervasive services: Approaches and perspectives. In Proceedings of the IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom’11). 525–530.Google ScholarCross Ref
- Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2010. Earthquake shakes Twitter users: Real-time event detection by social sensors. In Proceedings of the 19th International Conference on World Wide Web. ACM, 851–860.Google ScholarDigital Library
- Takeshi Sakaki, Makoto Okazaki, and Yutaka Matsuo. 2013. Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans. Knowl. Data Eng. 25, 4 (2013), 919–931.Google ScholarDigital Library
- Dazhong Shen, Qi Zhang, Tong Xu, Hengshu Zhu, Wenjia Zhao, Zikai Yin, Peilun Zhou, Lihua Fang, Enhong Chen, and Hui Xiong. 2019. Machine learning-enhanced realistic framework for real-time seismic monitoring—The winning solution of the 2017 international aftershock detection contest. Retrieved from https://arXiv:1911.09275.Google Scholar
- Huan Song, Deepta Rajan, Jayaraman J. Thiagarajan, and Andreas Spanias. 2018. Attend and diagnose: Clinical time series analysis using attention models. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Google Scholar
- Robert J. Steed, Amaya Fuenzalida, Rémy Bossu, István Bondár, Andres Heinloo, Aurelien Dupont, Joachim Saul, and Angelo Strollo. 2019. Crowdsourcing triggers rapid, reliable earthquake locations. Sci. Adv. 5, 4 (2019), eaau9824.Google Scholar
- Jennifer A. Strauss and Richard M. Allen. 2016. Benefits and costs of earthquake early warning. Seismol. Res. Lett. 87, 3 (2016), 765–772.Google ScholarCross Ref
- Ying Sun, Hengshu Zhu, Fuzhen Zhuang, Jingjing Gu, and Qing He. 2018. Exploring the urban region-of-interest through the analysis of online map search queries. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2269–2278.Google ScholarDigital Library
- Peter Sykacek and Stephen J. Roberts. 2002. Bayesian time series classification. In Advances in Neural Information Processing Systems. MIT Press, 937–944.Google Scholar
- Paul Thomas, Bodo Billerbeck, Nick Craswell, and Ryen W. White. 2019. Investigating searchers’ mental models to inform search explanations. ACM Trans. Info. Syst. 38, 1 (2019), 1–25.Google ScholarDigital Library
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. MIT Press, 5998–6008.Google Scholar
- Antanas Verikas, Adas Gelzinis, and Marija Bacauskiene. 2011. Mining data with random forests: A survey and results of new tests. Pattern Recogn. 44, 2 (2011), 330–349.Google ScholarDigital Library
- David Vise. 2007. The Google story. Strat. Direct. 23, 10 (2007).Google Scholar
- Li Wei, Nitin Kumar, Venkata Nishanth Lolla, Eamonn J. Keogh, Stefano Lonardi, and Chotirat (Ann) Ratanamahatana. 2005. Assumption-free anomaly detection in time series. In Proceedings of the (SSDBM’05), Vol. 5. 237–242.Google Scholar
- Qingyu Yuan, Elaine O. Nsoesie, Benfu Lv, Geng Peng, Rumi Chunara, and John S. Brownstein. 2013. Monitoring influenza epidemics in china with search query from baidu. PLoS ONE 8, 5 (2013), e64323.Google ScholarCross Ref
- Dongxiang Zhang, Liqiang Nie, Huanbo Luan, Kian-Lee Tan, Tat-Seng Chua, and Heng Tao Shen. 2017. Compact indexing and judicious searching for billion-scale microblog retrieval. ACM Trans. Info. Syst. 35, 3 (2017), 1–24.Google ScholarDigital Library
- Qi Zhang, Tong Xu, Hengshu Zhu, Lifu Zhang, Hui Xiong, Enhong Chen, and Qi Liu. 2019. Aftershock detection with multi-scale description-based neural network. In Proceedings of the IEEE International Conference on Data Mining (ICDM’19). IEEE, 886–895.Google ScholarCross Ref
- Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, and Dongya Wu. 2017. Convolutional neural networks for time series classification. J. Syst. Eng. Electron. 28, 1 (2017), 162–169.Google ScholarCross Ref
- Yi Zheng, Qi Liu, Enhong Chen, Yong Ge, and J. Leon Zhao. 2014. Time series classification using multi-channels deep convolutional neural networks. In Proceedings of the International Conference on Web-Age Information Management. Springer, 298–310.Google Scholar
- Hengshu Zhu, Ying Sun, Wenjia Zhao, Fuzhen Zhuang, Baoshan Wang, and Hui Xiong. 2020. Rapid learning of earthquake felt area and intensity distribution with real-time search engine queries. Sci. Rep. 10, 1 (2020), 1–9.Google Scholar
- Hengshu Zhu, Hui Xiong, Fangshuang Tang, Qi Liu, Yong Ge, Enhong Chen, and Yanjie Fu. 2016. Days on market: Measuring liquidity in real estate markets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 393–402.Google ScholarDigital Library
Index Terms
- Exploiting Real-time Search Engine Queries for Earthquake Detection: A Summary of Results
Recommendations
Overviewing the Knowledge of a Query Keyword by Clustering Viewpoints of Web Search Information Needs
WAINA '15: Proceedings of the 2015 IEEE 29th International Conference on Advanced Information Networking and Applications WorkshopsIn this paper, we address the issue of how to overview the knowledge of a given query keyword. We especially focus on concerns of those who search for Web pages with a given query keyword, and study how to efficiently overview the whole list of Web ...
Discovering search engine related queries using association rules
This work presents a method for online generation of query related suggestions for a Web search engine. The method uses association rules to extract related queries from the log of sbumitted queries to the search engine. Experimental results were ...
Comments