skip to main content
research-article

PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

Published:16 August 2016Publication History
Skip Abstract Section

Abstract

The web is the largest bulletin board of the world. Events of all types, from flight arrivals to business meetings, are announced on this board. Tracking and reacting to such event announcements, however, is a tedious manual task, only slightly alleviated by email or similar notifications. Announcements are published with human readers in mind, and updates or delayed announcements are frequent. These characteristics have hampered attempts at automatic tracking.

PeaCE provides the first integrated framework for event processing on top of web event ads, consisting of event extraction, complex event processing, and action execution in response to these events. Given a schema of the events to be tracked, the framework populates this schema by extracting events from announcement sources. This extraction is performed by little programs called wrappers that produce the events including updates and retractions. PeaCE then queries these events to detect complex events, often combining announcements from multiple sources. To deal with updates and delayed announcements, PeaCE’s schemas are bitemporal, to distinguish between occurrence and detection time. This allows complex event specifications to track updates and to react upon differences in occurrence and detection time. In case of new, changing, or deleted events, PeaCE allows one to execute actions, such as tweeting or sending out email notifications. Actions are typically specified as web interactions, for example, to fill and submit a form with attributes of the triggering event.

Our evaluation shows that PeaCE’s processing is dominated by the time needed for accessing the web to extract events and perform actions, allotting to 97.4%. Thus, PeaCE requires only 2.6% overhead, and therefore, the complex event processor scales well even with moderate resources. We further show that simple and reasonable restrictions on complex event specifications and the timing of constituent events suffice to guarantee that PeaCE only requires a constant buffer to process arbitrarily many event announcements.

References

  1. Daniel J. Abadi, Yanif Ahmad, Magdalena Balazinska, Ugur Çetintemel, Mitch Cherniack, Jeong-Hyon Hwang, Wolfgang Lindner, Anurag Maskey, Alex Rasin, Esther Ryvkina, Nesime Tatbul, Ying Xing, and Stanley B. Zdonik. 2005. The design of the borealis stream processing engine. In 2nd Biennial Conference on Innovative Data Systems Research (CIDR’05). 277--289.Google ScholarGoogle Scholar
  2. Daniel J. Abadi, Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stanley B. Zdonik. 2003. Aurora: A new model and architecture for data stream management. VLDB Journal 12, 2 (2003), 120--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Raman Adaikkalavan and Sharma Chakravarthy. 2006. SnoopIB: Interval-based event specification and detection for active databases. Data & Knowledge Engineering 59, 1 (2006), 139--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Asaf Adi and Opher Etzion. 2004. Amit - the situation manager. VLDB Journal 13, 2 (2004), 177--203. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mohamed H. Ali, Badrish Chandramouli, Jonathan Goldstein, and Roman Schindlauer. 2011. The extensibility framework in Microsoft StreamInsight. In Proceedings of the 27th International Conference on Data Engineering (ICDE’11). 1242--1253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Mohamed H. Ali, Badrish Chandramouli, Balan Sethu Raman, and Ed Katibah. 2010. Spatio-temporal stream processing in Microsoft StreamInsight. IEEE Data Engineering Bulletin 33, 2 (2010), 69--74.Google ScholarGoogle Scholar
  7. Mohamed H. Ali, Ciprian Gerea, Balan Sethu Raman, Beysim Sezgin, Tiho Tarnavski, Tomer Verona, Ping Wang, Peter Zabback, Anton Kirilov, Asvin Ananthanarayan, Ming Lu, Alex Raizman, Ramkumar Krishnan, Roman Schindlauer, Torsten Grabs, Sharon Bjeletich, Badrish Chandramouli, Jonathan Goldstein, Sudin Bhat, Ying Li, Vincenzo Di Nicola, Xianfang Wang, David Maier, Ivo Santos, Olivier Nano, and Stephan Grell. 2009. Microsoft CEP server and online behavioral targeting. Proceedings of the VLDB Endowment (PVLDB) 2, 2 (2009), 1558--1561. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Rajeev Motwani, Itaru Nishizawa, Utkarsh Srivastava, Dilys Thomas, Rohit Varma, and Jennifer Widom. 2003. STREAM: The stanford stream data manager. IEEE Data Engineering Bulletin 26, 1 (2003), 19--26.Google ScholarGoogle Scholar
  9. Shivnath Babu and Jennifer Widom. 2001. Continuous queries over data streams. SIGMOD Record 30, 3 (2001), 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Yijian Bai, Hetal Thakkar, Haixun Wang, Chang Luo, and Carlo Zaniolo. 2006. A data stream language and system designed for power and extensibility. In Proceedings of the 2006 ACM CIKM International Conference on Information and Knowledge Management (CIKM’06). 337--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Roger S. Barga, Jonathan Goldstein, Mohamed H. Ali, and Mingsheng Hong. 2007. Consistent streaming through time: A vision for event stream processing. In 3rd Biennial Conference on Innovative Data Systems Research (CIDR’07). 363--374.Google ScholarGoogle Scholar
  12. Alexander Boettcher and Dongman Lee. 2012. EventRadar: A real-time local event detection scheme using twitter stream. In 2012 IEEE International Conference on Green Computing and Communications (GreenCom’12). 358--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alejandro P. Buchmann, Jürgen Zimmermann, José A. Blakeley, and David L. Wells. 1995. Building an integrated active OODBMS: Requirements, architecture, and design decisions. In Proceedings of the Eleventh International Conference on Data Engineering (ICDE’95). 117--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Sharma Chakravarthy and D. Mishra. 1994. Snoop: An expressive event specification language for active databases. Data & Knowledge Engineering 14, 1 (1994), 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel Madden, Vijayshankar Raman, Frederick Reiss, and Mehul A. Shah. 2003. TelegraphCQ: Continuous dataflow processing for an uncertain world. In 1st Biennial Conference on Innovative Data Systems Research (CIDR’03).Google ScholarGoogle Scholar
  16. Gianpaolo Cugola and Alessandro Margara. 2009. RACED: An adaptive middleware for complex event detection. In Proceedings of the 8th Workshop on Adaptive and Reflective Middleware (ARM’09). 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Gianpaolo Cugola and Alessandro Margara. 2010. TESLA: A formally defined event specification language. In Proceedings of the 4th ACM International Conference on Distributed Event-Based Systems. 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of information: From data stream to complex event processing. Computing Surveys 44, 3 (2012), 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Umeshwar Dayal, Barbara T. Blaustein, Alejandro P. Buchmann, Upen S. Chakravarthy, Meichun Hsu, R. Ledin, Dennis R. McCarthy, Arnon Rosenthal, Sunil K. Sarin, Michael J. Carey, Miron Livny, and Rajiv Jauhari. 1988. The HiPAC project: Combining active databases and timing constraints. SIGMOD Record 17, 1 (1988), 51--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alan J. Demers, Johannes Gehrke, Biswanath Panda, Mirek Riedewald, Varun Sharma, and Walker M. White. 2007. Cayuga: A general purpose event monitoring system. In 3rd Biennial Conference on Innovative Data Systems Research (CIDR’07). 412--422.Google ScholarGoogle Scholar
  21. EsperTech. 2013. Event Processing with Esper and NEsper. (2013). http://esper.codehaus.org/ Last accessed 11/2013.Google ScholarGoogle Scholar
  22. Tim Furche, Georg Gottlob, Giovanni Grasso, Xiaonan Guo, Giorgio Orsi, Christian Schallhart, and Cheng Wang. 2014. DIADEM: Thousands of websites to a single database. Proceedings of the VLDB Endowment/International Conference on Very Large Databases (PVLDB) 7, 14 (2014), 1845--1856. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tim Furche, Georg Gottlob, Giovanni Grasso, Christian Schallhart, and Andrew Jon Sellers. 2013a. OXPath: A language for scalable data extraction, automation, and crawling on the deep web. VLDB Journal 22, 1 (2013), 47--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Tim Furche, Giovanni Grasso, Michael Huemer, Christian Schallhart, and Michael Schrefl. 2013b. Bitemporal complex event processing of web event advertisements. In 14th International Conference on Web Information Systems Engineering (WISE’13). 333--346.Google ScholarGoogle ScholarCross RefCross Ref
  25. Antony Galton and Juan Carlos Augusto. 2002. Two approaches to event definition. In 13th International Conference on Database and Expert Systems Applications (DEXA’02). 547--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Stella Gatziu and Klaus R. Dittrich. 1993. Events in an active object-oriented database system. In Proceedings of the 1st International Workshop on Rules in Database Systems (RIDS’93). 23--39.Google ScholarGoogle Scholar
  27. Narain H. Gehani and H. V. Jagadish. 1991. Ode as an active database: Constraints and triggers. In Proceedings of the 17th International Conference on Very Large Data Bases (VLDB’91). 327--336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jonathan Goldstein, Mingsheng Hong, Mohamed Ali, and Roger Barga. 2007. Consistency Sensitive Streaming Operators in CEDR. (2007). http://research.microsoft.com/pubs/70517/tr-2007-158.pdf, Technical Report, MSR-TR-2007-158, Microsoft Research, Dec. 2007.Google ScholarGoogle Scholar
  29. IBM. 2013. InfoSphere Streams. (2013). http://www-03.ibm.com/software/products/en/infosphere-streams/.Google ScholarGoogle Scholar
  30. Elena Ilina, Claudia Hauff, Ilknur Celik, Fabian Abel, and Geert-Jan Houben. 2012. Social event detection on twitter. In Proceedings of the 12th International Conference on Web Engineering (ICWE’12). 169--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hans-Arno Jacobsen, Alex King Yeung Cheung, Guoli Li, Balasubramaneyam Maniymaran, Vinod Muthusamy, and Reza Sherafat Kazemzadeh. 2010. The PADRES publish/subscribe system. In Principles and Applications of Distributed Event-Based Systems. 164--205.Google ScholarGoogle Scholar
  32. Robert A. Kowalski and Marek J. Sergot. 1986. A logic-based calculus of events. New Generation Computing 4, 1 (1986), 67--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jochen Kranzdorf, Andrew Jon Sellers, Giovanni Grasso, Christian Schallhart, and Tim Furche. 2012. Visual OXPath: Robust wrapping by example. In Proceedings of the 21st World Wide Web Conference (WWW’12). 369--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Guoli Li and Hans-Arno Jacobsen. 2005. Composite subscriptions in content-based publish/subscribe systems. In Proceedings of the 6th International Middleware Conference (Middleware’06). 249--269. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ming Li, Mo Liu, Luping Ding, Elke A. Rundensteiner, and Murali Mani. 2007. Event stream processing with out-of-order data arrival. In 27th International Conference on Distributed Computing Systems Workshops (ICDCS’07 Workshops). 67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel F. Lieuwen, Narain H. Gehani, and Robert M. Arlein. 1996. The ode active database: Trigger semantics and implementation. In Proceedings of the 12th International Conference on Data Engineering (ICDE’96). 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. David C. Luckham. 1998. Rapide: A language and toolset for causal event modeling of distributed system architectures. In Proceedings of the 2nd International Conference on Worldwide Computing and Its Applications (WWCA’98). 88--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. David C. Luckham. 2006. What’s the Difference Between ESP and CEP? (2006). http://www.complexevents. com/2006/08/01/whatE28099s-the-difference-between-esp-and-cep/.Google ScholarGoogle Scholar
  39. David C. Luckham and James Vera. 1995. An event-based architecture definition language. IEEE Transactions on Software Engineering 21, 9 (1995), 717--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Masoud Mansouri-Samani and Morris Sloman. 1997. GEM: A generalized event monitoring language for distributed systems. Distributed Systems Engineering 4, 2 (1997), 96--108.Google ScholarGoogle ScholarCross RefCross Ref
  41. Carlos A. Mareco and Leopoldo E. Bertossi. 1999. Specification and implementation of temporal databases in a bitemporal event calculus. In Proceedings of the 1st International Workshop on Evolution and Change in Data Management (ECDM’99). 74--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Dennis R. McCarthy and Umeshwar Dayal. 1989. The architecture of an active data base management system. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data (ACM SIGMOD Conference’89). 215--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Oracle. 2013. Oracle CEP. (2013). http://docs.oracle.com/cd/E16764_01/doc.1111/e14476/overview.htm.Google ScholarGoogle Scholar
  44. Norman W. Paton and Oscar Díaz. 1999. Active database systems. Computing Surveys 31, 1 (1999), 63--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Peter R. Pietzuch, Brian Shand, and Jean Bacon. 2003. A framework for event composition in distributed systems. In Proceedings of the International Middleware Conference (Middleware’03). 62--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Esther Ryvkina, Anurag Maskey, Mitch Cherniack, and Stanley B. Zdonik. 2006. Revision processing in a stream processing engine: A high-level design. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). 141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Nicholas Poul Schultz-Møller, Matteo Migliavacca, and Peter R. Pietzuch. 2009. Distributed complex event processing with query rewriting. In Proceedings of the 3rd ACM International Conference on Distributed Event-Based Systems (DEBS’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Richard T. Snodgrass and Ilsoo Ahn. 1986. Temporal databases. IEEE Computer 19, 9 (1986), 35--42. DOI:http://dx.doi.org/10.1109/MC.1986.1663327 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Suryanarayana M. Sripada. 1988. A logical framework for temporal deductive databases. In Proceedings of the 14th International Conference on Very Large Data Bases (VLDB’88). 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Utkarsh Srivastava and Jennifer Widom. 2004. Flexible time management in data stream systems. In Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS’04). 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Sybase. 2012. Sybase Event Stream Processor 5.0. (2012). http://infocenter.sybase.com/help/topic/com.sybase. infocenter.dc01612.0500/doc/pdf/ccl_programmers.pdf.Google ScholarGoogle Scholar
  52. Douglas B. Terry, David Goldberg, David A. Nichols, and Brian M. Oki. 1992. Continuous queries over append-only databases. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data (ACM SIGMOD Conference’92). 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. TIBCO. 2013a. BusinessEvents. (2013). http://www.tibco.com/products/event-processing/complex-event-processing/businessevents/default.jsp.Google ScholarGoogle Scholar
  54. TIBCO. 2013b. Tibco StreamBase. (2013). http://www.streambase.com.Google ScholarGoogle Scholar
  55. Walker M. White, Mirek Riedewald, Johannes Gehrke, and Alan J. Demers. 2007. What is “next” in event processing? In Proceedings of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. 263--272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Roel Wieringa. 2003. Design Methods for Reactive Systems - Yourdon, Statemate, and the UML. Morgan Kaufmann. I--XXV, 1--456 pages.Google ScholarGoogle Scholar
  57. Eugene Wu, Yanlei Diao, and Shariq Rizvi. 2006. High-performance complex event processing over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data (ACM SIGMOD Conference’06). 407--418. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. PeaCE-Ful Web Event Extraction and Processing as Bitemporal Mutable Events

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on the Web
      ACM Transactions on the Web  Volume 10, Issue 3
      August 2016
      201 pages
      ISSN:1559-1131
      EISSN:1559-114X
      DOI:10.1145/2988335
      Issue’s Table of Contents

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 August 2016
      • Accepted: 1 April 2016
      • Revised: 1 March 2016
      • Received: 1 September 2014
      Published in tweb Volume 10, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader