skip to main content
10.1145/2808797.2809390acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
short-paper

Finding Non-Redundant Multi-Word Events on Twitter

Published:25 August 2015Publication History

ABSTRACT

Twitter is a pervasive technology, with hundreds of millions of users serving as sensors that provide eyewitness accounts of events on the ground. In case of popular events, these sensors start to broadcast news by tweeting to their followers, and to the world. Within minutes these tweets can attract attention and also serve as a primary information source for traditional media. Given a huge set of tweets, the key questions are: (1) How can we detect informative events in general? (2) How can we distinguish relevant events from others? In this paper we tackle these challenges with a statistical model for detecting events by spotting significant frequency deviations of the words' frequency over time. Besides single word events, our model also accounts for events composed of multiple co-occurring words, thus, providing much richer information. Our statistical process is complemented with an optimization algorithm to extract only non-redundant events, overall, providing the user with a succinct summary of the current events. We used our model to analyze 24 million geotagged tweets that have been sent in the US from April 9 to April 22, 2013 -- the time period of the Boston marathon bombing -- and we show that our approach can create multi-word events that efficiently summarize real-world events.

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, pages 487--499, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Benhardus and J. Kalita. Streaming trend detection in twitter. International Journal of Web Based Communities, 9(1):122--139, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. M. Bishop. Pattern recognition and machine learning. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Castillo, M. Mendoza, and B. Poblete. Information credibility on twitter. In WWW, pages 675--684. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Cataldi, L. D. Caro, and C. Schifanella. Personalized emerging topic detection based on a term aging model. ACM TIST, 5(1):7, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Chung, S. Rabe-Hesketh, V. Dorie, A. Gelman, and J. Liu. A nondegenerate penalized likelihood estimator for variance parameters in multilevel models. Psychometrika, 78(4):685--709, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  7. K. S. Jones. Readings in information retrieval. Morgan Kaufmann, 1997.Google ScholarGoogle Scholar
  8. C. Li, A. Sun, and A. Datta. Twevent: segment-based event detection from tweets. In CIKM, pages 155--164. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Mendoza, B. Poblete, and C. Castillo. Twitter under crisis: Can we trust what we rt? In Workshop on Social Media Analytics, pages 71--79. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Metzler and W. B. Croft. A markov random field model for term dependencies. In ACM SIGIR, pages 472--479, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. P. Murphy. Conjugate bayesian analysis of the gaussian distribution. Technical report, University of British Columbia, 2007.Google ScholarGoogle Scholar
  12. M. Naaman, J. Boase, and C.-H. Lai. Is it really about me?: message content in social awareness streams. In ACM conference on Computer supported cooperative work, pages 189--192. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Rattenbury, N. Good, and M. Naaman. Towards automatic extraction of event and place semantics from flickr tags. In SIGIR, pages 103--110, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Ritter, Mausam, O. Etzioni, and S. Clark. Open domain event extraction from twitter. In SIGKDD, pages 1104--1112, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, pages 851--860. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Sankaranarayanan, H. Samet, B. E. Teitler, M. D. Lieberman, and J. Sperling. Twitterstand: news in tweets. In SIGSPATIAL/GIS, pages 42--51. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Tsukayama. Twitter turns 7: Users send over 400 million tweets per day, March 2013.Google ScholarGoogle Scholar
  1. Finding Non-Redundant Multi-Word Events on Twitter

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ASONAM '15: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015
          August 2015
          835 pages
          ISBN:9781450338547
          DOI:10.1145/2808797

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 August 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper
          • Research
          • Refereed limited

          Acceptance Rates

          Overall Acceptance Rate116of549submissions,21%

          Upcoming Conference

          KDD '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader