skip to main content
research-article

Fast, Scalable, and Context-Sensitive Detection of Trending Topics in Microblog Post Streams

Published:01 January 2013Publication History
Skip Abstract Section

Abstract

Social networks, such as Twitter, can quickly and broadly disseminate news and memes across both real-world events and cultural trends. Such networks are often the best sources of up-to-the-minute information, and are therefore of considerable commercial and consumer interest. The trending topics that appear first on these networks represent an answer to the age-old query “what are people talking about?” Given the incredible volume of posts (on the order of 45,000 or more per minute), and the vast number of stories about which users are posting at any given time, it is a formidable problem to extract trending stories in real time. In this article, we describe a method and implementation for extracting trending topics from a high-velocity real-time stream of microblog posts. We describe our approach and implementation, and a set of experimental results that show that our system can accurately find “hot” stories from high-rate Twitter-scale text streams.

References

  1. Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. In Proceedings of the 20th Very Large Database Conference. 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Asur, S. and Huberman, B. A. 2010. Predicting the future with social media. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). 492--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Benhardus, J. 2010. Streaming trend detection in Twitter. UCCS REU For Artificial Intelligence, Natural Language Processing And Information Retrieval Final Report, 1--7.Google ScholarGoogle Scholar
  4. Bollen, J., Mao, H., and Pepe, A. 2011. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In Proceedings of the International Conference on Weblogs and Social Media.Google ScholarGoogle Scholar
  5. Broadwell, P. M. 2004. Response time as a performability metric for online services. Tech. rep. UCB/CSD-04-1324, EECS Department, University of California, Berkeley.Google ScholarGoogle Scholar
  6. Cataldi, M., Di Caro, L., and Schifanella, C. 2010. Emerging topic detection on Twitter based on temporal and social terms evaluation. In Proceedings of the 10th International Workshop on Multimedia Data Mining. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chang, J. H. and Lee, W. S. 2003. Finding recent frequent itemsets adaptively over online data streams. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 487--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chang, J. H. and Lee, W. S. 2004. A sliding window method for finding recently frequent itemsets over online data streams. J. Inform. Sci. Eng. 20, 4, 753--762.Google ScholarGoogle Scholar
  9. Chi, Y., Wang, H., Yu, P. S., and Muntz, R. R. 2004. Moment: Maintaining closed frequent itemsets over a stream sliding window. In Proceedings of the IEEE International Conference on Data Mining (CDM). 59--66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Forrester. 2012. Forrester ecommerce study: The two-second rule is critical. http://colderice.com/forrester-ecommerce-study-the-2-second-rule-is-critical/.Google ScholarGoogle Scholar
  11. Giannella, C., Han, J., Pei, J., Yan, X., and Yu, P. S. 2004. Mining frequent patterns in data stream at multiple time granularities. In Next Generation Data Mining, H. Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha Eds., AAAI/MIT, 191--212.Google ScholarGoogle Scholar
  12. Glance, N., Hurst, M., and Tomokiyo, T. 2004. BlogPulse: Automated trend discovery for weblogs. In Proceedings of the WWW Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics.Google ScholarGoogle Scholar
  13. Hotho, A., Jaschke, R., Schmitz, C., and Stumme, G. 2006. Trend detection in folksonomies. Semantic Multimedia, 56--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Johnson, S. 2009. How Twitter will change the way we live. http://www.time.com/time/magazine/article/0,9171,1902818,00.html.Google ScholarGoogle Scholar
  15. Kannan, A., Patzer, J., and Avital, B. 2010. Trendtracker: Trending topics on Twitter. http://vis.berkeley.edu/courses/cs294-10-sp10/wiki/images/d/d4/FinalPaper.pdf.Google ScholarGoogle Scholar
  16. Khader, P., Scherag, A., Streb, J., and Roumlsler, F. 2003. Differences between noun and verb processing in a minimal phrase context: A semantic priming study using event-related brain potentials. Cog. Brain Res. 17, 2, 293--313.Google ScholarGoogle ScholarCross RefCross Ref
  17. Kwak, H., Lee, C., Park, H., and Moon, S. 2010. What is twitter, a social network or a news media? In Proceedings of the 19th International Conference on the World Wide Web. 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Li, H.-F. and Lee, S.-Y. 2009. Mining frequent itemsets over data streams using efficient window sliding techniques. Expert Syst. Appl. 36, 2, 1466--1477. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Li, H.-F., Ho, C.-C., and Lee, S.-Y. 2009. Incremental updates of closed frequent itemsets over continuous data streams. Expert Syst. Appl. 36, 2, 2451--2458. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Liang, X., Chen, W., and Bu, J. 2010. Bursty feature based topic detection and summarization. In Proceedings of the 2nd International Conference on Computer Engineering and Technology.Google ScholarGoogle Scholar
  21. Manku, G. S. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th Very Large Data Base Conference (VLDB). 346--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Mathioudakis, M. and Koudas, N. 2010. TwitterMonitor: Trend detection over the Twitter stream. In Proceedings of the International Conference on Management of Data. 1155--1158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Popescu, A.-M. and Pennacchiott, M. 2011. Dancing with the stars, NBA games, politics: An exploration of Twitter users’ response to events. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media. 594--597.Google ScholarGoogle Scholar
  24. Rui, H. and Whinston, A. 2012. Designing a social-broadcasting-based business intelligence system. ACM Trans. Manage. Inf. Syst. 2, 4, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Twitter. Twitter posts. http://blog.twitter.com/2011/03/numbers.html.Google ScholarGoogle Scholar
  26. Twitter, Inc. 2011. Year in review: Tweets per second. http://yearinreview.twitter.com/en/tps.html.Google ScholarGoogle Scholar
  27. Twitter, Inc. 2012. Twitter turns six. http://blog.twitter.com/2012/03/twitter-turns-six.html.Google ScholarGoogle Scholar
  28. Zhu, Y. and Shasha, D. 2002. Statstream: Statistical monitoring of thousands of data streams in real time. In Proceedings of the 28th Very Large Data Base Conference. 358--369. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fast, Scalable, and Context-Sensitive Detection of Trending Topics in Microblog Post Streams

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Management Information Systems
        ACM Transactions on Management Information Systems  Volume 3, Issue 4
        January 2013
        77 pages
        ISSN:2158-656X
        EISSN:2158-6578
        DOI:10.1145/2407740
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 January 2013
        • Revised: 1 November 2012
        • Accepted: 1 November 2012
        • Received: 1 April 2012
        Published in tmis Volume 3, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader