skip to main content
research-article

Analytics for the real-time web

Published:01 August 2011Publication History
Skip Abstract Section

Abstract

With the emergence of mobile devices constantly connected to the Internet, the nature of user-generated data has changed on most Web 2.0 sites. Today, people produce and share data more often and the lifespan of the data is shorter. Analyzing this data leads to new requirements for analytical systems: real-time processing and database-intensive workloads. Driven by these requirements, we have developed a new system for real-time analytics. Our system extends a key-value store, Cassandra, with push-based processing, transactional task execution, and synchronization. To demonstrate our system, we have built a service to reorganize news sites using real-time feedback from social media.

References

  1. D. J. Abadi et al. The design of the Borealis stream processing engine. In CIDR, pages 277--289, 2005.Google ScholarGoogle Scholar
  2. CNN. Study into the power of news and recommendation. http://cnninternational.presslift.com/socialmediaresearch, 2010.Google ScholarGoogle Scholar
  3. T. Condie et al. Online aggregation and continuous query support in MapReduce. In ACM SIGMOD Conf., pages 1115--1118, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, pages 205--220, 2007.Google ScholarGoogle Scholar
  6. M. J. Franklin et al. Continuous analytics: Rethinking query processing in a network-effect world. In CIDR, 2009.Google ScholarGoogle Scholar
  7. C. G. Gray and D. R. Cheriton. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In SOSP, pages 202--210, 1989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. InfoSphere streams. http://www-01.ibm.com/software/data/infosphere/streams, 2011.Google ScholarGoogle Scholar
  9. A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDM Workshops, pages 170--177, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Oskarsson and K. Kakugawa. Increment counters. http://issues.apache.org/jira/browse/CASSANDRA-1072, 2010.Google ScholarGoogle Scholar
  12. D. Pen and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In OSDI, pages 215--264, 2010.Google ScholarGoogle Scholar
  13. C. Penner. #numbers. http://blog.twitter.com/2011/03/numbers.html, 2011.Google ScholarGoogle Scholar
  14. Twitter Streaming API documentation. http://dev.twitter.com/pages/streaming api, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Analytics for the real-time web
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 4, Issue 12
      August 2011
      303 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 August 2011
      Published in pvldb Volume 4, Issue 12

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader