Abstract
With the emergence of mobile devices constantly connected to the Internet, the nature of user-generated data has changed on most Web 2.0 sites. Today, people produce and share data more often and the lifespan of the data is shorter. Analyzing this data leads to new requirements for analytical systems: real-time processing and database-intensive workloads. Driven by these requirements, we have developed a new system for real-time analytics. Our system extends a key-value store, Cassandra, with push-based processing, transactional task execution, and synchronization. To demonstrate our system, we have built a service to reorganize news sites using real-time feedback from social media.
- D. J. Abadi et al. The design of the Borealis stream processing engine. In CIDR, pages 277--289, 2005.Google Scholar
- CNN. Study into the power of news and recommendation. http://cnninternational.presslift.com/socialmediaresearch, 2010.Google Scholar
- T. Condie et al. Online aggregation and continuous query support in MapReduce. In ACM SIGMOD Conf., pages 1115--1118, 2010.Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004.Google ScholarDigital Library
- G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, pages 205--220, 2007.Google Scholar
- M. J. Franklin et al. Continuous analytics: Rethinking query processing in a network-effect world. In CIDR, 2009.Google Scholar
- C. G. Gray and D. R. Cheriton. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In SOSP, pages 202--210, 1989.Google ScholarDigital Library
- InfoSphere streams. http://www-01.ibm.com/software/data/infosphere/streams, 2011.Google Scholar
- A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35--40, 2010.Google ScholarDigital Library
- L. Neumeyer, B. Robbins, A. Nair, and A. Kesari. S4: Distributed stream computing platform. In ICDM Workshops, pages 170--177, 2010.Google ScholarDigital Library
- J. Oskarsson and K. Kakugawa. Increment counters. http://issues.apache.org/jira/browse/CASSANDRA-1072, 2010.Google Scholar
- D. Pen and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In OSDI, pages 215--264, 2010.Google Scholar
- C. Penner. #numbers. http://blog.twitter.com/2011/03/numbers.html, 2011.Google Scholar
- Twitter Streaming API documentation. http://dev.twitter.com/pages/streaming api, 2011.Google Scholar
Index Terms
- Analytics for the real-time web
Comments