Skip to main content
Log in

Streaming multiple aggregations using phantoms

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most compelling one—the backbone of a large internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations over high-speed data streams based on the two-level query processing architecture of GS, a real data stream management system deployed in AT & T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments show that our algorithm achieves near-optimal computation costs, which outperforms the best adapted algorithm by more than an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: ACM Symposium on Theory of Computing (STOC), pp. 20–29. Philadephia, USA (1996)

  2. Arasu A., Babcock B., Babu S., Datar M., Ito K., Motwani R., Nishizawa I., Srivastava U., Thomas D., Varma R., Widom J.: STREAM: the stanford stream data manager. IEEE Data Eng. Bull. 26(1), 19–26 (2003)

    Google Scholar 

  3. Arasu, A., Widom, J.: Resource sharing in continuous sliding-window aggregates. In: International Conference on very large data bases (VLDB), pp. 336–347. Toronto, Canada (2004)

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM SIGACT-SIGMOD-SIGART Symposium on principles of database systems (PODS), pp. 1–16. Madison, USA (2002)

  5. Barbour, A.D., Holst, L., Janson, S.: Poisson approximation. Oxford Science Publications (1992)

  6. Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Seidman, G., Stonebraker, M., Tatbul, N., Zdonik, S.: Monitoring streams—a new class of data management applications. In: International Conference on very large data bases (VLDB), pp. 215–226. Hong Kong, China (2002)

  7. Chakravarthy U., Minker J.: Processing multiple queries in database systems. IEEE Database Eng. Bull. 5(3), 38–44 (1982)

    Google Scholar 

  8. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, S.K.W., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: continuous dataflow processing for an uncertain world. In: Conference on innovative data systems research (CIDR), Asilomar, USA (2003)

  9. Chandrasekaran, S., Franklin, M.J.: Streaming queries over streaming data. In: International Conference on very large data bases (VLDB), pp. 203–214. Hong Kong, China (2002)

  10. Chaudhuri, S., Das, G., Narasayya, V.: A robust, optimization-based approach for approximate answering of aggregate queries. In: ACM International Conference on management of data (SIGMOD), pp. 295–306. Santa Barbara, USA (2001)

  11. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: ACM International Conference on management of data (SIGMOD), pp. 379–390. Dallas, USA (2000)

  12. Cranor, C., Johnson, T., Spatscheck, O., Shkapenyuk, V.: Gigascope: a stream database for network applications. In: ACM International Conference on management of data (SIGMOD), pp. 647–651. San Diego, USA (2003)

  13. Demers, A.J., Gehrke, J., Hong, M., Riedewald, M., White, W.M.: Towards expressive publish/subscribe systems. In: EDBT, pp. 627–644 (2006)

  14. Diao Y., Altinel M., Franklin M.J., Zhang H., Fischer P.M.: Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans. Database Syst. 28(4), 467–516 (2003)

    Article  Google Scholar 

  15. Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Processing complex aggregate queries over data streams. In: ACM International Conference on management of data (SIGMOD), pp. 61–72. Madison, USA (2002)

  16. Dobra, A., Garofalakis, M.N., Gehrke, J., Rastogi, R.: Sketch-based multi-query processing over data streams. In: International Conference on extending database technology (EDBT), pp. 551–568. Heraklion, Greece (2004)

  17. Finkelstein, S.: Common expression analysis in database applications. In: ACM International Conference on management of data (SIGMOD), pp. 235–245. Orlando, USA (1982)

  18. Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: ACM International Conference on management of data (SIGMOD), pp. 13–24 (2001)

  19. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In: International Conference on very large data bases (VLDB), pp. 79–88. Roma, Italy (2001)

  20. Gupta A., Mumick I.S.: Maintenance of materialized views: problems, techniques and applications. IEEE Data Eng. Bull., Special Issue on Materialized Views and Data Warehousing 18(2), 3–18 (1995)

    Google Scholar 

  21. Hall P.A.V.: Optimization of single expressions in a relational data base system. IBM J. Res. Dev. 20(3), 244–257 (1976)

    Article  MATH  Google Scholar 

  22. Hammad, M.A., Mokbel, M.F., Ali, M.H., Aref, W.G., Catlin, A.C., Elmagarmid, A.K., Eltabakh, M.Y., Elfeky, M.G., Ghanem, T.M., Gwadera, R., Ilyas, I.F., Marzouk, M.S., Xiong, X.: Nile: A query processing engine for data streams. In: ICDE, p. 851 (2004)

  23. Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: ACM International Conference on management of data (SIGMOD), pp. 205–216. Montreal, Canada (1996)

  24. Hong, M., Riedewald, M., Koch, C., Gehrke, J., Demers, A.J.: Rule-based multi-query optimization. In: EDBT, (2009)

  25. Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: International Conference on Very Large Data Bases (VLDB), p. 1149 (2003)

  26. Krishnamurthy, S., Wu, C., Franklin, M.J.: On-the-fly sharing for streamed aggregation. In: SIGMOD Conference (2006)

  27. Larson, P.-Å.: Data reduction by partial preaggregation. In: ICDE (2002)

  28. Madden, S., Shah, M., Hellerstein, J., Raman, V.: Continuously adaptive continuous queries over streams. In: ACM International Conference on management of data (SIGMOD), pp. 49–60. Madison, USA (2002)

  29. Ross, K.A., Srivastava, D., Sudarshan, S.: Materialized view maintenance and integrity constraint checking: trading space for time. In: ACM International Conference on management of data (SIGMOD), pp. 447–458. Montreal, Canada (1996)

  30. Roussopoulos N.: View indexing in relational databases. ACM Trans. Database Syst. 7(2), 256–290 (1982)

    Article  Google Scholar 

  31. Sullivan, M., Heybey, A.: Tribeca: A system for managing large databases of network traffic. In: USENIX Technical Conference. New Orleans, USA (1998)

  32. Wong E., Youssefi K.: Decomposition - a strategy for query processing. ACM Trans. Database Syst. 1(3), 223–241 (1976)

    Article  Google Scholar 

  33. Zhang, R., Koudas, N., Ooi, B.C., Srivastava, D.: Multiple aggregations over data streams. In: ACM International Conference on management of data (SIGMOD), pp. 299–310. Baltimore, USA (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, R., Koudas, N., Ooi, B.C. et al. Streaming multiple aggregations using phantoms. The VLDB Journal 19, 557–583 (2010). https://doi.org/10.1007/s00778-010-0180-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0180-z

Keywords

Navigation