ABSTRACT
Data cube pre-computation is an important concept for supporting OLAP(Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach is not only useful for distributive aggregate functions such as SUM but can also be applied to the holistic aggregate functions like MEDIAN.Maintaining a data cube for holistic aggregation is a hard problem since its difficulty lies in the fact that history tuple values must be kept in order to compute the new aggregate when tuples are inserted or deleted. The quotient cube makes the problem harder since we also need to maintain the equivalence classes. In this paper, we introduce two techniques called addset data structure and sliding window to deal with this problem. We develop efficient algorithms for maintaining a quotient cube with holistic aggregation functions that takes up reasonably small storage space. Performance study shows that our algorithms are effective, efficient and scalable over large databases.
- S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi. On the computation of multidimensional aggregates. In Proc. 1996 Int. Conf. Very Large Data Bases (VLDB'96), pages 506--521, Bombay, India, Sept. 1996. Google ScholarDigital Library
- D. Barbara. Quasi-cubes: Exploiting approximation in multidimensional databases. SIGMOD Record, 26:12--17, 1997. Google ScholarDigital Library
- K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes. In Proc. 1999 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'99), pages 359--370, Philadelphia, PA, June 1999. Google ScholarDigital Library
- A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 1990.Google Scholar
- B. Ganter, R. Wille, and C. Franzke. Formal concept analysis: mathematical foundations. Springer-verlag, 1999. Google ScholarDigital Library
- R. Godin, R. Missaoui, and H. Alaoui. Incremental concept formation algorithms based on galois lattices. Computational Intelligence, 11:246--267, 1991.Google ScholarCross Ref
- J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29--54, 1997. Google ScholarDigital Library
- V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'96), pages 205--216, Montreal, Canada, June 1996. Google ScholarDigital Library
- W. Labio, U. Yang, Y. Cui, H. Garcia-Molina, and J. Widom. Performance issues in incremental warehouse maintenance. In Proc. of the 26st Int'l Conference on Very Large Databases (VLDB'00), 2000. Google ScholarDigital Library
- L. Lakshmanan, J. Pei, and J. Han. Quotient cube: How to summarize the semantics of a data cube. In Proc. 2002 Int. Conf. Very Large Data Bases (VLDB'02), 2002. Google ScholarDigital Library
- L. Lakshmanan, J. Pei, and Y. Zhao. Qc-trees: An efficient summary structure for semantic olap. In Proc. Of ACM-SIGMOD Int'l Conference on Management of Data, 2003. Google ScholarDigital Library
- I. Mumick, D. Quass, and B. Mumick. Maintaince of data cubes and summary tables in a warehouse. In Proc. Of ACM-SIGMOD Int'l Conference on Management of Data, 1997. Google ScholarDigital Library
- K. Ross and D. Srivastava. Fast computation of sparse datacubes. In Proc. 1997 Int. Conf. Very Large Data Bases (VLDB'97), pages 116--125, Athens, Greece, Aug. 1997. Google ScholarDigital Library
- K. Ross, D. Srivastava, and S. Sudarshan. Materialized view maintenance and integrity constraint checking:trading space for time. In Proc. Of ACM-SIGMOD Int'l Conference on Management of Data, 1996. Google ScholarDigital Library
- J. Shanmugasundaram, U. Fayyad, and P. Bradley. Compressed data cubes for olap aggregate query approximation on continuous dimensions. In Proc. Of ACM-SIGKDD Int'l Conference on Knowledge Discovery and Data Mining, 1999. Google ScholarDigital Library
- A. Shukla, P. Deshpande, and J. F. Naughton. Materialized view selection for multidimensional datasets. In Proc. 1998 Int. Conf. Very Large Data Bases (VLDB'98), pages 488--499, New York, NY, Aug. 1998. Google ScholarDigital Library
- Y. Sismanis, A. Deligiannakis, N. Roussopoulos, and Y. Kotidis. Dwarf: Shrinking the petacube. In Proc. Of ACM-SIGMOD Int'l Conference on Management of Data, 2002. Google ScholarDigital Library
- W. Wang, J. Feng, H. Lu, and J. Yu. Condensed cube: An effective approach to reducing data cube size. In Proc. of 2002 Int'l Conf. on Data Engineering (ICDE'02), 2002. Google ScholarDigital Library
- Y. Zhao, P. M. Deshpande, and J. F. Naughton. An array-based algorithm for simultaneous multidimensional aggregates. In Proc. 1997 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'97), pages 159--170, Tucson, Arizona, May 1997. Google ScholarDigital Library
Index Terms
- Incremental maintenance of quotient cube for median
Recommendations
Semi-closed cube: an effective approach to trading off data cube size and query response time
The results of data cube will occupy huge amount of disk space when the base table is of a large number of attributes. A new type of data cube, compact data cube like condensed cube and quotient cube, was proposed to solve the problem. It compresses ...
Incremental maintenance of quotient cube based on galois lattice
AbstractData cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this fundamental ...
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-...
Comments