Building Quotient Cube with MapReduce in Hadoop

Article Preview

Abstract:

In order to solve the problem that how to improve the efficiency of query and calculation in massive data, a method of building quotient cubes in Hadoop plateform which combined the advantage of the quotient cube and MapReduce model is proposed in this paper. At first, all cubes will be established and their aggregate value will be calculated in the Mapping stage. All the key/value pair formed in Mapping stage will be passed to Reducing stage. Equivalence partitioning will be carried out In this stage, and the minimum aggregation cube of each equivalence partitioning will be the key with its aggregate value. According to the minimum aggregation cubes, we can get the quotient cubes. In order to improve the speed of parallel computing and reduce network traffic, equivalence class division will be executed locally after the Map stage, it is named as combiner stage. In this paper, MapReduce model is used to improve the efficiency of building quotient cube because of its ability of parallel computing in a large amount of data. In addition, the experiment proved that, under certain circumstances, increasing the number of Mapper/Reducer task can reduce the building time effectively, and improve the construction efficiency.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 765-767)

Pages:

1031-1035

Citation:

Online since:

September 2013

Export:

Price:

[1] Pedro Furtadoand, Henrique Madeira. Data Cube Compression withQuantiCubes[A]. London, UK: Data Warehousing and Knowledge Discovery[C]. 2000: 162-167.

DOI: 10.1007/3-540-44466-1_16

Google Scholar

[2] Fay Chang , Jeffrey Dean , Sanjay Ghemawat , et al. Bigtable: A Distributed Storage System for Structured Data [J]. ACM Transactions on Computer Systems. 2008, 26(2): 1-26.

DOI: 10.1145/1365815.1365816

Google Scholar

[3] Jairam Chandar. Join Algorithms using Map/Reduce[D]. Edinburgh: University of Edinburgh, (2010).

Google Scholar

[4] Jeffrey Dean, Sanjay Ghemawat. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM, 2008, 51(1): 107-113.

DOI: 10.1145/1327452.1327492

Google Scholar

[5] Hadoop, http: /hadoop. apache. org.

Google Scholar

[6] HBase, http: /hbase. apache. org.

Google Scholar

[7] Jiawei Han, Micheline Kamber. Data Mining Concepts and Techologies[M]. Beijing: mechanical industry publishing house, (2007).

Google Scholar

[8] Laks V. S. Lakshmanan, Jian Pei, Jiawei Han. Quotient cube: how to summarize the semantics of a data cube[A]. VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases[C]. 2002: 778 – 789.

DOI: 10.1016/b978-155860869-6/50074-3

Google Scholar