Font Size: a A A

Research On Query Processing Of Online Analysis Server Using Mapreduce

Posted on:2013-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:G T YuFull Text:PDF
GTID:2248330392456212Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the Internet’s rapid development and wide application of information technology,increasingly large amounts of data are produced by the network, and on-line analyticalprocessing, as the primary technology to store and analyze data, is required to makecorrespondingly the amount of data multiplied stored. At the same time, dealing with hugeamounts of data, we need to take a huge amount of computation. MapReduce proposed byGoogle Inc. is a large computer cluster framework model of concurrent processing hugeamounts of data, but has inherent deficiencies in handling structured data. Therefore,research on hybrid system of MapReduce and database has important significance.In this paper, we firstly analyze the design requirements of hybrid system based onMapReduce and database from system demands, design principles and targets; Secondly,we give the overall architecture of the system, and analyze the system from tomographystructure and system module, tomography structure includes presentation layer, conversionlayer, computation/scheduling layer and storage layer, system module includes distributeddatabase storage optimization model, query optimization model etc., and then descript themain work processes of the system; Finally, we extend a new multidimensional querylanguage, and descript the syntax detail of the multidimensional query language.In terms of optimization for hybrid system technology, we give implementations ofstorage and query optimization technology. For storage optimization, we define themanner of division and store of fact tables and dimension tables, and the optimizationfomula of joining between the dimension table and fact table; for query optimization, weimprove the algorithm of structure and query of QCTree, and its implementation onMapReduce; finally we analyze the effect of storage optimization to query efficiency.Finally, the experiments show that the performance of the system on join between facttables and dimension tables is impoved compared with MapReduce model based on HDFS,and on query optimization, the performance of the system is improved compared withHDW.
Keywords/Search Tags:data warehousing, multidimensional query, storage, data cubes, map, reduce
PDF Full Text Request
Related items