Font Size: a A A

Key Issues On Hadoop Online Analytical Processing System

Posted on:2017-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhaoFull Text:PDF
GTID:2348330512488019Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,Online Analytical Processing(OLAP)for the multidimensional data queries analysis becomes more significant.Decision makers in the enterprise make the internal major decisions by showing the analysis results of OLAP.At present,the research for OLAP is the storage processing and OLAP query performance optimization for the single data model.ROLAP(Relational-OLAP,ROLAP)based on relational database and MOLAP(Multidimensional-OLAP,MOLAP)based on multidimensional database are based on the single data organization model,which can not satisfys the query demand for heterogeneous data model and low latency multidimensional of different scale data set.For the problem above,we design and implement a scalable and efficient distributed Hybrid Online Analytical Processing(Hybrid-OLAP,HOLAP)system in view of query interpretation,cache query optimization mechanism etc,after we study the query planning process for the different data organization model.The system is designed to solve the multidimensional query of different scale data sets,and make efficient and reasonable query processing according to the implementation mode of different multidimensional organization.The four aspects of research on this system are as follow:First,for the multidimensional analysis on the large scale data sets that traditional ROLAP system can not effectively solve,we propose a system architecture named HOLAP can perform fast multidimensional query analysis for the data sets of different scale level under the Hadoop environment.It supports the query interpretation and aggregation method of the MDX(Multidimensional Expressions)query based Hive,and multidimensional query optimization method based on the Hbase precomputed cache mechanism.Study the key techniques including query planning method,query interpretation mechanism,formalized multidimensional cube construction method and aggregation cache mechanism etc,and the key technologies about the parsing and aggregation in the MDX query based Hive from the perspective of its implement.Second,for the Hive multidimensional query performance optimization on large scale data set,we propose a Formalized Model(Hsql-To-Nosql Formalized Model,Hs-Nos-FM)form Hive(Similar to the relational database)to the Hbase data model,through a algorithm of Segments,Dimensionality Reduction and Aggregation layer by layer(S-Redu-D-A)for constructing Hbase cube cache.Propose a data storage model of Formalized Multidimensional Cube(F-M-Cube),and proved to be efficient in multidimensional query of large-scale data set.Third,for two query plans,the calculation and analysis of query planning are carried out by query the planning index including real-time requirement,data size,dimension base,storage space,multi-table connection and query frequency.Propose a query planning work flow including privilege control,query listener,query analysis and query allocation.The method of query planning based on HOLAP system architecture is validated,and it shows better analytical performance in common OLAP multi dimensional query by comparing and analyzing the execution time of different scale data and different multidimensional queries.Finally,this thesis designs and implements HOLAP system by query planning method,query interpretation mechanism,formal multidimensional cube construction method,aggregation caching mechanism and Hive MDX query,embedding algorithm of the formal cube construction.After testing,the system has a good performance,and achieves the desired goals.
Keywords/Search Tags:OLAP, HOLAP, Query planning method, S-Redu-D-A algorithm, Hbase
PDF Full Text Request
Related items