Font Size: a A A

Research And Implementation Of Big Data Oriented Distributed OLAP Engine

Posted on:2016-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:J L WeiFull Text:PDF
GTID:2348330512470874Subject:Software engineering
Abstract/Summary:PDF Full Text Request
More and more data is becoming available on Hadoop during big data era.There are Limitations in existing Business Intelligence(BI)Tools as follows,such as limited support for Hadoop,data size growing exponentially,high latency of interactive queries and so forth.Challenges to adopt Hadoop as interactive analysis system are growing up.Say,majority of analyst groups are SQL savvy,no mature SQL interface on Hadoop,full OLAP capability on Hadoop ecosystem not ready yet,etc.So,a big data oriented distributed OLAP engine is put forward in this paper.We first of all dissect and analyse the open source traditional OLAP Engine framework Mondrian in order to comprehend the traditional OLAP Engine implementation principle,especially optimization mechanism such as materialized view and rewrite techonology.Then,this paper put forword the disadvantages of traditional OLAP Engine in the background of big data.At the same time,the corresponding stretagy to deal with big data and distributed features to utilize are proposed.The big data based distributed OLAP engine's main idea is just taking "space" for "time".It makes full user of the distributed scale-out Hadoop cluster to pre-computing and pre-build data cube from star-schema relational data to key-value data stored in HBase as much as possible.When a query comes,it just hits the point and returns back result.Besides,this paper study and analyse the cardinalty of massive dataset estimation algorithm,that is,HyperLogLog Counting,which plays an important role in function"disctinct count" and which is validated to be unbiased and consistant from aspect of mean value and variance with compared with HyperLogLog++ algorithm.Afterwords,the whole system architecture and component design are presented.On this basis,this paper describes the logical data cube design,the process of cube building,the procedure of ETL,and the query engine buiding.Meanwhile,in the light of component design,this paper shows the detail implementation of query engine,frontend RESTful Server,storage engine,coding subsystem,and job engine subsequently,including summarize the the advantages and features of REST style and the operations and its algorithm's complexity of Trie tree structure in coding subsystem.Finally,this paper show the pratical application of big data oriented OLAP engine.Through the technology of frontend AngularJS and backend Node.js,this paper construct the prototype of application.Then,under the comparison of traditional and OLAP Engine put forword from this paper,the performance experiment based TPC-H is done and verified to meet the requirement.
Keywords/Search Tags:big data, HyperLogLog algorithm, distributed, Hadoop, online analysis processing
PDF Full Text Request
Related items