Research And Implementation Of Big Data Oriented Distributed OLAP Engine

Posted on:2016-05-17

Degree:Master

Type:Thesis

Country:China

Candidate:J L Wei

Full Text:PDF

GTID:2348330512470874

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

More and more data is becoming available on Hadoop during big data era.There are Limitations in existing Business Intelligence(BI)Tools as follows,such as limited support for Hadoop,data size growing exponentially,high latency of interactive queries and so forth.Challenges to adopt Hadoop as interactive analysis system are growing up.Say,majority of analyst groups are SQL savvy,no mature SQL interface on Hadoop,full OLAP capability on Hadoop ecosystem not ready yet,etc.So,a big data oriented distributed OLAP engine is put forward in this paper.We first of all dissect and analyse the open source traditional OLAP Engine framework Mondrian in order to comprehend the traditional OLAP Engine implementation principle,especially optimization mechanism such as materialized view and rewrite techonology.Then,this paper put forword the disadvantages of traditional OLAP Engine in the background of big data.At the same time,the corresponding stretagy to deal with big data and distributed features to utilize are proposed.The big data based distributed OLAP engine's main idea is just taking "space" for "time".It makes full user of the distributed scale-out Hadoop cluster to pre-computing and pre-build data cube from star-schema relational data to key-value data stored in HBase as much as possible.When a query comes,it just hits the point and returns back result.Besides,this paper study and analyse the cardinalty of massive dataset estimation algorithm,that is,HyperLogLog Counting,which plays an important role in function"disctinct count" and which is validated to be unbiased and consistant from aspect of mean value and variance with compared with HyperLogLog++ algorithm.Afterwords,the whole system architecture and component design are presented.On this basis,this paper describes the logical data cube design,the process of cube building,the procedure of ETL,and the query engine buiding.Meanwhile,in the light of component design,this paper shows the detail implementation of query engine,frontend RESTful Server,storage engine,coding subsystem,and job engine subsequently,including summarize the the advantages and features of REST style and the operations and its algorithm's complexity of Trie tree structure in coding subsystem.Finally,this paper show the pratical application of big data oriented OLAP engine.Through the technology of frontend AngularJS and backend Node.js,this paper construct the prototype of application.Then,under the comparison of traditional and OLAP Engine put forword from this paper,the performance experiment based TPC-H is done and verified to meet the requirement.

Keywords/Search Tags:

big data, HyperLogLog algorithm, distributed, Hadoop, online analysis processing

PDF Full Text Request

Related items

1	Design And Implementation Of Online Data Processing System Based On Hadoop
2	Research And Implementation Of Distributed ETL Based On Hadoop Platform
3	The Research And Analysis Of Hadoop Small File Processing Method
4	Design And Implementation Of Distributed Query Algorithm Processing Communication Data Based On Hadoop
5	Design And Implementation Of A Distributed Data Processing Platform For The Online Music Service
6	Research And Implementation On Incremental Data Processing Algorithm Based On Hadoop
7	Massive Data Processing Application Based On Hadoop
8	Analysis And Implementation Of Shopping Online System Based On Hadoop
9	Research On Distributed Processing Of Massive Video Data Based On Hadoop
10	The Performances Of Distributed Big Data Processing Modes In High-speed Traffic Network