Research On Fast Data Cube Computation Method Based On Spark Platform

Posted on:2017-10-07

Degree:Master

Type:Thesis

Country:China

Candidate:C R L Sa

Full Text:PDF

GTID:2348330488489481

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the advent of big data, organizing and analyzing data increasingly become the bottleneck of development of a variety of industries, and demand of high performance when analyzing real-time data continuously improve, besides traditional online analytical processing exhibit poor real-time response ability when process large data, thence online analytical processing systems face with a higher application requirements. Spark is a distributed memory computing framework, with lightweight fast processing, also compatible with Hadoop ecosystem, lower learning costs, support by vibrant community, support of many kinds programming languages and other properties. Since the release of Spark,it has been widely used,and to achieve data analysis of real-time OLAP system Spark provides a new thinking.All of first, this paper summarizes the research status of online analytical processing under the background of big data, then summarize the most popular large data processing platform and framework at present, combined with the application requirement of speedy response of real-time OLAP system, determine to choose Spark to expand research work. In this study analysis synthetically on the basis of Spark memory parallel computing framework, Bloom Filter algorithm and BUC algorithm, and also focus on researching cube fast computing algorithm.And achieve parallelization of BUC algorithm through Spark computing framework, and improved to adapt to distributed computing, moreover the efficiency of cube computing improved effectively.For star join algorithm,realize multi-dimensional Bloom Filter star join algorithms, to accommodate connecting operation between tables with large data volume. Through contrast and analysis performance, verify that the proposed scheme is suitable for rapid cube computing tasks in the background of big data..The results of this study improve the speed of cube computing and real-time analysis performance of online analytical processing,this is a beneficial attempt in the field of online analytical processing with Spark.

Keywords/Search Tags:

online analytical processing, big data, distributed computing, Spark, data cube

PDF Full Text Request

Related items

1	Novel techniques for data warehousing and online analytical processing in emerging applications
2	Research On The Efficient Materialization And Fast Query Of Condensed Data Cube
3	OLAP Algorithm Research Based On Dimension Hierarchy For Data Cube
4	Design And Implementation Of Online Marketing Data Analysis Platform Based On The Materialized Data Cube
5	Online Analytical Processing And Applications
6	Data Warehouse In The Erp Application
7	Research On Distributed Query Of Quotient Cube Based On Spark
8	Data Stream Online Analytical Processing Technology
9	Multidimensional Data Model For Mining And Analysis Based On Multiple Structure Data Cube
10	Research On The Technology Of Label Cube