Font Size: a A A

Research On Fast Data Cube Computation Method Based On Spark Platform

Posted on:2017-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:C R L SaFull Text:PDF
GTID:2348330488489481Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of big data, organizing and analyzing data increasingly become the bottleneck of development of a variety of industries, and demand of high performance when analyzing real-time data continuously improve, besides traditional online analytical processing exhibit poor real-time response ability when process large data, thence online analytical processing systems face with a higher application requirements. Spark is a distributed memory computing framework, with lightweight fast processing, also compatible with Hadoop ecosystem, lower learning costs, support by vibrant community, support of many kinds programming languages and other properties. Since the release of Spark,it has been widely used,and to achieve data analysis of real-time OLAP system Spark provides a new thinking.All of first, this paper summarizes the research status of online analytical processing under the background of big data, then summarize the most popular large data processing platform and framework at present, combined with the application requirement of speedy response of real-time OLAP system, determine to choose Spark to expand research work. In this study analysis synthetically on the basis of Spark memory parallel computing framework, Bloom Filter algorithm and BUC algorithm, and also focus on researching cube fast computing algorithm.And achieve parallelization of BUC algorithm through Spark computing framework, and improved to adapt to distributed computing, moreover the efficiency of cube computing improved effectively.For star join algorithm,realize multi-dimensional Bloom Filter star join algorithms, to accommodate connecting operation between tables with large data volume. Through contrast and analysis performance, verify that the proposed scheme is suitable for rapid cube computing tasks in the background of big data..The results of this study improve the speed of cube computing and real-time analysis performance of online analytical processing,this is a beneficial attempt in the field of online analytical processing with Spark.
Keywords/Search Tags:online analytical processing, big data, distributed computing, Spark, data cube
PDF Full Text Request
Related items