Font Size: a A A

Research On High Performance MOLAP Technologies For Massive Data

Posted on:2018-05-17Degree:MasterType:Thesis
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:2348330512988945Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the perfection of enterprises' information systems and the ever-increasing accumulation of data,data analysis plays an increasingly important role in modern enterprises.OLAP(Online Analytical Processing),mainly refers to the real-time multidimensional query and analysis of data for decision analysis,is currently the most commonly used and most effective technology in the fields of data analysis.After years of development,the industry has many mature OLAP systems,and numerous companies benefit from OLAP technologies.But as we enter the era of big data,the data need to be processed and analyzed exploded,the traditional OLAP technologies have encountered a great bottleneck in massive data processing,their response become very slow or they even can't process large-scale data.Therefore,it is an urgent problem to investigate new OLAP technologies and design OLAP system for massive data.And the increasingly sophisticated distributed computing frameworks and distributed storage systems provide an effective way to solve this problem.Based on the current theoretical and technical basis,this thesis investigates the OLAP technologies for massive data,and designs a MOLAP(Multidimensional OLAP)prototype framework based on precomputation according to the results of research.The idea of MOLAP is generating the data cube by precomputing the results of possible queries to speed up the queries.In view of the characteristics of massive data,this thesis focuses on the following OLAP related issues: 1)Deal with the curse of dimensionality of precomputation based on massive data properly;2)Design distributed cube precomputation algorithms for massive data;3)Deal with the data growth problem of precomputation effectively;4)Response to the changes of multi-dimensional model caused by changing of business needs reasonably.The MOLAP prototype framework based on above results of research use Spark as the computation framework,HBase as the main storage component,use Calcite to achieve the SQL query engine,and provides a Web-based visual query and analysis components.Since the powerful distributed architecture,efficient cube precomputation algorithms and cube optimization strategies,the framework can do precomputation and generate data cube for massive data to provide low latency SQL queries and efficient OLAP analysis services.Finally,this thesis tested the prototype framework with the industry's most commonly used SSB benchmark standard,focused on the cubes' precomputation speed and storage footprint,and the query response time,then compare the framework to other typical big data OLAP systems in the industry.The Experiments show that,in most application scenarios,compared to the popular OLAP systems based on real-time computing in the industry,the prototype framework has a great query performance advantage,the time takes by precomputation and cubes' storage footprint is also acceptable.
Keywords/Search Tags:Online Analytical Processing, MOLAP, Distributed Processing
PDF Full Text Request
Related items