Font Size: a A A

Research On Online Analysis Processing Using Spark

Posted on:2016-07-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiangFull Text:PDF
GTID:2348330479953405Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Online analytical processing system help decision makers to handle multi-dimensional analysis of the historical enterprise data, so that they can make better decisions for enterprise development. Depending on the relational online analytical processing system of distributed computing framework multi-table joins are the key factors of Online Analytical Processing System Performance. Researching on how to reuse the results of multi- table joins according to the user's habits, to improve the overall relational online analytical processing system performance is significant.Based on the analysis of the distributed computing framework Spark which is the memory online analytical processing system, and compare to other online analytical processing system architecture, an online analytical processing system based on Spark is introduced.. The architecture can be divided into the application layer, the driver layer, the server layer, the compute layer and the storage layers. The main function modules include the dimension table processing module, the fact table processing module and multidimensional analysis of the implementation module.In the aspect of multi-dimensional analysis processing, read ing operation is only related to the query columns when using Spark process dimension and fact tables, this can reduce the amount computational work of Spark; dimension tables for hierarchical coding process, through the hierarchical level of coding information substitutions key fact table to generate hierarchical coding information fact sheet; us ing the reusable characteristic of Spark working data set to cache the encoding fact table with hierarchical coding information can make the next multi-dimensional analysis reuse the cached encoding fact table with hierarchical coding information effectively reducing the online analytical processing in multi-table join operation,Comparative experiments show that the Spark-based online analytical processing system can reuse the result of a connection dimension tables and fact table when processing the MDX statements with the same dimension. In the aspect of the whole efficiency of system, the Spark-based online analytical processing system gains much more efficiency compared with Hive-based online analytical processing system.
Keywords/Search Tags:online analytical processing, distributed, coding fact table, hierarchy encoding
PDF Full Text Request
Related items