Font Size: a A A

Design And Implementation Of Cloud Product Data Multidimensional Analysis System

Posted on:2018-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z Z WangFull Text:PDF
GTID:2348330512982142Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of cloud computing industry,the Internet companies whose main business is cloud services have also been rapid growth.At present,the author's internship company has nearly 13 cloud product line,however the product line have a large and scattered business data every day,and the log data has reached PB level.How to conduct a unified business analysis of the cloud products,timely discover the operation problems and forecast the trend of the market demand is the current problems.To solve these problems,the paper aims to access the product line business data to build a cloud product data reporting platform which is unified and high response,providing multi-dimensional and deep-level report query,data statistics,data analysis,data prediction and other functions.In the process of the cloud product data multidimensional analysis system,firstly,the author has completed the statistical analysis of data subject and data dimension,and participated in the interface prototype design of multidimensional analysis system.Secondly,the author participated in the OLAP(On-line Analytical Processing)engine research and the design of the log processing program.Then,in the system design and implementation stage,the author independently designed and realized the multidimensional analysis subsystem,Trainer subsystem and Log to ORC(Optimized Row Column File)subsystem,and completed the creation work of data Model and Cube.In the testing phase,the author independently completed the functional testing of the relevant subsystems.The multidimensional analysis system is based on the Apache Kylin OLAP engine,and the Cube is precomputed to realize the hign response of multi-dimensional searching.The multidimensional analysis subsystem dynamically builds SQL based on complex business logic and then further processes the query data to make more effective decision.The Trainer subsystem is responsible for the regular synchronization of data and the automatic construction of the Cube,providing data support for multidimensional analysis subsystems and ensuring data consistency.The Log to ORC subsystem is responsible for ETL(Extract-Transform-Load)processing of each product line.This subsystem uses Spark SQL to perform parallel calculation of log data,which greatly improves the processing speed,and it adopts more efficient file compression format ORC to store the processed data and improves the subsequent data processing performance.At present,the cloud product data multidimensional analysis system has complete a development and testing work,and is in the trial run stage,the system function is stable,and has achieve the desired goal.
Keywords/Search Tags:Multidimensional analysis, OLAP, Cloud Products, Apache Kylin, Spark Sql
PDF Full Text Request
Related items