Font Size: a A A

Design And Implementation Of A Commercial Advertising Data OLAP System Based On Spark And Kylin

Posted on:2020-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:H C YuFull Text:PDF
GTID:2428330578952552Subject:Software engineering
Abstract/Summary:PDF Full Text Request
OLAP system is a solution in the massive data query demand scenario,which is widely used in sales,marketing,statistics and other fields to provide support for enterprise decision-making.Technologies such as data warehousing and multidimensional analysis have become the mainstream OLAP solution ideas in the industry.In terms of data computing,Spark is a relatively mature and widely used big data computing engine.Kylin is an integrated OLAP solution that generates data cubes through pre-computation to provide ultra-high-spded query services.Based on the definition,analysis and collation of the data,this paper cleans,transforms and models,builds the data warehouse,designs the data cube,and analyzes the requirements of the system,outlines the design,and introduces the implementation in detail.Using this system,users can create tasks by selecting query dimensions,or create tasks by writing SQL statements to observe PV,CTR,revenue and other advertising metrics from different perspectives.The system also provides the functions of user rights configuration and task queue management.The system is characterized by the scheduling of tasks of multiple computing engines,as well as the concurrency,scalability and high efficiency architecture design.In terms of technology,the system uses Hive as a data warehouse,Spark and Kylin as the calculation engine,and supports automatic switching and expansion of the engine.Use Golang and Scala as the main development language.In terms of overall architecture,the system adopts a three-terminal separation architecture,that is,the front end is responsible for user interaction and information display;the back end is responsible for authority management and task management;the data end is responsible for task scheduling and cluster resource interaction,and API and message queue are used between each end.Communicate.In addition,the system uses Google Protocol Buffer to encode task messages for high efficiency and cross-language consistency,and the Actor model is used to develop a task scheduling system for high concurrency and fault tolerance.In the actual usage test,for common query tasks,the system can provide results at the second level with considerable availability.
Keywords/Search Tags:OLAP, Spark, Kylin, Commercial Advertising Data
PDF Full Text Request
Related items