Font Size: a A A

Research And Application Of OLAP Key Technologies For Massive Data

Posted on:2020-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:S M GuoFull Text:PDF
GTID:2428330572476384Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
In the field of big data,the amount of data is exploding.How to mine potential information from massive data quickly has become an important challenge in the database field at the present stage.On-line Analytical Processing(OLAP)has emerged as the times require.OLAP solves the performance bottleneck problem of traditional data in the face of massive data storage,and at the same time simplifies the complicated calculation process of distributed system in processing massive data.It is a research hotspot in the field of data mining now,with high theoretical value and research significance.However,in practical business applications,OLAP needs to rely on existing data warehouses and other platforms,and still face problems such as poor concurrent affordability,uneven resource allocation,and poor user experience.In view of the above problems,this paper designs and implements an analysis and query system based on Apache Kylin and Elasticsearch engine.The system completes the OLAP task,separates data analysis and query,makes full use of distributed computing system,and reduces distributed computing.The pressure has finally accelerated the speed of OLAP operations and improved the user experience.The specific work of this paper is as follows:(1)Study and summarize the current development status of OLAP technology at present,summarize the current problems in combination with actual business needs,research technical models for emerging problems,and design system solutions.(2)On the big data platform consisting of Hadoop,Hive and HBase,the analysis subsystem based on Apache Kylin is designed and implemented.The system is under the condition of high concurrency(concurrent number greater than 50)and terabyte data volume.The average analysis response time is kept within 8s.At the same time,the analysis system optimizes the different links of data integration calculation,which shortens the average analysis response time and saves the space of data results.(3)Design and implement the query subsystem based on Elasticsearch.Under the condition of terabyte data volume,the average data query time is kept within Is.According to the characteristics of the query process of the system,the accuracy of the query results is improved by more than 20%by combining the methods of the word segmentation system.At the same time,the word segmentation system model based on deep learning is studied and implemented,and the word segmentation effect is compared with the traditional word segmentation system to provide a reference for further optimization of the system.
Keywords/Search Tags:online analytical processing, distributed computing, Kylin Elasticsearch
PDF Full Text Request
Related items