Research And Application Of OLAP Key Technologies For Massive Data

Posted on:2020-04-18

Degree:Master

Type:Thesis

Country:China

Candidate:S M Guo

Full Text:PDF

GTID:2428330572476384

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

In the field of big data,the amount of data is exploding.How to mine potential information from massive data quickly has become an important challenge in the database field at the present stage.On-line Analytical Processing(OLAP)has emerged as the times require.OLAP solves the performance bottleneck problem of traditional data in the face of massive data storage,and at the same time simplifies the complicated calculation process of distributed system in processing massive data.It is a research hotspot in the field of data mining now,with high theoretical value and research significance.However,in practical business applications,OLAP needs to rely on existing data warehouses and other platforms,and still face problems such as poor concurrent affordability,uneven resource allocation,and poor user experience.In view of the above problems,this paper designs and implements an analysis and query system based on Apache Kylin and Elasticsearch engine.The system completes the OLAP task,separates data analysis and query,makes full use of distributed computing system,and reduces distributed computing.The pressure has finally accelerated the speed of OLAP operations and improved the user experience.The specific work of this paper is as follows:(1)Study and summarize the current development status of OLAP technology at present,summarize the current problems in combination with actual business needs,research technical models for emerging problems,and design system solutions.(2)On the big data platform consisting of Hadoop,Hive and HBase,the analysis subsystem based on Apache Kylin is designed and implemented.The system is under the condition of high concurrency(concurrent number greater than 50)and terabyte data volume.The average analysis response time is kept within 8s.At the same time,the analysis system optimizes the different links of data integration calculation,which shortens the average analysis response time and saves the space of data results.(3)Design and implement the query subsystem based on Elasticsearch.Under the condition of terabyte data volume,the average data query time is kept within Is.According to the characteristics of the query process of the system,the accuracy of the query results is improved by more than 20%by combining the methods of the word segmentation system.At the same time,the word segmentation system model based on deep learning is studied and implemented,and the word segmentation effect is compared with the traditional word segmentation system to provide a reference for further optimization of the system.

Keywords/Search Tags:

online analytical processing, distributed computing, Kylin Elasticsearch

PDF Full Text Request

Related items

1	Massive Distributed In-memory Columnar Database Query Engine For On-line Analytical Processing
2	Research On Online Analysis Processing Using Spark
3	Design And Implementation Of Query Optimizer For Massive Distributed Columnar Database
4	Data Stream Online Analytical Processing Technology
5	Distributed Computing Based On The Hunan Post And Its Financial Agent Industry
6	Research On Fast Data Cube Computation Method Based On Spark Platform
7	Research On High Performance MOLAP Technologies For Massive Data
8	Research On Key Technologies Of Distributed Rank-aware Query Processing
9	Research And Application On Link-Based Data Warehouse And Online Analytical Processing
10	Design And Implementation Of Distributed Music Vertical Search Engine Based On Elasticsearch