Font Size: a A A

Research On Reverse Skyline Query Algorithm Based On SR Tree Under MapReduce Model

Posted on:2019-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:L P HanFull Text:PDF
GTID:2428330548959207Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the advancement and development of science and technology,the amount of data has grown explosively,and traditional databases and traditional methods of computing have encountered tremendous challenges.The traditional database cannot satisfy the convenient storage of massive data,and the traditional calculation methods cannot meet the real-time processing of massive data.Algorithms.However,with the rapid increase in the amount of data,the traditional algorithm flow has been unable to meet the needs of real-time line,high efficiency,stability and other aspects.So how these traditional algorithms efficiently retrieve data with high efficiency,low delay and high precision in this massive data environment is an urgent problem to be solved.The introduction of distributed architecture makes it possible to massively process large amounts of data in parallel.Hadoop is one of the most effective distributed architectures for handling high-volume,massive data.HDFS uses a fault-tolerant distributed storage mode for the storage of large amounts of data,and uses a typical master-slave mode for data management.The MapReduce design mode splits the computation into the form of key-value pairs,and at the same time,changes the traditional way of data reading to the calculation module,making calculations to find the data to be processed,reducing IO operations and achieving large-scale mass parallel processing of data.Although the distributed architecture can process massive data in parallel,the traditional MapReduce-based Reverse Skyline algorithm still has some deficiencies.Although the traditional Reverse Skyline algorithm implements parallel operations,if all data is traversed each time it is queried,it will cause a lot of overhead,leading to low query efficiency and causing a huge waste of computing resources,so the index It is necessary to establish.However,due to the change of query conditions and the query position,the traditional Reverse Skyline query algorithm can not reuse the index structure under different circumstances,and the pruning efficiency is also limited.So how to effectively improve the traditional Reverse Skyline algorithm combined with the index and achieve the maximum degree of pruning is the key to improving the efficiency of the query.This article is based on the characteristics of the traditional Reverse Skyline query algorithm and how to accelerate massive data processing,the MRRSL algorithm(MapReduce-based Reverse Skyline query algorithm)is proposed.Based on the indexing mechanism,in order to speed up the query speed,improve the query efficiency,refine the pruning granularity,we define the filter set,which includes the candidate filter set and the judgment filter set,using the filtered data to prune the index,Large increase in the speed and efficiency of computing,refined the pruning granularity.For the overall structure of the algorithm,we define the MapReduce chain to better integrate with the indexing mechanism to achieve parallel operation in a distributed environment,to achieve different query positions for different query conditions,the index need to be established once can be used repeatedly,greatly improving the query efficiency.In operation,we define a candidate set and a judgment set,wherein the candidate set and the judgment set are respectively generated based on the candidate filter set and the judgment filter set.Use the judgment set to filter the candidate set,and then without the establishment of intermediate judgment data points,which greatly saves the storage space.
Keywords/Search Tags:MapReduce model, Reverse Skyline query, Distributed architecture, Massive Data, Index, SR tree
PDF Full Text Request
Related items