Skyline query is widely used to calculate multi-objective decision-making problems,and has become a hotspot in the research fields of spatial database and spatio-temporal data retrieval.With the arrival of the big data era,the scale of spatial data sets that people need to collect and process with is growing rapidly,how to perform efficient Skyline queries on massive data has received unprecedented attention.As an I/O and CPU sensitive operation,Skyline query for massive data in stand-alone machine is time-consuming and poor real-time,and Skyline query under distributed architecture is an effective means to solve the problem.The Map Reduce programming model makes it possible to use a commercial computer cluster to handle large-scale computing tasks.Hadoop,released by the Apache Foundation,is an open source implementation of the Map Reduce model.It encapsulates the complexities of resource scheduling,error handling,Processing details,with good fault tolerance and scalability,widely used in large data processing occasions.The Map Reduce-based Skyline query can be run on the distributed machine cluster to execute the massive spatial data query and return the correct result set.However,as a general distributed computing framework,Map Reduce does not provide built-in spatial data indexing mechanism.This makes the spatial data query to traverse all the input data,so the query efficiency is very low.Therefore,performing Skyline queries on the Map Reduce model,we need to consider how to divide the original dataset reasonably according to the position information,and use efficient index structure to prune the data efficiently.In this paper,a hierarchical spatial data indexing mechanism based on R-tree is designed to optimize the parallel Skyline query and dynamic Skyline query under massive data.By pruning the index at the global and local levels,it can effectively reduce the amount of data to be scanned and reduce the number of invalid comparisons.For the dynamic Skyline query algorithm,after the process of partial pruning is accomplished,the data set with strong domination ability is extracted to further filter out the irrelevant data and further improve the query rate. |