With the widely application of LBS (Location Based Services) and mobile internet technology, the information data of spatial location is explosively growing. Because of the growing large-scale data, traditional spatial data index and kNN query processing are facing with new challenges. The big data makes the disk access time increase greatly in the traditional in-memory spatial index. A new scalable and distributed index structure is needed for the high efficient spatial query. There are still some shortages for the few existing distributed spatial index, such as distributed R-tree index and distributed Voronoi-based index. R-tree index does not scale due to its traditional top down search that overloads the nodes near the tree root, and fail to provide full decentralization. Voronoi-based index is only fit for processing the static dataset and query point. When dynamic data objects are added, it should rebuild the Voronoi-based index. It is difficult to process kNN query for the large-scale data, owing to the lack of effective index structure.MapReduce, which is a high efficient distributed computing model, comes into being under the background of cloud computing and massive data. And it quickly has been widely used in the industry. We first proposed a new distributed inverted grid index structure by combing spatial grid partition and inverted index. The flat structure and loose coupling architecture of inverted grid index are believed to process the large-scale spatial data in the distributed framework. At the same time, we design the constructing inverted grid index algorithm based on the MapReduce model. We improve the traditional kNN query algorithm based on inverted grid index and design a new parallel query algorithm, which is ParallelCircleTrip. What’s more, we do extensive experiments to evaluate the efficiency and scalability of our index structure and kNN query algorithm. The results demonstrate the efficiency of our index structure is significantly higher than the distributed R-tree index and Voronoi-based index. The parallel query based on the ParallelCircleTrip algorithm also reached the liner expansion. |