Font Size: a A A

A Novel Spatial Data RKNN Algorithm Based On Mapreduce

Posted on:2014-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2248330398950535Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years, with the development of mobile communication technology and geographic information technology, location-based spatial data present its exponential growth. Among the location-based spatial data queries, RkNN (Reverse k Nearest Neighbor) query is one of the most popular data mining applications in different fields. The result of RkNN query returns the objects around the query point, which contains q as their kNN (k Nearest Neighbor). However, the existing RkNN methods have been difficult to meet the rapid growth of data speed and real-time query requirements from users.In this thesis, we combine the traditional RkNN algorithm and the MapReduce distributed framework to analyze how to solve the large-scale spatial data RkNN query. MapReduce is a model proposed by Google in2004, which is used for parallel processing and generating large data sets. While MapRduce is one of the off-line computing models in distributed computing, it is hard to obtain real-time calculation results. Hence, in this thesis, we adopt offline and online combination system model. We complete the inverted grid index creation and updating works offline using MapReduce, at the same time, return RkNN query results online. Firstly, we investigate the Basic-MRkNN query method based on inverted grid index over large scale spatial dataset. Secondly, we propose two optimization methods:quick fix Lazy-MRkNN query algorithm and incremental Eager-MRkNN query algorithm. In order to reduce network and disk I/O costs, in the filter stage, we use some prune rules to improve our distributed algorithm performance. In addition, we do a lot of experiments using real dataset and simulated data in32nodes cluster. We indicate that our algorithm is more efficient than Voronoi-based distributed method by more than50%. Moreover, two optimization algorithms are both superior to the basic method. While Eager-MRkNN method is suitable for processing intensive data, Lazy-MRkNN method is more appropriate for sparse data.
Keywords/Search Tags:RkNN query, MapReduce, Distributed computing, Spatial Data
PDF Full Text Request
Related items