Font Size: a A A

Research On Key Techniques Of High Performance Spatial Query Processing For Large Scale Spatial Data

Posted on:2014-08-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:1108330479479527Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Revolutionary progress of spatial data collecting techniques like earth observing, GIS, sensor networks, and so forth, dramatic decrease of the price of storage devices, as well as the desirability of people to extract information from the spatial data give birth to the so-called big spatial data, spatial data management technologies ushered in the era of big data. Traditional parallel computing and spatial database techniques encounter some difficulties such as limited scalability, limited data types support and analyzing and querying large volumes of data impossibly when facing the deluge of spatial data. In recent years, Map Reduce technologies rise suddenly as a new force; the technologies achieve high scalability and high performance by the distributed and parallel computing on clusters to meet the need of analyzing increasing large volume of data, and have been an alternative of the main technology for analytical big data management. Therefore, how to design efficient parallel spatial query and analysis algorithms in Map Reduce has become a research trend. Recently, the research of high performance spatial query processing for large scale spatial data has caught the attention of researchers, a growing number of spatial query and analysis algorithms have been being implemented in Map Reduce. However, those implemented algorithms require more improvements and some unimplemented algorithms need to be redesigned in Map Reduce. Aiming at practical application requirement, this dissertation studies the problem of high performance spatial query processing for large scale spatial data with the mainstream analytical operation technologies by taking the idea of traditional parallel spatial query algorithms together with considering the domain specialty of spatial data management. The main contributions of this dissertation include:1. Parallel computing model abstraction and spatial data modelingTo guide the designment of spatial query algorithm in Map Reduce, we abstract the Map Reduce parallel program model and design the key-value storage model of spatial data. Firstly, we use the formalization of independent parallelism and sequential synchronization(IPSS) computation to abstract Map Reduce parallel program model, and then we analyze the cost model of different phases in Map Reduce. Secondly, considering that the traditional extended object-relationship model for spatial database is not compatible to Map Reduce framework, we first analyzed the spatial information model, and then designed the key-value storage model for spatial data.2. The problem of parallel constructing spatial indexFirstly, focusing on the constructing requirement of R-tree index for large scale spatial data, Map Reduce combines with Hilbert curve and random sampling method to generate spatial partition function, which make the process of building R-tree conform to the Map Reduce IPSS model, then the algorithms in Map and Reduce phase are designed respectively, finally experiments from the aspects of load balancing, time performance and quality of generated R-tree have demonstrated the efficiency and effectiveness of the proposed methods. Secondly, a batch-parallel tile pyramid building algorithm based on Map Reduce is proposed. The formal descriptions of the pyramid-building tasks as well as the decomposition algorithm is given in the first place, then the details of the processing steps in the map and reduce phases are depicted. A filter optimization method is proposed. The experiment results show the feasibility, efficiency and scalability of the proposed approach.3. The problem of parallel spatial joins aggregate queryThe users are more interested in the summarization information from the results of spatial join for large scale spatial data. Therefore, two different parallel spatial join aggregate algorithms are proposed. Firstly, to explore the non- indexed spatial join aggregate in Map Reduce, a Map-Reduce-Filter-Merge(MRFM) is proposed. Map step divides the total spatial join aggregate into disjoint sets, then Reduce step join and aggregate each set individually, the Filter step will filter those aggregate results for single assignment spatial objects, finally Merge step merge the aggregate results for multiple assignment spatial objects using an efficient merge algorithm. Secondly, we introduce R-tree to accelerate spatial join aggregate in Map Reduce. A distributed R-tree index structure is presented to index the large scale spatial data. The algorithm first makes use of the distributed R-tree index to generate the parallel tasks. Those tasks meet the IPSS model. The details of processing in Map and Reduce phases are provided. Compared with the non- indexed algorithm, the algorithm is straightforward and the performance is also improved.4. The problem of parallel k nearest neighbor joins queryThe key of processing k nearest neighbor joins(knn J) in Map Reduce is how to partition data, the existing approach adopt the block partition methodology usually, which does not scale well for large spatial data because of the quadratic number of the partitions produced. In light of its limitation, we introduce a novel approach; R-tree based knn J with Hadoop(H-rknn J). In the process of H-rknn J, the k nearest neighbor(knn) expanded bounding box is introduced to limit the knn query range and partition data, and then the generated R-tree is used to execute knn J query in parallel fashion, achieving high performance. We analyze the communication and computation cost in theory. The experiment results and analysis in large real spatial data have demonstrated that H-rknn J could efficiently resolve the large scale knn J spatial query in Map Reduce environment, and has a good practical application.
Keywords/Search Tags:Big data, Spatial data, MapReduce, High performance, Spatial query, R-tree index, Tile pyramid, Spatial join aggregate, knn join
PDF Full Text Request
Related items