Research On Distributed Spatial Join Algorithms For Large Scale Data

Posted on:2022-06-09

Degree:Master

Type:Thesis

Country:China

Candidate:R B Wang

Full Text:PDF

GTID:2518306740462514

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the popularity of mobile devices and the development of satellite positioning system,massive spatial data are produced.Large scale spatial data contains rich value.Spatial data analysis and mining is a significant work.Spatial join is a basic operator of spatial data analysis,which has a wide range of application scenarios.However,the distributed implementation of this operation is not perfect.The idea of divide and rule is used to realize the distributed spatial join.Firstly,the whole spatial scope is divided into several small-scale spatial partitions,and then the data in each spatial partition is calculated in parallel on the distributed cluster by using the single-machine spatial join algorithm.However,the choice of space range in the existing technology is too large,resulting in too many invalid calculations.The division of spatial partition does not take into account the both spatial distribution of the two datasets,which leads to the problem of load balancing.There are still many areas to be optimized in the implementation details of parallel computing.In addition,the support for the types of spatial join and spatial data types is not perfect.Based on these,this thesis makes a comprehensive and detailed research on distributed spatial join:(1)A distributed spatial distance join algorithm is proposed.Firstly,the range of the whole spatial area is narrowed,and the invalid data that does not contribute to the final result is filtered efficiently.Secondly,considering the spatial distribution of the two datasets,the two datasets are used to divide the global domain,and the two spatial partitions are obtained and combined to generate a spatial partition set that takes into account the spatial distribution of the two datasets,so as to achieve load balancing in distributed computing.In addition,a special optimization for spatial distance self-join is made.Finally,a comparative experiment is carried out with the global spatial data,and the experimental results show that the performance of the proposed spatial distance join is better than the existing technology.(2)A distributed spatial k-nearest neighbor join algorithm is presented.Firstly,two rounds of computing scheme of k-nearest neighbor join are given.The minimum expansion distance of space object is obtained in the first round,and the exact join result is obtained in the second round.Then the shortcomings of the two rounds of calculation are analyzed,and a reasonable optimization strategy is given,which greatly reduces the data transmission through networks and unnecessary calculation.Finally,a comparative experiment is done based on the global spatial data.The experimental results show that the performance of the proposed spatial k-nearest neighbor join algorithm is better than the existing technology,and the effect of the proposed optimization strategy is obvious.The proposed k-nearest neighbor join supports all types of spatial data,and has strong versatility.(3)Based on Spark distributed computing framework,the proposed algorithm is implemented and packaged as API.Firstly,the proposed distributed spatial join algorithm is implemented by using the API provided by Spark.Then,the code implementation is encapsulated as an API for third-party use,including RDD encapsulation based on Spark Core and SQL statement encapsulation based on Spark SQL.

Keywords/Search Tags:

Distributed Computing, Spatial Join, Spatial Partition, Spatial Data, k Nearest Neighbors

PDF Full Text Request

Related items

1	Research On Detecting And Resuming Incomplete Spatial Data
2	The Research On Nearest Neighbors Query Technologies In Spatial Network Databases
3	Research On Key Techniques Of High Performance Spatial Query Processing For Large Scale Spatial Data
4	Optimization Of Complex Spatial Join Query Based On WFS Services
5	The Query Method Of Reverse K Nearest Neighbor Query In Spatial Database
6	Research On Spatial Data Index Technology And Its Application
7	Research On Spatial Pattern Matching For Spatial Data
8	The Query Techniques Of Spatial Database On R-tree
9	Spatial Data Analysis Based On Distributed In-memory Computing
10	Research On Spatial Join And Variants Of Nearest Neighbor Query