Research On KNN Query And Join Method In Location Service Based On Entity Density

Posted on:2019-09-13

Degree:Master

Type:Thesis

Country:China

Candidate:K K Yang

Full Text:PDF

GTID:2428330572955474

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of mobile Internet and smartphones,Location Based Service(LBS)has gradually become a basic Service.In LBSs,users often need to consider the spatial density of entities.How to organize spatial data and label the density of entities to improve the quality of service are valuable research topics.On the other hand,LBS business is highly dependent on the rapidly growing spatial data.Also users have generated new spatial data with location information in the process of using LBSs.Massive spatial data has a high demand on efficient organization method and efficient query algorithm.The main work of this paper is to design a reasonable algorithm to meet the real-time and validity requirements of spatial query in the face of massive spatial data.K-nearest Neighbor(kNN)query is an important operation in LBS.In the existing research of kNN query for spatial data management,little attention is paid to the location relationship between query results,and solid density factors are not considered,thus resulting in the situation that the query results are correct but not necessarily reasonable.In this paper,starting from the density attributes of spatial data,the node splitting operation in the process of R-tree establishment and the density attribute values are added.Based on the splitting methods of R-tree and R*-tree,DR tree(Density R-tree)and its splitting method are proposed,and the range query algorithm based on DR tree is designed.The density attribute value of DR tree is used to reduce the traversal times and the number of entities contained in the last query of kNN query,further optimize the kNN query algorithm,and improve the effectiveness and efficiency of kNN query.It achieves the validity requirement of massive spatial data query.For massive spatial data,the traditional centralized processing method is difficult to meet the requirement of real-time.It is inevitable to divide the big data set into small data sets,and then send each data into the distributed system for processing.The partitioning technique is the first and important step in distributed kNN join query.The existing block partitioning methods have the disadvantages such as inconsistency of geographic location between blocks,uneven size of blocks,large extra cost of blocks,and incompatibility with existing systems.In this paper,by combining the Minimal Bounding Rectangle(MBR)characteristics of R tree and quadtree,the height of the tree can be fixed and the geographical position correlation characteristics can be divided into blocks.In this paper,a new block method,QR tree block algorithm,is proposed,which achieves the geographic correlation on the basis of a single scan data set.On the basis of QR tree block filtering algorithm,QR tree block filtering algorithm is proposed,which can largely filter out irrelevant blocks.In the first MapReduce phase of distributed kNN connections,the number of task startup is greatly reduced through reasonable partitioning and filtering techniques,and in the best case,it is reduced to about 1/20 of the original number of tasks.Mass data is divided into blocks,and parallel tasks are minimized by filtering technology,finally meeting the real-time requirements of massive spatial data queries.Finally,a demonstrable spatial query system is implemented based on Java EE platform.The system supports DR tree range query,DR tree kNN query and QR tree kNN join.Experiments on two real datasets of Nanjing Restaurant and Beijing Taxi show that DR tree range,DR tree kNN query and QR tree kNN join query have obvious advantages in effectiveness and efficiency.

Keywords/Search Tags:

Spatial data, LBS, Entity density, kNN, Distributed kNN

PDF Full Text Request

Related items

1	Research On Spatial Clustering Algorithms
2	Research On Distributed Spatial Join Algorithms For Large Scale Data
3	Data Mining Method Based On Spatial Database Research
4	Large-scale spatial data management on modern parallel and distributed platforms
5	Spatial Data Analysis Based On Distributed In-memory Computing
6	A Research On Spatial Data Mining
7	Research On The Techniques Of Entity Identity On XML Data
8	Clustering Algorithm In The Spatial Data Mining
9	Study Of Distributed Real-time Data Flow Density Clustering Algorithm Based On Storm
10	Distributed Location Estimation of a Moving Target Characterized by a Spatial Poisson Field