Font Size: a A A

Distributed Processing Of K Nearest Neighbor Queries Over Large--scale Streaming Data

Posted on:2022-03-03Degree:DoctorType:Dissertation
Country:ChinaCandidate:M YangFull Text:PDF
GTID:1488306608479924Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
In recent years,with the wide application of wireless networks and intelligent mobile terminals,online service sources continue to generate streaming data,data size presents explosive growth,and the real-time processing capability of massive data is urgently demand.In many online services,K nearest neighbor,knn inquiry is one of the core issues of many basic services,has always had important research significance.In the online service,the user's response speed is extremely high,and how to improve query efficiency when faced with large-scale stream data and high-concurrent k neighbor query,reducing the response time of queries is a key issue that needs to be resolved in this thesis.Traditional single-machine-based centralized stream data K nearest neighbor query algorithm,multi-use linear processing mode processes the query one by one,which will result in a large latency of subsequent queries.In addition,the scalability of single-machine centralized query algorithms is poorly scalable,and it is difficult to deal with sharp growth data scale and query quantity.The distributed processing architecture has good scalability and high parallelism,which provides an effective way to solve the above problems.Most of the existing K-neighbor query algorithms determine the final query results by multiple iterations,unseasive iterative times will lead to high communication costs in distributed environments,so there is poor efficiency in distributed environments.Design brings new challenges.This thesis will focus on the problem of KO neaser Query of large-scale stream data below the distributed computing environment,specifically,the mobile object K nearest neighbor query problem based on European distance,mobile object conflict k nearest neighbors,and K-neighbor to high-dimensional text data.Query problem.The specific research content is as follows:K nearest neighbor query problem of moving objects based on European distance,is a given query object and a set of moving objects,and continuously returns to the nearest K object location of the Queen's distance from the query object when the object is constantly moving.Obviously minimizing query time is the most important purpose for solving this problem.This thesis provides an efficient distributed solution for Kneighbor queries for a large number of mobile data.First,a distributed block grid index(BGI)structure is designed herein.BGI is a two-layer index based on the grid.The top layer is a grid structure and divides the query area to a size equal,and there is no overlapping cell.Based on BGI,this paper further proposes a distributed mobile object K nest query algorithm DBGKNN.DBGKNN ensures that the query results can be obtained in both iterations,which can effectively control communication costs between computing nodes in a distributed environment.At the same time,the results of the query algorithm returned in the problem one will only have the results of the query algorithm in the problem one when it is only queried at the same time.Specifically,at time t,give multiple query objects,and a set of moving objects,the results returned by each query should be different from each other and the average Eue's distance from concurrent query points is minimized.In this question,after conflicting the results of concurrent queries,re-query,will result in an inefficient number of iterations,increasing communication costs.This thesis is based on grid index,which is a distributed solution that meets the Master-Worker mode.The program will lead to a collision query point in the primary node screening in the query,to make a collaborative query to ensure that the query results can still be returned within the controlled iteration.This is the second question studied herein.The third problem studied in this paper is the K-nearest neighbor query problem for high-dimensional text data,which has important value in content-based real-time recommendation.This article uses a popular method in the recommendation systemcontent-based filtering,which is mainly based on the characteristics of the data to describe/represent the user's interest[1][2].These high-dimensional points can be converted into points in the same space by many methods,such as tf-idf calculation or item embedding[3][4].Therefore,the problem of recommending content to the user in real time can be transformed into a K-nearest neighbor query of the point representing the user.This paper finds a scalable solution in a distributed environment:proposes a new distributed index called distributed ring index(DAI)and DAI-based distributed high-dimensional K-nearest neighbor query processing algorithm(DHDS-KNN).This paper implements a variety of index structures on Storm,and conducts extensive performance evaluation of the proposed K-nearest neighbor query algorithm through real data.The experimental results strongly support the efficiency of the Knearest neighbor query algorithm proposed in this paper in a distributed environment.The algorithm proposed in this paper can efficiently realize real-time query of massive moving objects and high-dimensional text stream objects.
Keywords/Search Tags:K nearest neighbors, Streaming data, Distributed Computing
PDF Full Text Request
Related items