Font Size: a A A

Scalable Collective Spatial Query Technology Research

Posted on:2016-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:P J HeFull Text:PDF
GTID:2348330536967374Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the accumulation of geo-tagged data,the development of social Web 2.0,the usage of Geo-Query system,the rapid growth of mobile geo-based product,the application of spatial keywords query and its query algorithm become the hot issue.In the real product environment,the customers have very high demands on the power of real-time for one query.The tradition algorithms actually are all based on one or another special index which faces some problems such as cannot use the full power of servers,consume large amount of time,hard to create and maintain the index.In order to satisfy the real-time feature and the huge amount of data,my work focus on the promotion of query speed when the data set scale is very large.In this paper,I propose two kinds of spatial keywords query algorithms based on the parallel Spark RDD model and also the “Grid” index which aim to decrease the IO cost during calculation.On the other hand,I propose the proceeding of algorithms,the structures of the algorithms,the building of index and how to preserve the index.Finally,through groups of experiments,I analyze the effectiveness of the algorithms,the speed of the algorithms and the scalability of the algorithms.The main contributions of this paper are as follows.(1)This paper proposes a half-parallel algorithm to solve the problem called Min_Sum in the field of spatial keywords query.Based on the feature of Spark RDD model,the base theory of spatial query and the features of spatial data,I propose the definition of the algorithm,the structure of the algorithm,the real example of this problem due to a kind of half-parallel “array-coding-system”.(2)This paper proposes a kind of index called “Grid” to decrease the IO cost.Through the analyzing of speed between centralized algorithm and the half-parallel one,I propose a set of methods which aim to build,preserve and use the index.(3)This paper proposes a two-step Min_Sum algorithm based on the “Grid” index.Due to the decrease of IO cost,the speed of this algorithm increase a lot on the limit of the number of keywords.I describe the algorithm and its structure.(4)This paper proposes a parallel algorithm to solve the problem called MaxSum_CosSKQ in the field of spatial keywords query.Based on the deep digging into the problem and the help of “Sharing parameter”,“Data reuse”,“Cartesian product” techniques in Spark RDD model,I proposed a parallel MaxSum_CoSKQ algorithm which has keyword-free feature and also described the algorithm,its structure and real example.Finally,I analyze the use of former “Grid” index.(5)I design some experiments to analyze the performance of the two algorithms,the performance of the index and their weakness.By performing the experiments on Amazon AWS,I mainly analyze the speed of algorithms,the influence of IO cost and the influence of server scale.Finally,I summarize and assess the result fairly.
Keywords/Search Tags:spatial keyword, scalability, Spark, Grid index, parallelism
PDF Full Text Request
Related items