Font Size: a A A

Efficient Spatial-textual Analysis Based On Distributed Environments

Posted on:2020-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y XuFull Text:PDF
GTID:2428330620959989Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,the location-based services are being more and more popular with the fast development of mobile internet.Typical applications of the location-based services include map applications,food delivery applications,social applications and etc.These applications have been generating huge amounts of spatial-textual data everyday(e.g.,each tweet has its location and textual content;and every restaurant has its location and textual tags).In the meantime,these applications have also derived several kinds of spatial-textual queries.Common examples of the spatial-textual queries are as follows:(1)The restaurant recommendation applications can help users to find the restaurants,which are within 100 m of user position and have the textual tag of "barbecue".(2)The social applications can help users to find 10 new friends based on textual similarity of their interest tags and their spatial distance.In this case,it is apparent that spatial-textual data analysis is very useful in real-life applications.However,with the surging data size,current spatial-textual techniques,which are implemented in centralized environments,cannot fulfill the requirements of high throughput and low response time.Recently,the Spark platform,which is a distributed in-memory computing platform,is being more and more popular.Besides,many works have proposed distributed solutions for big data scenario based on Spark platform.Compared to Hadoop based solutions,the Spark based solutions can achieve lower latency and higher throughput.In this case,to tackle the performance issues of traditional central environments based solutions,we will explore the solution of distributed spatial-textual analysis based on Spark platform.To put it simply,our distributed solution based on Spark platform is developed as follows.Firstly,we will introduce the analysis framework for spatial-textual data based on Spark platform.The framework extends the SQL programming interfaces,SQL parser and underlying execution engine of Spark SQL module to support multiple kinds of spatial-textual queries.Besides,it can further optimize the analysis workflow with the help of the proposed two-level index framework,which includes:(1)efficient and scalable global index that is stored in the master node to prune local partitions;and(2)local indexes that are stored in each slave node to further optimize the query.Then,several index structures are introduced to solve the four kinds of spatial-textual queries that are researched in this paper(including the Boolean Range Query,Boolean k Nearest Neighbors Query,Approximate Range Query and Spatio-textual Similarity Join Query).We also explore the distributed algorithms for these queries based on the two-level index framework and the proposed index structures.The proposed distributed algorithms can achieve good performance through two-phases filtering.Finally,the experiments conducted on large-scale real data sets have demonstrated the promising performance of the proposed indexes and our distributed solution.
Keywords/Search Tags:Spatial-Textual Analysis, Distributed Computing, Spark Platform, Indexing Technique
PDF Full Text Request
Related items