Font Size: a A A

Spatio-Textual Data Publish/Subscribe Study

Posted on:2019-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y R TangFull Text:PDF
GTID:2428330548967494Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the popularization of smart phones,a large amount of spatio-textual data has been generated in various social applications.Spatio-textual data have text attributes and spatial attributes.How to analyze large-scale spatio-textual data and get the maximum economic benefits become the focus of people's attention.As one of the effective ways to process spatio-textual data,Publish/Subscribe has attracted the attention of both academic and industrial scientists.However,Publish/Subscribe in existing papers can't achieve efficient matching of subscriptions and messages,and is less expressive.This paper studies two aspects about spatio-textual data Publish/Subscribe in order to make Publish/Subscribe fit in a large-scale subscription environment.First,this paper studies the Boolean Expression Publish/Subscribe.An efficient local index structure TR-tree and matching algorithm are designed for Boolean Expression Publish/Subscribe.TR-tree has two parts:a text index and a spatial index.TR-tree is stored in the match nodes which performs matching of the subscriptions and the messages.Text index groups subscriptions based on the key attributes and the number of predicates.Text index also uses a list of operators to store predicate-value pairs.The same predicate-value is stored only once in order to reduce the requirement of computer memory.Spatial index constructs an R-tree based on the key attributes and the number of predicates to reduce the searching space.The efficiency of the TR-tree structure is demonstrated in experiments.Second,this paper studies the methods for the distributed Publish/Subscribe.For a distributed Publish/Subscribe,the spatio-textual data partition method is proposed.In addition,a global index structure Gindex and a frame DSTSP are designed.In view of the skew of the query result in unbalanced load,a load balancing strategy is proposed.Gindex is stored in the dispatch nodes of the distributed system,and the spatio-textual data are divided according to the space attributes and the text attributes.DSTSP consists of dispatch nodes,match nodes,and result integration nodes.The dispatch nodes are mainly responsible for distributing the subscriptions or messages to the corresponding match node.the match nodes are responsible for the matching of the subscriptions and the messages.The result integration nodes are responsible for integrating the matching results to obtain the final result and send the message to the subscribers.In addition,the dispatch nodes can determine the load of the system based on the collected information.If the match nodes are overloaded,the cost model is used to calculate the cost of partition,and then the partitioning is performed.
Keywords/Search Tags:Spatio-Textual Data, Publish/Subscribe, Index Structure
PDF Full Text Request
Related items