Study On Similarity Search For Textual And Spatial Data

Posted on:2019-09-21

Degree:Doctor

Type:Dissertation

Country:China

Candidate:M H Yu

Full Text:PDF

GTID:1368330590451417

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

In the information age,applications such as search engine,location-aware ser-vice(LBS)and office automation system become more helpful in daily life.In these systems,the most important part is data processing,especially data searching.As there are many problems including typo errors and multiple attributes data existing in the real datasets,similarity search becomes an efficient strategy to process data.To address this issue,this thesis studies on similarity search for textual and spatial data and provides efficient algorithms,including:1.A Disk-based Textual Similarity Search Algorithm for Large Data:To address the textual similarity search problem on large scale of dataset,this thesis provides a binary tree index based on the length of texts.With the help of this index structure,this thesis designs a filter-verification framework,and based on this framework this thesis presents algorithms for threshold-based and top-k textual similarity searches.In addition,this thesis extends these algorithms and provide a disk-based algorithm to support large dataset.In the experiments,this thesis evaluated the efficient of our index and framework and compared with existing methods.The results showed our algorithms achieved highest performance than state-of-art algorithms.2.A Rt-Tree-based Spatio-Textual Similarity Search Algorithm:This thesis presents an Rt-tree-based algorithm for location-aware publish/subscribe problem which constructs index by integrating tokens from subscriptions into Rt-tree nodes.To improve the power of textual pruning and reduce the size of index,this thesis selects represent token from each subscription and stores them into Rt-tree nodes.In addition,this thesis extends our algorithm for ranking semantics.The experi-mental results showed our algorithm can efficient deliver messages to corresponding subscriptions by pruning a large number of unnecessary subscriptions.3.An Adaptive Textual Spatial Search Algorithm:To efficient support multiple types messages delivery in a location-aware publish/subscribe system,this thesis provides a cost-based algorithm which can adaptive select better strategies for different type of data.To be specific,this thesis first designs a trie-based index with spatial information and a quadtree index which integrates token on each node.And this thesis also provides corresponding algorithms based on these two indices for location-aware publish/subscribe problem.To process multiple types of data,this thesis presents a cost-based algorithm which always adaptive select lower cost algorithms between trie-based and quadtree-based algorithms to filter and verify data.In the experiments,this thesis evaluated the performance of our algorithms and compared them with existing algorithms.The results showed the cost-based algorithm can achieve highest performance for multiple types data as it always selected more adaptive methods in filtering and verification.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Low Complexity Signal Detection Algorithms For SM-MIMO Systems
2	Research On The Complexities,Theoretical Values Of Computer Games And Some Search Algorithms
3	Similarity Graph-based Scientific Literature Search Key Technology Research
4	Research On The Evolutionary Search Algorithms In The WEB Based On Learning
5	NAQ-tree: Effective index structure for similarity search in high dimensional space
6	Tree-based Semantic Similarity Of XML Documents
7	Research On The Key Techniques Of Graph Similarity Measure
8	Research On Self-Healing Disk Array Technology
9	Representing And Indexing Trajectories For Efficient Similarity Search With TRI Framework
10	Low-complexity Detection Algotithms In Massive MIMO System