Font Size: a A A

Query Processing And Optimization Of Unstructured Data

Posted on:2016-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:X JinFull Text:PDF
GTID:2308330470967697Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, unstructured data such as text, image, video and audio become more and more. Unstructured data management systems which can store and manage massive unstructured data come out. Query processing and optimization of unstructured data is an important problem in management of unstructured data. In comparison with query processing of structured data, query processing of unstructured data has two other operations, that is similarity search and similarity join.The paper concludes NoSQL databases which can manage unstructured data and Hive which supports distributed query processing. The paper also concludes the concepts, classifications and algorithms of similarity search and similarity join. Then, the paper proposes D-Search, a framework of query processing of unstructured data based on D-Ocean unstructured data management system. It supports similarity search and similarity join queries of unstructured data and simple queries of structured data. It also supports rule-based optimization and cost-based optimization of unstructured data. The paper also supposes query processing algorithms of similarity search and similarity join and their cost estimation methods. The experiments have verified the feasibility of query processing algorithms and reasonability of cost estimation methods. At last, to solve topic similarity join of text, the paper suppose the problem of KL distance based similarity join, a series of algorithms and their experimental comparison.
Keywords/Search Tags:Unstructured data, Query Processing, Query Optimization, Similarity Search, Similarity Join
PDF Full Text Request
Related items