Research On Chinese XML Information Retrieval System

Posted on:2005-05-29

Degree:Doctor

Type:Dissertation

Country:China

Candidate:W M Qu

Full Text:PDF

GTID:1118360122493287

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

XML information retrieval syslem differs greatly from traditional information retrieval system in the construction of both inverted text index and structural index, the evaluation of both keyword based query and structure based query, and the effects of structural information on result relevance ranking. To manage large-scale XML documents with complicated structure, this dissertation focus on the efficient structural indexing algorithm for XML data, result size estimation problem for XML structure based query optimization, result relevance ranking algorithm, and infrastructure for XML query processing for both text-rich and data-rich XML documents. To address the aforementioned issues, this dissertation makes the following contributions. First, it investigates the drawbacks of existing indexing algorithm for XML data, and propose a dynamic indexing algorithm for XML data based on D-bisimilarity, DifX. It can dynamically determine the structure information need to index according to real query loads and optimization of index. Second, to consider the effects of structural information on result relevance ranking, this dissertation proposes a ranking algorithm that consider both the frequency distribution and structural distribution of keywords in the result, and a dynamic element-oriented method to compute the weight of keywords. Experimental results prove the effectiveness of our solution. Third, this dissertation analyzes the complicacy of result size estimation problem for XML structure based query optimization compared to its counterpart in traditionally relational database, and proposes a full-featured result size estimation algorithm for XML query, SXM. For simple path expression query, this dissertation proposes a dynamic synopsis model for XML data based on the concept of F-stable and B-stable, XMap. For complicated path expression query, this dissertation adopts an improved Bifocal sampling method for result size estimation. For value predicate in XML query, this paper proposes a wavelet-based multi-dimensional histogram for the result size estimation. Finally, SXM integrates the three estimation algorithms mentioned above by XMap scheme to provide estimation for the whole XPath query. Fourth, this dissertation presents W2X (Way to XML), a prototype of Chinese XML document retrieval system developed by us. W2X have several merits: to begin with, it can retrieve Chinese XML document; moreover, it can process both text-rich XML data and data-rich XML data; besides, it adopts efficient indexing algorithm and query processing algorithm introduced by this dissertation, which enables W2X to manage large-scale XML data.To summarize, our works make XML document retrieval system more efficient, accurate and practical.

Keywords/Search Tags:

XML information retrieval system, indexing algorithm, relevance ranking algorithm, result size estimation

PDF Full Text Request

Related items

1	Based On Xml Chinese Web-retrieval Model
2	A Result Size Estimation Algorithm For Value Predication In XML Query
3	Ranking Algorithm Based On The Semantic Retrieval Of Lexical Semantic Tree
4	Research And Application Of Diverse Ranking In Information Retrieval
5	A relevance feedback-based system for quickly narrowing biomedical literature search result
6	Research Of Retrieval Results Ranking Model Based On The Relevance Feedback Technology
7	Research On Information Retrieval Ranking Optimization Methods
8	Research On Relevance Ranking
9	Research Of Result Optimization Of Information Retrieval
10	The Research On Retrieval Of XML Document Educational Resources Based On Celts-3 Standard