Research On Rich-text XML Document Retrieval

Posted on:2007-03-13

Degree:Master

Type:Thesis

Country:China

Candidate:T J Jiang

Full Text:PDF

GTID:2178360212958663

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

XML is a self-describing and extensible language, which specifies the contents as well as the structure information. There has been an exponential increase in the amount of the XML documents in Web pages on Internet, commercial text repositories, digital library and so on, and naturally, efficient information retrieval from these great amounts of XML documents is becoming extremely important.Based on the content, the XML documents have two views: the document-centric view and the data-centric view. Querying data-centric XML data is a well-explored topic with powerful database-style query languages such as XPath and XQuery set to become W3C standards. An equally compelling paradigm for querying document-centric XML documents is IR-style search on textual content.Information retrieval (IR-style) is different from database search (DB-style) in that the former is a process of inaccurate, vague and partial match. An XML document is semi-structural data with hierarchical structure and text contents. Information retrieval over XML documents can't be extended from traditional IR directly, the reason lies in that: (1) traditional keyword search don't leverage the structure information of XML documents, however, XML information can be retrieved by not only content condition but also structure (path) condition, which requires the integration of full text search and structure query; (2) XML retrieval with structure information returns XML elements (or fragments) in documents, whereas traditional information retrieval returns the entire documents; (3) unified ranking mechanism to consider vague content and structure (VCAS) retrieval; (4) the weight of node is influenced by different factors in XML retrieval.In this paper, we analyze the features of XML documents in view of information retrieval, and discuss the vagueness of user's query in natural language and influential factors of ranking VCAS retrieval results. Then, utilizing logical integrity of answer node, we analyze the factors of XML vague retrieval about relaxation on structure and content, and propose the way how to find the best query granularity from the query expression extracted from natural language vague query and search paths in XML tree. Based on these, a ranking model is designed to handle these new features; search engine is also realized in VC. Furthermore, to distinguish the user to the structure familiar degree, we propose a new idea of confident structure query and vague structure query, a new configurable ranking model is presented also, which is designed to handle these...

Keywords/Search Tags:

XML Retrieval, Answer Node, Weight, SCAS, Ranking

PDF Full Text Request

Related items

1	Research And Application Of Answer Ranking And Question Retrieval In Community Question Answering System
2	Mutual Promotion Of Question Retrieval And Answer Ranking In Community Question Answering
3	Research Of Answer Ranking Method Based On Weighted Keywords
4	Research On The Sorting Method Of Cross-modal Candidate Answers In Community Question And Answer
5	Research On Answer Summarization Method In Question-answering Community By Integrating Answerer Ranking Score
6	Research On Candidate Answers Ranking For Temporal Question
7	Study On The Answer Ranking Based On Deep Learning Methods
8	Inferring answer quality, answerer expertise, and ranking in question answer social networks
9	Research On Answer Selection Ranking Based On Attention Mechanism
10	Research On The Technology Of Content-based Re-ranking Video Retrieval