Font Size: a A A

XML Documents Retrieval Based On Bayesian Network

Posted on:2007-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:B F ChaiFull Text:PDF
GTID:2178360182485685Subject:Computer applications
Abstract/Summary:PDF Full Text Request
The information on the Web is various. In order to realize a simple Web query efficiently and expediently, more and more information is described by XML data model consistently. And XML is quickly becoming the standard for data presentation and data exchange over the Internet, and is also specially designed for the web application instead of HTML. The emergence of XML sustains dealing with both the concepts and structures of the information forcefully. The researchers think much of XML document retrieval increasingly. XML information retrieval system differs greatly from traditional information retrieval system in the construction of both inverted text index and structural index, and the thought of both query keywords and the documents' structures on dealing with user query.To manage large-scale XML documents with complicated structure, the article designs a retrieval strategy: user inputs their needs in a simple and unified interface. Then system optimizes the user's query, and finally the results are ordered by the similarity between results and query. The paper mainly achieves the following contributions. (1) The indexing mechanism: The indexing structure is designed according to XML documents' characteristics. Then the article provides an indexing algorithm which takes into account of the document's structure and content. (2) The query selecting model of XML documents on Bayesian network: After user inputs the natural query, system constructs several structured queries according to the structure of the document collection. Then a Bayesian network model is built for all these queries, and the probability formula of each structured query given the document collection is inferred in the model. The system computes each structured query's probability according to the formula, and the user selects several biggest queries that satisfy them. (3) The XML document ranking model on Bayesian network: A number of XML documents are returned from (2)'s query. Each document's elements with the corresponding query are built to a Bayesian network model. According to the model the probability of each document on the...
Keywords/Search Tags:Bayesian Network, XML Document, IR, Ranking, Indexing
PDF Full Text Request
Related items