Font Size: a A A

Based On Xml Chinese Web-retrieval Model

Posted on:2007-09-19Degree:MasterType:Thesis
Country:ChinaCandidate:A G GongFull Text:PDF
GTID:2208360185482269Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML is a main standard for data presentation and exchange on the Internet. And the improvement for the search efficiency and veracity is the main problem of XML data search. XML information retrieval system differs greatly from traditional information retrieval system in two aspects: the returned results are the nodes but not the documents, and the indexing structure both on the content and the architecture of the documents. To search large-scale XML documents with complicated structure, this dissertation focus on the efficient structural indexing structure and the indexing search algorithm for XML data, result relevance ranking algorithm. To address the aforementioned issues, this dissertation makes the following contributions. First, it investigates the drawbacks of existing indexing structure and indexing search algorithm for XML data, and proposes an indexing structure using a "document-keyword-node" two-level indexing model based on the inverted list for XML data. The indexing structure contains both structural and content information with less size increase and reduces the scope to search and improved the efficiency. Second, based on the indexing model, this dissertation proposes an indexing search algorithm which works together with the indexing structural to optimize the search sequence and improves the efficient. Third, this dissertation proposes a ranking algorithm for XML query results based on the TF-IDF, which takes the XML document structure into consideration. Finally, a prototype of Chinese XML document retrieval system called XSK (XML Search system based on Keyword) is developed by us, which integrates all of the indexing structure, the indexing search algorithm and the rank algorithm. XSK can retrieve Chinese XML document with the adoption of the efficient indexing structure, the indexing search algorithm and query processing algorithm introduced by this dissertation. And the experiment proves the efficiency and validity of the XSK system to search the XML data.
Keywords/Search Tags:XML Information Retrieval, Keywords Query, Indexing Structure based on Inverted List, Relevance Ranking Algorithm
PDF Full Text Request
Related items