Font Size: a A A

A Research On Searching Technique And System Implementation For XML Document Set

Posted on:2010-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:J N HuFull Text:PDF
GTID:2178360302459807Subject:Computer applications
Abstract/Summary:PDF Full Text Request
As the information grows rapidly on the Internet and especially the wide and deep application of XML technology in more and more fields, Traditional information retrieval systems based on HTML and pure text can not satisfy people the need of retrieving the various information in XML documents. HTML is being replaced by XML in certain field because of the simplicity, openness, extendibility and interoperability of XML which is becoming the main form of information expression on the Internet and other applications. XML searching technology is improving all the time. No searching model today is used generally and the standard query languages: XPath and XQuery provided by W3C are all based on the precise tag matching. Common users can hardly make effient queries by understanding the structure of the XML documents since most of the XML documents have a very complex and irregular structure. This makes the searching experiences not good for users. Thus, it's one of the researching hotspot in the international IR community that how to utilize the new features rich in XML fully and study the traditional information retrieval technology for reference and build a native XML based information retrieval system with high efficiency.This paper brings up a path index method and a twig path searching method for XML document set, and beyond all this an XML document set information retrieval system was built. The detailed content is as follow:1) It provides an indexing method for XML document set based on path division combined with traditional inverted list index for keyword searching and the path information of the nodes.2) It brings up an efficient twig path searching mechanism which can make the nodes filted in linear time according to the input data scale for complex twig query without generating too much intermediate result.3) It desiged and implemented an XML searching system based on the technics above. Besides the searching ability, it provides an API for information processing and data mining algorithms. Using the API, researchers can access their interesting data to be processed by the programs they created in the massive data source without the fussy preparation for data.
Keywords/Search Tags:XML, Twig Query, Path Index, IR System
PDF Full Text Request
Related items