Font Size: a A A

The Research Of XML Documents Retrieval

Posted on:2004-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y M GuoFull Text:PDF
GTID:2168360092997026Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology, HTML has not met the wider application of the web because of its inherent deficiency. XML as the subset of SGML is specially designed for the web application, and would become the principal marked-up language, because it not only overcome the shortcoming of HTML, but also eliminate the needless function of SGML to the web user at present.In this paper, we introduce the basic theory of XML and the features of XML documents, and review several XML query languages and the application of the traditional information retrieval technology to XML data. We present further the general architecture of the retrieval system for XML documents, the index and retrieval algorithm and its implementation. First, we think that a good retrieval algorithm should take into account different type data. According to such an idea, we propose a new retrieval method that combines XPath and vector space model, named as the vector retrieval model based on XPath.Secondly, we make full use of the hierarchical architecture of XML data, and analyze the structure of every document to construct a structure thesaurus, which is designed to navigate the user query and to eliminate the structural conflict. Finally, we adopt the bottom up scheme to achieve the pathmatch, and such a method not only could precisely locate the user demands but also would decrease the consumed time and improve the retrieval speed. Although this system is only a prototype, we believe that it would become a really applied system that has the perfect function in the future.
Keywords/Search Tags:XML, XPath, VSM, index, retrieval
PDF Full Text Request
Related items