Font Size: a A A

Research On Index Technology In XML Search Engine

Posted on:2007-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:J S ChenFull Text:PDF
GTID:2178360212495423Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Most of current search engine are based on static HTML, but HTML is only a kind of simple demonstration language, and it can't make accurate orientation for retrieving information, and thus limit the search engine accuracy ratio greatly. Now, the great lass of information on the web will be described, stored and expressed with XML. The tag of XML describes the meaning of the content. Search engine can find information relying on the relationship of tag and content, thus greatly reduce search scope and enhance the accuracy of retrieval. The paper tries to research on XML-oriented search engine.Firstly, a model of XML search engine is proposed and its design thought is introduced in this paper. The model comprises robot module, conversion module, parse module, index module, query module and so on .This paper describes each module structure and the realization thought in detail.Secondly, research on XML index technology. A numbering scheme based on region is improved in this paper. It can support the node code's renewal and provide the encoding maintenance plan for the XML index model. Based on this, a path index technology based on DTD is proposed. It integrates seamless with inverted index based on text to implement retrieval both on context and structure. Its main characteristics is a combination of encoding scheme, inverted index, path index, and establishing index for XML document and DTD simultaneously. In addition, elaborated in detail how to design index structure and discussed the index storage and the optimization.Finally, a XML index prototype system is developed to test the performance of the index method.
Keywords/Search Tags:XML, Index, Search Engine, Query, Information Retrieval
PDF Full Text Request
Related items