Font Size: a A A

Research And Design Of A XML Semantic Retrieval System Based-on Cache

Posted on:2008-10-13Degree:MasterType:Thesis
Country:ChinaCandidate:C F SongFull Text:PDF
GTID:2178360212493951Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of network technology, there is exponential growth of network resources. How to find the needed information from the web quickly and accurately has become a more and more popular problem in the IT studies. The traditional IR technology is based on text, and when you want to search something you just to submit a keyword. However, XML documents implicit structure information, and XML marks express the clear meaning. So the search engine can carry out accurate allocation according to the dependence relation between the keyword and content right away. And it can also return to correct results according to the keyword that the consumer provides. So the structured information of XML can make the semantic search easier in XML search engine than traditional search engine.According to the deficiency of the current XML retrieval system and the need of the consumer, in this paper, we learned and designed a XML semantic retrieval system based on cache. In the overall frame we added a cache structure called Frequent-Retrieval-Module. This module mainly saves frequent query patterns of the consumers. After a consumer submits a keyword, the system first scans this module, if there is the needed information, then returns them, else go to the retrieval module to seek further. Thereby, it can greatly improve the efficiency of retrieval. To realize semantic retrieval better, we give a method to fix a retrieval unit automatically. It can ascertain a retrieval unit that has suitable size and is meaningful for users according to the specific keyword using the semantic and structured information of XML document itself. We call this unit Minimum Retrieval Unit (MRU). This retrieval unit is neither the whole document nor a single XML element, but a set of elements or subtrees that satisfied the user's demand and the keyword contain relation. It's necessary to build inverted index after finishing ascertaining retrieval unit. The index is an important mechanism in information retrieval. This paper adopts a technology that often used in IR, and that is clustering. According to the structured information of the XML documents, we make some MRU that have the same or similar structures form a cluster, so that every cluster can be described by a characteristic word. We index every MRU in a cluster first, and then index all the clusters. The two-stage index structure can build index quickly, and can also improve the efficiency of retrieval.The contributions of the thesis are: (1) Propose an algorithm of frequent-query patterns for XML. Use this algorithm we can find the users' frequent query pattern, and store these patterns in the Frequent-Retrieval-Module. We can improve the speed of retrieval use the cache structure; (2) Suggested a clustering algorithm for XML documents based on route information, and applied this clustering algorithm to build index, the experiment showed the validity and feasibility of this algorithm; (3) Suggested a kind of method to fix a retrieval unit automatically, which can decrease the expense of compute, and also can improve the accuracy.
Keywords/Search Tags:XML, semantic retrieval, clustering, retrieval unit, frequent-query patterns
PDF Full Text Request
Related items