Research And Design Of A XML Semantic Retrieval System Based-on Cache

Posted on:2008-10-13

Degree:Master

Type:Thesis

Country:China

Candidate:C F Song

Full Text:PDF

GTID:2178360212493951

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the development of network technology, there is exponential growth of network resources. How to find the needed information from the web quickly and accurately has become a more and more popular problem in the IT studies. The traditional IR technology is based on text, and when you want to search something you just to submit a keyword. However, XML documents implicit structure information, and XML marks express the clear meaning. So the search engine can carry out accurate allocation according to the dependence relation between the keyword and content right away. And it can also return to correct results according to the keyword that the consumer provides. So the structured information of XML can make the semantic search easier in XML search engine than traditional search engine.According to the deficiency of the current XML retrieval system and the need of the consumer, in this paper, we learned and designed a XML semantic retrieval system based on cache. In the overall frame we added a cache structure called Frequent-Retrieval-Module. This module mainly saves frequent query patterns of the consumers. After a consumer submits a keyword, the system first scans this module, if there is the needed information, then returns them, else go to the retrieval module to seek further. Thereby, it can greatly improve the efficiency of retrieval. To realize semantic retrieval better, we give a method to fix a retrieval unit automatically. It can ascertain a retrieval unit that has suitable size and is meaningful for users according to the specific keyword using the semantic and structured information of XML document itself. We call this unit Minimum Retrieval Unit (MRU). This retrieval unit is neither the whole document nor a single XML element, but a set of elements or subtrees that satisfied the user's demand and the keyword contain relation. It's necessary to build inverted index after finishing ascertaining retrieval unit. The index is an important mechanism in information retrieval. This paper adopts a technology that often used in IR, and that is clustering. According to the structured information of the XML documents, we make some MRU that have the same or similar structures form a cluster, so that every cluster can be described by a characteristic word. We index every MRU in a cluster first, and then index all the clusters. The two-stage index structure can build index quickly, and can also improve the efficiency of retrieval.The contributions of the thesis are: (1) Propose an algorithm of frequent-query patterns for XML. Use this algorithm we can find the users' frequent query pattern, and store these patterns in the Frequent-Retrieval-Module. We can improve the speed of retrieval use the cache structure; (2) Suggested a clustering algorithm for XML documents based on route information, and applied this clustering algorithm to build index, the experiment showed the validity and feasibility of this algorithm; (3) Suggested a kind of method to fix a retrieval unit automatically, which can decrease the expense of compute, and also can improve the accuracy.

Keywords/Search Tags:

XML, semantic retrieval, clustering, retrieval unit, frequent-query patterns

PDF Full Text Request

Related items

1	Research On Ontology-Based Semantic Information Retrieval
2	Research On Semantic Processing Technology Based Information Retrieval Model
3	Study On The Methods In The Selection Of Retrieval Unit In Mongolian Information Retrieval System
4	Research On Techniques Of Mining Frequent XML Patterns
5	Design Of Ontology-based Animation Material Retrieval System And Research On Retrieval Model
6	Research On Chinese Concept Retrieval Based On Latent Semantic Analysis
7	Research On Ontology-Based Transportation Network Information Retrieval Techniques
8	Research Of XML Information Retrieval Based On Pseudo-relevance Feedback
9	Research On Ontology-based Semantic Information Retrieval Model
10	Research On Music Information Retrieval Technology Based On Content And Semantic