Font Size: a A A

Research Of XML Information Retrieval System Based On Element Links

Posted on:2011-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:J B YuFull Text:PDF
GTID:2178330338976297Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML information retrieval is a technology developed from the traditional information retrieval, integrated the database field with the information retrieval field. Research indicates that the element links in XML document influence not only the content of element, but also the structure, so that influence the result of XML information retrieval. Based on element links this paper researches on XML index, XML information retrieval model and redundancy information pruning technology.Firstly, we propose a new kind of XML index technology based on element links, which include two parts, the external links index and the inner elements index which based on Pseudo Dewey coding. The Pseudo Dewey coding is based on schema, which the coding of an element is depending on the location of the element type in the schema and element order, and so on. Meanwhile, the inner elements index organizes its structure based on criteria, such as keyword types, the logic size of coding. The experiment result shows that this index technology has the characters of supporting element links, good efficiency in retrieval and lower updating time costs. Secondly, we introduce a new XML information retrieval model based on graphic model, the new model take the influence of element links into account, then we calculate the relativity of contexts according to the size, location, proportion of the common descendant sequences, and deduce the context relativity matrix of the model. At last, we extend the traditional vector space algorithm to calculate the relativity between elements and user retrieval sentences, improve the precision and recall of the retrieval result consequently. Finally, we establish a Markov chain user navigation model based on user retrieval sentences, and deduce the transition probability matrix according to the user browse history records and the context of elements. Then we introduce a redundancy information pruning technology, which based on ideal relativity of results set, and its greedy optimization approach. The experiment result demonstrates that the greedy optimization approach has the properties of lower time costs, good execution efficiency, and it has more practical worth.
Keywords/Search Tags:XML Information Retrieval, Element Links, XML Index, Pseudo Dewey Coding, Graphic Model, Markov Chain, User Navigation Model
PDF Full Text Request
Related items