Font Size: a A A

Research On SLCA Problem In XML Keyword Retrieval

Posted on:2011-06-23Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ZangFull Text:PDF
GTID:2178360305451074Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML (eXtensible Markup Language) Extensible markup language[1] (XML) is a set of norms which are used to define semantic markup and is created by w3c according to standard generalized markup language (SGML). It has became the fact standard for describing data in the field of Web Service, Electronic Commerce etc. In order to help users to retrieve information they need from overlarge amount of XML documents, many algorithms have been raised. Research on XML query makes sense from that point of view.According to retrieval basis of those methods, this paper divides XML retrieval algorithms into two categories, XML Structural Retrieval and XML Keyword Retrieval. The former one takes advantage of regular expressions as traditional queries and could expose users's need easily. The latter one is more like Information Retrieval and the only inputs needed from the users are keywords.XML Structural Retrieval returns accurate results just according to expressions input. However, it also qualifies the users not only on the query language but also on the structure of XML documents. Those qualifications are impossible for most of the users, so from the aspect of freeing users, it is the XML Keyword Retrieval that will have a bright future.XML Keyword Retrieval has a basic problem which is how to compute Smallest Lowest Common Ancestors (SLCA). Many researchers have been contributing on that problem and raised several algorithms, like Stack, ILE, SE, LISA and LISA II. ILE and SE perform better than Stack analytically and experimentally. They deal with XML data by reading sequently. In light weight scale, LISA and LISA II have their own predominance.However, LISA not only scans all nodes more than once, but also introduces many join operations between sets. They are LISA's disadvantages. LISA II improves by using First Order Dewey Code which is different to Dewey Code that is widely used. First Order Dewey, as a private code for LISAⅡ, needs a mapping schema kept in memory.This paper presents a new SLCA solving algorithm, Binary Comparetive Search Algorithm (BCS). BCS is a Dewey-based method without join operations and extra data structures. BCS has another advantage that it only scans all the Dewey codes at most once in its worst situation.BCS performs effectively analytically and experimentally. It is a viable SLCA solving algorithm. As a new XML Keyword Retrieval method, BCS could be used easily as it deals with XML data with no extra qualifications to users. However, like other XML Keyword Retrieval methods, it performs worse than most XML Structural Retrieval methods in precision ratio as it has a birth defect that nearly all XML Keyword Retrieval methods never consider structural information.
Keywords/Search Tags:XML Keyword Retrieval, Dewey, SLCA, BCS
PDF Full Text Request
Related items