Research On SLCA Problem In XML Keyword Retrieval

Posted on:2011-06-23

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Zang

Full Text:PDF

GTID:2178360305451074

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

XML (eXtensible Markup Language) Extensible markup language[1] (XML) is a set of norms which are used to define semantic markup and is created by w3c according to standard generalized markup language (SGML). It has became the fact standard for describing data in the field of Web Service, Electronic Commerce etc. In order to help users to retrieve information they need from overlarge amount of XML documents, many algorithms have been raised. Research on XML query makes sense from that point of view.According to retrieval basis of those methods, this paper divides XML retrieval algorithms into two categories, XML Structural Retrieval and XML Keyword Retrieval. The former one takes advantage of regular expressions as traditional queries and could expose users's need easily. The latter one is more like Information Retrieval and the only inputs needed from the users are keywords.XML Structural Retrieval returns accurate results just according to expressions input. However, it also qualifies the users not only on the query language but also on the structure of XML documents. Those qualifications are impossible for most of the users, so from the aspect of freeing users, it is the XML Keyword Retrieval that will have a bright future.XML Keyword Retrieval has a basic problem which is how to compute Smallest Lowest Common Ancestors (SLCA). Many researchers have been contributing on that problem and raised several algorithms, like Stack, ILE, SE, LISA and LISA II. ILE and SE perform better than Stack analytically and experimentally. They deal with XML data by reading sequently. In light weight scale, LISA and LISA II have their own predominance.However, LISA not only scans all nodes more than once, but also introduces many join operations between sets. They are LISA's disadvantages. LISA II improves by using First Order Dewey Code which is different to Dewey Code that is widely used. First Order Dewey, as a private code for LISAⅡ, needs a mapping schema kept in memory.This paper presents a new SLCA solving algorithm, Binary Comparetive Search Algorithm (BCS). BCS is a Dewey-based method without join operations and extra data structures. BCS has another advantage that it only scans all the Dewey codes at most once in its worst situation.BCS performs effectively analytically and experimentally. It is a viable SLCA solving algorithm. As a new XML Keyword Retrieval method, BCS could be used easily as it deals with XML data with no extra qualifications to users. However, like other XML Keyword Retrieval methods, it performs worse than most XML Structural Retrieval methods in precision ratio as it has a birth defect that nearly all XML Keyword Retrieval methods never consider structural information.

Keywords/Search Tags:

XML Keyword Retrieval, Dewey, SLCA, BCS

PDF Full Text Request

Related items

1	Research On SLCA In XML Keyword Retrieval
2	Research On Slca-Based Keyword Search Over XML Documents
3	Research On XML Keyword Search Processing Method Based On SLCA Sematic
4	Study And Implementation On An Improved Approach Based On Dewey Coding For XML Meaningful Fragment
5	Study And Improvement Of XML Keyword Query Based On SLCA
6	The Research About Slca Based Keyword Search Over XML Data
7	Research On Query Processing For XML Keyword Queries Based On The ID List And Hash Index
8	XSemantic: The Research Of Keyword Search On XML Documents Based On Keyword Expansion
9	Research On Uncertain XML Keyword Search Based On The Semantic Of SLCA
10	Research On Keyword Search Based On XML Data