The Research About Slca Based Keyword Search Over XML Data

Posted on:2014-03-22

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Zhang

Full Text:PDF

GTID:2298330434971014

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

As the internet booming, XML becomes the de facto standard in representing and exchanging data. A lot of structrual query languages have been invinted, such as XPah and XQuery. But such structural query languages require users not only have to master a complete language, but also know the target XML file structural very well. On the other hand, keyword search over XML data has become a research hotspot for its user friendly.The main work of this paper includes:1. Propose two skip rules and three match concepts, and an effiective keyword search algorithm;2. Propose an algorithm to classify the return results of the keyword search algorithm to improve user experience.To return meaningful results, SLCA (smallest lowest common ancestor) has been proposed to identify the data nodes containing all input keywords but have no subtrees which also contain all these keywords. Several researches have been conducted by database community on methods and techniques for SLCA-based keyword search, with great success. But they offen incurs many unnecessary intermediate SLCA computation even when the size of final results is small enough. In this paper, we propose an iterative-skip approach to address this challenge. We first study the relation between SLCA candidates and propose a series of properties, two skip rules and three new concepts. Then based on these properties, we design an effietive skip strategy MMPS to skip more SLCA computations. Lastly, we re-implemenatd the IMS and MMPS strategies via C++language and did experiments on real data DBLP and artificial data XMark. The experimatal results show our approach over perfrom the IMS approach.On the other hand, as the keyword search approach returns a lot of results and some of the results differ a lot; in this paper we propose the idea to classify the returned results to improve user experience.First we propose a naive algorithm which use external index to get tag name and classify the results by each subtree root’s tag name via an red-black tree.However, by using external index, this algorithm introduces a lot of10cost. To avoid external index, we proposed an improved algorithm which using extended dewey code and an in memory finite state machine to compute the tag name. Finally, the experimental results show the improved algorithm performs much better that the naive one.

Keywords/Search Tags:

XML, SLCA, Keyword search, result classify

PDF Full Text Request

Related items

1	Study And Improvement Of XML Keyword Query Based On SLCA
2	Research On Slca-Based Keyword Search Over XML Documents
3	Research On XML Keyword Search Processing Method Based On SLCA Sematic
4	XSemantic: The Research Of Keyword Search On XML Documents Based On Keyword Expansion
5	Research On Uncertain XML Keyword Search Based On The Semantic Of SLCA
6	Research On Query Processing For XML Keyword Queries Based On The ID List And Hash Index
7	Research On Subtree Results Generation Of XML Keyword Search
8	Research On Probabilistic XML Keyword Search Processing Method
9	Research On XML Keyword Search Results Sorting Problem
10	Research On SLCA Problem In XML Keyword Retrieval