Font Size: a A A

The Research About Slca Based Keyword Search Over XML Data

Posted on:2014-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:J Q ZhangFull Text:PDF
GTID:2298330434971014Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As the internet booming, XML becomes the de facto standard in representing and exchanging data. A lot of structrual query languages have been invinted, such as XPah and XQuery. But such structural query languages require users not only have to master a complete language, but also know the target XML file structural very well. On the other hand, keyword search over XML data has become a research hotspot for its user friendly.The main work of this paper includes:1. Propose two skip rules and three match concepts, and an effiective keyword search algorithm;2. Propose an algorithm to classify the return results of the keyword search algorithm to improve user experience.To return meaningful results, SLCA (smallest lowest common ancestor) has been proposed to identify the data nodes containing all input keywords but have no subtrees which also contain all these keywords. Several researches have been conducted by database community on methods and techniques for SLCA-based keyword search, with great success. But they offen incurs many unnecessary intermediate SLCA computation even when the size of final results is small enough. In this paper, we propose an iterative-skip approach to address this challenge. We first study the relation between SLCA candidates and propose a series of properties, two skip rules and three new concepts. Then based on these properties, we design an effietive skip strategy MMPS to skip more SLCA computations. Lastly, we re-implemenatd the IMS and MMPS strategies via C++language and did experiments on real data DBLP and artificial data XMark. The experimatal results show our approach over perfrom the IMS approach.On the other hand, as the keyword search approach returns a lot of results and some of the results differ a lot; in this paper we propose the idea to classify the returned results to improve user experience.First we propose a naive algorithm which use external index to get tag name and classify the results by each subtree root’s tag name via an red-black tree.However, by using external index, this algorithm introduces a lot of10cost. To avoid external index, we proposed an improved algorithm which using extended dewey code and an in memory finite state machine to compute the tag name. Finally, the experimental results show the improved algorithm performs much better that the naive one.
Keywords/Search Tags:XML, SLCA, Keyword search, result classify
PDF Full Text Request
Related items