Research On Subtree Results Generation Of XML Keyword Search

Posted on:2014-02-26

Degree:Master

Type:Thesis

Country:China

Candidate:B Wang

Full Text:PDF

GTID:2268330422466783

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Keyword search over XML data has attracted a lot of research efforts in the lastdecade, where a core problem is how to efficiently answer a given keyword query.However, most existing methods addressing efficiency focus on computing qualifiedroot nodes, such as SLCA or ELCA nodes, as efficient as possible. In fact,constructing subtrees result is not a trivial task. It is also the core factor that affect thekeyword search system performance. This paper is going to solve these problemsbelow.Firstly, most existing methods adopt three-step strategy to get the subtree result:(Step1) scan all nodes in the set of inverted lists to get SLCA or ELCA results.(Step2)get a set of relevant keyword nodes(RKNs) for each SLCA or ELCA reslut v byrescanning all these nodes again.(Step3) construct a PathSubtree for each RKN setand the prune it.By studying the XML keyword subtree results and their generationalgorithm, we found the three problems:(1) existing methods cannot correctlyidentify RKNs for ELCA semantics;(2) existing methods need to buffer lots ofuseless nodes when construct the subtree;(3) there is no method which satisfy withSLCA and ELCA can adapt the various subtree genneration constraint.Secondly, for the first problem, we formally propose the definition of RKN andthe RKN-Base algorithm, which can correctly tell whether a given node is an RKN ofsome ELCA node by sequentially scanning the set of inverted lists once. AsRKN-Base cannot avoid processing all useless nodes, we then propose an optimizedalgorithm, namely RKN-Optimized, which computes RKN sets based on the set ofELCA nodes, rather than the set of inverted lists as RKN-Base does. As a result,RKN-Optimized avoids processing useless nodes, and reduces the time complexity.Thirdly, for the second problem, we propose a top-down algorithm named TDMthat constructs each MSubtree by visiting nodes of the subtree constructed based onan RKN set level-by-level from left to right, such that to avoid visiting as manyuseless nodes as possible.Fourthly, for the third problem, we propose a generic top-down subtreeconstruction strategy named "TDx", where"generic" means that our method can beused to get either one of MSubtree, RTF and TMSubtree results and our method isindependent of the query semantics, i.e., SLCA orELCA;"top-down" means that ourmethod constructs each restrict subtree by visiting nodes of the corresponding pathsubtree level-by-level from left to right, such that to avoid visiting as many useless nodes as possible. Then we take parallel processing strategy for our method toimprove the overall performance.At last, the experimental results show that our method is much more efficientthan existing ones according to various metrics.

Keywords/Search Tags:

xml, keyword search, ELCA, SLCA, Subtree Reslut, parallel

PDF Full Text Request

Related items

1	Research On XML Subtree Results Generation
2	The Research About Slca Based Keyword Search Over XML Data
3	Study And Improvement Of XML Keyword Query Based On SLCA
4	Research On Slca-Based Keyword Search Over XML Documents
5	Research On XML Keyword Search Processing Method Based On SLCA Sematic
6	XSemantic: The Research Of Keyword Search On XML Documents Based On Keyword Expansion
7	Research On Uncertain XML Keyword Search Based On The Semantic Of SLCA
8	Studies On Keyword Search Over Probabilistic XML Data
9	Research On Query Processing For XML Keyword Queries Based On The ID List And Hash Index
10	Research On Probabilistic XML Keyword Search Processing Method