Font Size: a A A

Research Of Keyword Query Processing On XML Data

Posted on:2014-07-20Degree:MasterType:Thesis
Country:ChinaCandidate:X M ZhaoFull Text:PDF
GTID:2268330422466852Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid expansion of XML applications, keyword search on XML data hasattracted wide attention from researchers. For a given keyword query, this paper focuseson how to efficiently return results satisfying different query semantics, the main researchcontents are as follow.Firstly, through in-depth comparison and analysis of existing keyword queryprocessing algorithms in characteristics and applicable conditions, we find that the majorreasons of inefficiency for existing algorithms are the common ancestor repeatition (CAR)and visiting useless nodes (VUN) problems.Secondly, in response to above problems, we propose a generic top-down keywordquery processing strategy and the corresponding algorithm TDxLCA, where xLCA meansour method can handle LCA, SLCA and ELCA semantics. Based on the traditionalinverted list, TDxLCA detects all the common ancestor nodes in a top-down way, thusavoids the CAR problem. For each detected common ancestor node, TDxLCA checks itssatisfiability through its child node rather than descendant nodes, thus avoids the VUNproblem. The generic of TDxLCA algorithm lies in the label independence and semanticsindependence. Label independence refers to that for any one of the existing path-basedlabeling scheme, TDxLCA can process the given query in the same time and spacecomplexity. Semantics independence means that for a given query, TDxLCA algorithmcan be used to compute qualified results for any semantics.Thirdly, we design a hierarchical index, namely LList, which is independent oflabeling schemes. Based on LList, we propose a TDxLCA-L algorithm. Compared withTDxLCA, TDxLCA-L reduces the time complexity and space complexity.Fourthly, based on the LList and hash indexes, we further propose the TDxLCA-Hand TDxLCA-HO algorithms to reduce the time complexity.Finally, from the running time, the number of atomic operations in our experimentalresults, we verify the efficiency and scalability of our methods in different data sets.
Keywords/Search Tags:XML, keyword search, generic, independence, TDxLCA
PDF Full Text Request
Related items