XSemantic: The Research Of Keyword Search On XML Documents Based On Keyword Expansion

Posted on:2011-05-28

Degree:Master

Type:Thesis

Country:China

Candidate:X S Wang

Full Text:PDF

GTID:2178360305497951

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Inspired by the great success of IR-style keyword search on the web, keyword search on XML has received extensive interests. To improve effectiveness and efficiency, IR system has to solve the following challenges:understand the semantic of keywords to identify the user search intention, formally define the returned results, design efficient algorithms and reasonable ranking strategy. SLCA as the returned results has been broadly recognized. IMS proposed the concept of anchor nodes and design efficient algorithms for computing SLCAs in multiway style. Although these works have tackled several obstacles in XML keyword search, almost no work focuses on understanding the true semantics of keywords. Additional, current methods almost based on Dewey encoding technique, the efficiency of encode comparison and LCA computation are very low, even more, there exists some redundant computation in IMS.In this paper, we first analysis the keyword semantic expansion problem, and then propose the hybrid keyword expansion technique based on WikiKeywords and XMLKeywords semantic network. WikiKeywords network is the keyword semantic network constructed from Wikipedia data set. We expand the users' keywords in use of WikiKeywords and compute the credibility of each expanded keyword. To overcome the redundant expansion of WikiKeywords, we additionally build another semantic network using the statistics of XML documents. Afterwards, we expand each user keyword hybrid from both networks.Against the low efficiency of Dewey encode in SLCA computing and the redundant computing of IMS, we propose a new region encode based SLCA solutionâ€”MMPS. MMPS seeks for SMatch, the match in which each node has no descendant node in the corresponding candidate list and is the closest node preceding to the rightmost node in the match, compute its LCA and decide whether it is an SLCA by comparing it to its next match NMatch. MMPS effectively reduces the number of redundant LCA computations by ignoring those not making to last results, and decreases the I/O cost when candidate node lists are very large, which take place most often in semantic-expansion-based keyword search, by finding SMatch in an iterate style. Additionally, MMPS reduces the cost of encode comparison and LCA computation to O(1) by making use of region encoding schema. As a blocked algorithm, MMPS has the drawback of bad user experience. We improve MMPS by a credibility-based pipeline algorithmâ€”PipelineMMPS. PipelineMMPS develop the computing plan according to the credibility of each keyword and compute the SLCA with high credibility first which can be outputted directly to improve user experience.

Keywords/Search Tags:

XML, Keyword Search, Semantic Expansion, SLCA, Region Encoding, MMPS, Pipeline

PDF Full Text Request

Related items

1	Research On Slca-Based Keyword Search Over XML Documents
2	The Research About Slca Based Keyword Search Over XML Data
3	Study And Improvement Of XML Keyword Query Based On SLCA
4	Research On Uncertain XML Keyword Search Based On The Semantic Of SLCA
5	Research On XML Keyword Search Processing Method Based On SLCA Sematic
6	Research On Query Processing For XML Keyword Queries Based On The ID List And Hash Index
7	Research On Subtree Results Generation Of XML Keyword Search
8	Research On Probabilistic XML Keyword Search Processing Method
9	Research On SLCA Problem In XML Keyword Retrieval
10	Research On XML Subtree Results Generation