Font Size: a A A

XSemantic: The Research Of Keyword Search On XML Documents Based On Keyword Expansion

Posted on:2011-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:X S WangFull Text:PDF
GTID:2178360305497951Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Inspired by the great success of IR-style keyword search on the web, keyword search on XML has received extensive interests. To improve effectiveness and efficiency, IR system has to solve the following challenges:understand the semantic of keywords to identify the user search intention, formally define the returned results, design efficient algorithms and reasonable ranking strategy. SLCA as the returned results has been broadly recognized. IMS proposed the concept of anchor nodes and design efficient algorithms for computing SLCAs in multiway style. Although these works have tackled several obstacles in XML keyword search, almost no work focuses on understanding the true semantics of keywords. Additional, current methods almost based on Dewey encoding technique, the efficiency of encode comparison and LCA computation are very low, even more, there exists some redundant computation in IMS.In this paper, we first analysis the keyword semantic expansion problem, and then propose the hybrid keyword expansion technique based on WikiKeywords and XMLKeywords semantic network. WikiKeywords network is the keyword semantic network constructed from Wikipedia data set. We expand the users' keywords in use of WikiKeywords and compute the credibility of each expanded keyword. To overcome the redundant expansion of WikiKeywords, we additionally build another semantic network using the statistics of XML documents. Afterwards, we expand each user keyword hybrid from both networks.Against the low efficiency of Dewey encode in SLCA computing and the redundant computing of IMS, we propose a new region encode based SLCA solution—MMPS. MMPS seeks for SMatch, the match in which each node has no descendant node in the corresponding candidate list and is the closest node preceding to the rightmost node in the match, compute its LCA and decide whether it is an SLCA by comparing it to its next match NMatch. MMPS effectively reduces the number of redundant LCA computations by ignoring those not making to last results, and decreases the I/O cost when candidate node lists are very large, which take place most often in semantic-expansion-based keyword search, by finding SMatch in an iterate style. Additionally, MMPS reduces the cost of encode comparison and LCA computation to O(1) by making use of region encoding schema. As a blocked algorithm, MMPS has the drawback of bad user experience. We improve MMPS by a credibility-based pipeline algorithm—PipelineMMPS. PipelineMMPS develop the computing plan according to the credibility of each keyword and compute the SLCA with high credibility first which can be outputted directly to improve user experience.
Keywords/Search Tags:XML, Keyword Search, Semantic Expansion, SLCA, Region Encoding, MMPS, Pipeline
PDF Full Text Request
Related items