Font Size: a A A

Study And Implementation On An Improved Approach Based On Dewey Coding For XML Meaningful Fragment

Posted on:2011-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:S LinFull Text:PDF
GTID:2248330395958409Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Extensible Markup Language (XML) is widely used in field of Web development and simplified data storage and sharing. At the same time, XML makes it easier for people to define data structure in computer, and becomes a standard semantic labeling language gradually. At recent time, with the sharp development of Web application, people can do nothing without network in their daily work, and XML is widely used in Web Services, Content Management, Web Integration, Configuration data and Electronic Commerce. There are more sites based on XML technology. Therefore, how to extract meaningful information from large XML data storage becomes core technology of study recently.In the field of XML keyword search, the meaningful fragments not only satisfy user’s expect, but also contain semantic information. Therefore, how to solve the meaningful fragments algorithm and the performance of the algorithm have become one of the most important issues.Smallest Lowest Common Ancestor (SLCA) Algorithm is recognized as the meaningful fragment for users. SLCA is the smallest branch of XMLtree and contains all the keywords. This thesisr, based on the conception of XML, improves the solving method of SLCA, after parsing the XML document to XML tree, marking the nodes in the tree by Dewey Coding. This thesis advances a new algorithm PHTKSA to solve the meaningful fragment SLCA, by preorder traversal XML tree nodes.This thesis also advances two problems, which are complementary to SLCA solving, dropping problem and repetition problem. Dropping problem is judging by impact factor, which makes it clear that whether this document have the chance of dropping problem or not. Then it’s solved by finding all fragments contained keyword, which calls EKS. EKS maintains all kinds of sets of the keywords. Then this thesis improves a prune algorithm to pick out the right fragment, which conforms to users’expect. Repetition problem is working out by pruning repetition nodes, which label and content are both the same. The method can delete redundancy nodes by the way of pruning algorithm and keep the right nodes. So users can get the keywords information in a simplified structure.Experiments evidences make it clear that PHTKSA is more efficient than traditional SLCA algorithm. And it is high accuracy when dealing with the two problems.
Keywords/Search Tags:XML Keyword Search, Dewey, SLCA, XML Tree, Preorder Traversal
PDF Full Text Request
Related items