Study And Implementation On An Improved Approach Based On Dewey Coding For XML Meaningful Fragment

Posted on:2011-06-01

Degree:Master

Type:Thesis

Country:China

Candidate:S Lin

Full Text:PDF

GTID:2248330395958409

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Extensible Markup Language (XML) is widely used in field of Web development and simplified data storage and sharing. At the same time, XML makes it easier for people to define data structure in computer, and becomes a standard semantic labeling language gradually. At recent time, with the sharp development of Web application, people can do nothing without network in their daily work, and XML is widely used in Web Services, Content Management, Web Integration, Configuration data and Electronic Commerce. There are more sites based on XML technology. Therefore, how to extract meaningful information from large XML data storage becomes core technology of study recently.In the field of XML keyword search, the meaningful fragments not only satisfy user’s expect, but also contain semantic information. Therefore, how to solve the meaningful fragments algorithm and the performance of the algorithm have become one of the most important issues.Smallest Lowest Common Ancestor (SLCA) Algorithm is recognized as the meaningful fragment for users. SLCA is the smallest branch of XMLtree and contains all the keywords. This thesisr, based on the conception of XML, improves the solving method of SLCA, after parsing the XML document to XML tree, marking the nodes in the tree by Dewey Coding. This thesis advances a new algorithm PHTKSA to solve the meaningful fragment SLCA, by preorder traversal XML tree nodes.This thesis also advances two problems, which are complementary to SLCA solving, dropping problem and repetition problem. Dropping problem is judging by impact factor, which makes it clear that whether this document have the chance of dropping problem or not. Then it’s solved by finding all fragments contained keyword, which calls EKS. EKS maintains all kinds of sets of the keywords. Then this thesis improves a prune algorithm to pick out the right fragment, which conforms to users’expect. Repetition problem is working out by pruning repetition nodes, which label and content are both the same. The method can delete redundancy nodes by the way of pruning algorithm and keep the right nodes. So users can get the keywords information in a simplified structure.Experiments evidences make it clear that PHTKSA is more efficient than traditional SLCA algorithm. And it is high accuracy when dealing with the two problems.

Keywords/Search Tags:

XML Keyword Search, Dewey, SLCA, XML Tree, Preorder Traversal

PDF Full Text Request

Related items

1	Research On Slca-Based Keyword Search Over XML Documents
2	Research On XML Keyword Search Processing Method Based On SLCA Sematic
3	Research On SLCA Problem In XML Keyword Retrieval
4	The Research About Slca Based Keyword Search Over XML Data
5	Study And Improvement Of XML Keyword Query Based On SLCA
6	Research On SLCA In XML Keyword Retrieval
7	XSemantic: The Research Of Keyword Search On XML Documents Based On Keyword Expansion
8	Research On Uncertain XML Keyword Search Based On The Semantic Of SLCA
9	Research On Keyword Search Based On XML Data
10	Research On Query Processing For XML Keyword Queries Based On The ID List And Hash Index