Research On Semantic-based Extracting Schema In XML Documents

Posted on:2014-05-14

Degree:Master

Type:Thesis

Country:China

Candidate:Y Yan

Full Text:PDF

GTID:2348330482452800

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

XML (eXtensible Markup Language), used in various applications, has become the standard of data presentation and data exchange. Since XML schema is the foundation of XML data exchange and efficient data query, XML data management becomes very important. But most of the XML documents lack XML schema information. Therefore, it is a very important task of XML data management to extract the schema information from XML data in an automated way. Traditional techniques of XML schema extraction focus on the structure of XML documents, without considering the semantic information of the labels in XML documents. Different XML documents writers use different labels to present information, which causes redundancy and error in the extracted XML schema. For this reason, it is very urgent to produce a compact and concise XML schema through taking full use of the semantic information in XML data.This thesis puts forward a semantic-based XML schema extraction technique. The first step is to select among all the XML documents those sharing high similarity into clusters. Secondly, we analyze respectively the XML data in every cluster and divide elements into different types according to the semantic information of element labels and the context. Thirdly, apply the XML schema extraction algorithm into every element type to get its corresponding schema and finally get the XML schema. This technique of semantic-based XML schema extraction technique falls into three parts:Firstly, we divide XML documents into different clusters. Different types of XML documents are described by different XML schemas, so we use the technology of cluster to let the XML document having similar schema together. Form clusters according to the label name and structure features of XML documents.Secondly, we divide XML elements according to their element types. Analyze the elements inside clusters and put those describing the same data or those share the same element type together according to the semantic information and context of element labels. Then edit the element label names of those elements belonging to the same element types into the OWL ontology in the format of equivalence relation.Finally, we form the automata for children element sequences of all the element types with the usage of the divided element type information.we simplify the automata and the XML schema can be extracted.This thesis fulfills a prototype system of semantic-based XML schema extraction and its experiment design. Through the analysis of experiment results, a conclusion can be reached that the XML schema extracted with the semantic-based technique is accuracy and concise.

Keywords/Search Tags:

XML, semantic, schema extraction

PDF Full Text Request

Related items

1	Semantics-based Relational Schema To Xml Schema Conversion Methods Research
2	The Research And Implementation Of Translating Relational Schema To XML Schema Preserving Semantic Constraints
3	A semantic analysis of XML schema matching for B2B systems integration
4	Research On Technology Of Schema Matching Between Global Schema And Local Schema
5	Research On Semantic Understanding Model Based On Image Schema
6	Schema Free Querying of Semantic Data
7	Ontology-based Semantic XML Description And Application Of Heterogeneous Data In Enterprise
8	Research On Key Technologies Of Deep Web Data Integration
9	Research On Ontology-based Schema Matching
10	Research On Heterogeneous DNA Data Based On XML