Font Size: a A A

Research On Semantic-based Extracting Schema In XML Documents

Posted on:2014-05-14Degree:MasterType:Thesis
Country:ChinaCandidate:Y YanFull Text:PDF
GTID:2348330482452800Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML (eXtensible Markup Language), used in various applications, has become the standard of data presentation and data exchange. Since XML schema is the foundation of XML data exchange and efficient data query, XML data management becomes very important. But most of the XML documents lack XML schema information. Therefore, it is a very important task of XML data management to extract the schema information from XML data in an automated way. Traditional techniques of XML schema extraction focus on the structure of XML documents, without considering the semantic information of the labels in XML documents. Different XML documents writers use different labels to present information, which causes redundancy and error in the extracted XML schema. For this reason, it is very urgent to produce a compact and concise XML schema through taking full use of the semantic information in XML data.This thesis puts forward a semantic-based XML schema extraction technique. The first step is to select among all the XML documents those sharing high similarity into clusters. Secondly, we analyze respectively the XML data in every cluster and divide elements into different types according to the semantic information of element labels and the context. Thirdly, apply the XML schema extraction algorithm into every element type to get its corresponding schema and finally get the XML schema. This technique of semantic-based XML schema extraction technique falls into three parts:Firstly, we divide XML documents into different clusters. Different types of XML documents are described by different XML schemas, so we use the technology of cluster to let the XML document having similar schema together. Form clusters according to the label name and structure features of XML documents.Secondly, we divide XML elements according to their element types. Analyze the elements inside clusters and put those describing the same data or those share the same element type together according to the semantic information and context of element labels. Then edit the element label names of those elements belonging to the same element types into the OWL ontology in the format of equivalence relation.Finally, we form the automata for children element sequences of all the element types with the usage of the divided element type information.we simplify the automata and the XML schema can be extracted.This thesis fulfills a prototype system of semantic-based XML schema extraction and its experiment design. Through the analysis of experiment results, a conclusion can be reached that the XML schema extracted with the semantic-based technique is accuracy and concise.
Keywords/Search Tags:XML, semantic, schema extraction
PDF Full Text Request
Related items