Font Size: a A A

The Research Of XML Schema Matching Algorithms

Posted on:2013-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:P GaoFull Text:PDF
GTID:2248330395455660Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
XML has emerged as a standard for data representation, data analysis and dataexchange on the Web. However, due to the flexibility of data description and theincrease in the number and the size of XML documents, how to efficiently manage largeXML data and integrate lots of XML data sources have become vital. Therefore anautomatic XML schema matching used to identify semantic correspondences amongXML schemas is an urgent problem in many domains.Analyzing the existing methods of schema matching and combining with theinformation characters of element in XML schema, this paper proposes a new approachof XML schema matching which considers both matching quality and matchingefficiency. It consists mainly of two parts-schema pre-processing and matchingalgorithm. First each simplified XML schema is represented as a sequence called CPS(Consolidated Prüfer Sequence). Then matching algorithm is applied on two schemas. Itis a hybrid matcher combining linguistic matcher and structural matcher.Comprehensively exploiting the feature information of element, linguistic matcherconsists of name matcher, data type matcher and constraint matcher. In addition, namematcher combines multiple string matching algorithms with the idea of decision tree.Structural matcher first computes the structural similarity of all complex elements pairsconsidering the children elements, leaf elements, ancestor elements and sibling elementsto discover many matched complex element pairs. Secondly we ought not to computethe structural similarity of all simple elements. Structural matcher is carried out onsimple elements inside every matched complex element pair. By this mechanism we caneasily discover complex matchings.At last, we design several parallel approaches to improve performance. We proposea parallel linguistic matcher and a parallel structural matcher of non-complex node. Ourexperiments demonstrate the highly effective of our algorithm. The improved parallelmethods are also feasible.
Keywords/Search Tags:XML schema, Schema matching, Similarity, CPS, Decision tree
PDF Full Text Request
Related items