Font Size: a A A

Research And Application For The Semantic Matching Of XML Tags

Posted on:2006-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:W Y ZhengFull Text:PDF
GTID:2168360155967309Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In today's data-centric world, more and more applications need to access multiple heterogeneous data sources. Especially to the enterprise applications, it is the crying necessarity not only for the development of the inner enterprise, but also for its adaption to the outer environment. As XML has become the main standard of data representation and information exchange for its self-descriptive, expansibility and patency, it is urgent to store structured data using XML documents. Thus, how to integrate heterogeneous data sources based on XML representation so as to achieve efficient information query has become the key problem needed to solve urgently.For the absence of research on intergrating standalone XML documents, this paper presents the conception to construct a user-defined mediate schema which will provide a global view for the integrated applications based on the results generated by the direct semantic matching of XML tags. According to the comprehensive analysis of exsiting schema matching methods, this paper first studied the algorithm for the semantic matching of XML tags, then embedded a data integration subsystem in the e-business web system of Changjiang Electric Group, alone with user's dynamic interactions, this subsystem will combine the pairs of semantically XML tags generated by 1:1 matching algorithm to construct a mediate schema, based on which to conduct visualization data query, actualizie the data access' transparency and the data source's plug and play.The main contributions of this thesis are described as follows:1. Self-studied an algorithm of achieving 1:1 matching tags between two standalone XML documents, which adopts the concept named 22 dimensional feature vector to describe XML tag, uses the distance of vectors to qualify the association of XML tags, then finds the pairs of tags with the minimum distance as the semantically related XML tags. Meanwhile, ultilizes machine learning to improve matching accuracy. Experiments that use the domain data described by XML documents of Changjiang Electic Group are used to test the algorithm; the results show that using ML module improves the accuracy by over 8%.2. Based on method COMAP of complex matching for relational schemas, presented a bottom-up method of complex matching for XML tags. For each leaf tags, according to the different type of information in its data, design different searchers to find every possible complex matching collaterally. And defined a best matching index on the set of candidate mappings so as to select the best complex matching of XML tags.
Keywords/Search Tags:heterogeneous data source, data integration, XML tag, schema matching, semantic matching, machine learning
PDF Full Text Request
Related items