Research On Schema Matching Algorithm Of Database

Posted on:2011-02-10

Degree:Doctor

Type:Dissertation

Country:China

Candidate:X K Du

Full Text:PDF

GTID:1118360305992263

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Schema matching is a binary operation which outputs the mapping between the elements of source schema and target schema. Along with the application of database, schema matching plays an important role in more and more application domains. e.g. schema integration, data warehousing, E-bussiness, semantic web and P2P database etc. Currently, schema matching is largely performed manually by domain experts and therefore a time-consuming, tedious, and error-prone operation. Schema matching has become a research hotspot in recent years and there are many results about it. Existing work about schema matching is mainly about how to use the elements' own information (element's name and datatype etc.), data instance (the data in schema) and structure information (the relation between schema elements) to mine the semantic of schemas and then use it to get the mapping between elements. But most of the work only uses the elements' own information to calculate the similarity between elements and then select the mapping according to the similariy, the structure information and data instance is merely used. There are several shortcomings in the existing method about schema matching. At first, the accuracy of the matching methods is low because the the structure information and data instance is not used. Then, the existing methods search the matched element for every element of target schema in the domain in the source schema. There are so many distractions of the matched element in the domain, so the degree of accuracy is not high. Furthermore, we can't absoultly confirm the correctness of any element's mapping which generated by the automatic matching methods because the methods are based on the heuristic algorithm.For the shortcoming of existing schema matching methods, we have done some works on the result of existing methods:A new method which uses the structure information between elements to supporting schema matching is proposed. In this method, we divide the similarity between two elements into linguistic similarity and structural similarity, and get the structural similarity by a new statistic method, and then get the matching probability by integrating the linguistic similarity and structural similarity. At last, we get the mapping between schema elements according to the matching probability.A new algorithm which integrates the data instance information and structure information to supply matching is introduced. At first, we calculated the degree of partial functional dependency according to the data instance information, and then we constructed the graph of partial functional dependency based on the degree of partial functional dependency, the degree of structure similarity was calcluated according to the graph fo partial functional dependency. At last, according to the degree of structure similarity and semantic similarity, the mapping was generated.Because of the more stucture information was used, the performance of this algorithm is better than the algorithm which only use the complete functional dependency information.A new idea to divide the schema into small element block before matching is proposed to improve the performance of matching algorithm. First of all, all the elements in the schema (source and target) were divided into small blocks according to the object described. Next, the TF/IDF algorithm is used to find the mapping block of block in target schema. At last, existing schema matching method is provided to get the mapping between element blocks. Because the existing methods show good performance when matching small schemas, so schema matching method based on element block improve the performance of matching task for large schema.The concept of dependcy conflict in the mapping by analysising the matching result of automatic schema matching methods from the ways of data transformation is proposed. Then, the algorithm for classifing and detecting the dependcy conflict was provided. At last, we combined the algorithm and existing schema matching methods and compared the result with the original method's result. The comparative results show that the existing algorithms' performance is improved when combind the algorithm for classifing and detecting the dependcy conflict.

Keywords/Search Tags:

schema matching, partial functional dependency, Term Frequency/Inverse Document Frequency (TF/IDF), structure match, matching probability, dependency conflict, element block

PDF Full Text Request

Related items

1	Research On Schema Matching Technology Supporting Massive Heterogeneous Data Integration
2	The Research Of XML Functional Dependency Based On XML Schema
3	Research On Technology Of Schema Matching Between Global Schema And Local Schema
4	Mining Entity Columns Of Web Tables Based On Functional Dependency
5	The Standardization Of The Xml Document
6	Research On Data Source Selection Algorithm For Inconsistency Detection
7	Research And Implementation Of Key Technologies Of User Behavior Recognition Based On Depth Packet Detection
8	Studies On Schema Matching Algorithms In Database
9	Document-level Relation Extraction Based On Dependency Syntax Analysis
10	Research On Text Matching Algorithm Based On GNN