Font Size: a A A

Research On Schema Matching Algorithm Of Database

Posted on:2011-02-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:X K DuFull Text:PDF
GTID:1118360305992263Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Schema matching is a binary operation which outputs the mapping between the elements of source schema and target schema. Along with the application of database, schema matching plays an important role in more and more application domains. e.g. schema integration, data warehousing, E-bussiness, semantic web and P2P database etc. Currently, schema matching is largely performed manually by domain experts and therefore a time-consuming, tedious, and error-prone operation. Schema matching has become a research hotspot in recent years and there are many results about it. Existing work about schema matching is mainly about how to use the elements' own information (element's name and datatype etc.), data instance (the data in schema) and structure information (the relation between schema elements) to mine the semantic of schemas and then use it to get the mapping between elements. But most of the work only uses the elements' own information to calculate the similarity between elements and then select the mapping according to the similariy, the structure information and data instance is merely used. There are several shortcomings in the existing method about schema matching. At first, the accuracy of the matching methods is low because the the structure information and data instance is not used. Then, the existing methods search the matched element for every element of target schema in the domain in the source schema. There are so many distractions of the matched element in the domain, so the degree of accuracy is not high. Furthermore, we can't absoultly confirm the correctness of any element's mapping which generated by the automatic matching methods because the methods are based on the heuristic algorithm.For the shortcoming of existing schema matching methods, we have done some works on the result of existing methods:A new method which uses the structure information between elements to supporting schema matching is proposed. In this method, we divide the similarity between two elements into linguistic similarity and structural similarity, and get the structural similarity by a new statistic method, and then get the matching probability by integrating the linguistic similarity and structural similarity. At last, we get the mapping between schema elements according to the matching probability.A new algorithm which integrates the data instance information and structure information to supply matching is introduced. At first, we calculated the degree of partial functional dependency according to the data instance information, and then we constructed the graph of partial functional dependency based on the degree of partial functional dependency, the degree of structure similarity was calcluated according to the graph fo partial functional dependency. At last, according to the degree of structure similarity and semantic similarity, the mapping was generated.Because of the more stucture information was used, the performance of this algorithm is better than the algorithm which only use the complete functional dependency information.A new idea to divide the schema into small element block before matching is proposed to improve the performance of matching algorithm. First of all, all the elements in the schema (source and target) were divided into small blocks according to the object described. Next, the TF/IDF algorithm is used to find the mapping block of block in target schema. At last, existing schema matching method is provided to get the mapping between element blocks. Because the existing methods show good performance when matching small schemas, so schema matching method based on element block improve the performance of matching task for large schema.The concept of dependcy conflict in the mapping by analysising the matching result of automatic schema matching methods from the ways of data transformation is proposed. Then, the algorithm for classifing and detecting the dependcy conflict was provided. At last, we combined the algorithm and existing schema matching methods and compared the result with the original method's result. The comparative results show that the existing algorithms' performance is improved when combind the algorithm for classifing and detecting the dependcy conflict.
Keywords/Search Tags:schema matching, partial functional dependency, Term Frequency/Inverse Document Frequency (TF/IDF), structure match, matching probability, dependency conflict, element block
PDF Full Text Request
Related items