Font Size: a A A

Research On Schema Matching Method Of Relational Database Research On Schema Matching Method Of Relational Database

Posted on:2014-05-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:G F LiuFull Text:PDF
GTID:1318330518971253Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,different enterprises produce a large amount of data in actual applications;these data are usually stored in relational database,and maintained in their own information systems.With the increasing demand for information sharing,it needs to exchange the collected data within enterprise or between enterprises,and dig out the information helpful for business intelligence.However,in the process of data integration,even for the same application field,the designs of the database schema are often quite different,and this heterogeneity seriously hindered the interoperability between data.Currently,it is always solved through establishing the correspondence between the two schema elements by the system designer or DBA,to realize the integration between heterogeneous data.However,it will take a lot of manpower and material resources,and prone to more mistakes.Furthermore,with the continuous expansion of database applications,the number of heterogeneous data sources is growing exponentially,and the database may contain hundreds of tables and thousands of attributes.Obviously,the simple manual matching cannot meet the application requirements.In recent years,some semi-automatic/automatic schema matching methods have been proposed.These methods make reasoning for matching relations using schema information,data instance and structure between elements,and then realize the automatic discovery of the corresponding relationship between elements.Comparatively speaking,the method based on schema information is relatively simple,and the information can be got easier,so early schema matching methods are mainly focused on the use of such information,but their applications have some limitations because of the limited amount of information.Subsequently,people start to focus on the use of data instance or structural information,and hope to mine out more valuable information to enhance the discovery of matching relations.Overall,although the schema matching methods based on above information alleviate the pressure of heterogeneous data integration to some extent,there still exist some deficiencies:firstly,the matching operations excessive pursue automation,and the inherent uncertainty will result in spending a lot of manpower to verify the matching results;secondly,In order to facilitate memory,more and more enterprises name schemas or elements in schemas in the Chinese way,which result in scarce suitability of existing traditional matching methods,and increase the difficulty of matching;thirdly,most previous matching methods pay more attention to the use of schema information,less considering data instance or a kind of information reflected by data instance information,which also has a reference value for the matching operations;finally,the applicability of different matching methods are different,and users cannot make reasonable judgment in the case of lack of expertise,which will lead to unavailable matching results with the improper choice of matching methods.To this end,combined with the existing schema matching algorithms,this paper has carried out the following work about schema matching algorithm in relational database:1)Study the effective use of expert knowledge in matching process.Before performing the overall matching,first,determining the initial correspondence relationship between schema elements to be matched based on the element name,selecting a small amount of correspondence relationship to be test and verified by users,and then reasoning the partial known matching,un-matching relations and the applicability of different matchers in the current task;secondly,selecting the matcher based on the above collected information,and guiding to merge,adjust and optimize the results got by individually matcher;finally,evaluating the selectivity of optimization results,and then recommending the most suitable candidate matching generation scheme for current matching task.2)Research on the problem of schema conflict in the Chinese environment.For the under matching schema lacking of data instance information or only extracting the element's Chinese description,firstly we extract the Chinese description information of relevant elements in data dictionary,and transform it into the form of entry vector using Chinese information processing technology.In addition,we use the clustering analysis to assign relation schemas having similar characteristics into the same cluster,so as to narrow the scope of the matching implementation and improve matching efficiency;For different relation in the same cluster,we calculate the Chinese semantic similarity between elements with the help of the organization of words in the auxiliary dictionary,and filter the matching results using the method of combining multiple selection strategy to improve the matching accuracy.3)Research on the schema matching method based on data instances.In the case of schema information not available or not sufficient,we use a similarity data detection algorithm to identify the similarity tuples between two schemas,and generate the initial similarity between elements.For each elements of schema,this method extracts the set of strong relationship elements related to every element,using the inherent internal relationship between elements of data instances,and the similarity between sets reflect the correlation similarity between elements.At last,the similarity of the data instance and the correlation similarity decide the overall similarity between elements.4)Research on the building method of the adaptive schema matching process.For a given schema matching task,we can dig out the input schema information according to combination of user interaction and automatic extraction,extract the available assistant matching information and the applicable corresponding schema matching algorithms.And then we build and adjust schema matching process adaptively,which can make the matching methods change with the variation of the application scenario,further enhance the applicability of schema matching methods,and give full play to the advantages of different matching algorithms.
Keywords/Search Tags:relational database, schema matching, expert knowledge, cluster analysis, auxiliary dictionary, correlation
PDF Full Text Request
Related items