Font Size: a A A

Research On Generating Matching Rules In Entity Matching

Posted on:2018-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2348330533963344Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Entity matching is designed to identify the records of the same entity in multiple data sources.Entity matching is the premise and key of effective use of data,which is widely used in data cleaning,data redundancy detection,data fusion and other fields.Entity matching is based on the understanding of the matching rules between records,and the matching rules between records determine the quality of the entity matching.In this paper,we study the generation algorithm of record matching rules in rule based entity matching.First of all,through the analysis of the existing algorithms,it is found that the existing algorithms not only have the problem of long training time,but also have low accuracy.Aiming at the existing problems,this paper proposes an algorithm based on greedy thought to generate the record matching rules GR-Greedy.GR-Greedy uses the whole property to produce a basis of record matching rules,then on the basis of record attribute matching rules are sorted,and then use the greedy thought delete may reduce the attribute matching accuracy,so as to improve the accuracy of record matching rules matching.Secondly,in order to further improve the accuracy of record matching,this paper proposes an algorithm GR-Traverse based on traverse.Compared with GR-Greedy,GR-Traverse based on the basic record matching rules that attributes are sorted,using the traversal method to expand enumeration space,and delete all attributes that may reduce the matching accuracy,avoiding local problems of GR-Greedy,so as to further improve the accuracy of record matching rules matching.Finally,the effectiveness and efficiency of the proposed algorithm are verified by experiments on real data sets.
Keywords/Search Tags:Entity Matching, Record Matching, Data Cleaning, Data Integration
PDF Full Text Request
Related items