Font Size: a A A

An Entity Resolution Framework Based On Attribute Patterns

Posted on:2014-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:F Q HeFull Text:PDF
GTID:2268330422450606Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the developing of society, the data of company and governments isgrowing vastly. But at the same time, the dirty data is increasing, which affect theusability. The entity resolution problem disturbs the data quality managementcommunity for a long time. Moreover, it is important for detecting duplicate recordand correcting errors. It is also the key problem of entity query and high quality dataanalysis. Now most frequently used methods are based on attribute features or basedon entity model. Attribute feature based method is limited by the information. Itcannot provide high confident decision. Entity model based method can use of theinner-relationship of the attributes. It is more brilliant. But the defect is lackversatility. The ER problem is coming up every day. Without general framework,ER problems have to be solved in ad hoc style. It is inefficient. The main difficultiesof current ER methods include:1) the general methods computing the similarity arerare;2) the threshold is hard to set and there is high error in the result decidedaround the threshold;3) the weights of different attributes are hard to assign.To the challenges talked above, a novel entity resolution framework based onattribute pattern in the paper. Through the analysis of the characters of relationsbetween attributes and entity, we classify them into four types. We propose generalapproach of computing the relevance of attributes. We combine the similarities ofeach attribute in vector, keeping their inner-relationship and avoiding theassignment of the weights. The decider is a division of the similarity space andobtains information from learning. We propose a new type entity diagram. Theentity partition proceeds on it. We compute the similarity of entity pairs in the entitypartition process, which can find more entity pairs. We also add an additionalfunction for feedback. At last, we verify the efficient of our framework.
Keywords/Search Tags:entity resolution, attribute patterns, framework, learning
PDF Full Text Request
Related items