Font Size: a A A

Research On Entity Matching Algorithm Based On Graph Model

Posted on:2024-06-29Degree:MasterType:Thesis
Country:ChinaCandidate:W Q LiangFull Text:PDF
GTID:2530307079961499Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the advent of the digital information age,how to efficiently utilize information from different sources has become an extremely challenging issue.However,due to the existence of problems such as ”information islands”,different data sources do not share the same information storage framework,which increases the obstacles for people to integrate and utilize data from different sources.Therefore,the entity matching task has emerged,which aims to identify data records that represent the same real-world entity.Currently,the mainstream solution for entity matching problems mainly relies on similarity measurement to complete pattern matching,and then uses classification algorithms to complete matching decisions.Such methods have a higher matching success rate for entity records with high attribute description similarity and high data quality.However,data collected in the real world often has low quality and is usually difficult to have similar attribute descriptions.In addition,due to the massive combination of entity records in two data sources,the proportion of matching combinations is very low,making it difficult to obtain the ”matching” label,resulting in class imbalance issues in entity matching data.At the same time,most existing entity matching solutions rely on an assumption that each entity record exists independently.However,in the real world,there are often intricate connections between entity records,which makes this assumption difficult to hold.Based on this,this thesis studies the entity label expansion and entity classification algorithms based on graph models,and the main research contents and results are divided into the following aspects:1.The thesis proposes a method for expanding entity labels based on the odd-order reachable relationship of heterogeneous graphs.Firstly,the thesis explores the ”potential matching” relationship that has not been explicitly expressed from the existing ”matching”relationship,aiming at the special task scenario of entity matching.Then,the ”matching”relationship of entity records is modeled as a heterogeneous bipartite graph,and the relationship between ”potential matching” information and odd-order reachable relationship among nodes in the graph is discussed.Finally,the odd-order reachable relationship between any two nodes in the graph is calculated in the simplest way to extract ”potential matching” information from the existing ”matching” labels and achieve the goal of expanding ”matching” labels and balancing data.2.The thesis proposes an entity matching classification method based on a heterogeneous graph model.BiGRU can learn contextual information,R-GCNs can realize feature learning between heterogeneous relationships,and the attention mechanism can make the model focus on key information.Therefore,this thesis proposes an entity matching classification method based on a heterogeneous entity enhancement network.Firstly,BiGRU is used to learn the combined contextual information of each attribute description.Then,the R-GCN layer is used to realize feature learning of associated features,so that the entity record features have comprehensive description information.Next,in the crossattribute label realignment module,the attention mechanism is used to adaptively find the best matching pattern for each entity.Then,in the attribute-weighted fusion module,the attention mechanism is used again to filter out redundant information and retain key information,and complete the matching decision.3.The thesis validates the proposed label expansion method and entity matching classification method on a real dataset.The horizontal experimental results show that the proposed entity matching classification method outperforms existing mainstream entity matching solutions.The vertical experimental results demonstrate the effectiveness of the important modules in the proposed method.
Keywords/Search Tags:Heterogeneous Bipartite Graph, Entity Matching, Odd-Path Reachability, Relation Feature Learning, Label Expansion
PDF Full Text Request
Related items