Font Size: a A A

Research On Entity Disambiguation Technology Based On Information Network Representation Learning

Posted on:2024-01-03Degree:MasterType:Thesis
Country:ChinaCandidate:J Y HuangFull Text:PDF
GTID:2568307127460954Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Entity disambiguation(ED)attracts many research communities,such as data integration and data mining.In the real world,every entity is unique,but some entities may have the same name.Instances of multiple entities may be incorrectly aggregated into one group,which affects performance of data integration and web retrieval.In this work,the ED task is to partition entity instances under a name so that each partition corresponds to one unique entity.Existing ED methods have the following problems:(1)Relation-based ED methods require complicated feature engineering to quantify the similarity among records,where a homogeneous information network is constructed and the diversity of interactive relations is ignored.Few works only use the structural information of heterogeneous information networks(HINs)to obtain high-quality node representation and use it for ED.(2)At present,there are less works to solve the ED task on the knowledge graph(KG).What kind of KG representation learning method can effectively solve the ED task has become an unsolved problem.ED task is usually regarded as a clustering task.At present,there are many clustering algorithms,and different clustering algorithms are used in different ED research works.Therefore,the choice of clustering algorithm for ED task has not been comprehensively studied.In view of above problems,main works are as follows:(1)An ED framework based on HIN embedding with meta-path fusion is proposed.The framework first uses the topology of HINs to obtain high-quality node embeddings,and then performs clustering algorithm to complete disambiguation.First,we propose two HIN embedding methods based on Metapath2 vec that fuse multiple meta-paths:one is meta-path fusion based on attention mechanism,and the other is meta-path fusion with multitype random walks.Then,two clustering algorithms are applied to generate final disambiguation results,and their applicability is analyzed.Finally,extensive experiments on two commonly used datasets are conducted.Experiments show that proposed approach can get better disambiguation performance than previous approaches,and our meta-path fusion embedding methods outperform Metapath2 vec by 1.1-21.7% in F1 score.(2)A general framework of ED based on KG representation learning and clustering algorithm is proposed.The framework first uses the semantic information contained in triples,embeds entities and relations into a low dimensional continuous vector space through a KG representation learning method,and then performs clustering algorithm to achieve ED.In the framework,three kinds of KG representation learning methods and five kinds of clustering algorithms are applied.In addition,experiments are conducted on two large KG datasets.We deeply analyze the characteristics and effectiveness of different KG representation learning methods and clustering algorithms,and recommend several KG representation learning methods and clustering algorithms for the ED task,which has significance for future ED research on KG.
Keywords/Search Tags:Entity disambiguation, Information network, Graph representation learning, Meta-path fusion, Clustering
PDF Full Text Request
Related items