Font Size: a A A

Study On Key Technologies Of Entity Alignment Between Knowledge Graphs In Open Environment

Posted on:2023-08-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:W X ZengFull Text:PDF
GTID:1528307169477054Subject:Management Science and Engineering
Abstract/Summary:
Knowledge fusion is an important stage of knowledge management,which connects,combines and updates the knowledge from different resources.With the advent of the era of big data,knowledge graph(KG),as an effective means to extract structured knowledge from massive unstructured data and stocked structured data,becomes an essential part of knowledge management.KGs that are constructed by data-driven techniques usually come from different sources and have low coverage.Hence,it calls for establishing the connections among these individually-constructed KGs using knowledge fusion techniques,which can thus achieve the augmentation and update of KGs.During the aforementioned process,entity alignment(EA)plays a crucial role.It aims to detect the equivalent entities in different KGs and connect heterogeneous KGs using these entities as anchors,which lays the foundation for the subsequent knowledge unification and update process.Currently,with the advancement of deep learning techniques,representation learning-based EA methods have become the mainstream approach.As the world knowledge continues to increase and evolve,the knowledge fusion process in essence operates under the open environment.Such an open environment poses great challenges to existing representation learning-based EA methods.From the perspective of input data,in an open environment,knowledge increases sharply and usually has a large scale.Besides,knowledge updates and evolves rapidly,resulting in a large volume of long-tail knowledge.From the perspective of model training,it is difficult to obtain the labeled data in an open environment,and the model is trained under scarce supervision signals.These challenges render current EA methods less effective.To fill in these gaps,this thesis targets at the typical challenges in EA under the open environment,and researches into the issues of large-scale data,long-tail knowledge and scarce supervision signals.The main contents and contributions are as follows:(1)Targeting at the challenges brought by large-scale data,this thesis puts forward a large-scale EA approach based on bidirectional graph partition.It first designs a bidirectional graph partition strategy for EA leveraging seed entity pairs.The aim is to divide the large-scale KG pairs into multiple subgraph pairs,where the original KG structure can be maintained and the subgraph pairs can be matched.Besides,considering that the partition process would inevitably hurt the accuracy of alignment results,this thesis proposes to model the alignment inference stage as the reciprocal recommendation process,which can sufficiently characterize and aggregate entity preferences,thus improving EA performance.Extensive experiments on existing public datasets and a newly-constructed largescale EA dataset validate that the proposed method can effectively handle EA at scale and achieve competitive performance.Besides,the proposed model is general,which can be applied to existing EA methods to improve their capability of dealing with large-scale data.(2)Targeting at the long-tail knowledge in the open environment,this thesis puts forward two degree-aware EA approaches to address the long-tail issue.Considering that the main difference between long-tail and other knowledge is the amount of neighboring structure information,this thesis proposes to mine the entity degree information to accurately model long-tail entities,and hence enhances its alignment performance.First,an EA model with degree-aware feature fusion is put forward,which dynamically combines multiple features using a degree-aware co-attention network and provides more accurate signals for EA.This model also contains an iterative training algorithm embedded with a KG completion module,which can make long-tail entities easier to align by iteratively replenishing the KG structure.Second,an EA model with degree-aware curriculum learning is put forward,which regards entity degree as the difficulty indicator,and iteratively trains the model based on curriculum learning.As such,the model training process could be optimized.This model also contains a re-ranking module based on word mover’s distance,which can improve the alignment performance of long-tail entities.Extensive experiments on existing public datasets validate the effectiveness of the proposed models and their capability of handling long-tail knowledge.(3)Targeting at the scarce supervision issue in the open environment,this thesis puts forward two effective EA approaches to tackle limited labeled data.First,given a limited labeling budget,a reinforced active EA model is proposed.It utilizes the multi-armed bandit model to combine multiple query strategies,so as to select the most valuable unlabeled data for labeling.It also adopts contrastive learning to mine useful information from the vast unlabeled data,which can in turn improve the alignment performance under scarce supervision.Furthermore,given no labeled data,this thesis proposes an unsupervised EA model based on progressive learning.The proposed model harnesses the semantic information in KG to generate initial pseudo labeled data,and devises a progressive learning algorithm embedded with an unmatchable entity prediction module to achieve both better structural representations and more accurate EA performance.Extensive experiments on public datasets validate that the proposed models can effectively handle EA with scarce or even no supervision.Besides,the proposed strategies are general,and can be applied on existing methods to improve their performance under limited supervision.In summary,under the background of knowledge management in the open environment,this thesis focuses on the entity alignment task in the knowledge fusion process.Targeting at the typical challenges in data inputs and model training,such as large-scale data,long-tail knowledge and scarce labeled data,it puts forward a series of novel and effective solutions,which have both theoretical and practical values.In addition,there still are several limitations.From the inner perspective,the proposed solutions are individual and each can only address its corresponding challenge,and there is yet no general framework that can tackle all challenges at a time.From a broader view,this thesis has not yet covered the other challenges in the open environment or the rest of the stages of knowledge fusion.These issues are left as future works.
Keywords/Search Tags:entity alignment, knowledge fusion, knowledge graph, graph partition, long-tail phenomenon, scarce supervision
Related items