Font Size: a A A

Research On Methods Of Relation Extraction Based On Relation Correlations

Posted on:2024-11-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:R D HanFull Text:PDF
GTID:1528307340977429Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the context of the exponential growth of Internet data,how to obtain and utilize the knowledge information in massive unstructured text has become an urgent problem.Entity relation extraction,as an effective means of acquiring knowledge,aims to automatically identify semantic relationships between entities from text,which is the cornerstone of a series of application services such as knowledge graphs,search engines,etc.,and has garnered considerable attention from both the academic and industrial communities.Currently,although the booming development of deep learning and pre-trained language models has greatly improved the performance of entity relation extraction task,it is still dramatically affected by the long-tail problem and multi-label problem,and its performance on long-tailed relation categories and multi-label entity pairs remains disappointing.This thesis aims at addressing the long-tail problem and multi-label problem faced by entity relation extraction task,and proposes a solution for constructing the models based on relation correlations.The motivation is twofold:(1)For the long-tail problem,although numerous categories distribute in the tail of data volume curve with insufficient samples,they may be highly correlated with certain relations at the head of the curve.By these correlations,knowledge transfer from head relations to tail ones can be achieved during training,assisting samples of tail relations to extract more effective features and improving training results?(2)For the multi-label problem,the same entity pair may simultaneously express multiple semantic relationships in a specific context,reflecting partial semantic overlap among these co-expressed categories.Relation correlations can measure the semantic distance between relation categories,which can facilitate the classifier to more accurately identify semantically closer relations for multi-label entity pairs,and is beneficial for the delineation of decision boundaries during classification.Specifically,this thesis focuses on “how to capture relation correlations”,and proposes four methods for situations with or without relation hierarchies:1.A relation extraction method based on Hierarchy-Interactive Attention mechanism.For the situation where relation hierarchies exist,this method focuses on modeling the inter-hierarchical correlations,i.e.,parent-child correlations,by proposing a recursive hierarchy-interactive attention network.The core idea is to guide lower-level relation classification using higher-level classification results,when classifying along the relation hierarchies from top to bottom.Specifically,when classifying for each level,it simultaneously utilizes historical knowledge from the previous level and input information from the current level,forming a recursive hierarchy-interactive architecture.Experiments show that this architecture effectively captures the correlations between hierarchical levels,and not only improves the overall performance,but also significantly outperforms baselines on long-tail relation categories.2.A relation extraction method based on Global Hierarchy Embeddings and Local Probability Constraints.This method,also for the situation where relation hierarchies exist,models both inter-and intra-hierarchical correlations,i.e.,parent-child correlations and sibling correlations.It constructs the model from both global and local perspectives.Globally,the relation hierarchies is treated as an undirected connected graph,which is encoded using graph neural networks to obtain relation embeddings that contain correlation information,i.e.,Global Hierarchy Embeddings.Locally,when identifying the relation category along relation hierarchies from top to bottom,the classification results of adjacent levels should be dependent on each other to reflect interhierarchical correlations.For this reason,a KL divergence-based loss called Local Probability Constraints is designed to maintain consistency and association between probabilities of adjacent levels.Experimental results confirm that Global Hierarchy Embeddings and Local Probability Constraints do capture extensive correlation information,significantly improving overall performance and long-tail relations’ performance.3.A relation extraction method based on Relation Co-occurrence Correlations.For the situation where relation hierarchies do not exist,this method captures cooccurrence correlations between relation categories by the co-occurrence phenomena of relations.Specifically,under the multi-task learning framework,it introduces coarseand fine-grained relation co-occurrence prediction tasks to determine whether a specific relation co-occurs with a group of other relations within the same context.These two tasks can learn embeddings for all relation categories,which are used to construct additional features,guiding the classifier to complete the classification process using cooccurrence correlations.Experiments demonstrate that this method significantly outperforms baselines on long-tail relation categories and multi-label entity pairs.4.A relation extraction method based on Entity Type Constraints.This method,also for the situation where relation hierarchies do not exist,models correlations by entity type constraints.Entity type constraints means that the allowed subject/object entity types for a particular relation category are pre-defined and fixed,i.e.,the constraint relationship between entity types and relation categories.Here utilizes entity type constraints from both global and local views.Globally,the Type-Constrained Graph is constructed by statistically analyzing the train set,and graph neural networks are applied to obtain all relation embeddings.This graph formulates all possible subject/object entity types for each relation,revealing the constrained correlations between relations with the same entity types.Locally,for each entity pair to be classified,relation categories matching their entity types should receive more attention from the classifier.In other words,classification probabilities for categories matching the entity types should be higher than those mismatching.To this end,a ranking-based Type-Constrained Loss is designed.Experimental results confirm that type-constrained correlations significantly improve the performance on long-tail relation categories and multi-label entity pairs.In summary,this thesis addresses the long-tail problem and multi-label problem faced by entity relation extraction task.It conducts researches focusing on how to capture relation correlations,for the situations with or without relation hierarchies,which demonstrates the effectiveness and potential of relation correlations in addressing the long-tail problem and multi-label problem,and provides a new perspective for the entity relation extraction task.
Keywords/Search Tags:Entity Relation Extraction, Long-tail Problem, Multi-label Problem, Relation Correlations, Relation Hierarchies
PDF Full Text Request
Related items