Font Size: a A A

Research On Scientific Entity And Relation Extraction For Semantic Mining Of Academic Literature

Posted on:2024-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:J F GeFull Text:PDF
GTID:2568307061985799Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The utilization of academic literature is greatly influenced by the thorough exploration of its content.To achieve this,a flexible knowledge organization framework is necessary,with ontology serving as a reliable semantic architecture for knowledge retrieval and utilization.In this paper,we focus on the crucial step of Ontology Learning(OL),which is entity and relation extraction.Current scientific entity and relation extraction methods for semantic mining of academic literature are not efficient as they are directly migrated from the general domain.This poses a challenge as scientific entities are more abstract and relations are more difficult to identify in academic literature.As a result,the performance of current methods is only about half of that in the general domain.This paper presents improvements to existing entity and relation extraction methods in order to address current problems and enhance the effectiveness of scientific entity and relation extraction.These improvements are made in two directions,namely multi-feature incorporation at the input side and multilabel constraints at the output side.In this study,we introduce a semantically enhanced multi-feature incorporation deep learning model for extracting entities and relations from scientific abstracts.Our approach is similar to existing methods,but incorporates additional features to improve performance and accuracy.We investigate the influence of linguistic information on entity and relation extraction tasks.We enhance the pre-trained word vector with additional POS tagging information to emphasize the importance of such information.Our results show that incorporating POS tagging information is more beneficial compared to using just the pre-trained word vector.In entity recognition,the length of the entity’s token sequence is considered a feature.On the other hand,in relation extraction,max-pooling of the context between entity candidates has been found to be more effective than using the full context.Additionally,we embed the distance between entity candidates as an additional feature.The entity type is also used as an additional feature input.The study found that incorporating rich semantic information into the sequence of token representations improved the performance of the model compared to the original span-based model.This suggests that learning entities and relations together can be enhanced by considering a broader range of contextual information.At the output side,we propose a multi-label scientific entity relationship extraction model.We explore the advantages of using the Seq2 Seq model for the relation extraction task.We also incorporate scientific entity relation triad information as a feature to enable the model to learn both the contextual and semantic information of the head-tail triad within the corpus.The experimental validation analyzes the effect enhancement brought by the scientific entity-relationship triad for scientific relationship extraction tasks and the good performance that the Seq2 Seq model can obtain by exploiting the semantic and related information present in the relationship types,especially in the field of scientific entity-relationship extraction for academic literature.Experimental results from the SCIERC dataset indicate that both the bidirectional improvements proposed in this paper outperform the original method,resulting in 1.8%,1.3%,and 1.6%,1.3% improvements in the F1 values for strict and boundaries relation extraction,respectively.
Keywords/Search Tags:entity extraction, relation extraction, Ontology, lexicality, multi-features, label-enhancing
PDF Full Text Request
Related items