Font Size: a A A

Research And Implementation Of Scientific Entity Relation Extraction

Posted on:2020-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y T ZhengFull Text:PDF
GTID:2428330623459890Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Using artificial intelligence technology such as natural language processing to extract scientific entities and the relation between them from massive academic literature can acquire structured knowledge,which improves the efficiency of researchers in analyzing literature and grasping scientific research trends,also provide a basis for government departments to make scientific research plan and social organizations to build core technology groups.This paper focuses on integrating the technical characteristics of different neural networks and attention mechanisms in deep learning,so as to extract scientific entities such as tasks and methods,as well as their hyponymy or synonymy relations.According to the traits of language description in academic literature,we integrate BiLSTM and self-attention mechanisms to address contextdependency and remote-correlation in scientific entity recognition,integrate convolutional neural network and BiLSTM to slove the local-dependency and long-dependency in relation extraction.The specific work is summarized as follows:(1)Scientific entity recognition: we propose the SAFE-BiLSTM-CRF model that combines BiLSTM-CRF and self-attention mechanisms,which treats scientific entity recognition as a sentence-level sequence labeling problem.Firstly,word embeddings are learned from massive scientific texts,and character-level embeddings sensitive to case are proposed.Then,BiLSTM is used to extract the contextual features of words,and a self-attention based feature extractor is proposed to obtain the global correlation features of words,so as to solve the problem that it is difficult for mainstream methods to capture the remote correlation information.Finally,CRF is used to obtain the optimal tag sequence based on these features.(2)Scientific relation extraction: we propose the ATT-CNN-BiLSTM model that integrates CNN,BiLSTM and attention mechanism,which treats scientific relation extraction as a sentence-level classification problem.Firstly,we take word embeddings and a small number of artificial features(part of speech,relative position,etc.)as input,and propose the enhanced contextual feature of entities according to the characteristics of relation description in scientific texts.Then,our model can use multi-convolution kernel to extract more abundant local features while learning word order and long-dependency information,by combining BiLSTM and CNN.In addition,attention mechanism is used to increase attention on key features,thus improve the classification performance of the model.(3)Experimental verification and system implementation: we implement the SAFEBiLSTM-CRF model and ATT-CNN-BiLSTM model based on the Tensorflow platform,and verify the performance of two models by designing comparative experiments.The experimental results on the public dataset ScienceIE show that both the SAFE-BiLSTM-CRF model and the ATT-CNN-BiLSTM model in this paper have achieved better results than the advanced methods,with an improvement of 1.1% and 1.6% respectively in F1 score.Finally,a knowledge extraction system based on the two models is implemented.
Keywords/Search Tags:academic literature, entity recognition, relation extraction, deep learning
PDF Full Text Request
Related items