Font Size: a A A

Research On Character Relation Extraction In News Texts

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:L QinFull Text:PDF
GTID:2428330629951035Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the popularity of the Internet,the way of disseminating news information has gradually turned into online communication.The news text is an unstructured text,which contains rich information about the relationship between the characters.How to extract the relationship between the unstructured texts is a research hotspot in the field of natural language processing.At present,the methods of relationship extraction are mainly divided into a pipeline method and a joint learning method.The former first identifies the character entity pair in the sentence and then classifies the relationship of the character entity pairs,while the latter extracts the entity and classifies the relationship of the entity pair.This paper uses the pipeline method to design the character relationship extraction model in news text.The model is mainly divided into person name recognition model and relationship extraction model.In the person name recognition part,this paper introduces the attention mechanism into the BiLSTM-CRF entity recognition model to construct the BiLSTM-Att-CRF person name recognition model to solve the shortage of the traditional model's ability to capture the key features of the sentence.In the relationship extraction part,this paper uses the distant supervision method to construct the data set to solve the problem of lack of high-quality corpus in Chinese field.However,false positive noise is inevitably introduced in the process of constructing data sets with distant supervision method.In order to overcome this problem,this paper introduces the generative adversarial network to reduce the noise of the data set at the sentence level,filters the false positive noise in the data set directly,and trains the BiLSTM-PCNN relation extraction model with the de-noised data set.However,it is theoretically impossible to filter out all the noise with the generative adversarial network.So on this basis,this paper constructed a package of sentences containing the same entity pair,and introduced the TF-IDF relational indicator discovery model to assign more weight to the sentences containing relational indicators in the package so as to suppress the remaining noise.In this paper,the person name recognition experiment was carried out on MSRA news corpus to compare the performance of BiLSTM-Att-CRF model and BiLSTM-CRF model.The experiment showed that BiLSTM-Att-CRF model was superior to BiLSTMCRF model,and the recall rate was improved by 1.18%.Then,this paper conducted a relational extraction experiment on the data set constructed by the distant supervision method,and verified the de-noising effect of the generative adversarial network by manual verification.The experimental results showed that the average accuracy of the denoised model was improved by 5.1% compared with the noised model.Finally,the experiment is carried out to verify the weight of the relational indicator.The experimental results show that the weight of the relational indicator is effective in noise suppression.
Keywords/Search Tags:character relationship extraction, attention mechanism, generative adversarial network, relationship indicator
PDF Full Text Request
Related items