Font Size: a A A

Distant Supervised Entity Relation Extraction Method And Application Based On Internal And External Semantic Features And Preferential Attention Mechanism

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:J X LeFull Text:PDF
GTID:2428330614956841Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Relation extraction is a core task in the field of information extraction.It extracts the relation between entities from massive unstructured text data to form structured triple information.It is used for knowledge graph construction,recommendation search system,automatic question answering system,text summary application provides key technical support.The distant supervised learning method can easily construct a largescale open-domain corpus by using the knowledge base to align text,but the generation of a large amount of noisy data will make it difficult for the relation extraction model to converge,and the extraction effect is poor.On the other hand,the existing deep learning models lack the ability to automatically learn features,resulting in incomplete extracted sentence feature information,which in turn affects the accuracy of the relation extraction model.Therefore,it is an important challenge in the field of information extraction to research and design a relation extraction method that can accurately mine sentence features and filter noise data.This thesis has carried out research on how to improve the accuracy of Piece-wise convolutional neural network(PCNN)to extract sentence features and how to mitigate the influence of noise data.The main contributions of this article include:1.In order to accurately and completely extract the feature information of the sentence,this thesis proposes a distant supervision relation extraction method based on internal and external semantic features.External semantic feature refers to the use of Word Net dictionary to query the entity's superordinate word set as the background feature.Internal semantic features refer to calculating the IDF value of words based on a corpus and adding it to the word vector after normalization as a word importance feature,highlighting the contribution of non-entity words in the sentence vector.The experimental results show that in the standard data sets New York Times(NYT)and Freebase(NYT-FB),the use of PCNN to extract sentence feature vectors adds external semantic features than using PCNN only in P @ 100,P @ 200,The P @ 300 index is improved by 2.3% to 5.9%,and it has better performance on the Precision / Recall(PR)curve;using PCNN combined with internal semantic features is better than using PCNN only in the average P @ N The indicator is improved by 3.3%,and it also has better performance on the PR curve;after adding internal and external semantic features to PCNN,the average P @ N value reached 74.3%,which is better than the previous two methods.Improve and get the best performing PR curve.2.In order to filter a large amount of noisy data in the data set,this article uses a multi-instance learning method to process all sentences with the same entity pair as one package,and output the package's feature vector instead of the sentence feature vector.This thesis improves the weight distribution method in Selective Attention(SATT),and proposes a Preferential Attention(PATT)mechanism,so that the sentences with confidence lower than the average in all sentences are assigned zero weight Indirectly improve the weight of sentences with higher confidence,thereby reducing the impact of noisy sentences and improving the accuracy of the package's feature vector.The experimental results show that,also in the case of PCNN combining internal and external semantic features and using multi-instance learning,PATT is improved by 2% to 5% on the P @ 100,P @ 200,P @ 300 indicators compared to SATT.The @N value is improved by 3.6%,and the PR curve is better.In addition,PATT is combined with internal and external semantic feature methods,and multiple sets of experiments are designed to compare the contribution of internal semantic features and external semantic features to relation extraction.3.In order to reflect the practical value of the relation extraction model,this article applies the entity extraction model based on internal and external semantic features and priority attention mechanism to the project of constructing the knowledge graph of financial big data.This model is one of the core modules in the system of this project.After inputting the sentence of entity recognition into the system,the triples of entities and relationships are extracted to construct the knowledge graph.Finally,the front-end page is designed to display the knowledge graph visualization.
Keywords/Search Tags:information extraction, relation extraction, distant supervision, PCNN, attention
PDF Full Text Request
Related items