Font Size: a A A

Sample Denoising In Distant Supervision For Relation Extraction

Posted on:2020-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:X T LiuFull Text:PDF
GTID:2428330575469949Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet era,more and more problems have begun to appear.How to quickly and accurately process the massive unstructured data existing on the network has attracted more and more researchers' attention.Processing text information is an important function of natural language processing.In the processing of text information,information extraction is an extremely important task.The relationship extraction used in this paper is a sub-task of information extraction.Currently there is full supervision.There are several categories of semi-supervised,weakly supervised,and supervised non-relational extractions.These classification methods are based on the source of training data.However,on the issue of massive unstructured data,there are problems of low accuracy or high cost.In 2009,scholar Mintz proposed a method for extracting tasks in a big data environment—far-supervised relationship extraction.This work is done by aligning the relationship instances in the knowledge base with the sentences in the text set.The alignment process is based on the assumption that if a statement in the text set contains a pair of entities in the knowledge base,the sentence is considered to express the expression of the entity in the knowledge base.Relationship tags,and then use the aligned data for large-scale training in the classifier.While modifying the training process,this paper has made two improvements to the traditional training method.First,the word-level attention mechanism is adopted in the pooling process to solve the heterogeneous statement problem in the sentence.Second,after the initial training of the data,the SVD-based label completion method is adopted for the incomplete label.These two improvements have high accommodibility and can be loaded into other neural network relational extraction models.In this model,convolutional neural networks are used to embed the semantics of sentences.This paper argues that in the same sentence,different words have different effects on the overall structure of the sentence and the meaning of the sentence,such as the sentence "Donald Trump is the president of US"."It is obvious that "president" has a greater influence on sentences than "of" and should have a higher weight.This article is called a heterogeneous statement.Therefore,this paper proposes a self-attention-pooling method based on the attention mechanism of words,and adds the attention mechanism between word levels to the convolution features of convolutional sentences,and gives more words to sentences with greater influence on sentences.High weights are used to solve the problem of heterogeneous statements by using this method.Far-supervised relationship extraction has been widely used to find new relationship facts from the text,but it still inevitably has the problem of mislabeling,which seriously affects the performance of relational extraction.In order to solve this problem,this paper applies the method of singular value decomposition and noise reduction in matrix noise reduction.In the packet feature vector,this paper considers that packets with similar feature values tend to have similar packet tags,that is,the expressions of the two packets in the packet feature matrix are linearly related,so after the data is initially trained,The packet characteristics are matched with the packet label obtained by the remote supervision and the matrix complement operation is performed,and the existing noise is reduced by processing the matrix low rank.In the process of matrix processing,the number of singular values is directly equivalent to the rank of the matrix.The sequence of singular values obtained by the singular value decomposition operation of the matrix is arranged from large to small,showing a rapid decay trend.In this paper,we use the singular value decomposition method to obtain the n singular values of the matrix.By seeking the optimal solution on the Frobenius norm,we can find the k singular values that best describe the characteristics of the matrix.Due to the fast decay of singular values,this paper It is considered that the singular value with smaller value appears as noise in the matrix,and the matrix is denoised using the largest k singular values,so that the matrix of the convolution layer is processed,and the result has a better approximation effect.Experiments show that the model can make full use of sentences with high information content,which effectively reduces the impact of false mark instances.Compared with the benchmark method,the proposed model has achieved a comprehensive improvement in the final accuracy and recall rate in the relationship extraction.
Keywords/Search Tags:Distant Supervision for Relation Extraction, Neural Networks, LRA in the least square sense, Singular value decomposition
PDF Full Text Request
Related items