Font Size: a A A

Distant Supervised Relation Extraction By Matrix Completion Via Truncated Nuclear Norm Regularization

Posted on:2019-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2428330542983170Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,with rapid development of the Internet,how to deal with the huge amount of heterogeneous data and understand these data quickly and accurately has become a hot issue that researchers are concerned about.Relation extraction is one of the important sub-task of information extraction.It is an indispensable step in extracting structured information from unstructured natural language texts.According to the sources of training data,the relation extraction can be divided into supervised relation extraction,unsupervised relation extraction,weak supervised and semi-supervised relation extraction.However,to deal with the massive heterogeneous data,these methods have some limitations.Distant supervised relation extraction proposed by researchers is a method suitable for the task of extracting the relationship under big data.It aligns the relational instances of the knowledge base and the corpus using Heuristic,to provide training sample for extractor.The distant supervised relation extraction is based on the assumption: if two entities participate in a relation of the existed knowledge base,any sentences that contain those two entities might express that relation.This assumption leads to the problem of noise data.For example,a sentence does not express this relation though it contains a pair of entities corresponding to that relation in the knowledge base.To tackle the noise challenges,the method that using matrix completion on distant supervised relation extraction has been proposed,which seeks to recover the underlying low-rank matrix and separate the error matrix.We propose a method for distant supervised relation extraction by matrix completion via truncated nuclear norm regularization,which is based on the low rank matrix completion technology,changing the problem of minimizing the nuclear norm into the minimization of truncated nuclear norm.In this paper,we propose a method to to achieve a better approximation to the rank function and keep effective ingredient of the matrix,which can also achieve high precision and good noise immunity.The nuclear norm of the matrix is the sum of the matrix singular values.However,instead of the sum of singular values,the number of singular values is directly related to the rank of the matrix.Singular value sequences are arranged in descending order,showing a tendency of rapid decay.The size of the nuclear norm can't be completely equivalent to the rank of the matrix.The reduction of the norm does not mean the rank of the matrix is reduced.So the accuracy of distant supervised relation extraction is affected.Truncated nuclear norm is the remainder of the largest portion of the singular value removed.In addition,we select the truncation position by investigating the distribution of singular values,and use TNNR-ADMM and TNNR-APGL algorithm to solve the optimization problem of minimization of truncated nuclear norm.Experiments on NYT'13 dataset demonstrate that our method can achieve better performance of relation extraction than former method which is based on nuclear norm.
Keywords/Search Tags:Distant supervision learning, Relation extraction, Low-rank matrix completion, Truncated nuclear norm
PDF Full Text Request
Related items