Font Size: a A A

Distant Supervision For Relation Extraction Based On Robust Principal Component Analysis

Posted on:2019-04-27Degree:MasterType:Thesis
Country:ChinaCandidate:H B DouFull Text:PDF
GTID:2428330548461245Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age,the problem of big data with explosive growth have been put forward.The purpose of information extraction is to extract useful structured information from vast quantities of unstructured text data,as one of the most important sub tasks,the relation extraction is used to detect and extract the semantic relations between the entities in the text.Relation extraction has four types: unsupervised learning,semi-supervised learning,distant supervised learning and supervised learning.Distant supervision for relation extraction is the most promising one,to provide training data it makes use of existing knowledge base and unstructured corpus by heuristic match.It does not need a lot of manual labelled data like supervised learning,nor does it produce the semantic drift phenomenon like semi-supervised learning,which enhances the adaptability of across domains,and it performs better than unsupervised learning in precision and recall.However,distant supervision for relation extraction also has shortcomings.(1)Noise problem.The heuristic match in distant supervision is based on the alignment assumption: if a pair of entities participate in a relation,all sentences that mention these entities are labeled by that relation name.But this assumption is not always established in some cases.For example,if an entity pair has no relation in the sentence,but the entity pair would also be mistakenly extracted the relation which already in the knowledge base,resulting in noise relation labels.(2)Sparsity problem.As we do not know which characteristics that have greater influence on relation extraction previously,so feature extraction can extract multiple text features,including lexical features,syntactic features,named entity tag features.However,many of these features in the training set maybe appear only once.For example,there are thousands of features after aligning the Freebase knowledge base and NYT'13 corpus,but every entity pair has only a few ones.To tackle the problems above,this paper's main work is as follows:(1)Transforming distant supervision for relation extraction into Low-Rank Matrix Recovery(LRMR),and selecting Robust Principal Component Analysis(RPCA)to implement distant supervision for relation extraction.Firstly,organize the data into the form of a matrix,and store the relation instances to be extracted as the unknown ones in the testing tuples of the matrix.In order to solve the sparsity problem,this paper assumes that the matrix has low rank.RPCA divides the original matrix into a low-rank matrix containing the main information and a noise matrix.To make use of the relevance of data by matrix decomposition,RPCA can predict the unknown data,which means the new relation instances.(2)A RPCA model with weighted nuclear norm is proposed.The traditional RPCA model with nuclear norm is used to shrink the singular value matrix with the same threshold value,and ignores that the size of the singular value is proportional to the importance of the information,which will reduce denoising effect.In this paper,weighted nuclear norm is proposed to RPCA instead of nuclear norm.It uses the threshold that is inversely proportional to the singular value to shrink the singular value in matrix decomposition,by the way,the model can slow down the shrinking speed of the large singular value and accelerate the shrinking speed of the small singular value.The model not only guarantees the low rank of the matrix,but also preserves the important information.The experimental results show that the RPCA model with weighted nuclear norm improves the precision and denoising effect of distant supervision for relation extraction.
Keywords/Search Tags:Relation Extraction, LRMR, Distant Supervision, RPCA, Weighted Nuclear Norm
PDF Full Text Request
Related items