Font Size: a A A

Unsupervised Relation Extraction Based On Matrix Factorization

Posted on:2019-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:J M HuangFull Text:PDF
GTID:2428330545486958Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Relation extraction is one of the fundamental tasks of information retrieval.Traditional supervised and semi-supervised methods in this area require labelled training datum or existing knowledge base,which limits their employment in a new area without any prior information.While unsupervised methods deem the task as a cluster problem,they can extract new relation instances from a corpus based on the context information.Yet previous unsupervised models yield poor performance with a complex structure,due to the high dimensional and sparsity of the co-occurrence matrix of entity pairs and relation mentions.Additionally,they introduce the discrete feature vector generated from hand-crafted feature sets to embed semantic information of relation mentions,which is also high-dimensional and sparsity.It increases not only the complexity of the model but also the sparsity of the co-occurrence matrix.Therefore,we propose a new unsupervised relation extraction model from the perspective of matrix factorization.Our model aims to reduce the complexity of the process and incorporate new semantic information,resulting in better training efficiency,flexibility and performance.The whole method consists of three parts:Firstly,we propose a co-occurrence matrix factorization method using negative sampling.We learn the representation of entity pairs in relation space,with negative sampling to reduce the complexity of the method and fully leverage the limited information.Secondly,we propose a multi-layer matrix factorization method to introduce deep semantic reinforcement.With a multi-layer matrix decomposition model,we manage to avoid the additional complexity of the introduced representations.Besides,the employed embeddings of relation mentions generated with word embeddings is a low-dimensional dense vector with no noise from downstream NLP tools.Lastly,we propose a novel neural relation extraction model named NURE-DSE,which combines two proposed models and trains itself with back propagation.The model benefits from both methods,and calculates the update gradient of parameters automatically.It is simple,efficient and high flexibility,which fits the corpus in web-scale.Experimental results in NYT10 dataset demonstrate the effectiveness of our method.It outperforms existing methods in F1 score and yields expressive embeddings of entity pairs in relation space.
Keywords/Search Tags:unsupervised relation extraction, representation learning, negative sampling, word embedding, matrix factorization
PDF Full Text Request
Related items