Font Size: a A A

Research On Text Entity Relation Extraction Based On Semi-supervised Learning

Posted on:2018-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:J H ZhangFull Text:PDF
GTID:2348330542487333Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information extraction is a hot direction in the field of natural language processing.Researchers are not only satisfied with the identification of named entities,but also focus on the mining of associations between massive entities.The research results can be applied to the construction of knowledge base,information retrieval,Q & A systems and other fields.Semi-supervised learning is a commonly used and effective method for machine learning,and it iterative training through a small number of seed marker samples and a large number of unlabeled samples.Based on the cooperative training method of semi-supervised learning,this paper proposes a text entity relation extraction method to improve the effect of relational classification.Firstly,lexical analysis and syntactic analysis are used to extract the key features and construct the eigenvectors.The goal is to generate the sparse matrices related to the corpus features.It lays a good foundation for the subsequent classification model training.Secondly,this paper adds the sample optimization module to the relation extraction task.Since the semi-supervised classification method is sensitive to the samples,a sample-optimized method of denoising and sub-sampling is presented in this paper.After the samples are marked and pre classified,remove the outliers and the isolated points of different types in the samples by using the sample-de-noising method that considers environment factor which provided by paper.The security samples and the boundary samples are retained,and the majority samples are subsampled to achieve the sample denoising and balance.This method can effectively improve the effect of entity relation extraction.Finally,this paper designs and constructs an entity relation classification model using the enhanced Tri-training cooperative training method.A sample measurement method is proposed in this paper based on information entropy and representative.The unmarked samples with higher metric values are selected and applied to the coordination training process.The three initial classifiers are trained by a small number of seed marker samples.The classifier is set as the target classifier in sequence,and the remaining two classifiers are used to mark the selected unmarked samples in each iteration.The result is input into the target classifier for training.After the iteration reaches the termination condition,the final entity relation classification model is obtained by voting method.Experiments with optimized samples show that the enhanced Tri-training method is superior to the Co-training method and the traditional Tri-training method.
Keywords/Search Tags:Entity Relation, Semi-supervised Learning, Cooperative Training, Classification Model
PDF Full Text Request
Related items