Font Size: a A A

Research On Distantly Supervised Fine-grained Entity Typing Technology Based On Semi-supervised Learning

Posted on:2021-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:B ChenFull Text:PDF
GTID:2428330623969220Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Fine-grained entity typing is a key and basic task of information extraction.This technology provides important technical support for downstream applications such as information extraction,knowledge graph construction,and question answering system.Because it is difficult to obtain the manually labeled data of fine-grained entity typing,the industry and academia usually adopt distant supervision to obtain the training corpus required to build a fine-grained entity typing system.Distant supervision brings the problem of “Noisy Label” to fine-grained entity typing.How to construct a finegrained entity typing system using data with noisy labels is the problem so called distant supervised fine-grained entity typing,which is also the core problem of current finegrained entity typing technology.The existing methods in the usage of noisy data has some problems such as loss of training corpus and vulnerability to Confirmation Bias.Semi-supervised learning is a learning method which combines supervised learning with unsupervised learning.It uses a large amount of unlabeled data to improve the performance of the model.It usually depends on the global and local consistency of the data to use unlabeled data.From the perspective of semi-supervised learning,this thesis also relies on the global and local consistency of the data to use noisy data and proposes two methods to study the technology:(1)A distant supervised fine-grained entity typing method based on compact latent space clustering: In order to bypass the problem of Confirmation Bias and efficiently utilize data with noisy labels,this method constructs a regularization based on Label Propagation(LP)to constrain the feature extractor of the model,thereby modeling the local and global consistency assumption.This constraint makes the samples with high label similarity close in feature space.In the calculation of label similarity between noisy data,this method proposes to perform a partial label-based label propagation method on the sample representation space to obtain the label distribution of each sample and use this distribution to calculate the label similarity.By compressing latent space clustering,the method effectively uses noisy data while bypassing the problem of confirmation bias.The method exceeds the results reported by similar algorithms at home and abroad at the time of publication on two benchmark datasets(the method is published in NAACL 2019).And under high noise conditions,it is significantly better than the method based on partial label loss under the same conditions.By using only 27.9% of the data of the public dataset BBN,this method achieves an equivalent result to the previous best method result.(2)A distant supervised fine-grained entity typing method based on virtual adversarial training: From the perspective of local distributed smoothness,this method constructs a regularization that can increase the local distributed smoothness through virtual adversarial training,thereby modeling the local consistency assumption.This method first constructs a perturbation that maximizes the local distributed smoothness in the representation space for each sample,and then constructs a regularization on the classifier based on the assumption that the classifier's prediction remains smooth under the perturbation.Because this method is based on self-supervision,there is a problem of Confirmation Bias.In the case of high noise,this thesis proposes a heuristic sample selection function to alleviate Confirmation Bias.Under high noise conditions,this sample selection function brings a great performance improvement.In addition,this thesis also explores the effect of the combination of part-of-speech tagging auxiliary task training and virtual adversarial training,hoping that the model learns the part-of-speech information to help its classification.With the help of these techniques,the method significantly improves the performance of the basic model on two benchmark datasets and exceeds the current state of the art.The two methods proposed in this thesis provide a brand new idea for distantly supervised fine-grained entity typing.Compared with previous methods,they make better use of noisy data to train models,and achieve the current best on two public datasets.In addition,the method proposed in this thesis achieved a first domestic overall result on the Entity Recognition and Entity Discovery track of the 2019 TAC KBP competition.And the method is also applied to the knowledge computing engine in the construction project of China Engineering Science and Technology Knowledge Center.
Keywords/Search Tags:Fine-Grained Entity Typing, distant supervision, compact latent space clustering, virtual adversarial training, semi-supervised learning
PDF Full Text Request
Related items