Font Size: a A A

Research On Adversarial Generating Approaches For Data-missing Text Classification

Posted on:2022-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:Q M XueFull Text:PDF
GTID:2518306464966259Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text classification has always been an important problem in terms of Natural Language Processing.Because of the development of computing capability and the boom of data in the Internet,Deep-Neural-Network(DNN)-based text modeling approaches has been developing rapidly for years.However,the training of supervised text modeling approaches based on DNN require a big number of data on a specific problem setting.Specifically,there are two major problems: 1)Sometimes there are only training data available,but not their labels.2)Sometimes information of labels are known,but no training data is available.For other labels,we can only acquire there meta-data.In this paper,we investigate deeply into these two major problems in terms of data generation.Specifically,the main contribution of this paper is as follow:For the first problem,this work concentrates on domain-adaptation sentiment classification of text.Specifically,there are already some domain-adaptation models that adapt a model trained on labeled source domain to an unlabeled target domain.However,the existing models mainly work on making the feature extractor output domaininvariant feature vectors,while ignore the underlying information behind the unlabeled data.Seeing that,we propose a novel domain-adaptation framework DAML,powered by adversarial learning and mutual learning.On the target domain with only unlabeled data,DAML assigns pseudo-labels for unlabeled data through mutual learning.This way,two parallel models are able to exchange information they have learnt,leveraging the underlying information of unlabeled data.Experiments on several public datasets show that DAML overperforms all of the state-of-the-art models.This work has been published in AAAI-2020.As to the second problem,this work mainly considers the problem of zero-shot learning.In this setting,only a part of labels are with training data.The model needs to predict other labels without any relevant training data.Because labels themselves have little information to leverage,most existing models are rule-based.The performance of these models depend on the rules devised by experienced experts.To solve this problem,we propose a data-generating approaching ADG4 ZS,basing on adversarial learning and attention mechanism.During the adversarial learning,ADG4 ZS learns to generate pseudo-data for any given label,from a data piece with any other label.Consequently,ADG4 ZS generates pseudo-data for all the unseen labels.These pseudo-data are then used to train a DNN to predcit for unseen labels.Experiments on several public dataset show that the performance ADG4 ZS exceeds all the baselines.In summary,this paper mainly investigate on the application of data generation approaches on the two major problem where Deep Learning are hard to perform well.Leveraging techniques like model adaptation,pseudo-labels generation and data generation,we utilize the data to a deeper extent and make Deep Learning on these problems possible.Experiments on several public datasets shows the effectiveness of the proposed approaches.
Keywords/Search Tags:Text Classification, Domain Adaptation, Adversarial Learning, Mutual learning, Data Generation, Deep Learning
PDF Full Text Request
Related items