Research On Auxiliary Annotation Method Based On Attribute Controllable Text Representation Generation

Posted on:2023-08-28

Degree:Master

Type:Thesis

Country:China

Candidate:Y S Deng

Full Text:PDF

GTID:2558307172958299

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Natural language processing tasks in the era of deep learning often rely on large-scale training data for supervised training,but the cost of manual data annotation can be very expensive and time-consuming due to the professionalism of annotators and the scarcity of domain experts.Some researches carry out automatic text annotation through relevant corpus and expert rules in the field,but in many minority research fields,there is no largescale structured corpus to use,and it is difficult to formulate accurate expert rules.Other studies expand the training data through the text generation model to improve the performance of the automatic annotation model under weak supervision,but the performance of the generation model is also poor under less training data.Based on the above problems,an Attribute Controlled Text Representation Generates Auxiliary Annotation Models(ACTRAnno)is proposed,which is mainly used in annotation tasks in the field of text classification.ACTRAnno carries out labeling in the way of batch incremental iteration.In each round of labeling process,all unmarked data are pre labeled,and then pre labeled samples of fixed batch size are selected for manual labeling.ACTRAnno uses the attribute controlled text representation generation model to expand the training data.The generation model does not directly generate the text,but generates the text representation vector as the input of the downstream task.And it makes the downstream classification model share some network parameters with the text representation generation model to reduce error propagation and improve the performance of text generation as a data enhancement tool for the downstream text classification task.The active learning method is used to construct the most effective training set for the overall training of the annotation model in the form of incremental iteration,so as to further improve the accuracy of the model.Aiming at the selection strategy based on model uncertainty in active learning,which leads to the unreliability of model uncertainty due to the lack of early data,a two-stage active learning model is proposed,which provides a scheme for constructing a more accurate text annotation model under weak supervision.Experiments show that on the IMDB,SST-2,YELP-2,and AG News datasets for the text classification task,the model accuracy improves by an average of 2.00% to 3.35% when the training samples are less than 1000,compared to the case with no data enhancement,and further improves by 1.4% when using the two-stage active learning strategy.

Keywords/Search Tags:

Automatic labeling, Text classification, Data augmentation, Weakly supervised learning, Active learning

PDF Full Text Request

Related items

1	Research On The Scheme Of Optimizing The Performance Of Classification Model In Weakly Supervised Learning
2	Research On Weakly-supervised Classification Methods Based On Samples And Labels Modeling
3	Weakly Supervised Named Entity Recognition Based On Online Encyclopedia
4	A Fine-grained Classification Algorithm Based On Deep Learning
5	Large-scale-Query-text Clustering Via Weakly-supervised Deep Learning
6	Research On Weakly-supervised Learning Based On Sample Selection Strategy And Contrastive Learning
7	Image Data Annotation And Recognition Based On Weakly Supervised Deep Learning
8	Robust Image Classification Algorithms With Weakly Supervised Learning
9	Research On Unbalanced Text Classification Based On Text Augmentation And Semi-Supervised Learning
10	Deep Learning Based Weakly Supervised Classification Method And Application