Font Size: a A A

Text Augmentation Method Based On Label Relevance Weight Filtering Mechanism In Sentiment Classification

Posted on:2021-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:L M ShiFull Text:PDF
GTID:2518306113461884Subject:Economic big data analysis
Abstract/Summary:PDF Full Text Request
In the field of supervised learning,deep model training requires a large amount of labeled data,otherwise it is easy to over-fit.data is labelled by people,which requires a lot of time.In order to reduce the workload of annotating data,regular transformations are often performed on text to generate similar text,that is,data augmentation.But data augmentation also brings problems.One of difficulties is that because traditional augmentation of text data randomly selects words for random transformation,augmented data may be inconsistent with original data in emotional inclination and semantics.so augmented data may bring noise to the training of supervised learning models.In order to solve the above-mentioned randomness,the paper aims to select the less relevant text word for data augmentation through the relevance between labels and text words.In the sentiment classification tasks,sentiment labels are highly correlated with words whose emotional inclination are obvious.If the words that are highly correlated with the label are selected for transformation in the random augmentation,the words may be replaced by new words whose emotional inclination are opposite to the labels.It may cause the labels to be inconsistent with the newly generated data.The main work is as follows:(1).The calculation methods of relevant weight: the coefficient of Logistic Regression and the attention score of Label Embedding.(2).In four replacement augmentation methods,the top N(N is a hyperparameter)words with the least relevance to labels are selected for replacement according to the relevance weight(degree)between the sentiment labels and the text words,which called the weighted replacement augmentation.The paper uses the NLPCC2014 dataset to test the weighted replacement augmentation methods.The experimental results show that replacement augmentation methods based on the relevance between labels and text words can effectively improve accuarcy and F1-Score in sentiment classification tasks.
Keywords/Search Tags:Sentiment Classification, Text Augmentation, Logistic Regression, Label Embedding
PDF Full Text Request
Related items