Font Size: a A A

Research And Application Of Automatic Augmentation Of Text Data Based On Neural Network Architecture Search Ideology

Posted on:2021-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:X L LuFull Text:PDF
GTID:2428330614970999Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Natural language processing is an important branch of artificial intelligence.In recent years,the combination of deep learning and natural language processing has brought a new breakthrough in this field.However,the basic element of the neural network model in deep learning depends on the training set's data quality.In recent years,the problem of insufficient data has frequently occurred.In response to this problem,data augmentation was born.Data augmentation is to use the existing training data to expand the data by a certain method to enhance the accuracy and robustness of the classifier.Existing research on data enhancement is mainly divided into image augmentation and text data augmentation.There are many researches on the field of image augmentation,including flipping,rotating,scaling,cropping,shifting,affine,etc.The methods of "text data augmentation" include back translation,easy data augmentation,non-core word replacement,contextual augmentation text augmentation based on language generation model.In recent years,neural network architecture search has been applied to the field of image augmentation to achieve image augmentation automatically.However,few researches of neural network architecture search on the field of text augmentation are studied.This article will base the research on this.Easy data augmentation is used to the augmentation method of the paper and uses neural network architecture to automatically search for the specific augmentation strategies of the target data set.It is proved that the use of this method can improve the performance of the classifier to a certain extent.The paper proposes a new text augmentation method,which is augmenting text data automatically.This method is based on reinforcement learning and draws on the idea of neural network architecture search.Firstly,the controller searches the search space for an augmentation strategy for the target data set based on reinforcement learning,and applies the augmentation strategy to the child-model training.Then,the child-model is used to classify the validation set.Finally,the controller gets classification feedback and updates the strategy.The paper mainly adopts the research form of contrast research.The three comparison models are:(1)the training set is the original corpus without data augmentation processing;(2)the training set is the data obtained from the original corpus through easy data augmentation.(3)The training set is the augmentation data obtained by that text data augmentation strategies automatically searched by the neural network is applied to the original corpus.Three text classification data sets are used in experiments including trec,sst2,cr and each data set was tested using three magnitudes of sub-balanced data sets including"500","2000",and "full set".The effect of the classifier under three experimental conditions in the condition of different data magnitudes;at the same time,in order to compare the difference between the classification effects between RNN(Recurrent Neural Network)and CNN(Convolutional Neural Network),the type of neural network in the child-model of the neural network architecture and the classifier neural network will be changed simultaneously to compare to the classification effects.Finally,the automatic augmentation method of text data is applied to the user evaluation function of the commodity risk assessment system to verify that the automatic augmentation method of text data has practical significance in the actual background.The paper research draws three conclusions:1)In the text classification problem,the use of searching augmentation strategies automatically method brings a certain degree of improvement in the classification effect;as the magnitude of the data set increases,the effect of the automatic augmentation method presents a downward trend.The improvement effect is obvious on small data sets,but it is not obvious on big data sets.The three text classification data sets have increased by an average of 1.8 points on the magnitude of the "500" data set,but on large data sets,the improvement effect is not obvious.2)The effect of the automatic search enhancement strategy method is slightly better than that of the easy data augmentation,and as the magnitude of the data set increases,the gap between methods is reduced.3)On the three data sources,under the same data magnitude and the same training data,the accuracy of the test set on CNN is higher than the accuracy of the test set on RNN.However,compared with CNN,the method of automatic search augmentation strategy in RNN brings a significantly higher effect and the accuracy on the 500 dataset is increased by an average of 2 percentage points,while the accuracy rate of CNN is increased by about one point.
Keywords/Search Tags:automatic augmentation of text data, easy data augmentation, neural network architecture search, reinforcement learning, convolutional neural network, recurrent neural network
PDF Full Text Request
Related items