Font Size: a A A

Research On Text Classification Based On Self-Attention Mechanism

Posted on:2021-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2428330602972240Subject:Engineering
Abstract/Summary:PDF Full Text Request
Natural language processing is a sub field of computer science and artificial intelligence.It is one of the core issues in the field of artificial intelligence.It aims to study how to make computers analyze and process a large number of natural language data of human beings.Text classification is one of the basic problems in the field of natural language processing.It assigns a type of label to a specific text unit.Text classification has a wide range of applications,including recommendation,Q & A,emotional analysis,spam detection,news classification,user intention classification and so on.The BERT pre-trained model released by Google in 2018 broke the record of many natural language processing tasks as soon as it was launched.In recent years,the research of transfer learning based on this pre-trained model has been carried out in full swing.However,it is very complex to understand BERT.In order to catch its principle,a lot of previous studies and researches are needed,including the traditional recurrent neural networks(RNN),long short-term memory(LSTM),sequence to sequence neural(Seq2Seq)networks with attention mechanism,etc,.Only in this way can we catch and grasp the essence of BERT.This paper reviews the traditional and deep learning methods to solve the problem of text classification,combs out the basic process of text classification,focuses on the self-attention mechanism and the operating principles of the Transformer encoders,as well as the similarities and differences between BERT and Transformer,and puts forward several attempts to solve the problem of text classification with BERT.The main research work of this paper is as follows:(1)The self-attention mechanism and the operating principles of Transformer encoders are studied.The self-attention mechanism and the encoder module of Transformer is the cornerstone to understand BERT.Only by understanding the self-attention mechanism can we catch the reason why BERT is superior to other natural language processing models as a text feature extractor.Besides,we can also have the knowledge of the way that BERT get pre-trained.Since BERT is a stack of Transformer's encoders,it is very important to understand the overall architecture of Transformer and the operating principles of the encoders.(2)Carries out data preprocessing according to the characteristics of training set.The quality of data is crucial.It limits the performance of a desiganated machine learning model.Therefore,proper data pre-processing work should be done carefully.Only then can the advantages of the BERT pre-training model be better utilized.(3)Deploys BERT pre-trained model for transfer learning.In this paper,two methods are proposed for transfer learning.The first one is to fine-tune BERT by using the data set of the competition;the second one is to output the sequence vector of the last layer of BERT,and then do mean-max-pooling for the vector,in order to extract the mean value and maximum value features of each sequence.After that,the new sequence feature vector is linearly transformed by feedforward neural network for text classification.Finally,the comparison between the two methods shows that the model performance of the second method is better than that of the first method by nearly 1.6%.Therefore,as far as the data set that is concerned,the second method is comparatively more meaningful.
Keywords/Search Tags:BERT, self-attention mechanism, Transformer, fine-tuning
PDF Full Text Request
Related items