Font Size: a A A

Research On Text Classification For Explainability Bidirectional Transformers Language Model

Posted on:2021-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y J YuFull Text:PDF
GTID:2428330626458915Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an important method of data analysis in the era of big data,deep learning has attracted widespread attention in the scientific research community at home and abroad in recent years.Text classification is to classify text into one or several categories according to certain rules in a given classification system.The text classification system generally includes news classification,sentiment classification,social website comment classification,etc.Therefore,in order to realize the calculation of massive text data in the Internet,researchers in the field of artificial intelligence have proposed deep learning algorithms with deep networks.Progress is important.The rapid development of the information era has produced an explosive growth of big data texts.In the face of massive and unstructured text data,the problem facing researchers and scientists is no longer how to obtain the required text data,but rather how to accurately and efficiently extract information that meets the needs from massive text data in the context of big data.This article sorts out the development history of machine learning and deep learning on text classification tasks,and then leads to the BERT model that has been of great significance in solving natural language processing in recent years.It introduces the internal mechanism and training methods of BERT model in detail,and further understands the language model in Various technical methods in pre-training and fine-tuning.After in-depth research,this article proposes to make several innovations and improvements based on the BERT model,so that the model in this article can effectively solve the shortcomings of the BERT pre-training method.Through some techniques increase the interpretability of the model,fine-tune text classification for downstream tasks,and obtain an explainability bidirectional transformer language model.The work in this paper is summarized as follows:(1)Factorization parameterization.This paper considers that the large word embedding matrix is decomposed into two small matrices to separate the dimensional parameters of the output layer from the dimensional parameters of word embeddings.This separation makes it easier to increase the dimension of the output layer without significantly increasing the size of the word embedding parameters.(2)Average maximum pooling,average pooling averages neighborhood features,maximum pooling maximizes neighborhood features,and combining the two can better retain features extracted from text.Different information will be captured from different layers in this model.Not only the overall features of the original data,but also the texture features can be better preserved through average maximum pooling.(3)Dense residual connection between levels.This paper proposes to increase the dense residual connection between different multi-head self-attention layers,which can effectively prevent the degradation of the neural network,and at the same time,by reducing the number of parameters and calculation operations,it is feasible.accuracy.(4)New pre-training method: sentence continuity prediction.After in-depth study of BERT model,this paper finds that two methods in the pre-training phase,Masked LM and Next Sentence Prediction,are the basic ideas of BERT pre-trained language models.The language model needs to be supplemented and improved based on the existing training methods.This article finds that the Next Sentence Prediction method can only achieve the effect of predicting the relationship between sentences.Therefore,a method for predicting sentence coherence is proposed.Predict the relationship between sentences in multiple situations.The experimental results show that the above-mentioned four improved techniques on the one hand reduce the memory consumption and improve the training speed of the model,on the other hand,improve the interpretability of the model,making the model better applied to natural language processing tasks.Compared with the original BERT,the model in this paper has fewer parameters than the BERT.The pre-training stage has improved the experimental results in the prediction of cover words and sentence continuity.The fine-tuning stage improves the classification accuracy rate on the text classification task.This paper also uses the L2 loss,which focuses on modeling the coherence between sentences and shows that it can always help downstream tasks through multi-sentence input.The experimental results show that the model in this paper has obtained the best technical results on the text classification task.The text classification accuracy rate on the IMDB dataset reaches 97.90%,which is 0.48% improvement over the BERT model.The text classification on the 20 News Groups dataset.The accuracy rate is 98.20%,which is 0.40% improvement over the BERT model.The experimental results show that the interpretable bidirectional coding language model proposed in this paper can effectively improve the performance of pre-trained language models,improve the interpretability of text representations,reduce the model size,and effectively improve the speed of pre-training and the accuracy of text classification.
Keywords/Search Tags:BERT, explainability, attention mechanism, text classification, pre-training
PDF Full Text Request
Related items