Font Size: a A A

Application Of Improved Deep Learning Algorithm In Chinese Text Classification

Posted on:2021-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:M J WangFull Text:PDF
GTID:2428330620465752Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
The classification problem is a very important and universal problem at present.There are many problems in our life that are all classification problems in the final analysis.As the core of Internet text processing and information retrieval,text classification occupies a very important position in the field of natural language processing.Nowadays,the number of Chinese news texts on the Internet has exploded exponentially.How to accurately and effectively categorize massive news data and then extract effective information from them is an urgent problem to be solved.Although the traditional text classification method can improve the classification effect,there are still problems such as dimensional explosion and sparse features.With the wide application of deep learning in text classification,it can effectively avoid these problems and achieve significant results.The thesis focuses on deep learning technology and use more efficient methods to achieve news text classification to improve the efficiency of information retrieval.It mainly applies the model fusion techniques in deep learning to Sogou news text classification tasks.The specific research contents are as follows:(1)Taking Sogou News text data as the target data set,first of all,the easy data augmentation(EDA)was introduced to deal with the serious imbalance of sample data,that is,simple data augmentation,which expanded the categories with a small number of samples.The classification effect after data augmentation was better than before data augmentation,which proves that the introduction of EDA can effectively improve the generalization ability of the model.Then,convolutional neural network(CNN),bidirectional gated recurrent unit(BiGRU)and attention mechanism are organically combined,and a CBA(CNN-BiGRU-Attention)model was proposed.Comparing the CBA model with the pure CNN,BiGRU,CNN-Attention model and the combination of two pairs of experiments,the results show that the CBA model has the highest accuracy,recall and F1 values,which were 0.8993,0.8995 and 0.9007,respectively.This model has excellent performance in news text classification tasks,and proves that each model is complementary.(2)On the basis of the CBA model,ensemble learning ideas were introduced to further improve performance.The ECBA(Ensemble-CNN-BiGRU-Attention)model was proposed,which combines two CBA models with different convolution kernel sizes and numbers into one.The performance classifier and the results obtained by means of probability average can effectively improve the anti-noise ability of the model and avoid the phenomenon of over-fitting of the model.The experimental comparison between the ECBA model and the CBA model shows that the accuracy,recall and F1 values of the ECBA model were 0.9058,0.9045 and 0.9067,which were higher than the CBA model,indicating that the ECBA model has a better classification performance in news text classification.
Keywords/Search Tags:Text classification, Convolution neural networks, Bidirectional gated recurrent unit, Attention mechanism, Ensemble learning
PDF Full Text Request
Related items