| With the rapid development of technology and the popularization of the Internet,digital information resources have been integrated into all walks of life and existed in various media platforms,resulting in an explosive growth of various types of text information.In order to filter out false or negative information and retrieve useful information quickly and efficiently,text classification has become an important part of natural language processing tasks,and been the focus of researchers.The vigorous development of deep learning technology in recent years has further promoted the application of text classification theory and technology in various fields.In view of this,the study of text classification has important theoretical significance and practical application value.Traditional machine learning algorithms were mostly used in early research on text classification,where categories need to be preset in the texts for classification processing.It relied on complex feature engineering to extract features,and the performance was susceptible to human subjective intervention.Besides,the labor cost required for feature extraction on large-scale data was enormous.In this case,deep learning models with the ability to automatically learn features have gained the increasing attention,and achieved results that go beyond machine learning.However,the existing deep learning models for text classification also have problems: on the one hand,most of the text classification models adapt shallow neural network models and one-versus-all classification schemes,resulting to imbalanced classifier datasets,difficulty in learning decision boundaries,and inability to obtain semantic relationships between long-distance words in the text;On the other hand,when performing multiple feature extraction by hybrid model,it involves the characteristics of word vector representation,the way of model fusion and different weights of different features.To deal with the above two problems,two types of text classification models are proposed in this thesis based on deep learning.Firstly,a text classification model based on one-versus-one strategy and deep pyramid convolution neural network is proposed in the thesis.Word2 vec is used to obtain word embedding vectors.At the same time,in order to realize one-versus-one scheme,an encoding matrix is constructed to perform text target category label conversion;Then,the convolutional operation is improved to better extract local feature information of the text,and the improved deep pyramid convolutional neural network model is employed to learn the semantic relationship between long-distance words in the text and further extract text features;Finally,in the output layer,oneversus-one scheme is adopted to realize classification,which has a balanced classifier dataset and is easier to perform decision boundary learning.Secondly,on the basis of the above model,a multi-channel hybrid model based on ngram2 vec and gating mechanism for classification is proposed.Word embedding vectors pre-trained by ngram2 vec are first employed to represent the text;Then,on the one hand,the bidirectional short-term memory network is used to extract the context features from the text vector representation,and the attention mechanism is added to obtain the important context features.On the other hand,the deep pyramid convolutional neural network is adopted to obtain the semantic relationship of the long-distance words on the basis of extracting the local features of the text;Then,considering the different impacts of features from different channels on classification results,the gating mechanism is introduced to fuse the different features obtained from the above two channels;Finally,one-versus-one scheme is applied to realize text classification for the final text feature representation.Finally,the two models proposed in this thesis are simulated.The experimental results on several datasets show that compared with the existing models,the two models proposed have achieved higher classification accuracy,and are better under the indicators of accuracy,recall and F1 value,which further verify the effectiveness of the proposed models and the corresponding theories. |