Font Size: a A A

Research And Application Of Text Classification Method Based On Deep Learning

Posted on:2022-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:2507306557464324Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Nowadays,automatic text classification technology is widely used in the fields of machine translation,human-machine dialogue and recommendation systems.The current mainstream text classification algorithms are mainly divided into two categories: machine learning algorithms and deep learning algorithms.These two categories of algorithms are the extension of statistics,the processing and mining of Internet text data based on the two is actually the process of statistically estimating and classifying samples.In this process,we often face some problems of poor model classification performance caused by text vectorized representation that is too simple and high-dimensional,and the semantic information contained in the vector is not rich enough.In response to the above problems,this thesis mainly completed the following tasks:1.Aiming at the problem of poor model classification performance caused by the use of high-dimensional and sparse text feature vectors when the Support Vector Machine(SVM)model is used for text classification,combined with the word embedding idea in deep learning,proposed a Word2vec-SVM classification model for text classification.After comparing with the two mainstream text classification models in current machine learning algorithms-Naive Bayesian Model(NBM)classification model and K-Nearest Neighbor(KNN)classification model and the unimproved SVM classification model,the results show that the accuracy of the Word2vec-SVM classification model is 1.44% higher than the naive Bayes classification model that performed better in the two comparative experiments,and the F1 value is 0.0166 higher than the naive Bayes classification model,which shows the effectiveness of the Word2vec-SVM classification model.2.Aiming at the problem that when using the Long Short-Term Memory(LSTM)text classification model for text classification,the performance of the model is often restricted by the loss of text semantics.So this thesis has carried out the embedding operation of the Self-Attention layer before the network of LSTM and obtain the Attention-Based LSTM classification model.Then put the model with the Convolutional Neural Networks(CNN)classification model and Fasttext classification model and LSTM classification model into a comparative experiment on the same data set,and the results showed that the classification accuracy of the Attention-Based LSTM classification model fused with text semantic information was 1.22% higher than that of the Fasttext classification model that performed better in the two comparative experiments,and the weighted average F1 value increased by 0.72%.This shows that when considering the use of deep learning algorithms for text classification,the Attention-Based LSTM classification model proposed in this thesis has a better classification effect and improves the performance of text classification.3.Based on the above research results and adopting the integration ideas in machine learning,the CNN classification model,the LSTM classification model and the Attention-Based LSTM model that have undergone word embedding processing are selected as the base classifiers,and the Gradient Boosting Decision Tree(Gradient Boosting Decision Tree,Referred to as GBDT)is selected as a meta-classifier,then using the stacking method to design an integrated classification model based on Stacking.The results of comparison experiments show that the classification effect of the integrated model is better than that of a single classification model,which shows the effectiveness of the integrated classification model.
Keywords/Search Tags:text classification, word embedding, self-attention mechanism, deep learning model, ensemble model
PDF Full Text Request
Related items