Font Size: a A A

Research On Chinese Text Classification Based On Hybrid Neural Network Model

Posted on:2021-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:P B ShiFull Text:PDF
GTID:2518306113467184Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,deep learning theory has been widely used in various fields,such as machine translation,speech recognition,computer vision and so on.Nowadays,the Internet age is full of text information,and the text data produced every day is growing exponentially.These text data show a large number of diversity and low value density.How to extract valuable information from these text data is an important research direction in the field of artificial intelligence.After deeply understanding the current situation and related theoretical knowledge of text classification,this paper takes text classification as a research problem,applies deep learning model to the field of text classification.In order to further improve the accuracy of model classification and solve the problem of sparse text representation,this paper proposes a hybrid neural network model based on convolutional neural network(CNN)and long short memory network(LSTM).At the same time,it improves the word embedding layer,deepens the importance of subject words,and extracts local features and context semantic information in the text.The main contents and innovations of this paper are as follows:First of all,this paper briefly introduces the research status of NLP and deep learning theory,and analyzes the importance of NLP technology research.The general process of text classification task and related technical algorithm are fully introduced.Secondly,in order to further improve the importance of subject words,TFIDF value is introduced into the generation of word embedding layer.TF-IDF value can measure the particularity of words for a class of documents,emphasizing the high-frequency feature words in the category.The word vector generated by word2 vec and TF IDF value are weighted to form a secondary embedding layer.In order to improve the accuracy of text classification,convolutional neural network(CNN)and long-term memory network(LSTM)are combined to extract text features.Combining the feature extraction performance of the two models,not only the local features of the text can be extracted,but also the context semantic information can be captured.Finally,dropout random deactivation strategy is added to improve the anti over fitting ability of the model.Finally,through the task of text classification on the Chinese text data set,the above improvements are combined to achieve a complete text classification system architecture and experiment.Compared with the traditional machine learning methods support vector machine(SVM),naive Bayes,convolution neural network model(CNN)and the improved hybrid neural network model proposed in this paper,the improved hybrid neural network model has better classification effect from the accuracy rate,recall rate and F1 value.
Keywords/Search Tags:Text categorization, CNN, LSTM, word2vec, TF-IDF
PDF Full Text Request
Related items