Research On Chinese Text Classification Based On Hybrid Neural Network Model

Posted on:2021-07-21

Degree:Master

Type:Thesis

Country:China

Candidate:P B Shi

Full Text:PDF

GTID:2518306113467184

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

In recent years,deep learning theory has been widely used in various fields,such as machine translation,speech recognition,computer vision and so on.Nowadays,the Internet age is full of text information,and the text data produced every day is growing exponentially.These text data show a large number of diversity and low value density.How to extract valuable information from these text data is an important research direction in the field of artificial intelligence.After deeply understanding the current situation and related theoretical knowledge of text classification,this paper takes text classification as a research problem,applies deep learning model to the field of text classification.In order to further improve the accuracy of model classification and solve the problem of sparse text representation,this paper proposes a hybrid neural network model based on convolutional neural network(CNN)and long short memory network(LSTM).At the same time,it improves the word embedding layer,deepens the importance of subject words,and extracts local features and context semantic information in the text.The main contents and innovations of this paper are as follows:First of all,this paper briefly introduces the research status of NLP and deep learning theory,and analyzes the importance of NLP technology research.The general process of text classification task and related technical algorithm are fully introduced.Secondly,in order to further improve the importance of subject words,TFIDF value is introduced into the generation of word embedding layer.TF-IDF value can measure the particularity of words for a class of documents,emphasizing the high-frequency feature words in the category.The word vector generated by word2 vec and TF IDF value are weighted to form a secondary embedding layer.In order to improve the accuracy of text classification,convolutional neural network(CNN)and long-term memory network(LSTM)are combined to extract text features.Combining the feature extraction performance of the two models,not only the local features of the text can be extracted,but also the context semantic information can be captured.Finally,dropout random deactivation strategy is added to improve the anti over fitting ability of the model.Finally,through the task of text classification on the Chinese text data set,the above improvements are combined to achieve a complete text classification system architecture and experiment.Compared with the traditional machine learning methods support vector machine(SVM),naive Bayes,convolution neural network model(CNN)and the improved hybrid neural network model proposed in this paper,the improved hybrid neural network model has better classification effect from the accuracy rate,recall rate and F1 value.

Keywords/Search Tags:

Text categorization, CNN, LSTM, word2vec, TF-IDF

PDF Full Text Request

Related items

1	Research On Chinese Text Classification Based On Hybrid Neural Network Model
2	Text Categorization Algorithm Based On Machine Learning
3	Research On Text Classification Method Based On Bidirectional LSTM
4	Research On Text Similarity Recognition Based On LSTM
5	Research On Affecting Factors Of Word2vec Training Optimization
6	Application Research Of Text Content Monitoring And Analysis Based On Word2vec And SVM
7	Generation Technology Of Review In Specific Domain Of Social Network Based On LSTM
8	A Study On Text Categorization Based On Machine Learning
9	Research On Semantic-based Text Similarity Calculation Method
10	Research Of Hierarchical Text Categorization System Based On VSM And Rule Matching