Research On Text Classification Based On Deep Learning

Posted on:2022-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:T T He

Full Text:PDF

GTID:2518306512497054

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Text classification is an important part of natural language processing.Text classification is used widely at present,such as news topic classification,spam text filtering,information retrieval,public opinion supervision,emotion analysis and so on.With the development of Internet,users browse tiktok,micro-blog,We Chat,jitter,news client and electronic business platform,and conduct comments on their platform,which will generated a lot of data.These data are massive,complex,and disorder.It is difficult to complete text categorization by manpower alone.Therefore,the use of computer technology for automatic text classification has become the focus of current research.The current text classification model has the problems of weak text expression and incomplete text feature extraction.Therefore,this paper mainly studies two aspects of text expression and feature extraction,and proposes a high-precision text classification model.Firstly,this paper describes the principle of word2 vec model,and analyzes its advantages and disadvantages.In order to solve the problem that word2 vec model does not consider the importance of words,this paper introduces a method that TF-IDF algorithm and word2 vec fuse to become word vector.This method not only takes into account the semantic information between text words,but also takes into account the weight of words,which can more accurately represent the text semantics.At the same time,in order to make the TF-IDF algorithm more suitable for text classification tasks,this paper summarized the problems of traditional TF-IDF algorithm and improved the TF-IDF algorithm by using the distribution information of feature items within and between classes and distance information to form tf-idf-icp algorithm,which improves the classification degree of feature words.Then,the improved TF-IDF algorithm is combined with word2 vec model to form a word embedding layer,and in use of the word embedding layer form the input word vector.Secondly,this paper studies the classic neural networks in deep learning,summarizes their advantages and disadvantages,and finds that a single neural network can only extract one aspect of features,so this paper design two neural networks by selecting several neural networks with high text classification accuracy.There are two kinds of text classification models,one is ACNN(attention based on convolutional neural network),the other is ablcnn(attention base on Bi LSTM and CNN),which is based on attention mechanism.Finally,considering the problems of text representation and text features,the improved TF-IDF algorithm is combined with the two classifiers to form a text classification model,which improves the accuracy of text classification.This paper test on thucnews and online?shopping?10?cats.The experimental results show that the improved TF-IDF algorithm combined with word2 vec model can improve the effect of text classification.The accuracy is 97.38% on the thucnews data set and the accuracy rate on cats data set is 91.33% on the online?shopping?10?cats.In addition,the experimental results show that the difference between ABLCNN classifier and ACNN classifier is not significant,but the training time of ACNN classifier is less.So the more deep neural network combination is not a best choice.

Keywords/Search Tags:

text classification, term frequency-inverse document frequency algorithm, convolutional neural network, bidirectional long short-term memory, attention

PDF Full Text Request

Related items

1	Text Classification Research Based On Deep Neural Network And Attention Mechanism
2	Text Sentiment Classification Based On Attention Mechanism
3	Research Of Online Comment Text Sentiment Classification Based On Long-short Term Memory Network
4	Research On Network Intrusion Detection Method Based On Bi-LSTM
5	Research On Relation Classification Via Bidirectional Long Short-Term Memory Networks With Attention Mechanism
6	Short Text Sentiment Classification Based On Deep Learning
7	Research On Chinese Text Classification Method Based On Long And Short Term Memory Network
8	Research On Short Text Classification Method Based On Contextual Feature Expression
9	Research On Text Sentiment Classification Algorithm Based On Bidirectional Long Short-term Memory Network
10	Research On Text Emotion Classification Algorithm Based On Deep Learning Technology