Font Size: a A A

Research On Text Classification Based On Deep Learning

Posted on:2022-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:T T HeFull Text:PDF
GTID:2518306512497054Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text classification is an important part of natural language processing.Text classification is used widely at present,such as news topic classification,spam text filtering,information retrieval,public opinion supervision,emotion analysis and so on.With the development of Internet,users browse tiktok,micro-blog,We Chat,jitter,news client and electronic business platform,and conduct comments on their platform,which will generated a lot of data.These data are massive,complex,and disorder.It is difficult to complete text categorization by manpower alone.Therefore,the use of computer technology for automatic text classification has become the focus of current research.The current text classification model has the problems of weak text expression and incomplete text feature extraction.Therefore,this paper mainly studies two aspects of text expression and feature extraction,and proposes a high-precision text classification model.Firstly,this paper describes the principle of word2 vec model,and analyzes its advantages and disadvantages.In order to solve the problem that word2 vec model does not consider the importance of words,this paper introduces a method that TF-IDF algorithm and word2 vec fuse to become word vector.This method not only takes into account the semantic information between text words,but also takes into account the weight of words,which can more accurately represent the text semantics.At the same time,in order to make the TF-IDF algorithm more suitable for text classification tasks,this paper summarized the problems of traditional TF-IDF algorithm and improved the TF-IDF algorithm by using the distribution information of feature items within and between classes and distance information to form tf-idf-icp algorithm,which improves the classification degree of feature words.Then,the improved TF-IDF algorithm is combined with word2 vec model to form a word embedding layer,and in use of the word embedding layer form the input word vector.Secondly,this paper studies the classic neural networks in deep learning,summarizes their advantages and disadvantages,and finds that a single neural network can only extract one aspect of features,so this paper design two neural networks by selecting several neural networks with high text classification accuracy.There are two kinds of text classification models,one is ACNN(attention based on convolutional neural network),the other is ablcnn(attention base on Bi LSTM and CNN),which is based on attention mechanism.Finally,considering the problems of text representation and text features,the improved TF-IDF algorithm is combined with the two classifiers to form a text classification model,which improves the accuracy of text classification.This paper test on thucnews and online?shopping?10?cats.The experimental results show that the improved TF-IDF algorithm combined with word2 vec model can improve the effect of text classification.The accuracy is 97.38% on the thucnews data set and the accuracy rate on cats data set is 91.33% on the online?shopping?10?cats.In addition,the experimental results show that the difference between ABLCNN classifier and ACNN classifier is not significant,but the training time of ACNN classifier is less.So the more deep neural network combination is not a best choice.
Keywords/Search Tags:text classification, term frequency-inverse document frequency algorithm, convolutional neural network, bidirectional long short-term memory, attention
PDF Full Text Request
Related items