Font Size: a A A

Chinese News Classification Based On Multi-scale CNN And LSTM Hybrid Model

Posted on:2022-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2518306566960989Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data and the rapid development of information technology,all kinds of new media become an important means of information dissemination,and news,as the carrier of information,it shows explosive growth.As a very popular research direction in deep learning,text classification has been concerned by the majority of researchers.How to classify massive text intelligently with high accuracy and extract the effective information for human is the main task and purpose of text classification.Different from many limitations of classification methods based on machine learning,the text classification model based on deep learning can extract the features of data and improve the accuracy of text classification through neural network.In deep learning,convolutional neural networks(CNN)and recurrent neural network(RNN)are two kinds of mature frameworks commonly used in text classification task.This paper proposes a hybrid model based on convolutional neural network and long short-term memory networks(LSTM),which improves the classification model of data cleaning,feature extraction and deep learning in text classification.The main work and innovation of this paper:(1)This paper introduces the research background and significance of text classification,analyzes the research status of cyclic neural network and convolutional neural network and the related theory and technology of text classification,and focuses on the feature selection methods in the current text classification: DF,CHI,MI,IG,weight calculation methods:Boolean weight,TF-IDF.It lays a foundation for the study of optimizing text feature extraction.(2)Feature extraction in text plays an important role in text classification.This paper presents a new process of extracting text features.After stopping words and word segmentation in the past,we use TF-IDF to calculate the weight of each word,multiply it by the word vector of the word,and then input it into Skip-gram model to reduce the dimension of the word vector and make it have semantic information.Finally,we use TF-IDF-CHI to judge the relevance between a vocabulary and its category.At this time,the word vector is low latitude and has different importance,which has a crucial impact on the accuracy of text classification.(3)In this paper,a multi-scale CNN and LSTM hybrid model is proposed to deal with the text classification task.The advantage of CNN is that it can obtain the local features of the data through convolution operation.Different scales of CNN can capture different text features.The higher level of convolution can alleviate the problem that important features may be lost in the convolution process through feature reuse.As a variant of RNN,LSTM solves the problem of RNN gradient explosion or gradient disappearance,and can deal with the context dependence of text data better.The feature vectors obtained from multi-scale CNN and the feature vectors trained by LSTM model are fused through Merge layer.The new feature vectors have the advantages of both models.They not only have high features,but also contain the context information in the text data.Finally,they are classified by softmax function to achieve better classification effect.
Keywords/Search Tags:natural language processing, text classification, feature representation, convolution, long short-term memory networks
PDF Full Text Request
Related items