Font Size: a A A

Research On The News Text Classification Based On Convolutional Neural Network

Posted on:2020-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J TaoFull Text:PDF
GTID:2428330575995195Subject:Information management
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and information technology,new media networks have become an effective platform for information interaction.Unstructured news text data,as an important form of information,presents an explosive growth.How to efficiently and accurately classify a large amount of news texts and extract the valuable information is one of the most popular research topics.Due to its short length,rare words,diverse expressions and grammatical structure,the news text classification is more difficult.Therefore,an effective text classification algorithm is urgently needed to better mine text semantics and extract valuable information from massive news texts.Since the concept of deep learning was first proposed in 2006,it has made significant breakthroughs in the fields of image recognition,speech recognition,machine translation and natural language processing.Compared with traditional machine learning algorithms,deep learning technology has the characteristics of automatic extraction of features and strong learning ability,which provides a good foundation for improving the accuracy and universality of news text classification algorithm.Currently the Convolutional Neural Network(CNN)has become a type of mainstream text classification model.This paper proposes a classification framework based on convolutional neural network,which has improved the key links of text feature representation,feature extraction and classifier construction in news text classification.The main work of this paper is as follows:Feature representation of the input layer of the convolutional neural network has a great influence on the final classification result.The word2vec technology based on distributed representation has achieved ideal results in text processing in recent years.The word2vec maps each word into a continuous dense vector in the dimensional space.The distance between the words can be used to measure the semantics similarity.In this paper,the input layer of the convolutional neural network model replaces the traditional one-hot vector with the word2vec,and the word2vec is trained by the CBOW model.Using this way,only the semantics of the context of the word can be obtained,the overall semantic information of the text is lacked.The LDA topic model can make up for this shortcoming in some extent,and the word2vec and the topic vector are spliced to obtain a more effective feature representation means,which can better extract the shallow semantic information of document.Based on the optimization of the input layer feature representation from the word granularity level of the text and try to combine the attention mechanism,give higher attention weights to the key features that affect the text classification results.Convolution kernels in different sizes are designed in the convolutional layer to extract the deep semantic features of the text compress and reduce the features in the pooling layer using the maximum pooling method,and finally the high-quality feature maps in the fully connected layer Connect and use softmax to get the category of the text.The experimental results show that the precision,the recall and the F1 value of the model are 96.4%,95.9%and 96.2%respectively.It shows that the improved CNN model can extract deep semantic features of text through the unique hierarchical structure,which provides powerful support for the establishment of efficient and accurate news text classification model.
Keywords/Search Tags:News text classification, word2vec, feature representation, CNN
PDF Full Text Request
Related items