Font Size: a A A

Research On The Classification Method Of News Text Based On Deep Learning

Posted on:2021-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:C C ZhangFull Text:PDF
GTID:2518306032467894Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid increase of data volume,the data in text form accounts for a large proportion.As the most popular text mining technique,text classification can find valuable information in a large amount of messy text data.In the field of text classification,the primary goal is to keep the classification accuracy and reduce the time consumed on classification.Therefore,this paper studies the classification model of News text based on deep learning convolutional neural network model.The main research work is as follows:(1)Aiming at the sparsity and context dependence of the News text,a method of preprocessing and feature extraction for the News text dataset is proposed.In this method,word segmentation is applied to the News text,all the text is transformed from the original article form into the phrase form,and then the stop-word processing method is used on the dataset to reduce the impact of noisy data on the classification model.At the same time,word2vec tool is used to carry out word vector training for the preprocessed text,so that the word vector can be input into the Embedding layer in the form of multi-dimensional data.When learning the characteristics of words,the model can associate the content of words in context as the classification result,so that words with different word frequencies but related to each other can play a certain role in the classification task.(2)Aiming at the weak generalization of News text classification,this paper improves the text classification model,which is based on the text classification algorithm of deep learning convolutional neural network and the word vector training method.By controlling the model parameters,this model realizes three different forms of text vector for Embedding.Different classification models are obtained through different training of text word vectors,and different classification effects of each model are compared and analyzed.Finally,the optimal algorithm model is determined.(3)To solve the problem of unbalanced distribution frequency of News text categories,a hierarchical softmax structure based on Huffman tree is proposed.The hierarchical structure softmax is established by category statistics to replace the previous method of flat structure calculation.The training speed of multi-classification model is improved and the time complexity of the model's calculation probability is reduced.Through the comparison of each model's classification effect on the test set,the optimal model is obtained.In this model,the word vector trained with word2vec tool is entered into the Embedding embedded layer,and combine the convolutional neural network algorithm in the form of dynamic parameters participate in the training model continually.The classification accuracy trained on the test set with the proposed classification model is 93.87%,which is outperforms traditional models by nearly 3%.
Keywords/Search Tags:Text classification, Term vectors, Noise data, Word characteristics, Huffman tree
PDF Full Text Request
Related items