Font Size: a A A

Research On Chinese Text Classification Based On Convolution Neural Network

Posted on:2018-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:G P NieFull Text:PDF
GTID:2417330569985096Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the amount of Internet data is increasing exponentially,the human society is entering the era of big data.So,how to storage the data and effectively manage and mining data value,so as to serve for the human society are the extremely important task both for academia and industry.Text data as one of the main carrier of information,compared to the image and video,takes least cyber source to dissemination in the case of carrying roughly the same amount of information,thus text data become the main carrier of information dissemination on the internet.Text classification is the most important part of text data management,the value of mining,since the traditional classification model expression ability is weak,has been unable to cope with the challenge of multi class problems and massive data in text data classification,thus,looking for new text representation and classification method has become the research hotspot.We adopt convolution neural network model in deep learning as classifier,use Word2 vec model as feature extraction of text.First,every word after word segmentation is mapped into a fixed length vector by Word2 vec,then,we traverse the text and transform the text to a matrix consist of word vector.The number of rows of matrix is determined by the maximum word number of text.If the text length is less than the maximum word number of the text,the text matrix insufficient part pads zero.Once we get the text matrix,we can input the text matrix to the convolution neural network model to finish the train task.Because the original texts contains number of words is very large,and is not conducive to the subsequent classification problem.We use LDA to extract the words associated with each subject in every text.After this,the number of words of the text is greatly reduced,which means the number of rows of matrix is greatly reduced.The experimental results show that the method of Word2 vec and convolutional neural network combination proposed in this paper is better.The traditional TF-IDF feature combines support vector machine or Naive Bayesian or random forest classifier gets the best results is 88%,88%,88% in the evaluation index of accuracy,recall rate and F1 value,while Word2 vec based on convolutional neural network in accuracy rate,recall rate and F1 value were 91%,91%,91%,comparing the three indicators in the convolutional neural network has better performance.
Keywords/Search Tags:Text Classification, Convolution Neural Network, Word2vec, LDA
PDF Full Text Request
Related items