Font Size: a A A

Research On Short Text Classification Based On Granular Computing Model And Convolution Neural Network Model

Posted on:2019-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:B WangFull Text:PDF
GTID:2348330545498856Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The premise of text classification is text representation,and the traditional methods of text representation are mostly in the form of counting,and this way of counting thinks that words and words exist independently of each other and do not take into account the semantic information of the text.Too many artificial factors are added to feature selection,which leads to the feature dimension being high,sparse and unable to represent text information effectively,and now a large amount of text information is generated on the Internet.Complex and varied topics pose many challenges to text categorization.If traditional machine learning classification algorithms are used to train models,generalization is weak,especially when data sets are unbalanced Therefore,how to better text representation and new classification algorithm become the key point of research.In 2006,deep learning gradually rose.At present,it has made great breakthrough and development in the field of speech and image.Many deep learning research results have better classification effect and excellent performance than traditional machine learning algorithms.The training model is more generalizable.In this thesis,we use the method suitable for the classification representation of short text and extend the feature of short text with granular computing model,and combine the convolutional neural network model to classify short text.Therefore,this thesis mainly does the following work:1.This thesis introduces in detail the flow of common short text classification tasks,including the key steps of data preprocessing,word segmentation,deactivation word,feature representation,etc.In addition,the characteristics of the short text book data are also analyzed.It will lay a foundation for the text feature extraction,text feature expansion and how to design the convolution neural network model for the text feature extraction,text feature extension and how to design the convolution neural network model.2.In this thesis,we did not build the feature representation of the short text,but based on the Skip-Gram neural network language model to train the word embedding of each word.In the training,in addition to their own corpus,we also added Wikipedia data.Therefore,it can better express the semantic information between words and words,and improve the expression ability of words' features.3.In this thesis,a method of extending text feature words based on granular computing model is proposed.Firstly,the word vector space is constructed by using the word embedding of each word trained in the corpus,that is,the word vector of the feature word,and the word vector based on all the feature words.Then,the related granulation relation is constructed to granulate the word vector space.The result of granulation is that every feature word in the word vector space has a feature word class,which is also called feature word grain.The feature words within each feature class maintain a high degree of similarity.Finally,the feature words in the feature speech class are selected to expand,which effectively alleviates the problem of data sparsity in the short passage.In 2005,the semantic information of feature words was further improved.4.In this thesis,four convolution kernel neural network models are designed.On the basis of the extended features of each text,the most important information in text information is further extracted to complete the task of short text classification.At the same time,when training text features,Through three sets of comparative experiments,we can know that compared with the traditional machine learning classification algorithm,the method proposed in this thesis has better results in the classification effect,and initializes the text features by word embedding.Compared with the artificial random initialization of text features,the classification effect of this method is better than that of the traditional convolution neural network model.At the same time,the classification accuracy of this method is better than that of the traditional convolution neural network model.At the same time,The experimental parameters and the experimental results are analyzed in detail.
Keywords/Search Tags:short text classification, granular computing model, natural language processing, convolution neural network, word embedding
PDF Full Text Request
Related items