Font Size: a A A

Research On Multi-label Short Text Classification Based On Deep Learning

Posted on:2020-08-09Degree:MasterType:Thesis
Country:ChinaCandidate:X L XuFull Text:PDF
GTID:2428330599459751Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of network platform,a large number of short text data emerge.Because this kind of data has the characteristics of multi-label and multi-angle,users cannot get target information quickly when browsing short text.Therefore,effective multi-label classification for short text is one of the hot issues in current research.Short text data is characterized by short content,large amount of data and irregular expression.These characteristics lead to many problems in classification,such as noise,non-intensive features,context independence and so on.Because of the increasing number of tags in short text,traditional classification methods cannot meet with the existing needs.Facing with the problems of uneven data distribution and sparse feature of modeling matrix in current short text categorization methods,this paper makes the following contributions:(1)Aiming at improving the problem that traditional feature extraction algorithms cannot effectively extract sparse short text features,this paper proposes a short text feature extraction method based on Word2 vec model.Firstly,the short text is vectorized and processed in two ways.On the one hand,the optimized Word2 vec model is used to reduce the dimension of the vector,and then weighted it using Term Frequency-Inverse Document Frequency(TF-IDF)algorithm.On the other hand,the vector is processed directly by TF-IDF.Then,the vectors processed by these two methods are merged and extracted.Finally,support vector machine(SVM)is used to classify them.Experiments show that this method can extract short text features effectively,and its classification effect is obviously better than other algorithms.(2)Aiming at improving the problem that the traditional multi-label text categorization method can't process the uneven data distribution samples well,and that the traditional neural network method will produce gradient disappearance and gradient explosion,a multi-label short text categorization method based on LGMC model is proposed,which uses Long-Short Term Memory Model(LSTM)to extracts features from text vectors,then uses Gate Recurrent Unit(GRU)to further extract feature vectors,and uses the constructed label tree to classify feature vectors.Experiments show that the performance of the model is better than traditional multi-label classification algorithm and traditional neural network algorithm,and it can effectively classify short text with multi-label.
Keywords/Search Tags:Short Text, Multi-Label Classification, Feature Extraction, Word2vec, LGMC
PDF Full Text Request
Related items