Research On Multi-label Short Text Classification Based On Deep Learning

Posted on:2020-08-09

Degree:Master

Type:Thesis

Country:China

Candidate:X L Xu

Full Text:PDF

GTID:2428330599459751

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the vigorous development of network platform,a large number of short text data emerge.Because this kind of data has the characteristics of multi-label and multi-angle,users cannot get target information quickly when browsing short text.Therefore,effective multi-label classification for short text is one of the hot issues in current research.Short text data is characterized by short content,large amount of data and irregular expression.These characteristics lead to many problems in classification,such as noise,non-intensive features,context independence and so on.Because of the increasing number of tags in short text,traditional classification methods cannot meet with the existing needs.Facing with the problems of uneven data distribution and sparse feature of modeling matrix in current short text categorization methods,this paper makes the following contributions:(1)Aiming at improving the problem that traditional feature extraction algorithms cannot effectively extract sparse short text features,this paper proposes a short text feature extraction method based on Word2 vec model.Firstly,the short text is vectorized and processed in two ways.On the one hand,the optimized Word2 vec model is used to reduce the dimension of the vector,and then weighted it using Term Frequency-Inverse Document Frequency(TF-IDF)algorithm.On the other hand,the vector is processed directly by TF-IDF.Then,the vectors processed by these two methods are merged and extracted.Finally,support vector machine(SVM)is used to classify them.Experiments show that this method can extract short text features effectively,and its classification effect is obviously better than other algorithms.(2)Aiming at improving the problem that the traditional multi-label text categorization method can't process the uneven data distribution samples well,and that the traditional neural network method will produce gradient disappearance and gradient explosion,a multi-label short text categorization method based on LGMC model is proposed,which uses Long-Short Term Memory Model(LSTM)to extracts features from text vectors,then uses Gate Recurrent Unit(GRU)to further extract feature vectors,and uses the constructed label tree to classify feature vectors.Experiments show that the performance of the model is better than traditional multi-label classification algorithm and traditional neural network algorithm,and it can effectively classify short text with multi-label.

Keywords/Search Tags:

Short Text, Multi-Label Classification, Feature Extraction, Word2vec, LGMC

PDF Full Text Request

Related items

1	Parallel Multi-Label Text Classification Based On Word2vec
2	Research On Text Multi-label Classification Algorithm Based On Label Correlation
3	Research On Multi-label Classification Method Of Chinese Short Text Based On Multi-dimensional Feature Fusion
4	Research Of Text Classification Based On Word2vec And Self-attention
5	Multi-label Text Classification Based On Long Short-Term Memory
6	Chinese Short Text Analysis Based On Word2vec
7	Research On Short Text Automatic Summarization Algorithm Based On TextRank And Word2Vec
8	Research And Design Of Classification Algorithm Based On Massive Multi-label Text
9	Research On Short Text Clustering Of Social Networks Based On Word2vec
10	Research On Improved TF-IDF Feature Selection And Short Text Classification Algorithm