On Multi-label Text Classification Algorithms Based On Deep Learning

Posted on:2018-11-24

Degree:Master

Type:Thesis

Country:China

Candidate:W L Yu

Full Text:PDF

GTID:2428330566998326

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet industry,the generation and dissemination of information has reached unprecedented speed and the amount of data has exploded.The Internet is flooded with a great deal of text,audio,video and other types of data,although text information is the fastest,most massive data source undoubtedly.To manage the massive text information effectively,so that users can search and find quickly,text categorization is the most basic and crucial technology among them.In traditional classification tasks,a sample corpus often corresponds to only one class,and classification algorithms off the shelf can handle these sing label classification problems well.However,in real life,the text data is complex and changeable,and one sample is often associated with more than one category and belongs to more than one topic.For such a multi-label text classification problem,the traditional classification algorithm is difficult deal with.Therefore,it is of great realistic significance to design efficient and accurate multi-class text categorization algorithms and this has attracted more and more attention.Generally,there are two difficulties in the multi-class text classification algorithm.Firstly,the text data features high dimensions,less effective features,sparse and redundant.Secondly,the labels of one sample depend on each other,owning the high-order correlations.The main research content of this subject is to solve the bottleneck encountered in the traditional multi-class text categorization algorithms,to extract the effective features of the text corpus using the autoencoder model,modeling the inter dependencies of labels effectively,then design and implement the ML-LSTM multi label classification algorithm.In view of the sparsity and redundancy of text features,we use the autoencoder and max pooling model AE_P to extract the semantic features of texts effectively.General text data is presented in a vector space model,then the original data dimension is generally the total number of entries in a corpus.The entry of a sample corpus is only a small part,and the effective feature dimension is less with a great sparse attribute.The autoencoder is a non-linear feature extraction model,which can be extracted without supervised information.The effective expression of the original sparse features in low-dimensional space can significantly reduce the sparsity of features.Moreover,max pooling operation can effectively reduce feature redundancy.Experiments show that the features extracted by AE_P algorithm can improve the accuracy of the final classification results.In view of the label correlations,this paper proposes a ML-LSTM model,combining the data feature and label as data-label embedding,furtherly employing four kinds of serialization method,namely,sample clustering,association rules,the frequency method and the random initialization to determine the ranking of embedding.At each time step,employ the long and short term memory network(LSTM)combined with classical classification method to model the embedding.The dependency of labels can be well captured when classifying,and we demonstrate the effectiveness of ML-LSTM.

Keywords/Search Tags:

multi-label, text classification, autoencoder, label correlations

PDF Full Text Request

Related items

1	Research On Multi-label Classification Algorithm With Label Correlations
2	Text Categorization Of High Dimensional Imbalanced Data Based On Depth Label Correlation Mining
3	Research On Multi-label Learning Algorithms Based On Samples And Label Correlations
4	Research On Multi-Label Learning Based On Label-Specific Features And Label Correlations
5	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
6	Learning Label Correlations For Multi-label Classification
7	Research On Label Coding Algorithms For Multi-label Classification
8	Multi-label Learning Algorithms Based On Local Pairwise Label Correlations And Its Application In Zhihu
9	Multi-instance And Multi-label Web Page Classification Research Based On Support Vector Machine
10	Research On Multi-label Learning And Algorithms Based On Data And Label Correlations