Text Categorization Of High Dimensional Imbalanced Data Based On Depth Label Correlation Mining

Posted on:2018-11-13

Degree:Master

Type:Thesis

Country:China

Candidate:X F Jie

Full Text:PDF

GTID:2348330569486408

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

The traditional text classification assumes that each document is associated with only one category.However,in real-world text categorization tasks each document usu-ally has multiple semantic meanings.Thus,multiple labels are required to accurately describe a text document.Multi-label text categorization is an important method to solve the multi-semantic text classification problem as it can precisely and effectively present the complicated semantic meanings of documents.Multi-label learning has been a prominent topic in the text categorization paradigm.However,as a more general text categorization method,multi-label text categorization method usually requires more complex classification models and is more challenging to solve.The three difficulties in multi-label text categorization method can be summarized as following three aspects: 1)how to improve the efficiency and accuracy of multi-label learning algorithms in processing high-dimensional dataset;2)how to effectively explore and utilize label correlations;3)how to deal with the imbalance problem in multi-label text categorization.Aiming at providing effective and efficient solutions to multi-label text categorization tasks,the research works in this thesis can be summarized as following two aspects.1.Multi-label text data usually has the characteristics of high dimensionality,sparse feature space,low-similarity among same classes.In order to solve multi-label text categorization problem effectively,the dimensionality of text data needs to be reduced so that the accuracy of classification can be improved and the complexity of classification can also be decreased.To this end,this thesis introduces a feature transforming method based on fuzzy similarity.The fuzzy similarities between features and labels are computed and utilized to transform the high-dimensional text documents to lower dimensional relevance vectors.2.For the imbalance problem in multi-label classification,a two-stage multi-label learning algorithm is proposed.This algorithm divides all labels into two groups,i.e.imbalanced labels and common labels,based on the imbalance ratios of labels.In the learning process of the first stage,multi-label hypernetwork model is trained to produce basic predictions for all labels.The learning in the second stage is aimed at improvingthe classification performances on imbalanced labels with extra information provided by the correlations between common labels and imbalanced labels.Experimental results are conducted on eight multi-label text dataset to verify the effectiveness of the proposed methods.Firstly,in order to verify the effectiveness of the proposed dimensionality reduction method,the classification results of BR-SVM,CLR and ECC on original data sets are compared with the classification results on data sets after dimensionality reduction,respectively.Secondly,the classification results of the proposed methods are also compared with that of the BR-SVM,MLKNN,CLR,ECC,RAkEL,and COCOA to verify the effectiveness in dealing with class-imbalance problem.The experimental results demonstrate that the proposed method achieves comparable classification performances in dealing with high dimensional,class-imbalanced text categorization problems against many state-of-the-art mutli-label learning methods.

Keywords/Search Tags:

multi-label classification, evolutionary hypernetwork, multi-label hyper-network, label correlations, imbalanced data

PDF Full Text Request

Related items

1	Parallel Multi-label Evolutionary Hyper-network On Spark
2	Research On Multi-label Classification Algorithm With Label Correlations
3	Hierarchical Multi-label Integrated Chain Evolutionary Hypernetwork
4	Research On Multi-label Learning Algorithms Based On Samples And Label Correlations
5	Research On Multi-label Learning And Algorithms Based On Data And Label Correlations
6	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
7	Learning Label Correlations For Multi-label Classification
8	Research On Multi-Label Learning Based On Label-Specific Features And Label Correlations
9	Research On Multi-label Classification Algorithms Based On Samples And Property Analysis
10	Multi-label Learning Algorithms Based On Local Pairwise Label Correlations And Its Application In Zhihu