Font Size: a A A

Semi-Supervised Dimensionality Reduction And Ensemble Learning For Multi-label Classification

Posted on:2011-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:P LiFull Text:PDF
GTID:2178360305994208Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Multi-label classification and its widespread applications have been currently heated research issues in the field of machine learning and data mining, among of which multi-label dimensionality reduction and multi-label ensemble classification are both two problems much worthy of being studied. In the traditional machine learning, researches mainly focus on single label problem in which only one label is assigned to every instance. However, as for us, one sample with several labels, called multi-label problem has attracted lots of our attentions. This thesis has investigated various methods of multi-label classification, semi-supervised learning, dimensionality reduction and ensemble learning, whose applications on different kinds of benchmark and practical datasets are also additionally explored. In the respective of data preprocessing and classifiers ensemble, this thesis concentrates on how to achieve effective dimensionality reduction of high-dimensional multi-label data with semi-supervised learning and how to improve performances of multi-label classification with ensemble learning.In real applications, it would often occur that there are many high-dimensional multi-label data with only a few labeled samples and large numbers of unlabeled samples. Aiming at eliminating redundant features and mining the latent information provided by the unlabeled samples, this thesis technically incorporates semi-supervised learning into multi-label dimensionality reduction and has presented a novel semi-supervised discriminant analysis based multi-label dimensionality reduction algorithm, called MSDA (Multi-label Semi-supervised Discriminant Analysis). The newly developed method attempts to maximize separability among different classes using the graph weighted matrix of sample attributes and the similarity correlation matrix of partial sample labels. Simultaneously, it tries to estimate the intrinsic geometric structure on the low-dimensional data manifold employing unlabeled data. Extensive experiments on general multi-label datasets show that MSDA performs better on several evaluation metrics when compared with other kinds of methods, which demonstrates the effectiveness of the proposed MSDA algorithm.When it comes to the performance of multi-label classification, this thesis has explored the possible way to enlarge diversity of multi-label base learners and proposed a multi-label ensemble algorithm based on soft pairwise constraint projection, called SPACME (Soft PAirwise Constraint projection for Multi-label Ensemble). With regard to this method, the soft pairwise constraint information provided by the data would be resampled so as to build an initial base learner. The generated cannot-link constraint set and must-link constraint set are both used to construct the projection matrix, which is designed to map the original data into a new data representation. Then it tries to iteratively train a number of base learners with the weights update function on the newly produced data, and as a result, the diversity of the base learners are enhanced. And the ultimate outputs of the ensemble are decided by majority voting combining multiply base classifiers. Empirically results significantly indicate the superiority of SPACME, which largely improves the classification accuracy and displays considerable robustness to varied situations.
Keywords/Search Tags:multi-label classification, multi-label dimensionality reduction, semi-supervised discriminant analysis, multi-label ensemble, soft pairwise constraint projection
PDF Full Text Request
Related items