Font Size: a A A

Research On The Dimensionality Reduction And Classification Algorithms In Multi-label Learning

Posted on:2015-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:K YanFull Text:PDF
GTID:2298330422472098Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
Multi-label learning comes from text classification, and many real-world problemsbased on machine learning fall into the category of multi-label learning. Different fromtraditional supervised learning methods which assume that each instance is associatedwith only one class label, one instance in multi-label learning usually belongs tomultiple labels simultaneously. Numerous original features should be sampled toenhance the accuracy of multi-label learning, which results in ‘curse of dimensionality’problem. The accuracy of learning algorithms will be severely degenerated due to thisproblem. Thus, how to obtain effective low-dimensional data from high-dimensionalspace performs a significant role in enhancing the accuracy of classification. For theproblems of multi-label classification, the main contributions of this thesis aresummarized as follows:(1) Multi-label learning, dimensionality reduction methods and manifold learningalgorithms are introduced. Manifold learning algorithms can preservegeometricstructure of the local patches when high-dimensional data is mapped to alow-dimensional space. However, the number of neighbors of locally linear embeddingalgorithm is fixed, which could not avoids smoothing or eliminating elimination ofsmall-scale structure as well as false division of dividing the continuous manifold intoirrelevant sub-manifolds. Thus how to select the number of neighbors is significant.(2) The condition in multi-label learning, which could not use the latentinformation of unlabeled data, may degenerate the accuracy of multi-label learning. Inreal scenario, only few high-dimensional multi-label data are labeled. In order toeliminate the redundant feature effectively and use the latent information of unlabeleddata, and obtain the low-dimensional manifold structure, semi-supervised learningmethods should be adopted. To make full use of the supervised information of labeledinstance and the statistics information of unlabeled instance, and to calculate the propernumber of neighbors, an effective dimensionality reduction algorithm named VariableK-Nearest Semi-Supervised Locally Linear Embedding (VKSSLLE) is proposed.(3) In order to enhance the accuracy of multi-label learning, an effectivemulti-label classification algorithm named Variable K-Nearest Semi-Supervised LocallyLinear Embedding-Naive Bayes Classifier (VKSSLEE-NBC) is proposed, which adoptVKSSLLE algorithm to obtain the low-dimensional manifold structure embedded in high-dimensional space, and adopt naive Bayes classifier to implement multi-labelclassification. Different dimensionality reduction algorithms are respectivelyincorporated with multi-label naive Bayes classifier respectively to solve multi-labellearning problem. Experimental results on artificial dataset and two real-world datasetsshow that VKSSLLE_NBC algorithm can effectively enhance the accuracy ofmulti-label learning.
Keywords/Search Tags:Multi-label classification, Multi-label dimensionality reduction, NaiveBayes classifier, Manifold learning, Semi-supervised learning
PDF Full Text Request
Related items