Font Size: a A A

The Sparse Methods For Multi-Label Classification

Posted on:2015-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z J MaFull Text:PDF
GTID:2298330431493442Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Classification is one of the hot researches in data mining. In the field of traditional classification, each instance is assumed to belong to one class label. However, in the real application, each instance could be associated with multiple labels. For example, a news reporting Brazil’s World Cup can be labeled as "sports meet","football" and "Brazil". According to different purposes, a computer has many functions such as "video and audio","scientific research" and "shopping online". These problems are called multi-label problems. Multi-label classification has been widely applied to many fields such as text classification, information retrieval and bioinformatics. However, there are more challenges in multi-label classification than traditional one. Firstly, in multi-label classification, a set of labels are not independent from each other, and there are some correlations among them. How to measure and capture the correlations in the label space for improving prediction is an open issue. Furthermore, similar to traditional single-label classification, multi-label classification also suffers from high-dimensional data. The high dimensionality of data exists in not only instance space, but also label space. Particularly, with the increase of the number of labels, the space of label variables often becomes sparse. This has brought both challenges and opportunities to multi-label learning.Specific to these challenges existing in multi-label learning, this thesis proposes three algorithms based on the improvement of different kinds of partial least squares regression (PLSR) models. Theoretical analysis and simulation experiments show that the three algorithms can obtain effective results of classification.Due to singular value decomposition (SVD) can extract the important information of matrix space, we propose an algorithm for multi-label classification called SPMD. SPMD can perform dimension reduction and regression analysis for multi-label data simultaneous. Firstly, the labels are taken as a whole to exploit the label correlation, and then the score vectors of instance space and label space are computed by SVD. Finally, based on PLSR, the classification model for multi-label is constructed.Due to that Ridge regression can handle the multi-collinearity problems, we present an algorithm for multi-label classification named RPLS-DA, where DA means discriminant analysis. An l2-norm penalization is exerted on PLS-DA to tackle with the problem of "large p, small n" caused by high-dimensional data.We improve the Nonlinear Iterative Partial Least Squares algorithm (NIPALS) by the sparse model named LASSO, and propose an algorithm for multi-label classification called LNMD. LNMD aims at performing dimension reduction and feature selection at the same time, and then the label correlations are considered to design the classification model for multi-label data. Furthermore, LNMD is a new sparse method for dimension reduction.
Keywords/Search Tags:multi-label classification, SVD, ridge regression, sparse learning
PDF Full Text Request
Related items