Font Size: a A A

Feature Selection Method Research For Multi-label Classification

Posted on:2016-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q MaFull Text:PDF
GTID:2308330464964468Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Classification denotes that the labels of unknown samples are predicted by a model trained on some known samples. According to the number of labels on sample, classification is divided into single-label and multi-label classification. In multi-label classification, one sample is probably associated with multiple labels, and different labels sometimes are overlapped. Theoretically, the redundant and irrelevant features increase the computation complexity and lead to poor performance of classification. Therefore, feature selection plays an important role in multi-label classification.Two filter-based multi-label feature selection methods are proposed in this thesis: (1) a multi-label feature selection method based on the quadratic programming form and Frank-Wolfe optimization algorithm (QPFS-FW); (2) a multi-label feature selection method based on the Hilbert-Schmidt independence criterion and genetic algorithm with control strategy (CGA-HSIC).For QPFS-FW, the quadratic programming form minimizes the feature-feature redundancy and maximized the feature-label relevance simultaneously, and the Frank-Wolfe algorithm and its two special cases solve the quadratic programming problem efficiently. In experiments, we compare QPFS-FW with three feature selection methods on ten datasets. Experiments show that our method runs averagely about four times faster than primal method, achieves a fewer features and performs better than other methods.For CGA-HSIC, the Hilbert-Schmidt independence criterion evaluates the dependence between features and labels, and the genetic algorithm with control strategy adjusts feature numbers in each seeking for the optimal feature subset. In experiments, the kernel parameter is obtained with Emotions dataset. Then, the algorithm convergence is demonstrated on Emotions and Plant datasets. In comparison of algorithms, we show performance of CGA-HSIC against three feature selection methods based on four datasets.
Keywords/Search Tags:multi-label classification, multi-label feature selection, quadratic programming, Frank-Wolfe method, genetic algorithm, Hilbert-Schmidt independenee criterion
PDF Full Text Request
Related items