Font Size: a A A

Research On Multi-label Feature Selection Algorithms And Their Applications

Posted on:2017-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:J YinFull Text:PDF
GTID:2308330488496677Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In multi-label classification, any single instance could be related to several classes simultaneously and the class labels are no longer mutually exclusive. Multi-label classification is of growing importance due to its applications in numerous domains, such as text categorization, scene classification, gene function prediction and music emotions annotation. In general, a large number of features are included in multi-label data sets. Unfortunately, many irrelevant, redundant and noisy features lead to the poor performance of multi-label classification algorithms. Feature selection aims at reducing features dimensionality and improving effectiveness of learning methods. Thus, the feature selection procedure is an essential pre-processing step for multi-label classification. Currently, multi-label feature selection methods can be broadly categorized into three groups. One is filter methods which is classifier-independent and the other two are wrapper ones and embedded ones which are classifier-dependent. Focusing on filter and wrapper methods, and their applications, our work can be divided into three parts:1. Constructing a filter-based multi-label feature selection algorithm (QPFS-RCDM). We extend single-label quadratic programming feature selection (QPFS) with a unit simplex constraint to build a strictly convex QP problem with non-negative constrains only, and then solve such a model using random coordinate descent method (RCDM) with a linear convergence rate to improve computational efficiency. In experiments, we compare it with four commonly used multi-label feature selection methods on twelve data sets, according to five popular and indicative performance measures. The experimental results demonstrate that QPFS-RCDM could select a high quality feature subset for the fixed number of input features.2. Designing a multi-label wrapper feature selection method based on evolutionary multi-objective optimization algorithm (NSGA-II). The control strategy for the size of subsets is applied so that the number of optimal feature subsets can be pre-determined. In experiments, three existing techniques are considered as comparative methods on four data sets. The experimental results show that our proposed method can achieve good feature subsets as well as improve the classification performance.3. Applying these methods to protein subcellular localization under multi-label settings. For six different species protein datasets constructed by GO information, we compare different feature selection algorithms and validate the performance of QPFS-RCDM algorithm further experimentally. From the perspective of biology, we point out species-specific GO features, find out sub-cellular-specific GO features and thus establish a meaningful biological classification model.
Keywords/Search Tags:Multi-label classification, feature selection, coordinate descent method, quadratic programming, linear support vector machine, multi-objective optimization, protein subcellular localization
PDF Full Text Request
Related items