Font Size: a A A

The Research Of Multi-Label Learning Problem About Feature Selection And Classification

Posted on:2014-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:L L HuangFull Text:PDF
GTID:2248330398979204Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the real world, with the development of Internet and multimedia technology, there are multi-label data that each instance often belongs to multiple categories at the same time. Then more and more information need to be dealt with. Combined with the co-occurrence and relevance of the multi-label data, we perform some works at two aspects respectively:the multi-label feature selection and the multi-label classification to process the information efficiently.Feature selection, a common way for dimensionality reduction, is to select a subset of the most representative or discriminative features from the input feature set. The central requirement is that good feature set contains features that are highly correlated with the class labels, but they are uncorrelated with each other. Various kinds of feature selection methods have been developed to tackle the issue of high dimensionality. ReliefF and F statistic are both classical filter feature selection algorithms whose feature selection process uses the intrinsic characteristics of data to evaluate the selected feature subset. That is to say, the selection is independent of the classifier, as a pretreatment process. Due to the label relevance and co-occurrence, the traditional single-label feature selection algorithm cannot be directly applied to feature selection problem of multi-label data. There are few feature selection algorithms for multi-label data, and the fundamental problems about multi-label feature selection are not solved. Therefore, multi-label feature selection algorithm has become one of current hot topics.In order to overcome the issue that the traditional single-label feature selection algorithm cannot be directly applied to feature selection problem of multi-label data. We firstly improve ReliefF algorithm. For multi-label data, based on label co-occurrence, this algorithm assumed the label contribution value was equal. Combined with three novel methods calculating the label contribution, the updating formula of feature weights was improved. Then a distinguishable feature subset was selected from the original features. The classification experiments demonstrate the proposed algorithm is obviously better than the traditional approaches. According to this idea of how ReliefF is improved, multi-label F-statistic is proposed for multi-label data, and it also utilizes label contribution which is the same with ReliefF.Further more, in current feature selection algorithms for multi-labe data, feature correlations are not taken into account, while in most real-life data, features dimensions are often correlated. Therefore, we establish a new robust feature selection method for multi-label learning. Feature correlation is added into the sparse learning of feature selection so that we can learn the feature correlation and do feature selection simultaneously. An efficient algorithm is introduced with rapid convergence. Experiments on benchmark data sets illustrate that the proposed method outperforms many state-of-the-art feature selection methods.In traditional supervised learning, each instance belongs to one category. But for multi-label data, one label can’t express the whole information of the multi-label instance. And the traditional classification algorithms are not suitable for multi-label data. For the multi-label classification problem, in order to overcome the co-occurrence and relevance of the multi-label data, a novel multi-label classification approach using adaptive linear regression is proposed. In our approach, basing on the classical linear regression theory, the multi-label linear regression is extended. Combined with a number of evaluation criteria of classification, the eventual labels are predicted adaptively. This new method determines the different thresholds for each class depending on the label distribution of original data, incorporates the fixed threshold corresponded to the averages and adaptive thresholds corresponded to the comprehensive evaluation criteria, and reduces the influence of the distribution and noise of original data. Experimental results demonstrate the effectiveness of the proposed algorithm for the multi-label classification problem.
Keywords/Search Tags:multi-label, feature selection, classification, ReliefF, F-statistic, linear regression
PDF Full Text Request
Related items