Font Size: a A A

Research On Semi-Supervised Multi-Label Feature Selection Algorithm

Posted on:2020-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:L Q WangFull Text:PDF
GTID:2428330590986881Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the field of machine learning,traditional supervised learning assumes that a learning object corresponds to only one concept label.However,in real life,a learning object may be associated with multiple concept labels at the same time.For example,a movie can be marked as sci-fi,action,and the United States at the same time;a picture may also be tagged with a log house,a tree,a lawn,a path and so on.Multi-label learning is a learning framework for studying such tasks,and attracting to a great deal of researchers.However,there are two problems with existing multi-label learning algorithms: on the one hand,the number of labels is much more and the semantic information is complex,which spend a lot of manpower and time tagging multi-label data,and it is difficult to obtain a large amount of labeled data;on the other hand,the feature set of multi-label data presents a high dimensionality problem.The irrelevant and redundant features will damage the generalization performance of classification model.Therefore,it is necessary to reduce the dimension of the high-dimensional multi-label data.In this paper,we propose a semi-supervised multi-label feature selection(SSMLFS)algorithm based on the above two problems.The basic idea is that we evaluate features based on between the dependency of the original feature description and the related labels and the local structure retention ability under the semi-supervised learning framework.The main contents are as follows:Firstly,based on HSIC(Hilbert-Schmidt Independent Criterion),we calculate the Hilbert-Schmidt cross-covariance operator norm on the RKHS(Reproducing Kernel Hilbert Space)domain,and obtaining the independence criterion,that is,the HSIC empirical estimate.And we maximize it regarding as an optimization goal.Secondly,we consider labeled and unlabeled data at the same time.At the beginning,we construct a sample-based adjacency graph,and then maximize the local structure of the samples.Finally,we calculate the importance of each feature.The experimental results based on six different evaluation criteria on six datasets verify the effectiveness of the SSMLFS algorithm.
Keywords/Search Tags:multi-label learning, multi-label feature selection, semi-supervised learning, Hilbert-Schmidt independent criterion, locality preserving projection
PDF Full Text Request
Related items