Font Size: a A A

Research Of Feature Selection Algorithm For Weak Label Learning

Posted on:2020-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:L JiangFull Text:PDF
GTID:2428330599456796Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a hot spot of current machine learning research,multi-label learning has been widely applied to the fields of automatic content annotation,bioinformatics and information retrieval,etc.However,with the advent of the Internet information age,a large number of high-dimensional data are generated,and multi-label learning suffers from the “curse of dimensionality”.The features of the samples are so redundant,that makes the model learning too much parameters and is easy to fall into the risk of over-fitting,thus reducing the accuracy of multi-label classification.Feature selection is an important method to effectively mitigate such problem.Feature selection methods try to select a relevant subset of features from a original features through certain strategies.The number of selected features is generally much smaller than that of the original features.Feature selection can reduce the difficulty of learning tasks and improve the performance of the learner.The supervised multi-label feature selection algorithm generally assumes that the labeled samples is abundant and the label information is complete and then selects the feature by considering the correlation between features and labels.However,we typically have a large number of data but a few of them have complete labels in our real scenarios.As a result,supervised multi-label feature selection algorithm often uses limited labeled data with complete labels while ignoring a large number of unlabeled data.The unsupervised feature selection algorithm directly selects the relevant feature subsets but ignores the relationship between the labels and the features.In order to leverage labeled and unlabeled data to effectively select features,we proposed two multi-label feature selection algorithms based on weakly-supervised information,which are described as follows.(1)Semi-supervised feature selection algorithm based on sparsity regularization and dependence maximization(FSSRDM).FSSRDM is suitable for situations where there are only a small number of samples with complete labels and a large number of samples without labels.FSSRDM combines the missing labels prediction with the feature selection into an unified framework.The least square model is firstly used to evaluate the relationship between labels and features.As well as that,the predicted labels of unlabeled data could be obtained by the model.Secondly,the Hilbert-Schmidt independent criterion is used to characterize the dependence of the feature and the label space and to maximize this dependence.Next we adopt the L-2,1 sparse regularization term to obtain the regression coefficient matrix.Finally,we select the relevant feature subset through the sparse coefficient matrix.Compared with the state-of-art supervised feature selection and semi-supervised feature selection algorithm,FSSRDM can more effectively select relevant feature subsets and improve the performance of multi-label classification.(2)Weakly supervised feature selection algorithm based on label space dimension reduction(WSFSLR).WSFSLR is suitable for situations where each instance of the original data has a different degrees of missing labels.In order to reduce the impact of the missing labels on the sample-label space mapping,WSFSLR decomposes the original label space by non-negative matrix factorization to obtain a low-dimensional label space,then adopts the least squares model to measure the relation between the features and labels in low-dimensional label space.In addition,the Graph Laplacian matrix is used to ensure that the similar samples give similar outputs.Finally,the sparse coefficient matrix is obtained by the L-2,1 sparse regularization term to select the features.We compare the traditional supervised and semi-supervised feature selection method and feature selection method based on missing labels on the data sets with larger label space.WSFSLR can effectively reduce the dimension of the data set and improve the accuracy of multi-label classification.
Keywords/Search Tags:Multi-label learning, Feature selection, Weakly-supervised learning, Sparsity regularization, Label space reduction
PDF Full Text Request
Related items