Font Size: a A A

Research On Feature Selection For Weakly Supervised Multi-Label Data

Posted on:2021-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y XuFull Text:PDF
GTID:2428330611994931Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of big data,the sampled data sets keep explosive growth in their dimensionalities and present multiple concepts from the perspective of labels.As a result,data sets show two typical characteristics,i.e.,the high-dimensionality of data and the label incompleteness.In the machine learning tasks,processing highdimensional and incomplete data directly is accessible to induce ”Curse of Dimensionality” and ”Algorithm Failure”,thus feature selection for weakly supervised multi-label data has been introduced.Feature selection is an effective mean to tackle these issues.Feature selection in the weakly supervised multi-label learning scenarios has attracted more attention in recent years due to its extensive real application.There still exist two complicated problems hard to be solved:(1)when processing semi-supervised multi-label data,the existing feature selection approaches devote attention to either of two issues,namely,alleviating negative effects of imperfectly predicted labels and quantitatively evaluating label correlations,exclusively for semisupervised or multi-label scenarios.None of these approaches extract intrinsic label correlations to guide feature selection;(2)when processing weakly labelled data,the existing approaches focus either on recognizing missing labels or on eliminating the negative effects of missing labels for feature selection.These approaches either superficially consider the missing labels as negative or indiscreetly impute them with some predicted values,which may either overestimate unobserved labels or introduce new noises in selecting discriminative features.Specific to the above two problems in the existing weakly supervised multi-label feature selection approaches,two effective feature selection models are designed and implemented on the basis of feature-label space consistency and generative probabilistic framework respectively.First,a new space consistency-based feature selection model is designed in this paper for simultaneously tackling the problem of the negative effects of imperfectly predicted labels as well as quantitatively label correlations evaluation for the existing feature selection approaches.Specifically,correlation information in the feature space is learned based on the probabilistic neighborhood similarities,and correlation information in label space is optimized by preserving feature-label space consistency.This mechanism contributes to appropriately extracting label information in the semisupervised multi-label learning scenario and effectively employing this information to select discriminative features.An extensive experimental evaluation on real-world data shows the superiority of the proposed approach under various evaluation metrics.Second,a new feature selection model from a generative point of view is designed in this paper for solving the problem of the recognition of missing labels and the pitfall of missing labels for the existing feature selection approaches.Concretely,the new model relaxes Smoothness Assumption to infer the label observability,which can reveal the positions of unobserved labels,and employ the spike-and-slab prior to perform feature selection by excluding unobserved labels.Using a data-augmentation strategy leads to full local conjugacy in our model,facilitating simple and efficient ExpectationMaximization(EM)algorithm for inference.Quantitative and qualitative experimental results demonstrate the superiority of the proposed approach under various evaluation metrics.
Keywords/Search Tags:Feature selection, Multi-label learning, Weakly supervised learning, Label correlation, Sparsity learning
PDF Full Text Request
Related items