Font Size: a A A

The Research On Incomplete Label Distribution Feature Selection Based On Granular Computing

Posted on:2023-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:P DongFull Text:PDF
GTID:2568306803962739Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the context of the era of big data,the collected data is increasingly rich in semantics and forms.As a widely discussed problem in the field of machine learning,the problem of label ambiguity has received more attention in recent years.Multi-label learning is the mainstream paradigm for resolving label ambiguity problem in the current.In multi-label learning,each instance can belong to multiple labels at the same time,and labels associated with the same instance possess the same importance.Obviously,multi-label learning is still unable to answer the question "How well do different labels describe the same instance?",which also limits its breadth in dealing with label ambiguity.Therefore,label distribution learning,as an extension of multi-label learning,plays an increasingly important role in handling label ambiguity.However,in the real world,the annotation information of label distribution data may be incomplete,and the existing complete methods cannot be directly used to process these data.In addition,with the development of data collection and preservation technology,data in all walks of life tends to be high-dimensional,and excessive dimensionality may bring more challenges to data classification.Inspired by this,from the perspective of granular computing,this paper focuses on the two problems of "incomplete label" and "curse of dimensionality" in label distribution learning.The main research work is as follows:1.Aiming at the situation that the labels of some samples in the label distribution data are completely missing,based on the idea of local rough set,a new label distribution local rough set model is proposed.The model utilizes neighborhood relation to granulate the feature space and the label space respectively,considers the correlation between features and the correlation between labels,and takes the correlation between features and labels into consideration by constructing a new approximation set.Based on the label distribution local rough set model,a new heuristic feature selection algorithm is designed to select a relatively optimal feature subset.The effectiveness of the presented algorithm can be verified via a series of comparative experiments and statistical test analysis.2.For the case where the labels of some samples in the label distribution data are partially missing,the neighborhood tolerance relationship is used to directly deal with the incomplete data without recovering the missing label,which avoiding the interference of noisy information,while considering the correlation between features and the correlation between labels.Inspired by neighborhood discrimination index,a new measure called neighborhood-tolerance discrimination index is constructed to evaluate the distinguishment ability of the feature subset,and a new incomplete label distribution feature selection algorithm based on neighborhood-tolerance discrimination index is proposed.The algorithm can be directly used to process label distribution data without discretization,which reduces the information loss in the process of discretization.Experimental results on multiple label distribution datasets indicate that the proposed algorithm is feasible and effective.
Keywords/Search Tags:Granular computing, Feature selection, Label distribution learning, Local rough set, Neighborhood discrimination index
PDF Full Text Request
Related items