Font Size: a A A

Expert-based Feature Selection And Missing Multi-label Learning Strategy

Posted on:2021-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:F SongFull Text:PDF
GTID:2428330626460973Subject:Statistical information technology
Abstract/Summary:PDF Full Text Request
Multi-label learning has attracted great public attention in many domains,such as personalized recommendation,document categorization,bioinformatics and so on.For the multi-label learning,each instance may belong to multiple labels simultaneously.The multi-label learning is aimed at obtaining a high-performance classification model,which can calculate the possible labels for any new instance.In the multi-label learning problem,for the feature space,the dimension disaster often occurs due to the problem of too high dimension in the massive data,which leads to the reduction of classification accuracy.However,the dimension reduction of feature space can be realized by feature selection algorithm,so as to improve the classification accuracy and generalization performance.For the label space,the data loss and other phenomena often occur due to the cost or technical limitations in the process of data acquisition,which leads to the absence of some labels.Based on this,this thesis proposes two processing algorithms for these two problems,the main contents are as follows:(1)For the problem that the dimension of feature space is too high.the existing multi-label feature selection algorithms mainly use the minimum redundancy maximum relevance criterion to select features in all feature sets without considering expert features,so the multi-label feature selection algorithm has a long running time and high complexity.But in real life,experts can directly determine the overall prediction direction based on a few or several key features.If we pay attention to and extract this information,it will reduce the computation time of feature selection and even improve the performance of classifier.Based on this,this thesis proposes a multi-label feature selection algorithm based on conditional mutual information of expert features.The algorithm firstly combines the expert features with the remaining features,and then uses the conditional mutual information to obtain a strong to weak feature sequence with the label set.Finally,the subspace is divided to remove the redundant features.(2)For the problem of missing labels in label space,most multi-label learning algorithms default that their label set is complete.However,in the real world,the data information for each instance is not always complete.At present,there are a few label completion algorithms for the missing multi-label learning,and these algorithms ignore the noise interference in the feature space.At the same time,when people label unknown instances,the threshold size of the discriminant function often affects the results,which is most obvious for labels near the threshold.All these factors make it more difficult to use the label correlations under conditions of missing labels.Aiming at the above problems,we propose a missing multi-label learning algorithm with non-equilibrium based on two-level autoencoder.Firstly,the label density is used to enlarge classification margin of the label space.Then a new supplementary label matrix is augmented from the missing label matrix with the non-equilibrium label completion method.Finally,considering the noise problem of feature space,a two-level kernel extreme learning machine autoencoder is constructed to implement the information fusion about features and labels.
Keywords/Search Tags:Multi-label learning, feature selection, missing label, expert feature, extreme learning machine, autoencoder
PDF Full Text Request
Related items