Font Size: a A A

Research On Feature Selection For Imbalanced Label Density Learning

Posted on:2021-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:C WuFull Text:PDF
GTID:2428330626460968Subject:Statistical information technology
Abstract/Summary:PDF Full Text Request
Multi-label learning has gradually become one of the research hotspots in intelligent fields,such as machine learning,big data and data mining.In multi-label learning,usually the more the number of features in the sample,the more accurate the sample instance is described.With the continuous increase of features,its related redundant features will continue to increase.Since the existence of these features will seriously affect the accuracy of the classifier and even cause misclassification,it is necessary to reduce the dimension of the features.Feature selection is an effective dimensionality reduction method.It can first select features with high correlation and low redundancy as feature subsets,and then perform classification training and prediction on them.In a sample instance,whether there is a label is closely related to the characteristic attribute of the sample instance.At the same time,because the imbalance of label is widespread in the field of intelligence,it will make the degree of description of sample instances by different labels be different.At present,there is a lack of research on imbalanced data.The traditional processing method is to first process imbalanced data into balanced data by sampling or resampling,and then study it.However,this processing method often changes the attributes of the original data set and loses part of the information,resulting in a decrease in the accuracy of the classifier.Most of the existing research is on the imbalance under a single label,but there is little research on the imbalance under multiple label.Based on the problem of imbalance under multiple labels,this thesis proposes two improved algorithms,the main research work is as follows:(1)At present,most feature selection algorithms do not consider that there may be some differences in the degree of description of samples by different labels.To solve this problem,a multi-label feature selection algorithm with imbalance label otherness(MSIO)is proposed.First,the frequency distribution of positive and negative labels under different labels is used as the weight of the label and added to the feature selection process,and then the traditional information entropy calculation method,finally obtains a more efficient feature sequence.Fully validated on multiple multi-label benchmark data sets,the experimental results and statistical hypothesis tests show that the algorithm is effective.(2)Most of the existing feature selection algorithms are proposed based on the assumption that the distribution of labels is roughly balanced,and few of them consider the problem of imbalanced distribution of labels.To solve this problem,a multi-label feature selection algorithm with weakening marginal labels(WML)is proposed.First,the frequency ratio of positive and negative labels under different labels is calculated as the weight of the label,and then weaken the edge labels by weighting.In the process of feature selection,the relevant information of the label space is added to obtain a more efficient feature sequence,which improves the accuracy of the description of the sample by the label.By analyzing the experimental results,the algorithm proposed in this thesis has certain advantages.At the same time,stability analysis and statistical hypothesis testing further prove the effectiveness of the algorithm.The MSIO algorithm and WML algorithm proposed in this thesis add the information contained in different labels to the classification process,which not only retains the original attributes of the feature space,but also improves the accuracy of the classifier.The experimental results on multiple sets of benchmark data sets show that the proposed algorithm has certain advantages over other comparative multi-label learning algorithms.The stability analysis and statistical hypothesis test further prove the effectiveness and rationality of the algorithm in this thesis.
Keywords/Search Tags:Multi-label learning, feature selection, information entropy, label otherness, imbalanced data, marginal label
PDF Full Text Request
Related items