Font Size: a A A

The Research Of Multi-label Feature Selection Based On Mutual Information And Feature Label Relationship

Posted on:2022-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:W MengFull Text:PDF
GTID:2518306485950219Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the era of big data,as a data preprocessing technology,feature selection plays an increasingly important role in the field of machine learning.At present,most feature selection methods are mainly applied to single-label data.However,with the continuous increase of data dimensions and labels,feature selection has been widely applied to multi-label data,and has produced a better classification effect.Some of the traditional multi-label feature selection algorithm mostly measure and the overall collection of correlation,and selecting a set of correlation feature subset.However,there exists a complex among multi-label inside the label data structure relations.If only to measure the correlation between features and labels without considering the structure of the relationship between label,it may leak to choose some important features or miss some important features.At the same time,if only from a single aspect to consider label structure relations,the label set may research is not enough,that influence the selection of the optimal feature subset.In addition,in multi-label data,there are not only some structural relations among labels,but also some complex structural relations among features.Therefore,how to fully consider and combine the structural relations between labels and features to select the optimal feature subset is an important research problem.Aiming at the above problems,this paper studies the multi-label feature selection algorithm in the following three points.Firstly,we put forward a multi-label feature selection algorithm based on label group,LG?MLFS.It considers the correlation structure between labels in label set.The algorithm considers the group structure relation of the label set and divides the related labels into a group.In each label group,each label is assigned an importance weight to the label within that label group.The algorithm selects the features related to each label group respectively,and takes the union of the feature subset related to each label group as the optimal feature subset finally selected by the algorithm.Experimental results show that the LG?MLFS algorithm has better classification performance compared with the comparison algorithm under multiple data sets and evaluation indexes.Secondly,It considers the relationship of label structure from many perspectives,and proposes a multi-label feature selection method,MLSFF,which integrates the multi-viewpoint label structure and feature.According to think in terms of three label structure relations,the algorithm extracts the features of the three different subsets,respectively.By using three kinds of feature subset,the fusion between the feature space is divided into three different importance of feature subspace.Aiming at three different subspace,it sets the selection of three different ratios and choose some low redundancy in each subspace feature.The experimental results show that the MLSFF algorithm selects a group of better feature subsets and achieves better classification results.Finally,we put forward a multi-label feature selection algorithm based on the correlation feature group,CFGFS.It considers the correlation structure among features in the feature set and combining with the label importance structure.The algorithm considers the group structure relations among the features and divides the related features into a feature group.The label collection is divided into important label and non-important label groups by label importance,and the two groups of labels are set with different weights.Combined with two groups of labels with different weights,representative features were selected from each feature group and de-redundancy was carried out.The experimental results show that the CFGFS algorithm can select a group of optimal feature subsets under multiple data sets and evaluation indexes,and produce better classification performance.
Keywords/Search Tags:Machine learning, Multi-label, Feature, Feature selection, Mutual information
PDF Full Text Request
Related items