Font Size: a A A

Multi-Label Feature Selection And Its Label-Specific Acquisition Algorithm

Posted on:2021-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:D D ZhaoFull Text:PDF
GTID:2428330626960970Subject:Statistical information technology
Abstract/Summary:PDF Full Text Request
In recent years,big data technology and artificial intelligence technology have developed rapidly.Meanwhile,it has also promoted the development of multi-label learning,which has gradually been listed as a key research topic by many scholars and experts and achieved good results in this regard.Among them,multi-label feature selection has been paid more and more attention in data mining and machine learning,and a large number of algorithms have been proposed to realize dimensionality reduction of the feature space,and applied in a variety of fields successfully.The purpose of multi-label feature selection is to achieve feature dimensionality reduction,and select features with high discriminative ability to maximize correlation and minimize redundancy.Different from multi-label feature extraction,feature selection selects features from the original feature space without any transformation,and the physical meaning of the original features is well preserved.In terms of readability and interpretability,the multi-label feature selection algorithm has become the focus of many researchers.In most feature selection algorithms,information entropy and other methods are mainly used to determine the correlation,and conditional probability is used to determine the redundancy.These methods not only require a priori knowledge,but also more complicated calculations.And when performing multi-label feature selection,the label has its own unique attributes,and these unique attributes have a strong discriminating ability for the label.Therefore,strengthening the research on label-specific can carry out multi-label learning more efficiently.Based on the above problems,this thesis proposes two feature selection algorithms.The main contents are as follows:(1)In this thesis,the membership degree of rough set and Kendall correlation coefficient are used for feature selection.The characteristic of rough set computing does not need prior knowledge to analyze the data,find the knowledge of hidden and reveal the underlying laws.In this thesis,the membership degree of rough set is used to calculate the correlation between the feature and the label space,and the feature with the highest correlation is stored in the selected feature subset.Then using the Kendall correlation coefficient to calculate the redundancy between the selected feature and the unselected feature,and the feature with the largest difference between correlation and redundancy is stored in the selected features.After each feature is stored,the redundancy is recalculated until all features are stored in the selected features.Finally,in the selected feature subset,the first k features are taken as the last reduced feature subset for training and classification test.The experimental results ofmultiple data sets illustrate the effectiveness of the algorithm.(2)In this thesis,sparse representation of features is firstly,then,the size of mutual information of sparse features is calculated.Labels have their own unique attributes,namely,label-specific,which can be sparsely expressed when selecting features.However,there may be some redundancy for features processed through label-specific.In this thesis,combine information entropy to calculate the mutual information of all features in the new feature space,and rank the features according to the size of the mutual information,and take the top 90% as the final feature subset.The multi-label data set is tested in the research,and the final test results show that the algorithm reflects strong feasibility.
Keywords/Search Tags:multi-label learning, feature selection, label-specific, membership degree, kendall, mutual information
PDF Full Text Request
Related items