Font Size: a A A

Research On Feature Selection And Multi-label Classification Algorithm

Posted on:2022-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:J L LuFull Text:PDF
GTID:2518306575968649Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,multi-label classification as an important technical means to deal with mass information has been deeply studied and widely used.Due to the variety of multi-label data,the model of multi-label classification algorithm is more complex than that of traditional single-label classification algorithm.At present,there are still problems such as data high dimension,class imbalance,label order and label correlation.In order to improve the performance of multi-label classification algorithm,this thesis conducts research from the following two aspects:1.Aiming at the problems of high dimensionality and class imbalance of multi-label data,a minimum redundant mutual information(MRMI)feature selection algorithm is proposed.The algorithm judges whether the information shared between the two features is redundant through the amount of shared information between the two features.Then,according to the mutual information value between the feature set and the label set,the importance of the feature set to the label set is judged,and the feature with high importance to the label set is selected.The amount of shared information between features is used to reduce the redundant information of the feature set,so as to select the feature that maximizes the importance of the label set and minimizes the redundancy of the feature set.In the experiment,8 data sets in UCI are used as experimental data,and the comparison experiments with different feature selection algorithms on ML-k NN and SVM classifiers prove the effectiveness and feasibility of the method.2.Aiming at the problem that the multi-label classification algorithm does not make full use of the label order and label correlation,a label selection ordered classifier chain(LS-OCC)multi-label classification algorithm is proposed.The algorithm is improved based on the idea of the classifier chain algorithm.First,the labels are sorted to form an ordered classifier chain to reduce the transmission of error information in the chain.And then select the label by calculating the correlation between the labels.Which not only considers the correlation between the labels,but also reduces the information redundancy in the attribute space of the classifier.In the experiment,8 multi-label benchmark data sets in Mulan are used to compare experiments with different classification algorithms.The LS-OCC method has good performance in the three evaluation indexes of accuracy,Hamming loss and Macro-F1,which proves the classification performance of the method.
Keywords/Search Tags:multi-label learning, feature selection, maximum correlation, minimum redundancy, label correlation
PDF Full Text Request
Related items