Font Size: a A A

Research On Multi-label Feature Selection And Classifier Chains Algorithms

Posted on:2019-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:M Q TianFull Text:PDF
GTID:2428330578972068Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of machine learning,the classification problem,as an important part of machine learning,has been widely studied and applied.The classification problem usually refers to the single-label classification,which divides unknown instances into a single category.However,many application scenarios in the real world belong to multi-label,in which the instances are attributed to multiple categories at the same time.Multi-label classification is firstly applied to text classification,and it has been studied and applied in many scenarios,such as image labeling,emotions of music classification,bioinformatics,information retrieval and so on.The multi-label feature selection algorithm and the multi-label classification algorithm are two important parts of the multi-label research field.Because of the characteristics of multi-label,the feature selection algorithm for multi-label is more complex than for the single,and the existing multi-label classification algorithms still have great improvement space.In this thesis,two aspects,the multi-label feature selection algorithm and the multi-label classification algorithm,are studied.The main work of this thesis is as follows:(1)A multi-label feature selection algorithm based on genetic algorithm and Maximum Correlation Minimum Redundancy(MLFS-GM)is proposed.Based on the genetic algorithm and the strategy of maximum correlation minimum redundancy,MLFS-GM considers the correlation among labels,redundancy among features,and correlation between feature and label set,and uses the mutual information in information theory to model the correlation and redundancy,thus giving the fitness function of the genetic algorithm.Experiments on multiple multi-label public datasets show that this algorithm is superior to the GA-ML-CFS algorithm that also empolys with the genetic algorithm as well as the MLFSIE algorithm that uses the information gain modeling correlations between the feature and label set on most evaluation indexes.(2)Aiming at the problems of error propagation and the random generated chain order in traditional classifier chain method,the Classifier Chains for Multi-Label Classification based on Label Set Partition and Greedy Strategy(CC-LPGS)is proposed.The CC-LPGS method is composed of two steps.Firstly,the correlation between the labels is modeled according to mutual information,the correlation graph and the correlation matrix of label set are constructed by using the symmetric uncertainty,then the NJW spectral clustering algorithm can be used to cluster the label set.Secondly,for each label subset,a greedy search strategy is used to iteratively generate the complete classifier chain of the label subset.The training set is randomly divided into two parts:the construction set and the evaluation set.The construction set is used to train the current candidate classifier.The evaluation set is used to evaluate the performance of the current candidate classifier.In each step one classifier with the highest score is added to the sub chain,and the whole classifier chain is generated iteratively.Experiments on multiple multi-label public datasets show that this algorithm is superior to the traditional classifier chain algorithm on all evaluation indexes,and is superior to the other multi-label classifier algorithms on partial indexes.
Keywords/Search Tags:Multi-label classification, Multi-label feature selection, Classifier chains, Spectral clustering, Genetic algorithm
PDF Full Text Request
Related items