Research On Multi-label Feature Selection And Classifier Chains Algorithms

Posted on:2019-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:M Q Tian

Full Text:PDF

GTID:2428330578972068

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In recent years,with the rapid development of machine learning,the classification problem,as an important part of machine learning,has been widely studied and applied.The classification problem usually refers to the single-label classification,which divides unknown instances into a single category.However,many application scenarios in the real world belong to multi-label,in which the instances are attributed to multiple categories at the same time.Multi-label classification is firstly applied to text classification,and it has been studied and applied in many scenarios,such as image labeling,emotions of music classification,bioinformatics,information retrieval and so on.The multi-label feature selection algorithm and the multi-label classification algorithm are two important parts of the multi-label research field.Because of the characteristics of multi-label,the feature selection algorithm for multi-label is more complex than for the single,and the existing multi-label classification algorithms still have great improvement space.In this thesis,two aspects,the multi-label feature selection algorithm and the multi-label classification algorithm,are studied.The main work of this thesis is as follows:(1)A multi-label feature selection algorithm based on genetic algorithm and Maximum Correlation Minimum Redundancy(MLFS-GM)is proposed.Based on the genetic algorithm and the strategy of maximum correlation minimum redundancy,MLFS-GM considers the correlation among labels,redundancy among features,and correlation between feature and label set,and uses the mutual information in information theory to model the correlation and redundancy,thus giving the fitness function of the genetic algorithm.Experiments on multiple multi-label public datasets show that this algorithm is superior to the GA-ML-CFS algorithm that also empolys with the genetic algorithm as well as the MLFSIE algorithm that uses the information gain modeling correlations between the feature and label set on most evaluation indexes.(2)Aiming at the problems of error propagation and the random generated chain order in traditional classifier chain method,the Classifier Chains for Multi-Label Classification based on Label Set Partition and Greedy Strategy(CC-LPGS)is proposed.The CC-LPGS method is composed of two steps.Firstly,the correlation between the labels is modeled according to mutual information,the correlation graph and the correlation matrix of label set are constructed by using the symmetric uncertainty,then the NJW spectral clustering algorithm can be used to cluster the label set.Secondly,for each label subset,a greedy search strategy is used to iteratively generate the complete classifier chain of the label subset.The training set is randomly divided into two parts:the construction set and the evaluation set.The construction set is used to train the current candidate classifier.The evaluation set is used to evaluate the performance of the current candidate classifier.In each step one classifier with the highest score is added to the sub chain,and the whole classifier chain is generated iteratively.Experiments on multiple multi-label public datasets show that this algorithm is superior to the traditional classifier chain algorithm on all evaluation indexes,and is superior to the other multi-label classifier algorithms on partial indexes.

Keywords/Search Tags:

Multi-label classification, Multi-label feature selection, Classifier chains, Spectral clustering, Genetic algorithm

PDF Full Text Request

Related items

1	An Improved Multi-Label Classifier Chain Algorithm Via Label Space Correlation
2	Research On The Multi-label Feature Selection And Classification Methods With The Label Correlations
3	Feature Selection Method Research For Multi-label Classification
4	Research On Multi-label Classification Algorithm Based On Label Relationship
5	Research On Several Issues Of Multi-Label Feature Representation
6	Research On Multi-label Classification Based On Classifier Chains
7	Research On Multi-label Learning And Algorithms Based On Data And Label Correlations
8	A Study Of Feature Selection For Multi-Label Classification
9	Based On Decision Relevance Multi-label Classification And Feature Selection Algorithm
10	Research On Multi-label Feature Selection Algorithms Based On Random Search Strategy