Font Size: a A A

An Improved Multi-Label Classifier Chain Algorithm Via Label Space Correlation

Posted on:2020-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y YangFull Text:PDF
GTID:2428330590471693Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The traditional Multi-Label Classifier Chain algorithms have some limitations,such as the randomness of the initial label chain sequence,the unstable classification effect,and being not able to effectively deal with large-scale multi-label data sets.In this thesis,An Improved Multi-Label Classifier Chain Algorithm Based on Label Space Correlation(LSCC)is proposed.Combining the advantages of label space dimensionality reduction and LSCC,this thesis proposes a method called Label Space Dimension Reduction Algorithm via LSCC(LSDRCC).This thesis main contents as follows:1.Multi-Label Classifier Chain algorithms assumes that the label at position k is only associated with the first k-1 label.In fact,randomly initialized tag chains do not satisfy that assumption.In this thesis,LSCC is proposed for feature selection and label chain sequence optimization of large-scale multi-label data sets.Firstly,the distance formula is defined,and the label space is partitioned by clustering.The prediction results are obtained by constructing several optimized local classifier chains in parallel by approximate optimal solution.Experiments on 12 multi-label datasets and 3 different types of base classifiers in 5 different domains show that LSCC has better performance in classification accuracy and time-consuming compared with existing algorithms.2.In this thesis,a feature selection method based on local label cluster mutual information is proposed for improve the adaptability of Chain-based multi-label algorithms to large-scale multi-label data sets.The relative mutual information between local labels and features is used to filter out the local feature subset of each label cluster.3.In this thesis,LSDRCC is proposed.It optimizes the label space dimension reduction from Label Coding,Model Training and Hidden Label Decoding.It reduces the time-consuming of classification tasks and improves the adaptability of the improved classifier chain algorithm to large-scale data sets.At the same time,this paper implements the algorithm based on Spark parallel computing framework,which makes full use of the advantages of memory computing.
Keywords/Search Tags:multi-label classification, classifier chain, label clustering, feature selection, Spark
PDF Full Text Request
Related items