Font Size: a A A

Ensemble Learning Based Partial Label Learning Algorithm

Posted on:2020-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LuFull Text:PDF
GTID:2428330599459805Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of big data era,it is generally accepted that use of machine learning to process and analyze big data has been the most effective method.In various fields,however,the massive characteristics of data and the specificity of data tags are presented,which makes traditional machine learning methods unable to process data.Partial label learning is the product of a weak supervised learning framework dealing with the specificity of data markers.The difference between this framework and traditional weakly-supervised learning framework lies in instance tag submerged in many candidate tag sets which is unclear and single.Such an instance is quite common in reality,and partial label learning framework is essentially an extension of traditional weakly-supervised learning.Partial label learning is a special kind of multi-class classification problem and aims to obtain a multi-class classifier by training set.The multi-class classification problem can be converted to construct multiple binary classification problems,However,it seldom considers the poor classification performance and robustness caused by the imbalance of the number of classes in the data set;There are few researches on partial label learning algorithms.Many existing machine learning algorithms can be applied to deal with partial label learning problems.Based on this problem,this paper mainly makes a few aspects.1.In this paper,an integrated partial label learning method for KD tree equilibrium training set is proposed.Use of fast retrieval of KD tree,the number of positive and negative samples in the partition tends to be balanced,and then the stacking method in integrated learning is used for training.The best way to predict for unknown samples is to summate by voting.Experiments on public UCI data sets and real data sets.Experiments show that the proposed ensemble algorithm of KD tree balanced training set has better expressiveness.2.The ECOC algorithm based on the maximum difference of the sample features is proposed in this paper.From the ECOC framework characteristics,the sample data set with the largest feature difference is found,and the base classifier with larger difference is trained.The ECCD binary coding matrix is the largest through the feature difference.The sample candidate label or operation is constructed as a column code,and finally the prediction of the sample is compared by the data of each classifier and the coding matrix to achieve prediction.Experiments on open UCI data sets and real data sets show that the proposed ECOC algorithm based on the maximum difference of sample features has better performance.
Keywords/Search Tags:partial label learning, candidate label, K-dimensional tree, ensemble learning, balanced training set, output error correction coding, feature difference, multi-class classification
PDF Full Text Request
Related items