Font Size: a A A

Research On Adaptive Ensemble Learning Based Clustering

Posted on:2018-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhangFull Text:PDF
GTID:2348330569986481Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Ensemble learning is the first popular direction in machine learning.It combines a number of learners in some way to solve the same problem and can significantly improve the overall learning system's generalization performance.Despite the relentless efforts of many researchers over the past decade,ensemble learning evolved from the initial budding and got a lot of results.But there are still some gaps with the actual requirements.Therefore,more in-depth analysis and research is needed.In the field of traditional ensemble learning,while existing studies focus on how to build and ensemble base learners to reduce training loss,few attention has been paid on the fitness between testing instances and base learners.Thus through the relationship between traditional ensemble learning and lazy learning,we propose two kinds of algorithms,adaptive AdaBoost based K-means and adaptive random forest algorithm based LDA topic clustering.1.Proposing adaptive AdaBoost based K-means.The algorithm is based on base learners generated by the AdaBoost algorithm.The Ada Boost algorithm will gradually increase the weight of the misclassified samples in the process of training the learner,and the relative weight of the samples that are correctly classified will be too low.For this reason,the trained learners are more likely to be good results for the regions of the sample,while ignoring other regions.To tackle this problem,training samples are grouped into several subsets and fitness between groups C and base learners H are calculated according to the error rate of H on C.In the testing phase,the similarity between the testing sample and the traning subsets are used to find fitness between testing sample and base learners.The experimental results show that the discriminative capability of the proposed algorithm is better than the traditional AdaBoost algorithm in 10 UCI standard datasets.2.Proposing adaptive random forest based LDA topic clustering.In the algorithm,we firstly use base learners generated by random forest algorithm to constructed residual errors space for the training set.Secondly,according to certain rules to convert the residual errors space into a document set,and using the document set to generate the meta-features related to the base learners through the LDA topic model.After that let these meta-features and the base learner models based on FWLS(Feature-Weighted Linear Stacking)algorithm to learn,and then get a series of coefficients.Combining the coefficients with the corresponding meta-features allows the weight of each base learner to be determined.In the test phase,through the connection between the testing sample and the training set,the meta-features of the testing sample fitting with each of the base learners are obtained.Combined with the coefficients obtained before,each of the base learners also obtains the weight corresponding to the test sample.So that appropriate weights of the base classifiers can be dynamically given according to different test samples.The experimental results show that the discriminative capability of the proposed algorithm is better than the traditional random forest algorithm in 10 UCI standard datasets.
Keywords/Search Tags:ensemble learning, clustering, adaptive, meta-features, FWLS, LDA topic model
PDF Full Text Request
Related items