Font Size: a A A

Hierarchical Ensemble Learning For Resource-Constrained Computing

Posted on:2021-12-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:H F WangFull Text:PDF
GTID:1488306107958249Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Ensemble learning is an important type of machine learning strategy.An ensemble of machine learning classifiers usually improves generalization performance and is useful for many applications.However,training and using machine learning models require necessary resources,which include,but are not limited to,memory storage,execution time,energy consumption,and other materials and human resources.The extra memory storage and computational cost incurred from the combined models often limits their potential applications,especially when large ensembles are employed.The computation burden has become the bottleneck for both training and using ensemble models.Ensemble reduction is an effective approach to deal with the resource-constrained issues.Ensemble reduction is also called ensemble selection or ensemble pruning,which is an active research area in ensemble learning.Ensemble reduction selects part of the base learners from an initial pool of models,and subsequently discards the rest unselected ones.The new ensemble is expected to attain the same level of prediction accuracy as the original ensemble,while saving much storage and computation cost.Research on ensemble reduction has profound practical and theoretical significances.This thesis investigates into several issues related to ensemble reduction and learning for resource-constrained computing.First,an ensemble reduction algorithm is designed to cut down the ensemble sizes,so that computation costs incurred by large population of base learners can be reduced.Next,by embedding the ensemble-reduction algorithm,a hierarchical ensemble learning framework is proposed to boost the ensemble accuracy and training speed,while maintaining the ensembles at moderate sizes.Based on this hierarchical ensemble framework,a method is developed to enable direct multiclass classification,without executing a set of binary classifiers multiple times at the cost of additional computation resources.Finally,the thesis studies hardware implementation of ensemble learning systems,which also tests the performance of the proposed ensemblereduction algorithm and the learning framework,especially under the resource-constrained condition.The main contributions of this thesis are summarized as the following.First,a new ensemble reduction method is proposed to significantly reduce the memory storage and computations.This method uses a technique from logic minimization for digital circuits to select and combine particular classification models from an initial pool in the form of a Boolean function,through which the reduced ensemble performs classification.To the best of our knowledge,we are the first to recognize that ensemble reduction can be formulated as a logic-minimization problem.Experiments demonstrate that the method either outperforms or is very competitive with the initial Bagging ensemble in terms of generalization error.On average,it retains only 9.43% of the base learners from the initial pool,and attains a reduction ratio of >97% in the best case.It is superior to all existing reduction methods surveyed for identifying the smallest numbers of models in the reduced ensembles.Next,a novel hierarchical ensemble learning framework is proposed.The framework begins by constructing an initial pool of decision tree models using the random forest algorithm.The forest is then divided into a number of groups by several partition strategies we devised.Ensemble reduction is employed in each of the groups to select and combine particular models.Prediction from the ensemble is obtained by first obtaining the predictions from the selected trees.In this way,the initial problem of ensemble reduction for a large pool is transformed into a number of ensemble-reduction tasks for small groups,so that the outcome and efficiency of logic generalization is better achieved.The results from all groups are subsequently aggregated via voting for the final outputs.The hierarchical architecture provides a tuning mechanism for resource-aware computing,as the ensemble can be dynamically adjusted to accommodate the computing resources for the best possible performance.The framework significantly reduces the initial ensemble sizes,while still enjoys the accuracy advantage offered by an ensemble of models,as it outperforms random forests in all datasets experimented.Third,a method is proposed to directly tackle multiclass classification problems.Previously,a multiclass classification problem is transformed into a set of binary classification ones,when classifiers cannot handle more than two classes.Here a direct solution is given,without any transformation.We encode class labels to binary logic values,and perform multi-output logic synthesis to derive Boolean functions,one for each bit in the class-label encodings.Experiment verified the reliability and feasibility of the method.We have achieved smaller ensemble sizes while the accurancy is still higher than random forests.Last,an automatic software-to-hardware conversion paradigm is proposed so that decision-tree ensembles are mapped to basic gates and flipflops,which can be embedded into hardwares.The hardware exploration serves as the actual applications of the ensemble methods in this work,where their area utilization and energy consumption are compared.Note,these ensemble methods,including the random forest algorithm,were originally developed as pure software-based ensemble methods.This physical level has not been tackled by previous ensemble reduction works.The hardware comparison is only possible,thanks to the conversion paradigm developed.The conversion paradigm supports many tree-based ensemble methods,which can be useful in hardware/ software codesigns integrating machine learning models.To sum up,the hierarchical ensemble learning framework is the core of this study.The partition strategies and the corresponding training methods can effectively control the ensemble's performance,including prediction accuracy,power consumption,memory occupancy,speed and accuracy of data analytics.The multiclass classification functionality adds to the comprehensiveness of the framework.Logic synthesis is a key technique underlies this framework.Extensive experiments from both hardware and software perspectives verified our proposed ensemble algorithms and methods.
Keywords/Search Tags:Ensemble Reduction, Ensemble Methods, Logic Synthesis, Hardware/ Software Codesign
PDF Full Text Request
Related items