Font Size: a A A

Algorithm Research And System Implementation Of Automatic Machine Learning For Typical Scenarios

Posted on:2021-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:X FangFull Text:PDF
GTID:2428330647450735Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of data volume in all walks of life,information tech-nology has entered the era of big data and artificial intelligence.However,it encoun-ters great bottlenecks and constraints when employing artificial intelligence modeling widely.There are several problems and challenges such as high technical threshold,the severe shortage of talents,too much reliance on expert experience,and long modeling cycle.In order to accelerate the applications of artificial intelligence and improve the efficiency of AI modeling,recently,a new machine learning technology(AutoML)instead of human labor has been used internationally to select algorithms and tune hy-perparameters properly and automatically.At present,Automatic Machine Learning is taken more seriously for both academia and industry at home and abroad.Researchers have made significant progress on the basic techniques.However,the existing AutoML methods have not yet been able to solve automatic modeling tasks in the full-process data analysis scenario and the Lifelong Learning scenario.On the one hand,most practical machine learning models are often end-to-end ma-chine learning pipelines.A typical data analysis process involves multiple stages such as data preprocessing,feature engineering,algorithm selection,model evaluation,and hyperparameter optimization.Data analysts need to understand each method in each stage,and through repeated iterations and trial and error,finally,select a machine learn-ing pipeline with excellent performance.Therefore,it is difficult and time-consuming to build an efficient full-process data analysis model.On the other hand,the exist-ing AutoML methods are mostly used to solve the problem of automatic modeling on datasets with stationary distributions.In many practical application scenarios,the data distribution changes over time.This dynamic change in data characteristics and dis-tribution is also called Concept Drift.Due to the concept drift,the model obtained by training in one time period may be difficult to adapt to the data of the next time period,resulting in a decrease in prediction accuracy.The purpose of Lifelong Learning is to capture the concept drift of data so that the machine learning model can be dynamically updated as the data changes.To solve the above problems,this thesis firstly studies and proposes a frame-work named Auto-PLD(AutoML for PipeLine Design)that combines Reinforcement Learning and Bayesian Optimization to achieve the automation of machine learning pipeline design.Secondly,this thesis proposes an automated Lifelong Learning al-gorithm framework Auto-LLE(AutoML for Lifelong Learning based on weighted Ensemble)based on adaptive model weighted ensemble.Finally,we design and imple-ment an easy-to-use,functionally rich AutoML system.It can support both automated pipeline design and automated Lifelong Learning.The research work of this paper won the gold medal of the "Internet+" College Students Innovation and Entrepreneur-ship Competition and the third place of the NeurIPS 2018 AutoML Challenge.The major work and contribution are as follows:(1)In the end-to-end automated pipeline design scenario,this thesis proposes an algo-rithm framework named Auto-PLD for automated pipeline design.We firstly define a machine learning pipeline consists of five stages.This kind of machine learning pipeline can deal with continuous and discrete features.Then,we divide the auto-matic pipeline design problem into two subproblems which are pipeline structure search and pipeline hyperparameter tuning.An algorithm combining Reinforce-ment Learning and Bayesian Optimization is proposed to alternately optimize these two sub-problems.Finally,we propose two parallelized Auto-PLD approaches to further improve the efficiency of automated pipeline design.Experimental results show that Auto-PLD performs better than auto-sklearn algorithm on most datasets.Moreover,with the increase of computing nodes,the parallel Auto-PLD methods can further improve the performance of pipeline construction.(2)The thesis proposes an automated machine learning algorithm framework Auto-LLE for Lifelong Learning scenario.To address concept drift in imbalanced classi-fication tasks,this paper proposes an algorithm based on adaptive model weighted ensemble learning.First,we classify the concept types into the long-term concept and short-term concept.Auto-LLE uses an incremental learner to learn the long-term concept.Meanwhile,Auto-LLE weights the historical models according to an adaptive weight update formula.Finally,the latest data will be predicted by com-bining the predictions of historical models and an incremental learner.Experimen-tal results show that Auto-LLE can efficiently and automatically capture concept drift and improve prediction performance.(3)Based on Auto-LLE and Auto-PLD algorithm frameworks,we designed and im-plemented a system that supports both automated Lifelong Learning and automated pipeline design.In terms of system design,the system has good usability and scal-ability through the easy-to-use programming interfaces and the pluggable modular design.In terms of task type,This system supports classification,regression,and clustering tasks.
Keywords/Search Tags:AutoML, CASH problem, Lifelong Learning, Ray, parallelization
PDF Full Text Request
Related items