Font Size: a A A

Research On Some Key Technologies In Ensemble Regression Problem

Posted on:2016-07-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M WangFull Text:PDF
GTID:1108330482958438Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the most four major research direction of machine learning, ensemble learning, which train multiple base learners to solve a learning problem and then combine their results to make the final decision, can significantly improve the generalization performance and stability of a learning system. Therefore, research on the ensemble learning theory, algorithm and application has attracted a lot of attentions in machine learning domain during the past decade. Regression problem is one of the most important tasks in machine learning, and is widely used in meteorology, hydrology, medicine, finance, electric power, transportation etc. Many regression learning algorithms have been proposed, such as artificial neural networks, classification and regression tree and support vector machine, etc.Ensemble learning has made great achievements in theoretical and application for classification problem. However, the research on regression problem using ensemble learning starts relatively late, and has few research productions, relevant theory and technology is not mature enough. There are still many problems unsolved. Therefore, in this thesis, we focus on several key techniques of the regression problem using ensemble learning approach, including ensemble regression learning algorithm framework, individual predictor generation, ensemble pruning algorithm and results combining, and parallel ensemble learning algorithm with MapReduce. Finally, a ensemble prediction system for infectious disease is designed and implemented based on public health big data. The main works and contribution of this thesis is summarized as follow:(1) An ensemble regression learning algorithm framework based on learning process model (ERLAF-LPM) is proposed, which is used to design an effective ensemble regression learning algorithm. The research on classification and regression problem using ensemble learning is independent and various theoretical frameworks and explains the lack of standardization. To track with this problem, we propose an ensemble regression learning algorithm framework from the learning process model point of view and analyzes how to design an effective ensemble regression learning algorithm under the guidance of proposed algorithm framework.(2) A heterogeneous ensemble regression learning algorithm based on multiple disturbances (HERL-MD) is proposed, which is used to improve the individual diversity and generalization performance of learners. Most of the previous learning algorithms are based on single disturbance and homogeneous ensemble. To track with this problem, we propose an ensemble regression learning algorithm based on multiple disturbance and heterogeneous ensemble. The individual diversity can be improved by simultaneously disturbing the training datasets, base learning algorithm and algorithm parameters. The generalization performance can be improved by avoiding the problem of over-fitting using cross validation.(3) An adaptive ensemble pruning and dynamic weighted combination algorithm based on post pruning (AEPDWC-2P) is proposed, which is used to improve the generalization performance and learning speed of algorithm. Many real problems are non-stationary, static pruning and result combination methods are difficult to effective learning time-varying data. To handle this problem, we propose an adaptive ensemble pruning and dynamic weighted combination learning algorithm based on post pruning. This algorithm uses the post pruning technology to adaptively select individual learners subset based on new prediction data, and then dynamic compute the combination weights. At the same time, motivated by the great generalization performance and learning speed of extreme learning machine (ELM), ELM is used as base learner in our proposed algorithm. Finally, the algorithm is applied in the problem named time series prediction, and leads to better performance and speed than usual learning approaches.(4) A parallel ensemble regression learning algorithm framework and its MapReduce implementation are proposed, which is used to tackle the scalable and parallel problem. The family of AdaBoost.RT algorithm is a kind of the state-of-the-art ensemble regression learning algorithm. However, the embedded greedy optimization of AdaBoost.RT algorithms makes them hard to utilize the MapReduce parallel computation architecture. To handle this problem, motivated by the great generalization performance and learning speed of HERL-MD based on AdaBoost.RT, SP-HERL-MD framework and its MapReduce implementation HERL-MD-MR is therefore proposed, which not only keep the advantages of HERL-MD algorithm, but also well utilize the MapReduce parallel computation architecture to accelerating the computation due to the inherited scalable and parallel structure. Furthermore, HERL-MD-MR is applied to solve the regression learning problem using big dataset, and demonstrated the performance in terms of prediction accuracy, speedup and scaleup using synthetic and real-world data sets.(5) An ensemble prediction system for infectious diseases based on public health big data is designed and implemented, which is used to tackle big data storage and management, infectious diseases prediction model building and prediction. Due to the massive amounts of data and the complexity of data structure for public health data, it is very difficult to high efficiency prediction infectious diseases using single learners. To track with this problem, we design and implement an ensemble prediction system for infectious diseases based on public health big data. Based on the works mentioned above, a learning algorithms base and a prediction models base are building. Nearly 10 years of diarrhea cases of Shanghai are used to verify the effectiveness of the system.
Keywords/Search Tags:Ensemble learning, Regression problem, Heterogeneous ensemble, Ensemble pruning, Infectious diseases prediction
PDF Full Text Request
Related items