Font Size: a A A

Research On Double Weighted Random Forests Forecasting Algorithm And Its Parallelization

Posted on:2018-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:H GuoFull Text:PDF
GTID:2348330533469230Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology,the era of large data has arrived and data has explosively growth.Big data brings a great challenge to traditional machine learning methods.Big data is huge,complex,and diverse,which results in two main problems.One is that machine learning algorithms run too long to provide results in an acceptable time.The second is that the dimension of data is high,redundant,so that the traditional random forest regression algorithm can not get the desired effect.In order to solve these problems,this dissertation studies the improvement of traditional random forests regression and its parallelization.To solve the problem that the dimension of data is so high and redundant that the traditional random forest regression algorithm can not achieve the desired effect.There are some literatures to improve the traditional random forest regression algorithm using weighted feature extraction instead of the random feature extraction.However,we find that most of the related researches are directed to the classification problem,but there is little discussion about the regression problem.And almost all methods of weighting the feature extraction have a prerequisite of the independence between features,but in reality,this is not the case.Therefore,in this dissertation we adopts a feature weighting algorithm which can take the relationship between features into account,and uses two me thods to extract features.The weighting feature extraction improves the the accuracy of the classification and regression tree model,but also increases the correlation between tree models.It may affect the performance of random forests regression algorithm.To solve these problems,this dissertation proposes a double weighted random forests regression algorithm.In addition to weighting the feature to improve the accuracy of the classification and regression tree,and at the same time,this dissertation weight for the generated regression tree model.It try to balance the accuracy of the classification regression tree and diversity by using double weighted method to improve the final predictive performance of the random forest regression algorithm.To solve the problem of weighting the model tree,two new methods are proposed to balance the accuracy of model tree and the diversity of the model tree.The methods are forward search method and the method based on diversity calculation.To solve the problem that the machine learning algorithm runs for a long time,we parallelize and implement the double weighted random forest regression algorithm and analyze the parallelization effect through experiment.
Keywords/Search Tags:double-weighted, random forest, regression, parallelization
PDF Full Text Request
Related items