Font Size: a A A

Random Forest Improved Processing Of Unbalanced Data And Its Parallelization

Posted on:2017-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:L S ZhongFull Text:PDF
GTID:2308330485478434Subject:Mathematics
Abstract/Summary:PDF Full Text Request
Random forest is in a random way to build a forest, the forest is composed of many decision trees, decision tree is not associated. Every decision tree establishment and is in random sampling process, and then use the form of voting for classification and prediction. The algorithm solve the single classifier in the performance bottleneck, so it is widely used in many aspects. Of course, the algorithm also exist many problems the, this article mainly from the two aspects of the optimization.First, a new data preprocessing method is proposed for the study of data preprocessing.The shortcomings of the forest algorithm is handling the imbalanced data set, this paper proposes in the theory of K-means algorithm based on proposed a K smote algorithm M.K. smote the main idea is that k-means method to find the center point of the original negative, then according to the smote that "new negative categories, the original data set to the negative class to replace all of the" new negative category, last use smote draw a new data set. The experimental results show that the method in the random forest algorithm improved the performance.Second, Mapreduce based parallel study of random forest algorithm.When solving the big data, not only need to spend a lot of time, and the classification performance is low. Under this background, this paper combines the Hadoop platform distributed framework Mapreduce, the random forest based on the main idea of Mapreduce research on parallel.Mapreduce is "divide and rule", the complex problem into several sub problems of the same to solve the problem, the corresponding sub lot easier. Specific to the random forest algorithm, the "divide and rule" is mainly embodied in the process of building a single, parallel processing of decision tree, and then the combination of decision tree to construct good vote. The experimental results show that the random forest parallelization are improved in time and efficiency.
Keywords/Search Tags:delay differential equations, asymptotic stability, boundedness, Lyapunov functional
PDF Full Text Request
Related items