Random Forest Improved Processing Of Unbalanced Data And Its Parallelization

Posted on:2017-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:L S Zhong

Full Text:PDF

GTID:2308330485478434

Subject:Mathematics

Abstract/Summary:

PDF Full Text Request

Random forest is in a random way to build a forest, the forest is composed of many decision trees, decision tree is not associated. Every decision tree establishment and is in random sampling process, and then use the form of voting for classification and prediction. The algorithm solve the single classifier in the performance bottleneck, so it is widely used in many aspects. Of course, the algorithm also exist many problems the, this article mainly from the two aspects of the optimization.First, a new data preprocessing method is proposed for the study of data preprocessing.The shortcomings of the forest algorithm is handling the imbalanced data set, this paper proposes in the theory of K-means algorithm based on proposed a K smote algorithm M.K. smote the main idea is that k-means method to find the center point of the original negative, then according to the smote that "new negative categories, the original data set to the negative class to replace all of the" new negative category, last use smote draw a new data set. The experimental results show that the method in the random forest algorithm improved the performance.Second, Mapreduce based parallel study of random forest algorithm.When solving the big data, not only need to spend a lot of time, and the classification performance is low. Under this background, this paper combines the Hadoop platform distributed framework Mapreduce, the random forest based on the main idea of Mapreduce research on parallel.Mapreduce is "divide and rule", the complex problem into several sub problems of the same to solve the problem, the corresponding sub lot easier. Specific to the random forest algorithm, the "divide and rule" is mainly embodied in the process of building a single, parallel processing of decision tree, and then the combination of decision tree to construct good vote. The experimental results show that the random forest parallelization are improved in time and efficiency.

Keywords/Search Tags:

delay differential equations, asymptotic stability, boundedness, Lyapunov functional

PDF Full Text Request

Related items

1	Stability Analysis And Synthesis For Neutral Delay Systems
2	Stability Analysis Of A Class Of Neural Networks With Delay
3	The Stability Analysis For Several Types Of Delayed Cellular Neural Network Models
4	The Stability Analysis For Several Types Of Delayed Cellular Neural Network Models
5	Stability Analysis Of Singular Time-delay Systems
6	Stability Analysis Of A Class Of Time-delay Discrete BAM Neural Networks With Time-varying Inputs
7	Stability Analysis For Nonlinear Cellular Neural Networks With Delays
8	Study On Asymptotic And Robust Stability Of Delayed Neural Networks
9	Stability Analysis And Control Synthesis Of Positive Differential-difference Equations
10	Stability Of Cellular Neural Networks