Font Size: a A A

Research On Talent Turnover Prediction Model Based On Optimized Random Forest Algorithm

Posted on:2021-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q DongFull Text:PDF
GTID:2518306560953089Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Talent is the first resource for China's economic and social development.The loss of key talents will have a great impact on enterprises.It is of great significance to establish a talent turnover prediction model suitable for the enterprise itself.At present,the random forest algorithm has been widely used in this field,but in the talent information database of most enterprises,the imbalance in the amount of information between leaving and serving talents has caused the instability of the random forest algorithm's prediction performance,and the Talent turnover factors are different,but the node splitting method of the random forest algorithm is relatively single,and cannot be adjusted according to the characteristics of talent information of different enterprises.In response to the above problems,this paper first proposes an improved resampling algorithm to balance the data set,then introduces a linear programming model to adjust the existing node splitting rules,and establishes a talent turnover prediction model based on an optimized random forest algorithm to help companies more complete the forecast of the turnover risk of talents.The specific work is as follows:(1)Aiming at the problem of poor classification performance of the random forest algorithm on unbalanced data sets,this paper proposes an improved algorithm BSMOTE balanced data set based on the SMOTE algorithm.The algorithm first uses the K-means algorithm to cluster a few samples,and then takes three samples from each cluster to construct a triangle.A new minority sample is generated between the center of gravity and the vertices of the triangle,and the new sample is then directed toward the triangle.The position of the center of gravity is pulled together to avoid marginalization of the sample distribution,and the generated new samples are added to the original data set,and then the random forest algorithm is used for classification.By comparing the BSMOTE algorithm with five resampling algorithms on seven imbalanced data sets,it is proved that the BSMOTE algorithm can effectively improve the performance of the random forest algorithm for the classification of imbalanced data.(2)Aiming at the problem of single node splitting rule for generating decision tree in random forest algorithm,this paper proposes an improved algorithm LPRF to optimize the node splitting function of random forest.This algorithm linearly combines the functions of the C4.5 algorithm and the CART algorithm in node splitting in the decision tree,and introduces a linear programming model to solve the optimal combination coefficients suitable for different data sets,thereby adaptively updating the node splitting rules.Then the BSMOTE algorithm is introduced into the LPRF algorithm,and an improved random forest algorithm BSMOTE-LPRF is further proposed.Through comparative experiments,it is proved that the BSMOTE-LPRF algorithm can more effectively handle the classification of unbalanced data,and at the same time solves that the random forest algorithm cannot adapt update the splitting rules,leading to the problem of insufficient prediction accuracy.(3)Established a talent turnover prediction model based on the BSMOTE-LPRF algorithm,and determines the parameters of the BSMOTE-LPRF algorithm through multiple sets of experiments.The model is compared with decision trees,KNN,SVM and traditional random forest prediction models.The experimental results show that the BSMOTE-LPRF prediction model is superior to other performance evaluation indicators such as G-mean value,F-measure value and AUC value.The model proves the effectiveness of the BSMOTE-LPRF algorithm to optimize the random forest algorithm,and the feasibility of the BSMOTE-LPRF algorithm applied to the prediction of the risk of enterprise talent turnover.
Keywords/Search Tags:Random Forest, Imbalanced Data Set, Node Splitting, Classification Algorithm, Turnover Prediction
PDF Full Text Request
Related items