Font Size: a A A

Research On Employee Turnover Prediction Based On SMOTE-SVM Under Unbalanced Data

Posted on:2022-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y B WanFull Text:PDF
GTID:2518306779972039Subject:Enterprise Economy
Abstract/Summary:PDF Full Text Request
With the rapid development of economy and the increase of employment opportunities,many enterprises will face the problem of employee turnover,which will have a negative impact on business,cost,team stability and core technology.In order to reduce the turnover rate of employees,most enterprises use support measurement and structural equation model to analyze the turnover trend of employees based on statistical analysis.These methods usually only deal with low-dimensional information such as personal ability,salary,working environment and job satisfaction,and do not consider many relevant high-dimensional characteristic information such as age,gender,overtime,tourism and company satisfaction.However,when the existing SVM based methods classify and predict the high-dimensional feature data set of resigned employees,they will be affected by the imbalance of the number of positive and negative samples of employee data,which makes the classification hyperplane tend to the minority samples with resignation intention,resulting in the decline of prediction accuracy of minority samples with resignation intention,fuzzy boundary and noise pollution.Aiming at the problems of limited classification effect on unbalanced data sets,the same misclassification cost and weak generalization ability of current SVM classification algorithms,this paper proposes adaptive fuzzy c-means clustering(AFCM)and improved smote algorithms based on kernel space(AFCM-SMOTE and K-SMOTE-SVM),Combined with the integrated learning method,a comprehensive turnover prediction algorithm model for multi type enterprise data is proposed.The main contents of this paper include:(1)Aiming at the problem that the classification accuracy of SVM algorithm is low when facing the unbalanced data set of enterprise employees,smote oversampling method is introduced to improve the balance of enterprise employee data set,and an improved cost sensitive weighting algorithm is proposed through research,which improves the defect that SVM algorithm has no wrong classification cost for the newly generated sample data in the classification process.(2)Focusing on the characteristics of multi center convergence of enterprise resigned employee data,before sampling the resigned employee data set,this paper uses the fuzzy c-means clustering algorithm to find the center point of the resigned employee sample.In order to more accurately determine the value of the resigned employee clustering category,an improved clustering FCM algorithm is proposed,and then combined with SMOTE algorithm to generate new samples,so that the newly generated resigned employee samples are closer to the real data,The probability of noise data generation is greatly reduced.On this basis,the kernel function technique of SVM is used to transform the data into high-dimensional feature space,and then clustering and sampling are carried out.A clustering oversampling support vector machine classification algorithm K-AFCM-SMOTE-SVM based on kernel space is obtained.The originally separated AFCM-SMOTE algorithm is combined with the SVM classification process,so as to solve the problem that the original processing method of sampling the data of resigned employees has little impact on the classification results of SVM,Experiments show that this method can greatly improve the accuracy of SVM classification.(3)In order to further improve the classification accuracy of K-AFCM-SMOTE-SVM algorithm in the face of different types of enterprise data,this paper introduces the integrated learning algorithm to increase the generalization ability of SVM.Based on the original Ada Boost algorithm,an integrated learning algorithm PIBoost based on new construction evaluation is proposed.This algorithm changes F-measure to treat the loss of positive class misclassification and negative class misclassification equally when calculating the accuracy,Because the importance of the two is different when classifying unbalanced data,and combined with the previously proposed full sample cost sensitive weighting algorithm,it greatly improves the classification accuracy of SVM model when facing different data sets.Experiments show that the proposed employee turnover prediction algorithm based on SMOTE-SVM has significantly improved the classification results F-measure and g-means in the face of highly unbalanced enterprise employee data,and can effectively solve the problems that smote sampling algorithm has a weak impact on SVM classification results,and the generated turnover samples are too random and prone to noise data.In addition,the improved ensemble learning algorithm proposed in this paper can also effectively solve the defects of over fitting risk and weak generalization ability for different types of enterprise employee data sets.The research results of this paper have good practical value and application prospect in enterprise turnover prediction and employee management.
Keywords/Search Tags:Support vector machine, Unbalanced enterprise employee data classification, AFCM clustering algorithm, Kernel space, Integrated learning
PDF Full Text Request
Related items