Font Size: a A A

Random K-Nearest Neighbor Algorithm With Application To Bankruptcy Prediction

Posted on:2021-06-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ShenFull Text:PDF
GTID:2518306113953479Subject:Statistics
Abstract/Summary:PDF Full Text Request
In the era of digital economy,more and more enterprises recognize the value of data,more and more data in the process of operation are collected and further used to help decision makers to evaluate the operation status of enterprises and early warning of future risks.Among them,it is of great significance to use a series of current financial indicators to predict bankruptcy after a period.Based on this background,this paper proposes a method for the classification of data with high imbalance ratio,high dimension and high correlation: Random ensemble rank k-nearest neighbor algorithm(REKRNN).REKRNN algorithm takes Bagging as the framework and takes rank k-nearest neighbor algorithm as the base learner of ensemble learning for the first time.Compared with other common base learners,k-RNN classifier algorithm has low complexity and low computing cost,which can easily over-fitting phenomenon in complex data processing.The REKRNN algorithm combines this feature with the advantages of bagging's high generalization performance to improve the classification accuracy of the algorithm in the imbalanced dataset classification task.In view of the fact that rank k-nearest neighbor rule is more suitable for balanced data set,before the modeling of the base learner,the hybrid resampling technics is used to preprocess the training data set: Bootstrap sampling is used for the minority part of training samples,random down-sampling is used for the majority part of samples.At the same time,the differences between the basic learners are improved to balance the data set;For the feature set with high dimension and high correlation,each base learner is trained by using the random subspace method to randomly select feature subspaces.This method can reduce dimensions,improve classification efficiency and increase the difference again at the same time,which implement the "good but different" principle of ensemble algorithm,and then improve the generalization performance of the algorithm.The REKRNN algorithm is applied to the bankruptcy prediction task of Polish manufacturing enterprises.The experiment shows that the use of the above technologies gradually improves the ability of the algorithm to identify the minority part of sample,and the finally proposed random rank k-nearest neighbor algorithm has a high classification accuracy in the imbalanced classification task.
Keywords/Search Tags:Bankruptcy Prediction, Imbalanced Data Classification, Ensemble Learning Algorithm, K-rank Nearest Neighbor Rule
PDF Full Text Request
Related items