| In recent years,the problem of classification of imbalanced data sets has attracted extensive attention and research.Cost-sensitive support vector machines(CS-SVM)have a good generalisation performance in this problem.The penalty parameters and kernel parameters of CS-SVM have a great impact on the classification performance of the model.In particular,in the classification of imbalanced data,the selection of model usually involves parameter tuning and optimisation.However,the selection of these parameters is used as a ’black box’ without understanding the details.This paper analyses the behaviour of CS-SVM when the parameters of the penalty factor and kernel function are taken to different values in imbalanced data sets.By theoretical derivation and data analysis,the search space for parameters is divided into three regions.In good regions,effective parameters can be quickly searched and the model performs well.The range of kernel function parameters can be expected by calculating the distance between samples.And based on this,a new CS-SVM parameter optimisation method is proposed.This method can reduce the search space of parameters and the computational effort decreases exponentially with increasing complexity of the dataset.And the running time is significantly reduced compared to the global search.In addition,this paper analyses and discusses the performance of several commonly used classification models in the classification of imbalanced data.In order to make the separation effect more inclined to improve the recall of the minority class,a new evaluation metric based on G-mean is proposed in this paper to evaluate the proposed method.The results show that the proposed method in this paper can increase the recall of the minority class to 1 while ensuring sample accuracy.The proposed method has higher search efficiency and better average performance compared to other classification models. |