| Support Vector Machine(SVM)is an important machine learning algorithm.It is based on statistical learning theory such as VC dimension and structural risk minimization.It has a solid theoretical foundation and universal practicality.It usually presents a good classification effect in relatively balanced data of categories.However,many classification problems in practical applications are highly unbalanced.For unbalanced classification problems,support vector machines often do not have good classification performance.Based on the research background of the unbalanced classification problem,this paper analyzes the principle of the support vector machine algorithm and the defects of unbalanced data classification.This paper carries out experimental verification on the existing improved SVM algorithm for unbalanced classification,analyzes its advantages and disadvantages in detail,and proposes an improved algorithm based on this: Probability-optimized cost-sensitive SVM(PCS-SVM).The main research contents of the paper are:(1)A detailed analysis of the causes of the unsatisfactory effect of the support vector machine algorithm on unbalanced data is performed,and the reason is verified through experiments;based on the current unbalanced data sampling method and the improved support vector machine algorithm,The effectiveness of the unbalanced data classification is verified by experiments;and the shortcomings of traditional improved algorithms are pointed out.(2)Based on the existing cost-sensitive support vector machine algorithm,the probability distribution density function(PDF)of the sample distribution is introduced to effectively adjust the penalty parameters in the optimization problem,and an improved support vector machine algorithm PCS-SVM is proposed.The algorithm uses similarity matrix and pre-defined hyperparameters to estimate the sample class near the hyperplane,which can effectively adjust the separation hyperplane.Combined with the advantages of different improved algorithms,it improves the adaptability and robustness of the algorithm to different data characteristics.(3)Validate the PCS-SVM algorithm using 16 sets of actual unbalanced data sets,compare the classification performance of the PCS-SVM algorithm and the currenttypical SVM improved algorithm,and use different evaluation indexes to predict the new algorithm and the traditional improved algorithm.Evaluation;and using the T-test method to further compare PCS-SVM algorithm with other SVM improved algorithms to verify the effectiveness of PCS-SVM algorithm. |