Classification is to classify data according to its characteristics,which has attracted much attention in the research of data mining.Classification problems include two processes:learning and classification.In the learning process,an effective learning method is used to learn a classifier based on the training data set with known class labels.In the classification process,the learned classifier is used to classify new data without class label.It can be seen that the accuracy of classification depends on the accuracy of the classifier,so the focus of classification is on the learning of the classifier.Support Vector Machine(SVM)and Distance-Weighted Discrimination(DWD)are two very commonly used classification algorithms,which strategies for searching for the best separation hyperplane are both to maximize the interval.In SVM,only the support vectors have an effect on the learning of the separation hyperplane,so it is not sensitive to the class-imbalance data.However,in the process of minimizing the loss,the phenomenon of "data piling" in the situation of high-dimension low-sample size data is prone to occur in SVM,that is,a lot of sample points(i.e.,support vectors)are piled up on the interval boundaries on both sides of the separation hyperplane,resulting in overfitting.In DWD,all sample points have varying degrees of influence on the learning of the separated hyperplane,so the problem of overfitting high-dimension low-sample size data can be effectively solved.However,in the face of class-imbalance data,in order to reduce overall misclassification,it is easy to bias to the majority class in DWD,thus the separation hyperplane is pushed to the side of minority class.FLexible Assortment MachinE(FLAME)and Distance-Weighted Support Vector Machine(DWSVM)are both proposed to solve the above-mentioned problems of overfitting high-dimension low-sample size data and sensitivity to class-imbalance data.FLAME and DWSVM both inherit the advantages of SVM and DWD,alleviating the problems of overfitting and sensitivity to class-imbalance.The former is through the optimization of the loss function,while the latter is through the optimization of the objective function in the optimization problem.Fuzzy Support Vector Machine(FSVM)is developed from SVM.By introducing fuzzy membership(weighting coefficient),different sample points make different contributions to the learning of the separation hyperplane,and FSVM also inherits the advantage of SVM.After analyzing the problems of overfitting(data piling)high-dimension low-sample size data and sensitivity to class-imbalance data,drawing lessons from FLAME and DWSVM,effectively combining the great classification performance that FSVM can solve the problem of sensitivity to class-imbalance data and DWD can solve the problem of overfitting high-dimension low-sample size data,we apply inverse distance in DWD to fuzzy membership in FSVM,and propose a new two-class linear classification algorithm,called Inverse Distance Weighted Support Vector Machine(IDWSVM),which can efficiently solve the problems of overfitting(data piling)high-dimension low-sample size data and sensitivity to class-imbalance data.Finally,it is proved that IDWSVM can indeed solve the above two classification problems through theoretical analysis and example demonstration.At the end of this paper,the new algorithm IDWSVM is further developed,and the ideas of extending this two-class linear classification algorithm to nonlinear classification and multi-class classification are proposed.On the one hand,the kernel technique can be introduced to map the nonlinear dataset to the higher dimensional linear dataset to solve the nonlinear classification problem.On the other hand,we can solve the problem of multi-class classification by the ideas of "one-versus-one" and"one-versus-rest". |