Font Size: a A A

Research On Distance Weighted Discriminant Model Under Semi-supervised Framework

Posted on:2022-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:D B WuFull Text:PDF
GTID:2558307088450994Subject:Statistics
Abstract/Summary:PDF Full Text Request
In classification tasks,a classifier is mainly learned from a training dataset,and then the learned classifier is used to classify data of unknown class.The algorithms that discriminate among the relevant classifiers based on the distance from the sample points to the classification hyperplane are better known as Support Vector Machine(SVM)and Distance Weighted Discriminant(DWD).In SVM,the separation hyperplane is only related to a small number of support vectors,independent of the size of each category of data,so it is not sensitive to the category imbalance data.However,in high-dimensional small sample data,SVM is prone to "data piling" phenomenon,i.e.,there are many sample points on both sides of the separation hyperplane,which is easy to cause overfitting.In DWD,the distance from all sample points to the separation hyperplane is used,which can effectively solve the overfitting problem of SVM in high-dimensional small sample scenarios.However,the traditional DWD,which is used for category imbalance data,tends to push the separation hyperplane to a few classes,and the identification of the few classes is not accurate.And for the lack of sample labels,the semi-supervised method can fit the data with a limited number of labeled samples and a large number of unlabeled samples.Therefore,this paper proposes a semi-supervised distanceweighted discrimination method based on the scenario where the labeled samples are high-dimensional small samples and the classes are unbalanced,and solves the problem by using the direct-push method and the proposed Newton method,respectively.The most widely used method in this scenario is the semi-supervised SVM,but the SVM has disadvantages such as overfitting and difficulty in solving,making the semi-supervised SVM also has these problems.The DWD method proposed by Marron et al.overcomes these disadvantages of the SVM,while the generalized DWD can be well handled in the category imbalanced data.In the empirical section,the effects of DWD versus SVM in high-dimensional samples with category imbalance are analyzed in detail,and it is found that generalized DWD can reduce the impact of the category imbalance problem in this scenario by adjusting the distance index and using a sample-based weighting method,while QN-SDWD and TDWD are compared with QN-S3 VM and TSVM,and supervised SVM and DWD methods for F1 scores and the operational efficiency of the model.The results of the empirical analysis show that QN-SDWD solved by L-BFGS algorithm has the fastest solution speed,while TDWD has more stable performance,proving that the proposed method can be efficiently applied to semi-supervised tasks with labeled samples for high-dimensional small sample datasets,and can deal with the category imbalance problem well.Since DWD was proposed,it has been extended by many scholars,especially the inclusion of kernel function and optimization of solution method,which makes the method not only surpass SVM algorithm in classification accuracy comprehensively,but also its solving speed is faster.However,there is not a method like S3 VM in DWD research,so this paper provides a more practical idea for the development of the extension of distance-weighted discrimination and the improvement of other semi-supervised algorithms.
Keywords/Search Tags:distance weighted discriminant, support vector machine, semi-supervision, category imbalance, high-dimensional small sample data
PDF Full Text Request
Related items