Font Size: a A A

Research On Efficient And Robust SVM Based On CRF

Posted on:2022-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y HeFull Text:PDF
GTID:2518306575967089Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of machine learning,the noise in the original training set is generally divided into attribute noise and label noise.In most cases,the harm of label noise is greater than attribute noise,and it will seriously affect the accuracy of classifier verification.In order to eliminate the negative effects of tag noise,it is mainly based on filter or robust algorithm to filter tag noise.According to this,someone proposed a tag noise filtering learning framework based on completely random forest(CRF-NFL),which uses completely random forest(CRF)as filter.CRF-NFL framework can not only filter tag noise effectively,but also combine various classifiers to train the filtered training set,that is,it can combine other robust algorithms to filter tag noise and further improve the filtering performance.However,this framework has two disadvantages: one is that the complete random forest has not been optimized,which can not make the accuracy of classifier verification higher;the other is that it only focuses on the combination of various classifiers.For example,when the classical support vector machine(SVM)is selected as the combined classifier,the crf-nfl-svm model is formed.In the dichotomy problem,the robustness of SVM is not considered,and in the high noise training set In this case,the performance of crf-nfl-svm model is not ideal.In view of the two shortcomings of CRF-NFLframework,this thesis makes related research based on crf-nfl framework and support vector machine theoryFirstly,this thesis optimizes the label noise filtering method based on completely random forest.Through the optimization of the voting threshold,the label noise in the original training set can be filtered better,and the verification accuracy of the classifier is higher.At the same time,because there is no pruning process between completely random forest and random forest,support vector machine does not need cross validation,and the efficiency is also improvedSecondly,this thesis proposes a method to improve the robustness of SVM algorithm in dichotomy problem.Because label noise is an important factor to make the original data set indivisible,the key to transform the linear non separable problem into linear separable problem theoretically is to maximize the penalty coefficient,according to which the maximum hyperplane can be solved according to the linear separability,thus increasing the anti noise ability of the algorithm and improving the robustness of SVM.Finally,according to the above optimization improvement,this thesis proposes an efficient robust support vector machine model based on completely random forest(CRF-ERSVM).Using UCI data set,compared with the classical support vector machine model and CRF-NFL-SVM model,the verification accuracy of this model is improved by5.18% and 4.18% respectively in the noise data set with 20%.
Keywords/Search Tags:completely random forest, label noise, voting optimization, support Vector Machines, robustness
PDF Full Text Request
Related items