Font Size: a A A

Research And Application Of Active Learning Method For Unbalanced Data Set Based On One Class SVM

Posted on:2022-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:H C LiuFull Text:PDF
GTID:2518306575465914Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Active learning is an effective method to reduce the marking cost.In the application scenario of unbalanced data sets,it is more challenging to reduce the marking cost.At present,there are two main problems in the active learning algorithm of unbalanced data sets.First,when the imbalance degree is large,the cost for the initialization of the learner will increase,and the combination of traditional active learning and sampling method becomes ineffective,resulting in the waste of labels.Moreover,the active learning method makes little use of the distribution information of samples and pay little attention to the majority data,making the selected samples unrepresentative.Secondly,traditional active learning is difficult to extract high-dimensional data features,which makes manual feature extraction too costly.Based on the above two problems,this research uses the theory of one class support vector machine to conduct active learning research under the application scenarios of unbalanced data sets.The research in this thesis mainly includes the following three parts:1.An active learning method for unbalanced data sets based on one class support vector machine(OCSVM-AL)is proposed.Firstly,in order to solve the problem of label waste in the case of high unbalanced ratio,one class support vector machine is chosen as the learner.Secondly,the method based on density calculation is used to construct the query function to increase the representativeness of selected samples.On this basis,the scenarios of one sampling and batch sampling are explored.Finally,the UCI common dataset is used for experimental verification.The experimental results show that OCSVM-AL is an effective active learning algorithm.OCSVM-AL algorithm can effectively reduce the number of labeled samples when dealing with unbalanced dataset with large unbalanced ratio.2.An active learning of one class support vector machine based on SSGAN is proposed.In order to reduce the cost of manual extraction of high-dimensional data features and impove the robustness of the active learning algorithm for unbalanced data sets,this thesis studies the one class SVM active learning method for high-dimensional data application scenarios.Firstly,the convolutional neural network is used to automatically extract data features on the pre-trained semi-supervised generative adversarial network,then use the extracted feature data for active learning.In this process,active learning and discriminator are collaboratively optimized.Finally,experimental results indicate that the proposed method is effective and superior to the contrast algorithm in terms of the number of markers.3.Application of active learning method for unbalanced data set based on one class SVM.Firstly,data preprocessing and feature analysis are carried out.Secondly,the active learning algorithm is used to label the samples.Finally,compared with the comparison algorithm,the algorithm proposed in this study can significantly reduce the labeling cost,which indicates that the active learning algorithm in this study has obvious advantages in processing the actual coal mine safety situation data.
Keywords/Search Tags:active learning, unbalanced dataset, one calss support vector machine, clustring, GAN
PDF Full Text Request
Related items