The Classification Of Less Labeled Imbalanced Data Base On Active Learning

Posted on:2023-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:Z D Zhao

Full Text:PDF

GTID:2568306836973849

Subject:Computer technology

Abstract/Summary:

Classification algorithms are an extremely important and fundamental part of data mining.Traditional classification methods tend to focus on the overall classification accuracy of data,and when applied to unbalanced data,the overall classification accuracy is usually improved at the expense of the classification accuracy of a few classes.In many practical applications,the role of minority class samples in unbalanced datasets is crucial,so the study of classification algorithms for unbalanced data has received extensive attention from experts and scholars.The paper first proposes an initial sample selection strategy for the binary classification problem of imbalanced data to focus on the minority class and reduce the overhead of subsequent iterations,and extends the strategy to linearly indistinguishable datasets.Then,a traditional support vector machine based active learning strategy is used to calculate the cosine similarity of the majority class of samples to be labeled,and select the sample with the smallest value,and then form a balanced set of samples to be labeled together with the minority class samples.The experiments are conducted on the mushroom and Reuters-21578 datasets to verify the feasibility of the strategy,and the results show that the strategy can effectively reduce the active learning iteration time.Since manual annotation in active learning requires the participation of additional experts and is costly,this paper combines semi-supervised learning with active learning to propose an active learning method that reduces sample redundancy.The method can select the most informative samples and the selected samples are highly representative,which can effectively avoid invalid sampling in the active learning process;in addition,the semi-supervised learning strategy based on direct push support vector machine is improved,and the time consumption in the semi-supervised learning process is effectively reduced by eliminating most classes of samples in batch.Finally,the above two strategies are integrated and related experiments are conducted on digits and MINST datasets respectively,and the results show that the strategy can effectively reduce the algorithm execution time while guaranteeing the classification accuracy.

Keywords/Search Tags:

Active learning, Unbalanced data, Semi-supervised learning, Support vector machines

Related items

1	Studies Of Some Problems In Support Vector Machines And Semi-supervised Learning
2	Research On Semi-supervised Support Vector Machine Learning Algorithsm
3	Research On Semi-Supervised Support Vector Machine Learning Methods
4	Research On Models And Algorithms Of Twin Support Vector Machines
5	Research On Models And Algorithms Of Semi-supervised Support Vector Machine
6	Research On The Application Of Semi-supervised Learning In Natural Language Processing
7	Online Semi-Supervised Learning Theory,Algorithms And Applications
8	Research On Classification Algorithm Based On Support Vector Machines And Deep Learning
9	Research And Application Of Active Learning Method For Unbalanced Data Set Based On One Class SVM
10	Semi-supervised Classification Method Based On Support Vector Machines