Font Size: a A A

Research On The Key Technologies And The Applications For The Class Of Imbalance Problem

Posted on:2022-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y L JianFull Text:PDF
GTID:2518306524989249Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In recent years,the class imbalance problem that aims to correctly classify imbalanced data sets and improve the classification performance of minority instances has received attention.Such problem can be roughly described as one of the class(es)termed as minority class(es)contains much smaller instances than the others,also referred as majority class(es).The current algorithms mainly focus on either stage of sampling and classifier construction,and almost no one combines these two stages to conduct research.In addition,the problem of data imbalance also exists in many practical applications,affecting industrial production and life.To address this problem,a novel classification method called focused online random forest based on synthetic minority oversampling technique(FORF-SMOTE)is proposed in this paper and simply expressed as FORF-S.And the simple and efficient synthetic minority oversampling(SMOTE)technique has been successfully applied to the judicial project.In order to improve the classification performance of unbalanced data and solve the problem of unbalanced data in the practical project,this paper mainly does the following work:(1)Only focus on the sampling stage or the training stage in unbalanced problem research is limited.The algorithm in this paper is motivated by making the sampling strategies integrated into algorithm level to create classifiers,which constructs two online random forests respectively trained by original training dataset and new generated dataset,then further jointly consititute the model.The new dataset is generated by oversampling the minority and filtering the majority.Moreover,the experimental results have demonstrated that the proposed algorithm takes advantages of the state-of-the-art methods and performances well in many test datasets.(2)All types of cases in judicial projects have the problem that the ratio between petition case and no-petition case is always unbalanced.We apply SMOTE to oversample the original dataset and use the new dataset to generate the classifier.The comparative experiment proves the successful application of over-sampling technique in judicial projects...
Keywords/Search Tags:Imbalanced data, online random forest, synthetic minority oversampling technique, nearest neighbor
PDF Full Text Request
Related items