Font Size: a A A

Research On Approach For Classification Of Intra-class Imbalanced Data Sets

Posted on:2017-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:F X ShiFull Text:PDF
GTID:2348330503486907Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, the problem of imbalanced data classification exists in many fields, more and more attention have been paid to solve it because of its importance and difficulties. So far, many effective methods have been proposed and applied to different fields, the method for specific intra-class imbalanced datasets still worth to study.Since most of existing methods that solve the imbalanced data classification are considered situation of imbalance between classes, not considering the type of the within imbalance problem, thus when a class within imbalance occurred, it would affect the final classification result. A large number of studies have shown that the data imbalance between the classes are not the only factor affecting the classification learning, the within imbalance of data are the key factors that influencing the effect of classification.In this article, first of all, i will introduce the imbalance and intra-class imbalance problem, the causes of the imbalanced data classification is analysed through its difficult in a series of analysis and basic research; Then, introduce the existing of some classic methods to solve the problem of unbalanced data sample classification and classic evaluation standard, and analysis the advantages and disadvantages of these methods; Then, this topic is suggested to consider a new kind of imbalance in the integrated learning based on improved DBSCAN algorithm and combined with optimization methods for classification of imbalanced data sample solution, and based on boundary samples with the merits of oversampling an d undersampling mixed sampling; Finally using evolutionary algorithms(EAs) in particle swarm optimization algorithm(PSO) on the sampling rate, characteristic vector and the weight coefficient of the base classifier is optimized, innovative put forward an improved DBSCAN algorithm based on PSO and brand-new DBPSBoost algorithm. Followed by a series of experiments that improve the recall ratio, precision ratio, F-Measure and AUC results, thus proving the effectiveness of the proposed method.
Keywords/Search Tags:intra-class imbalance, re-sampling, particle swarm optimization, ensemble classification
PDF Full Text Request
Related items