Font Size: a A A

Research On Approach For Classification Of Imbalanced Data Sets With High Density

Posted on:2015-11-13Degree:MasterType:Thesis
Country:ChinaCandidate:P F JiaFull Text:PDF
GTID:2308330479989901Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, since classification of imbalanced data widely exists in many different fields; in addition, it would provide considerable negative influences on real-life production when such issue cannot be handled in a porper way. So, all of researchers never stop to improve work performance of classification on imbalanced data. Currently, quantity of data become more and more massive and density become more and more highly along with the rise of big data, which causes the issue, that classification of imbalanced data with high density, attract more widely public concerned than before.Most of tranditional methodologies for solving the concern of classification on imbalanced data set do not consider the totally different individual characteristics among data sets. So, it leads a fact that there is not a general approaches for this issue has the ability to achieve a satisfied work performance when facing to the problem that classification on data sets with the distribution feature of high density.To start with, this thesis focuses on basic research. In orther words, the thesis analyses the nature reasons which may lead classification of imbalanced data beome a general diffculty in most of fields. In addition, the thesis induces, investigates and analyzes the mainstream methodologies that solve the concern and evaluate critera. Furthermore, in the data level, the thesis presents a completed new hybrid sampling methodology which is based on the the instances distribution information, and then two experiments are made in order to validate the effectiveness of such sampling approach. And then, such hybird sampling approach has been combined with improved ensemble learing algorithm. As a result, a totally new ensemble learning method---DBBoost---wich is used to improve the work performance of classification on imbalanced data with distribution characteristic of high density has been proposed. And two independent experiments prove it can get better work performance in metra of Area under ROC Curve, Recall, Percision and F-Measure; I the light of what has been done above, this thesis can safely draw a conclusion that DBBoost has remarkable advantages when faces to the concern of classification work on imbalanced data with feature of high density.
Keywords/Search Tags:Classification, Imbalanced-data, High-density, Ensemble Learning, Sampling
PDF Full Text Request
Related items