Font Size: a A A

Research Of Multi-class Imbalanced Data Classification Method

Posted on:2018-10-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y N PengFull Text:PDF
GTID:2348330569486426Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Classification problem is one of the most important researches in data mining.Traditional classification method is based on the design of binary data,but the data in most practical cases are multi-class,such as network anomaly detection,weather forecast and oil pollution detection.Compared with binary classification,multi-classification problems are more complicated which bring new challenges to traditional classification study.The data in practical applications are not only various in it categories but also imbalanced in its distribution of samples.When the minority samples are too little,the traditional classification methods can rarely get useful information from them.So in the final classification decision,it will take on the majority class.That is to say,it is necessary to study multi-class imbalanced classification method.Classification framework is an important method which is widely used in solving multi-class imbalanced classification problems.Primal problem is solved by decomposing it into a series of imbalanced binary-classification problems and getting classification results through ensemble rules.The framework includes decomposition strategies,imbalanced binary-classification methods and ensemble rules,which attracts researchers' attention.However,there are still some shortcomings in the framework.The proportion of effective binary-classifier will decrease with the increase of class scales,which will seriously affect the classification performance of whole multi-class imbalanced classification method.In order to solve above problems,this thesis puts multi-class imbalanced classification problem into a weighted complete graph after introducing the concept of core cluster and inter-class difference.On this basis,a decomposition strategy based on maximum spanning tree and an ensemble rule based on node degree are proposed.The sample of optimal binary-classifiers is selected to train to reduce the construction rate of poor binary-classifier.And then binary-classifiers' results are integrated which are weighted with node degree to balance the inconsistencies in the construction of different class samples.In order to verify the feasibility of proposed method,the decomposition strategies and the ensemble rules proposed in this thesis are compared with the existing methods based on four imbalanced ensemble learning methods and two classification algorithms.The experimental results show that proposed method can effectively improve the classification accuracy.On the basis of optimizing the decomposition strategy and ensemble rule of classification framework,this thesis further improves the existing imbalanced classification method.The k neighborhood of traditional k-Nearest Neighbor Algorithm is dynamically adjusted for imbalanced data to enlarge the participation probability of the minority class.The experimental results show that the improved k-Nearest Neighbor Algorithm can improve classification accuracy of the minority class while keeping overall classification accuracy balanced.
Keywords/Search Tags:imbalanced data, multi-class problem, decomposition strategy, ensemble rule, dynamic adjustment
PDF Full Text Request
Related items