Font Size: a A A

Research On Classification Algorithms For Imbalanced Dataset

Posted on:2017-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:L L XuFull Text:PDF
GTID:2348330488472145Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
As the improvements of the ability to mine and find more data information,data has been accumulated in various fields,among which there are balanced dataset and imbalanced data.It's a task for people to take effective measures to make full use of its internal information and regularity.Data classification is an important aspect of data processing.By analyzing and studying the existing dataset,data classification seeks out the hidden information as well as rules in dataset,then to predict the category of the unknown dataset.As for the classification of balanced dataset,classical classification algorithms have achieved progressive results,such as Extreme Learning Machine(ELM),Support Vector Machine(SVM).However,in practical application,the traditional algorithms are not adequate when used for the imbalanced data.The algorithms mentioned above lose sight of the fact that the category of the data distribution is imbalanced,so it is no doubt that it will cause an unsatisfied classification result.Therefore,to cut down the influence of unbalanced data,we need to improve the algorithms in hand or design new algorithms.On account of the case that traditional classification algorithms get low classification accuracy for the minority of unbalanced dataset,based on ELM and SVM,a study is mainly done from the following two aspects:For the study of data,an improved ELM algorithm based on clustering and under-sampling is proposed(FCM-ELM).The proposed algorithm uses clustering technology to cluster the negative class of training set into different clusters.According to the defined sampling rate,samples are respectively taken from each cluster to make up new negative dataset,which can balance the number of the positive and negative data in training set.By comparing and analyzing the experimental results,it can be seen that the new algorithm can effectively reduce the influence of the uneven distribution of data for classification accuracy,so the proposed algorithm can obtain better performance.For the study of algorithm,combining with SVM and clustering,a weighted ensemble learning algorithm is proposed(FCM-ENWSVM).In proposed algorithm,a weighted support vector machine(WSVM)is designed,in which different weights for different class data are assigned,according to the proportion of different categories of data.Then WSVM and clustering are combined into a new ensemble learning algorithm.The experimental results with artificial dataset and UCI dataset show that the new algorithm is able to process the classification of unbalanced data and achieve the desired effect.
Keywords/Search Tags:Imbalanced Data, Extreme Learning Machine, Support Vector Machine, Cluster, Under-Sampling
PDF Full Text Request
Related items