Font Size: a A A

The SVM Algorithm And Its Application Based Data Preprocessing In The Kernel Space For Unbalanced Data

Posted on:2014-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:S Y HaoFull Text:PDF
GTID:2268330425966782Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The problem of unbalanced data exists in various fields, such as medical field, the fieldof fault diagnosis and the field of fraud detection. Therefore studying an effective algorithm tosolve the unbalanced classification is great scientific significance and application value. Butwhen the classical classification algorithm applied to the unbalanced data, the classificationperformance of the algorithm is far from ideal. In addition, in the field of fault diagnosis,because support vector machine classification algorithm has faster convergence rate, strongstability and generalization ability, it has replaced the neural network algorithm widely used.The paper bases on support vector machine classification algorithm and focuses on how makethe SVM classification interface bias toward the majority instances appropriately.First, research the knowledge of unbalanced algorithms and machine fault diagnosis,then analyze the basis of knowledge and research. In order to select the majority instanceswith information and representative information of the spatial structure of the majority class,we present a novel under-sampling algorithm based on kernel cluster. Majority instances areclustered using fuzzy C-Means clustering algorithm (KFCM) in kernel space for randomlysampling representative samples with cluster information. The selected majority instances aswell as minority instances are used to learn classifier. Substantially the AdaBoost ensemble isused to integrate the proposed unbalanced classification component based on KernelCluster-based under-sampling, so the SVM classification performance under unbalanceddataset is improved. In the experiments, the proposed approach is compared with otherdata-preprocess methods for unbalanced dataset classification, the experimental resultsdemonstrate that the proposed method can not only improve classification performance ofSVM but also algorithm complexity.Secondly, we present the other support vector machine algorithm for unbalanced databased on sample properties under-sampling. We use Euclidean distance in the kernel space toselect the majority instances, then according to the sample’s density some representativemajority instances located near the classification interface are selected. In the experiments, theproposed approach is compared with other data-preprocessing methods for unbalanced datasetclassification, the experimental results demonstrate that the proposed method can improveclassification performance of SVM in the minority class data, the overall classification performance and robustness.Finally, applying the SVM classifier for unbalanced data based on kernel cluster-basedunder-sampling ensemble approaches to the field of fault diagnosis, it can achieve goodresults by experimental the method.
Keywords/Search Tags:The problem of unbalanced classification, Kernel cluster under-sample, AdaBoost, Sample properties, Fault diagnosis
PDF Full Text Request
Related items