The SVM Algorithm And Its Application Based Data Preprocessing In The Kernel Space For Unbalanced Data

Posted on:2014-11-09

Degree:Master

Type:Thesis

Country:China

Candidate:S Y Hao

Full Text:PDF

GTID:2268330425966782

Subject:Communication and Information System

Abstract/Summary:

The problem of unbalanced data exists in various fields, such as medical field, the fieldof fault diagnosis and the field of fraud detection. Therefore studying an effective algorithm tosolve the unbalanced classification is great scientific significance and application value. Butwhen the classical classification algorithm applied to the unbalanced data, the classificationperformance of the algorithm is far from ideal. In addition, in the field of fault diagnosis,because support vector machine classification algorithm has faster convergence rate, strongstability and generalization ability, it has replaced the neural network algorithm widely used.The paper bases on support vector machine classification algorithm and focuses on how makethe SVM classification interface bias toward the majority instances appropriately.First, research the knowledge of unbalanced algorithms and machine fault diagnosis,then analyze the basis of knowledge and research. In order to select the majority instanceswith information and representative information of the spatial structure of the majority class,we present a novel under-sampling algorithm based on kernel cluster. Majority instances areclustered using fuzzy C-Means clustering algorithm (KFCM) in kernel space for randomlysampling representative samples with cluster information. The selected majority instances aswell as minority instances are used to learn classifier. Substantially the AdaBoost ensemble isused to integrate the proposed unbalanced classification component based on KernelCluster-based under-sampling, so the SVM classification performance under unbalanceddataset is improved. In the experiments, the proposed approach is compared with otherdata-preprocess methods for unbalanced dataset classification, the experimental resultsdemonstrate that the proposed method can not only improve classification performance ofSVM but also algorithm complexity.Secondly, we present the other support vector machine algorithm for unbalanced databased on sample properties under-sampling. We use Euclidean distance in the kernel space toselect the majority instances, then according to the sampleâ€™s density some representativemajority instances located near the classification interface are selected. In the experiments, theproposed approach is compared with other data-preprocessing methods for unbalanced datasetclassification, the experimental results demonstrate that the proposed method can improveclassification performance of SVM in the minority class data, the overall classification performance and robustness.Finally, applying the SVM classifier for unbalanced data based on kernel cluster-basedunder-sampling ensemble approaches to the field of fault diagnosis, it can achieve goodresults by experimental the method.

Keywords/Search Tags:

The problem of unbalanced classification, Kernel cluster under-sample, AdaBoost, Sample properties, Fault diagnosis

Related items

1	Research On Adaboost Improved Algorithm For Unbalanced Data
2	Research On Rolling Bearing Fault Diagnosis Method Under Few Sample Condition
3	Research On Small Sample And Small Fault Diagnosis Method Based On PCA
4	Research On Virtual Sample Generation Technology Based Of KDE And Copula Function And Its Application To Imbalanced Dataset Classification
5	Research And Application Of Fault Diagnosis Method Using Extension Neural Network
6	Multiclass Classification Method Research With SVM Arithmetic
7	Research On Analog Circuit Fault Diagnosis Based On Support Vector Machine
8	Face Image Feature Extraction And Recognition In The Case Of Small Sample Size Problem
9	Research On Equipment Fault Diagnosis Algorithm Based On Support Vector Machine
10	Research Based On Improved Discriminant Locality Preserving Projection Method And It's Application In Industrial Fault Diagnosis