Font Size: a A A

Research And Application Of Imbalance Data Classification Based On Support Vector Machine

Posted on:2012-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:W J ZhaoFull Text:PDF
GTID:2218330344950974Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the information age, a large number of informations need people to find their own regular patterns.And classification is a ofen work in data processing ,so it becomes an important research direction of machine learning. Kernel function is one of important concept of SVM,it can make training set be mapped to a high dimensional space,so SVM can easily solve the line and a nonlinear classification case of limited samples.Researchs show that,SVM has good classification effect for the balance datasets.But for imbalanced data sets,it is difficult to get expected results. Support vectors decided the hyperplane's position,but support vectors'number of majority class is more than minority class,that can make the separating hyperplane offset.And there is even no classification rule of minority class when the number of two types of samples are seriously imbalanced.The major research work of this paper is to study how to solve imbalanced data classification prombles by SVM method. The main work and innovations include the following:First, research the theory of SVM. Analysis of the limitations of empirical risk minimization principle and introduce the advantages of structural risk minimization principle. And then,give a detailed summary of support vector machine theory and main algorithms.Second, analyze the difficulties of imbalanced data classification.Summarize the methods of imbalanced data classification.,and analysis the advantages and disadvantages of various methods.Third,proposed a support vector machine method based on clustering(DISVM). It is a improved method of the previous SVM partition algorithm. The main idea is to divide the majority class into some subsets,and then train the different combinations of each subset and minority class by SVM ,and finally to integrate the various sub-classifier. The method mainly improved previous algorithm on the partition rule of majority class,and we got a good performance of this method though experiment.Fourth, given a imbalanced classification SVM method based on convex hull(GSVM). First,compress the two kinds of samples to its hull's centroid direction, and then find the latest point of he two convex hulls, finally generate classifier use SVM method. Experiments show that this method has a good classification performance.Fifth, feature imbalance is another kind of imbalanced data, acute leukemia gene expression profiling is a typical feature imbalance data.In this paper,we will find the informative gene of the gene expression profiling.Traditional methods mainly consider the influence of single gene for classification judgment.Our method mainly considering two related genes,and use the distance of two genes as a measure to select informative genes. Experiments show that this method has a good classification performance.
Keywords/Search Tags:support vector machine, imbalanced data, re-sample training set, compressed convex hull, informative gene
PDF Full Text Request
Related items