Font Size: a A A

Studies And Application Of Fuzzy And Double Regular Support Vector Machines

Posted on:2013-01-10Degree:DoctorType:Dissertation
Country:ChinaCandidate:C D QinFull Text:PDF
GTID:1228330395457120Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, as the new method of the data mining, Support Vector Machine (SVM)got more fully development and application. It is based on the theory of optimum, mainly toseek some laws of the classification and regression from some observation datas (samples)which can’t get from the principle analysis. Next using the laws analyzes data phenomenonswhich can’t be observed from the mass datas. With the support of the linear and nonlinearoptimization theory, support vector machines have many advantages, such as high fittingprecision, few parameters, strong generalization and global optimums performances. It is verygood to solve the small sample high noise, more outliers, and high dimension classificationand regression problems in data mining. Now it becomes a new research area in the field ofmachine learning research and it has been widely applied into various areas, such as patternrecognition, function fitting and density estimation. This paper mainly focus on the extractingimportant diseasecausing gene from tumor characteristics, the application of fuzzymembership in the support vector machine, the classification of the imbalances data and theproperties and the application of the double regularization support vector machine (SVM) etc.The main research work is as follows:1. According to the characteristics of the colon cancer gene expression profiles withhigh dimension, small sample and great noise,a method was proposed to measure the tumorgene with Bhattacharyya distance and to remove the genes irrelevant to the classificationtask.Next the method extract secondly the tumor gene by utilizing the sensitivity of the tumor geneon model. Simultaneously, a weight was added to the important genes depending on the normalization ofthe sensitivity and a new sample dataset was built. Finally a support vector machine was used to analyzeand test the feature genes on the new sample dataset. Experimental results show this method improves theaccuracy rate of tumor diagnosis.2. In view of the classification of imbalance data set with the larger imbalanced ratio ofclass, a balanced fuzzy support vector machine (BFSVM) is proposed, making use of theimbalance adjustment factor and the fuzzy membership based on the features of sample points.Firstly, it computes the sample covariance matrix and gets the imbalance adjustment factor,then computes the fuzzy membership of every sample and gets the contribution rate of e verysample. Fuzzy membership and imbalance adjustment affect the sample error of classifier atthe same time; the experiment results prove that the algorithm has a good effect on the largerimbalanced ratio.3. To solve the over-fitting problems with support vector machine (SVM) for the outliers or noise, the characteristics of fuzzy support vector machine (FSVM) and proximal supportvector machine (PSVM) are analyzed. Drawn on their advantages such as fuzzy membershipand proximal hyper plane, a method based on support vector data domain (SVDD) indescription is proposed. This method fully considered the relationship between the distancesof each sample point to the center of each class and the contribution rate of each sample. Theimproved algorithm performs more clearly and precise. The analytical results show thealgorithm with fuzzy membership degree has a higher recognition rate, but also spent agreater amount of training time.4. By using the sigmoid Integral smooth function, Smooth support vector machine(SSVM) of O.L.Mangasarian changed the constraint condition of standard support vectormachine problems into the unconstrained optimization, but the method can not consider theinfuence of some outliers or noises to the hyper plane.On the other hand,some Polynomialsmooth loss functions are more accurate than the sigmoid integral smooth function at theinflexion. A FSSVM is proposed with some smooth function and some fuzzymembership.The fuzzy membership function consider the contribution of the every sample tothe hyperplane. The contribution of the outliers and some noise become very small. Theunconstraint and differentiable optimization problem with some smooth function can selectthe BFGS algorithm or NA algorithm to compute. The results show that those changes gainthe positive effect in trails.5.According to the strengths and weaknesses of the L2–norm support vector machine(SVM) and the L1-norm SVM in the process of cancer gene analysis of classification, adoubly regularized Support Vector Machine (DRSVM) was applied to the DNA microarrayclassification based on the Bhattacharyya distance, which was used to eliminate most of theunimportant genes and gain a few high correlate important genes for classification. Aquadratic polynomial loss function changed the constrained optimization into theunconstrained and differentiable one, which can be computed by BFGS(Brogden-Fltcher-Goldfarb-Shanno) algorithm. The experimental result of two kinds of tumorgene data sets shows this method is effective and feasible.All in all, After ten years of development,support vector machine theory hasaccumulated the solid foundation for a larger study.With more and more mature optimizationtheory, support vector machine gradually turned to the application area of research.In thispaper, the auother mainly explores to improve the classification performance of the supportvector machine using the memembership function,smoothm function and the doubleregularization based on the existing optimization theory. Of course the author also hopes these methods can be used to other intelligent optimization algorithms and improve theperformance of the algorithms.
Keywords/Search Tags:Statistical learning theory, Support vector machines, Fuzzy membership, Data Domain Descriptions, Double doubly regularized, Smooth factor, Bhattacharyyadistance, feature gene, Least squares support vector machines, Large scale sample sets
PDF Full Text Request
Related items