Font Size: a A A

Research Of Fuzzy Support Vector Machine And Its Application Of Gene Classification

Posted on:2014-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:C Y XuFull Text:PDF
GTID:2268330392473028Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a machine learning method, Support Vector Machine(SVM) solves the problem of localminimization, over-learning and dimension disaster. Though, the training process takes a longtraining time and vulnerable to be influenced by noise and isolated points remains to be itsdefects in practical applications. To better resolve this problem, Fuzzy Support VectorMachine(FSVM) arises. FSVM assigns different subordination values to these sample pointsaccording to their unique roles during the classification process, which solves the noise pointsproblem so well and makes a better classification accuracy.Gene classification is a question urgent to be solved in Bioinformatics, researches andanalysis about it has a high application value on diagnosis and treatment with some diseases.With the development of gene-data processing and data mining, SVM,as a potential data miningtechnique, is becoming an important research area.This article is based on above statements, it focuses on FSVM’s designing method of fuzzymembership function and its applying in gene classification. Main work is as follows:1. Current fuzzy membership functions are mainly designed from distances between samplepoints and its class centers, these methods usually have a high dependence on sample’s geometricdistribution and ignores the effects of sample points’ spatial relationships and the sample’s ownclassification property. Two kinds of improved FSVM are put forward, aimed to solve thesequestions. They are PHFSVM based on inner-class hyperplane and CCD-FSVM based on classcentripetal degree. PHFSVM replaces class center with inner-class hyperplane, defining fuzzymembership function through the distance between each sample and its inner-class hyperplane,increasing the penalty of sample vulnerable to be misclassified and assigned a low degree tosample that is far away from classification hyperplane and impossible to be supporting vector,which highly reduces computation volume on salvation of fuzzy membership function;CCD-FSVM combines the link between sample and class center with sample and other points inclass, which is expressed by class centripetal degree. CCD-FSVM also classifies highly mixedsample by value of centripetal degree to increase classification accuracy.2. Traditional SVM is created with a precondition of similar sample amount. When it is used inan unbalanced sample, a phenomenon of deviation arises. It fails to classifies a minority sampleand show a poor performance. In real applications of classification on unbalanced data, minoritysample plays a more important role in classification process, and users is expecting an algorithm to improve identification rate on minority sample. In the article, author analyzes the amount ofsample and its role in the designing of subordination degree function, raising BFSVM that isbased on unbalanced data classification, which mixing in influence of sample amount on fuzzymembership function after considering links between sample points, thus improvingidentification accuracy of unbalanced data classification about minority sample points.3. Feature of gene data is its little sample and high dimensions, it needs to bedimension-reduced before processed from biology view. This article finds coding informationgene by sequential forward floating selection algorithm, and applying three FSVM mentionedabove in classification of colon cancer data, verifying the effectiveness of FSVM in geneclassification by a series of experiments.
Keywords/Search Tags:FSVM, Fuzzy Membership Function, Unbalanced Data, Gene Classification
PDF Full Text Request
Related items