Font Size: a A A

The Application Research Of Support Vector Machine In Non-spherical Distribution Data Set And Tumor Gene

Posted on:2013-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2268330395979885Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Support vector machine is a new tool to solve machine learning problems using optimization method, it is proposed by Vapnik in the mid90s. Only More than ten years, whether in theory or in practice, it has made breakthrough progress. Support vector machine as an important classification tool, especially when dealing with high dimensionality, small sample data sets, shows a more significant advantage. Therefore, with the development of research, there are more and more the applications in specific engineering. But the support vector machine is too sensitive to noise point. So it can improve classification performance of the support vector machine as to reduce the noise’s interference effectively.In biology, gene chip technology can get the gene expression profiles with small-sample and high-dimension. Identifying tumor samples accurately and effectively from gene expression profiles is extremely significant and very helpful for clinical medicine. In the face of the data set with small samples and high dimension, support vector machine shows obvious advantage, therefore, to research and construct a more suitable classifier for identifying tumor gene expression profiles becomes research hotspot.The main works of this paper are as follows:1. The traditional support vector machine is too sensitive to noise points interference, fuzzy support vector machine (FSVM) is over-reliance on the distribution shape of a data set, according to this problem, firstly, construct a noise filtering system (NFS), filter out the data point which most likely to be noise in the data set; then put the equivalence class coefficient proposed by literature [3] as a punishment factor into the traditional support vector machine model for further reducing the influence of noise data on the classification. The method shows a better noise immunity and classification ability when dealing with the data set containing more noise data and presenting the non-spherical distribution,.2. The key of distinguishing between normal and tumor samples effectively for tumor gene expression data is to find out the fewest genes which can predict the classes, then use a good performance classifier to classify. Faced with the problem, firstly, use the Revised Feature Score Criterion (RFSC) to remove the genes irrelevant to the classification task. Secondly, improve the pair-wise redundancy method, propose strong correlative tree to filter the redundant gene. Thirdly, improve the Rough Support Vector Machine (RSVM) and propose the Approximate Equivalence Rough Support Vector Machine (AE-RSVM), and then validate classification for data sets. Use the tumor data set to test, the experimental results show the feasibility and effectiveness of the method proposed in this paper.
Keywords/Search Tags:Support Vector Machines, noise, equivalence class, gene expression profile, tumor classification, gene selection
PDF Full Text Request
Related items