Font Size: a A A

Support Vector Machine With Input Uncertainty And Its Application To Bioinformatics

Posted on:2010-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:L C BoFull Text:PDF
GTID:2178360278475477Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The disruptions of some certain genes, which are known as the disease genes, are responsible for the causing of many diseases. Hunting disease genes, also known as identifying disease genes is a problem of primary importance in biomedical research, which has a practical significance in gene diagnosis, gene therapy and gene drug design. In disease genes hunting, it is necessary to prioritize a large number of candidate genes from most to least promising. The prioritizing process is called candidate gene prioritization, which is an important step in disease genes hunting.With the development of genomics, there are many distinct data sources can be used for candidate gene prioritization, which not only differ from data types, but also usually have high dimensions and contaminated with noise. Additionally, in the problem of candidate gene prioritization, there are only a small number of known disease genes as training samples compared with the huge number candidate genes. So, the one-class SVM method is applied to solve this problem.Supporting vector machines (SVM) is a kind of newly developed machine learning method based on statistical learning theory and a solid theoretical foundation in Mathematics. It has advantages in solving few samples, nonlinear and high-dimensional pattern recognition problems and is widely applied in text categorization, handwriting recognition, image classification, bioinformatics and some other fields. For its good performance and convenience in handling all types of biological data, support vector machines are widely applied in bioinformatics to solve all kinds of problems. One-class SVM is an expansion of the basic SVM algorithm to one-class problems and it has already been successfully applied in anomaly detection, target identification and other issues. One-class SVM based candidate gene prioritization is its new application in bioinformatics.The biological experimental data used for candidate gene prioritization with one-class SVM is usually contaminated with errors and noise. For de-noising, this paper introduces uncertain input variable to the formula of one-class SVM and expands it through formula derivation to one-class SVM with input uncertainty. This expanded one-class SVM can take advantage of uncertainty information in the input data for de-noising to better use the data with noises for candidate gene prioritization. In order to integrate a variety of data sources for candidate gene prioritization, this paper proposed a one-class SVM based data fusion method and achieved good experimental results.
Keywords/Search Tags:Support vector machines, Bioinformatics, Candidate gene prioritization, Uncertainty, De-noising, Data fusion, One-class support vector machine
PDF Full Text Request
Related items