Font Size: a A A

The Study Of Biological Data Analysis Method Based On Support Vector Machines

Posted on:2013-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Q YuFull Text:PDF
GTID:1110330374477707Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
In the recent few decades, the biomolecular data have anexplosive growth in the public database. The data are demanded tobe processed and analysis for the biology, so the bioinformatics is born.Bioinformatics used the mathematics, the theory, the method andtechnology of the information science to analyze biologicalmacromolecules and their sequence, structure and function. InBioinformatics, the machine learning methods have become animportant method to solve these biological problems. The main work ofthis thesis is to discuss the classification prediction of the biologicalsequences, the major contribution of the thesis are as following:In chapter2, we propose a computation method to predict theregulatory interaction in Arabidopsis. We construct the positive andnegative samples using the existing transcription data. Based on thegene expression profile data and the sequence information, werepresent each regulatory interaction by a new feature extractionmethod. Then, the support vector machine (SVM) and the jaccknife testare employed to evaluate our method. The result shows that ourmethod achieves an overall accuracy of98.39%with the sensitivity of94.88%and the specificity of93.82%.In chapter3, we propose a new pseudo amino acid model topredict the subcellular location of apoptosis proteins. We use aminoacid substitution matrix and auto covariance transformation to extractthe sequence feature and construct the feature vector, which not onlyquantitatively describes the differences between amino acids, but alsopartially incorporates the sequence order information. By comparingour method with the other method, our method achieves a betterprediction accuracy.In chapter4, we propose a mathematical model to predict the success for polymerase chain reactions (PCR). At present, theexperimental procedure of PCR, including the primer design, wasalways the focus of attention, while little attention was paid to theanalysis of the PCR template. In this study, we focus on the DNAtemplate, the subject of PCR experiment, and use the k-mer method tocharacterize the DNA sequence. Then we employ SVM to predict thesuccess for PCR using the189exons in human chromosomes as targetamplicons. The result displays that our method is feasible.
Keywords/Search Tags:Bioinformatics, Support vector machines, Transcriptionregulatory, Protein subcellular location, Polymerase chain reaction
PDF Full Text Request
Related items