Font Size: a A A

Recognition Of Translation Initiation Site And Splicing Site In Eukaryote Genome

Posted on:2008-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2120360215491357Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Based on mathematical methods, we identify the functional sites of gene by a machinelearning system. Statistic analysis of sequences shows that although so many complex proteininteractions and higher-order structure influence the protein translation initiation and introncleavage process, some rules still exist. There are relative conservative characters in their primarysequences.In this paper, we firstly do some research on vertebrate translation initiation sites. Translationin vertebrates does not always start at the first AUG in an mRNA, implying that translation dependsalso on sequence information flanking AUG. Someone reported that almost 40% of vertebratemRNAs contain upstream AUGs. It makes that the prediction of translation initiation sites is anon-trivial task. Based on position propensity matrix(PPM) and length distibution of open readingframe(ORF), we develop a linear classifier to perdict start codons. This classifier makes very finedistinction between translation initiation sites and upstream AUGs which exist in 5' untranslatedregion. It is also used for predicting translation initiation sites in full-length mRNAs. Combiningour classifier with the ribosome scanning model, the translation initiation sites in vertebrate mRNAsequences are predicted. The higher accuracy is obtained. The overall accuracy is 97.8%. Moreover, when this method is applied to human full-length mRNA sequences, a satisfying result is alsogotten.In addition, in order to find an effective algorithm to identify human splicing site, thesequences are expressed by the six dimensional vectors composed by the increment of diversity andposition propensity matrix. And these six dimensional vectors are selected as the input of thesupport vector machine (SVM). Support vector machine can find the optimization hyperplane invector space to classify the real splice sites and the false splice sites. The result indicates that thehigher predicting accuracy than other methods for human splice sites is obtained by using thismethod. And the fewer parameters are used. When it experiments in dataset N269, for donor site,the true positive rate is 96.7% and the true negative rate is 93.4%, for acceptor site, the true positiverate is 94.3% and the true negative rate is 92.9%.
Keywords/Search Tags:translation initiation site, position propensity matrix, ribosome scanning model, splice junction site, increment of diversity, support vector machine
PDF Full Text Request
Related items