Font Size: a A A

Prediction Of Alternative Splice Sites And Skipped Exons Based On Sequence Features

Posted on:2009-12-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:W R T YangFull Text:PDF
GTID:1100360245487012Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The eukaryotic gene sequences contain the coding exons and noncoding intron sequences. In transcription step, the intron sequences are spliced out and the exon sequences are joined together Alternative splicing is a mechanism which produces the different mRNAs and proteins from a gene. Alternative splicing is an important mechanism to increase the transcript diversity. Alternative splicing events occur frequently in the human genome and other eukaryotic genomes. The genome-wide analysis of alternative splicing indicated that approximately half of human genes have alternative splice forms. Alternative splicing occurs in the different tissues, different cells and different developmental stages. It is inherent in the entire life process, and it is also closely related with many diseases. The previous studies which are based on ESTs and microarray analysis have identified many alternative splicing events. Because of the experimental limitations, identification of alternative splicing events also needs non-EST-based computational methods.In this dissertation, based on the information parameters from genome sequences, the alternative splicing events are predicted by the position weight matrix, increment of diversity, support vector machine and Mahalanobis discriminant algorithm. And the parameters are selected by the statistical analysis of WebLogo, informational parameter Mni and t-test. The mechanism of splice site competitions and palindromic sequences are also discussed in this dissertation. The main contributions of this dissertation are summarized as follows:1. The support vector machine method which combines with position weight matrix and increment of diversity is proposed as the classifier for alternative 5'/3' splice sites and pseudo splice sites. Our method can receive the specificity of 85.62% (81.19%) and sensitivity of 88.74% (90.86%) for the prediction of alternative 5'(3') splice sites.2. Based on the mechanism of splice site competition and sequence parameters, alternative 5'/3' splice sites and constitutive splice sites of the human and mouse genomes are predicted by the support vector machine method which combines with position weight matrix and increment of diversity. In the human genome, our method can correctly classify 67.88% (71.63%) of donor (acceptor) sites into alternative and constitutive, the prediction ability of acceptor sites is 45% higher than the recent method. In the mouse genome, our method can correctly classify more than 72% splice sites into alternative and constitutive. The results indicate that our method has high quality and can be used in wide range.3. The position weight matrix scoring function is used to represent splice site strength, and the mechanism of splice site competition is described by only one parameter: scoring function subtraction. While applying on the alternative splice site prediction, the prediction abilities are approximately equal to the recent method which is based on the mechanism of splice site competition. The results reveal that the scoring function subtraction is one of the best parameter to describe the mechanism of splice sites competition.4. The skipped exons and constitutive exons are analyzed for its length, the divisibility by 3 and the splice site conservation. The 3-mer frequencies of left intron, right intron and exon sequences are analyzed by t-test, and we have found that CCT et al. 3-mer are significantly different in the skipped exons and constitutive exons sequences. Then the skipped exons are predicted by two methods, one is based on position weight matrix and increment of diversity combine with support vector machine, the other is based on position weight matrix and increment of diversity combine with Mahalanobis discriminant. Both of the two methods can correctly predict almost 60% of skipped exons based on local sequence features.5. The statistical analysis of the palindromic sequences has shown that palindrome frequency of constitutive exons is higher than skipped exons, and it is 23 times higher than the palindrome frequency of random sequences. The statistical result provides a new evidence for the theory: the alternative state is a derivative of an ancestral constitutive exon.6. The mononucleotide conservation of cancer specific splice sites is analyzed, and the cancer specific splice sites are predicted by the support vector machine method which combines with position weight matrix and increment of diversity. The prediction accuracy is 62% which is higher than other method.7. All alternative donor (acceptor) sites of different alternative splicing types are clustered as one class (it is mean that all splice sites are divided into four types: alternative donor sites, constitutive donor sites, alternative acceptor sites and constitutive acceptor sites), then they are predicted. The prediction result on C. elegans alternative splicing data has shown that it is feasible to divide the splice sites into the four classes. The result provides a new insight for the work of alternative splicing prediction.
Keywords/Search Tags:alternative splicing, skipped exon, conservation site, 3-mer frequency, palindromic sequence, position weight matrix, increment of diversity, support vector machine
PDF Full Text Request
Related items