Font Size: a A A

Research On Exon Prediction Based On Theory And Methods Of Digital Signal Processing

Posted on:2015-12-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:1220330467483190Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The task fast, reliably and accurately to predict the actual locations of exons in DNA sequences of eukaryotes is an important problem in bioinformatics. The ability to accurately locate short exons is a difficult problem in predicting the numbers and locations of exons, while in exon prediction methods suppressing background noise of intron regions plays an important role in improving the prediction accuracy of short exons. In eukaryotes the relatively small protein coding regions known as exons in their genes are usually interrupted by the non-coding regions known as introns, and a large fraction of exon sizes might be considered small. Due to the fact that the discriminative characteristics are likely to appear less prominently in short exons, the prediction of short exons is a particularly difficult problem. The coding information of some exons plays an important role in tumor invasion and metastasis. This thesis proposes two efficient methods for improving the prediction accuracy of short exons from two perspectives:capturing the characteristics of short exons and suppressing background noise of intron regions.The methods for exon prediction can be divided into two categories, digital signal processing-based methods and database-based methods. By using the singularity detection algorithm with wavelet transform modulus maxima and the empirical mode decomposition, this thesis develops two methods for predicting eukaryotic exons. The main work of this thesis may be expressed as follows:(1) Exon prediction via the singularity detection algorithm with wavelet transform modulus maxima. This method firstly constructs the sequence of nucleotide distribution, and the noises represented by introns are removed from exon signals by analyzing the evolution of the wavelet transform modulus maxima of the sequence of nucleotide distribution across scales. The sharp variation points of short exons are reconstructed, with a good approximation, from the local maxima of their wavelet transform modulus. Then, the short exons can be predicted accurately. The HMR195and BG570data sets are widely used to evaluate the performces of exon prediction methods. These two data sets are used to evaluate the prediction performance of singularity detection method. By comparison with the main existing methods, the prediction results of sigularity detection method on the HMR195and BG570data sets show that:1) This method reveals at least improvements of12%and8%on the exons of a length not greater than50base pairs and200base pairs respectively with respect to detection rate;2) This method exhibits at least improvement of6.8%on the total exons with respect to accuracy;3) This method reveals at least improvement of74.5%for suppressing background noise of intron regions in terms of the signal-to-noise performance metric.(2) To extend the application range of sigularity detection method,200test data with each test data containing two contiguous short exons and a short intron flanked by these two short exons are randomly selected from NCBI GenBank. By comparison with the main existing methods, this method reveals at least improvement of20.7%on the200test data with respect to accuracy.(3) Exon prediction using empirical mode decomposition and modified Gabor-wavelet transform. In this method, the numerical DNA sequence represented by DNA-bending stiffness scheme is firstly decomposed by empirical mode decomposition into a collection of intrinsic mode functions (IMF). Then the first IMF is used to compute the local spectrum by modified Gabor-wavelet transform. Since the empirical mode decomposition is a self-adaptive technique for spectrum analysis of non-stationary signal, this novel technique can detecte the significant components of short exons that are rarely observed with the traditional methods. In addition, only the first IMF is used to compute the local sprectrum, thus this method presents the advantage of noise suppression in the prediction of exons. By comparison with the main existing methods, the prediction results of this method on HMR195data set show that:1) This method reveals at least improvement of20.8%in terms of the signal-to-noise performance metric;2) This method exhibits at least improvement of5.3%on the exons of a length not greater than50base pairs with respect to detection...
Keywords/Search Tags:Bioinformatics, Exon Prediction, Digital Signal Processing, SingularityDetection, Empirical Mode Decomposition
PDF Full Text Request
Related items