Font Size: a A A

Study On Fractal Characteristics Of DNA Sequence

Posted on:2006-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2168360155972738Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Fractal is a universal phenomenon of nature. It refers to the correlation between system states in a system's evolution, the correlation between the whole and the part, as well as the correlation between a part and the part in a system. Researches on fractal characteristic of DNA sequence may reflect the trails in DNA sequence during the biological evolution. The analysis of DNA sequence not only helps us to find out rules in biological evolution but also understand the character of genetic language. On the basis of biological research achievement and nonlinear system theory, this paper studied DNA sequence with a system and integrated viewpoint. The faractal characteristic of DNA sequence was illustrated through computating the Hurst exponent of the digitized DNA sequence. The two controversial problems in the world were solved in this paper: 1. There are many methods for estimating the Hurst exponent, most of which are derived from the stiationary stochastic process with finite variance. But digitized DNA sequences are unpredictable, so that it cannot be guaranteed that they may satisfy the forenamed conditions. Therefore, we try to find out whether there is a robust estimator for other types of random sequence (such as the sequence which has short range dependence, infinite variance or non-stationary). 2. Whether does long range dependence (LRD) exist in all types of DNA sequence, especially, in DNA sequence's coding regions? In this paper, first, 12 kinds of common estimator were used to estimate the Hurst exponent of 5 different sequences (FGN, Gaussian FARIMA (p, d, q), FARIMA with other finite variance innovations, FARIMA with infinite variance innovations and non stationary process) and determine their robustness. It is found that the Variance of Residuals Method (median) is the only method which seems to be the most robust and accurate one. Secondly, the digitized DNA sequence was calculated by the Variance of Residuals Method (median). Particularly, 3 types of DNA sequence (gene sequence, coding sequence, noncoding sequence) come from 9 species, such as Mammals, invertebrates, plant, fungi, bacteria, virtue, protozoa, et al., and 8 types of mapping rule for digitizing DNA sequence were used. At the end, the computed result demonstrates that the LRD exists in three kinds of DNA sequence, and the result builds the foundation for the further exploring of the mystery of DNA sequence. As a large number of data need to be processed, we use SPLUS6.2 which has the good statistical property and supports object-oriented programming. All experiments are based on SPLUS6.2.
Keywords/Search Tags:long range dependence, DNA, Hurst exponent, fractal
PDF Full Text Request
Related items