Font Size: a A A

The Internal Nature Of Biological Sequences

Posted on:2009-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:L T ZhangFull Text:PDF
GTID:2120360272956854Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is the cross-product of a multidisciplinary, with the computer as the tool of biological information storage, retrieval and analysis. This paper studies the visualized biological sequences, alignment and protein network and so on.Being enlightened from Chaos walk, and Combining gene sequence and Chaos Game Representation, we do further study on protein sequence multifractal nature and its Rényi entropy rate. Fractal theory from mathematics and entropy rate from information theory was introduced to the analysis of protein sequences: The 20 kinds of amino acids are viewed as symbols, then we can use fractal dimension to express the nature of protein sequence with extending from two-dimensional visualization of CGR to n-dimensional space; in addition, the protein sequence could also be seen as a group signal which can be depicted by information entropy. It establishes a correspondence between multifractal dimensions of chaos game representation of proteins and Rényi entropy rate of symbolic sequences via probabilistic measureμ.In accordance with different nature, amino acids can be classified, and the basis and focus for its classification are different, so there are several different classifications. In this paper, there are two kinds of classification: one is divided into 4 classes according to HP model; the other one is devided into 7 classes in accordance with the physico-chemical properties. Then with a new alignment algorithm, we take advantage of PCGR distance between the two classified protein sequences and a fixed threshold k to compare these two sequences, find the length of similar fragments and locations. This algorithm could reduce computational complexity and make sure alignment efficiency. In addition, it may estimate the similarity of sequences with their alignment results.With hierarchical clustering method, we use the quadratic divergence distance and the FCGR distance to construct phylogenetic trees of 26 species, respectively. It can explore the evolutionary relationships between species from different phylogenetic trees that have different definition of distant but the same data source. Each node stands for a DNA fragment; each fragment connection contains information between the two nodes. The topology structure of organism can be study from complex network, which is construction of the network topology mainly through measuring clustering coefficient. The result shows that the degree distribution of complex network takes on Power-Law characteristic, but its exponent is too small, which illuminates that DNA sequences have stable structure, with a great deal of randomicity and instability in the genetic process.
Keywords/Search Tags:Chaos game representation of protein (PCGR), Multifractal dimensions, Rényi entropy rate, sequence alignment, the quadratic divergence distance, Phylogenetic trees, Hierarchical Clustering, Complex network
PDF Full Text Request
Related items