Font Size: a A A

Graphical Representations Of Biological Sequences And Their Applications

Posted on:2007-03-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:F L BaiFull Text:PDF
GTID:1100360182982400Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
This dissertation mainly studied some new graphical representations of biological sequences based on biological background and structures of biological sequences, provided new method for classifing, analyzing, comparing and storing of biological sequences, etc. and discussed concrete applications of these representation methods to analysis of similarity constructions of evolutionary tree problems of biological sequences, etc. The main results, obtained in this dissertation, may be summarized as follows:1. The DNA sequences and amino acid sequences have been translated into 2-D graphical representations. The 2-D graphical representations of DNA sequences and amino acid sequences are similar to the molecular structure graphs. Therefore we make use of chemistry metrology method to compute invariants of graphs—Balaban index, distribution index and the average bandwidths of corresponding distance matrix and consider them as a set of invariants for the DNA primary sequences and amino acid sequences. Similarity and dissimilarity analysis based on invariants of DNA primary sequences and amino acid sequences are given for the first exon genes of β-globin of nine species: human, goat, gallus, opossum, lemur, mouse, rabbit , rat, gorilla and six yar029w etc.2. We describe the DNA primary sequence as a random walk. With the description, two random sequences {Y_m} and {X_n} correspond to a DNA sequence, and graphical representations of DNA sequences are given as well. We further prove that two random sequences {Y_n} and {X_n} have the quality of Markov chains. Based on the graphical representations of DNA, transition probability distributions, correlations and numerical characterizations of random sequences are given. We introduce some new invariants for the DNA primary sequences also. Then using these invariants, we compared primary sequences for exon—1 of β—globin genes that belong to nine species for analyzing the similarity and dissimilarity.3. Construction of phylogenetic trees is key means in molecular evolutionary studies. We propose a new method for phylogenetic analysis, based on graphic representations of DNA sequences. Utilizing the invariants of graphs, we give the distance measure of DNA sequences and define the distance between species. We have chosen mitochondrial DNA sequences of 30 species and constructed their phylogenetic trees successfully. The method does not require sequence alignment and is totally automatic.4. The sequences of RNA secondary structure on the complex plane are described as 2-D random walks. A random walk curve and a random complex numerical sequence are obtained. We define a function between the nucleotide sets and the point sets in the 6-D space. Therefore, we get the 6-dimensional representation of RNA secondary structure in the 6-D space by this function. Furthermore, we transform the representations into matrices and characteristic vectors.We analyze the similarity of the RNA secondary structures of AIMV-3 and the other 8 kinds of viruses by using the numerical representation of random complex numerical sequence: module, phase, and the matrix invariant—the leading eigenvalues of the matrix and the distances between the characteristic vectors, which describe the sequences.5. The RNA secondary structure sequences are translated into "Spectrum—like" and "Zigzag Curve" representations, from which we get three recursive formula, and obtain 1-D, 2-D and 3-D graphical representations of RNA secondary structure sequences by the three recursive formula. Furthermore using the 1-D graphical representation, we propose frequency—domain analysis method of RNA secondary structure sequences.6. We give a new 2-D graphical representation of protein sequences based on nucleotide triplet codons in the half complex plane, which has no degeneracy. Meanwhile using main characterization of complex vector: module and phase, we give a kind of numerical description of protein sequences. Also in the 3-D space, we assign the 20 amino acids to 20 vertices of the dodecahedron. By the symmetry of the dodecahedron we obtain 3-D representation of 20 amino acids, and 3-D graphical representation and the corresponding numerical sequence of protein sequences. And similarity and dissimilarity analysis based on the invariants of graphs and characteristics of numerical sequences are given for nine RNA secondary structures of RNA-3 of virus. We construct sequence phylogenetic tree of a group of cytochromes C protein.
Keywords/Search Tags:DNA sequences, protein, RNA secondary structure, numerical characterization, graphical representation, distance matrix, leading eigenvalue, sequence invariant, phylogenetic tree
PDF Full Text Request
Related items