Font Size: a A A

Graphical Representations And Invariant Methods Of The Similarity Analysis For Biological Sequences

Posted on:2007-01-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H YaoFull Text:PDF
GTID:1100360182482448Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
Based on the completeness of genome sequencing projects of the Human, Arabidopsis thaliana, Rice and the research on protein sequences, more and more molecular sequences data have been generated. The need to analyze, process and store these data is integrating the mathematics and computer sciences into the molecular biology. This need has created a new interdisciplinary field composed of information science, computer science, life sciences, mathematics, statistics, physics, chemistry and so forth, which is called computational molecular biology. Computational molecular biology is mainly deal with complex computations involving gene sequences, protein sequences by mathematics and computer science. This dissertation mainly studied the similarity analysis of biological sequences and the construction of phylogenetic trees. The main results obtained in this dissertation can be summarized as follows:1. In Chapter 2, we propose the concepts of cell and system of graphical representations of DNA sequences, and introduce a class of 2-D graphical representations based on different designed cells;we give the graphic representation of the distribution curves based on the classifications of the nucleic acid bases;we select several new kinds of invariants as the DNA sequences descriptors based on the characterizations of the different graphic representations;similarity and dissimilarity analysis based on invariants of DNA primary sequences are given for the first exon genes of (5 -globin of eleven species and the computation complex in our methods is only O(N), sharply decreasing the computation complex in the methods of the matrices invariants(at least O(N~2)).2. In Chapter 3, based on the concepts of cell and system of graphical representations, a class of 2D graphical representations of RNA secondary structures are given in terms of classifications of bases;as an application, we make quantitative comparisons for a set of RNA secondary structures at the 3' -terminus of different viruses based on the graphical representations.3. In Chapter 4, we give several graphical and matrix representations of protein sequences based on the classifications according to the physical and chemical properties of the amino acid;Using the algebraic invariants, such as the geometrical center, the leading eigenvalue of the matrix, the band average width and so on, we make the analysis of thesimilarity of the protein sequences;the used invariants in this chapter have much biological meaning.4. In Chapter 5, we introduce the methods and the main steps of the construction of phylogenetic tree. We propose a new sequence distance measure based on the fully overlapping triplets of nucleotide bases of DNA primary sequences to study a phylogeny of the 34 eutherian orders by complete unaligned mitochondrial.
Keywords/Search Tags:graphical representation, invariant method, similarity analysis, DNA sequences, RNA secondary structures, protein sequences, phylogenetic tree
PDF Full Text Request
Related items