Font Size: a A A

Geometrical Characterization Of Biological Sequences And Applications

Posted on:2009-05-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y GuoFull Text:PDF
GTID:1100360272970196Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the active development and completion of the genome of some model organism, especially the completion of Human Genome Project, the biological data presents unprecedented leap. With the increasing of these biological data, Bioinformatics, as a new interdiscipline, has generated and obtained the rapid development. Now Bioinformatics is becoming one of the core domains of nature sciences in this century, which uses mathematics, statistics, computer science as the study tools, and takes nucleic acid, protein, and some biological macromolecule as the study object. The subject focuses on how to collect, store, transfer, search, analyze, and then to explore the life origin, biological evolution, life inbeing and some serious theory problems.The research area of Bioinformatics is very wide, which includes sequence comparison, phylogenetic analysis, gene prediction, protein structure prediction, drug design, biochemistry simulation, the whole genome analysis, RNA structure prediction, assembly sequence, public database, the database format, and so on. The dissertation mainly studied the sequence comparison and phylogenetic analysis. The main results obtained in this dissertation can be summarized as follows:In Chapter 2, based on the idea of CGR, a 2-D graphical representation method of RNA secondary structure sequences and protein sequences is given, which avoids some limitation occurred in some former graphical representation model of biological sequence. These methods are used to analyze the similarity and dissimilarity of different species, and the phylogenetic tree of protein sequences is constructed.In Chapter 3, we have used the curvatures of smoothed curves by theβ-spline function to analyze the similarity of the DNA sequences and proposed the curvatures as a new invariant. The proposed method is tested on two real data sets: the coding sequences ofβ-globin gene and all of their exons. Meanwhile, we find that the information ofβ-globin gene of 11 species contained in the second exon is richer than the other two exons. Our method is simple and has high veracity.In Chapter 4, to avoid the unprecise approximate results, we have proposed the difference form of torsion. Then the torsion is regarded as the new descriptor to numerically characterize TOPS string. Our analysis on 34 TOPS strings has indicated that the introduction of TOPS strings into evolution analysis is successful. This method is also simple and has high veracity.In Chapter 5, instead of merely considering one curve characterization, we have computed curvature and torsion of curves as one descriptor to numerically characterize DNA sequences. The new method was tested on three data sets: the coding sequences ofβ-globin gene and all of their exons, Using the method we have also analyzed coro-navirus genomes and constructed their phylogenetic tree. In order to comparize, we employ the matrix invariant method to perform the similarity analysis on the same data. It's obvious that our method performs faster and better results.
Keywords/Search Tags:Graphical representation, DNA sequence, RNA secondary structure, Protein sequence, Phylogenetic tree, Curvature, Torsion, Difference
PDF Full Text Request
Related items