Font Size: a A A

Research On Several Problems Of Computational Molecular Biology

Posted on:2005-10-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LiaoFull Text:PDF
GTID:1100360122496904Subject:Computational Mathematics
Abstract/Summary:PDF Full Text Request
The primary structures of DNA(deoxyribonucleic acid), RNA (ribonucleic acid), and protein are all macromolecules which are unbranched polymers built up from smaller units. In the case of DNA, these units are the four nucleotide residues A (adenine), C (Cytosine), G (guanine) and T (thymine), while for RNA, the units are the four nucleotide residues A, C, G and U (uracil). For protein, the units are the twenty amino acid residues A(alanine), C(cysteine), D(aspartic acid),E(glutamic acid), F(phenylalanine), G(glycine), H(histidine), I(isoleucine), K(lysine), L(leucine), M(methionine), N(asparagine), P(proline), Q(glutamine), R(arginine), S(serine), T(threonine), V(valine), W(tryptophan) and Y(tyrosine). Thus, a DNA (RNA) sequence can be identified with a word over the alphabet N = {A, C, G, T(U)}, and a protein sequence can be taken as a string over twenty letters. While the secondary structure of RNA(or DNA) is a set of free bases and pairs which forms bonds between A-U(or A-T) and C-G. In some considerable extent, the secondary structures of RNA(or DNA) can be reduced into linear sequences. So, the tools and methods in Combinatorics and Statistics will play important roles in studying linear sequences of biomolecular units. Also, we can present the geometric representation of biological sequences and structures. So the geometric topology and group theory are important also.The main contents are listed as follows:It is difficult to predict the RNA secondary structures of all sequences by prediction algorithms. In Chapter 2, we consider the enumeration problem of RNA secondary structures and substructures as a generation of the results in papers[5-7,9].Free energy is usually looked upon as the standard measure of the optimal structures.In Chapter 3, we present algorithms to compute the minimum free energy of the RNA secondary structure with pseudo-knots or not.In Chapter 4, we present two algorithms for searching the local alignment and global alignment of mRNA sequences and protein sequences and solve the LCS problems between biological sequence and biological structure.In Chapter 5, according to the classifications of chemical structure of four nucleotide residues A, C, G and T, we introduce a characteristic representation, two 3D graphical representations, a 2D graphical representation and a 4D representation of DNA sequences. We construct distance matrix and L/L matrix associated with the coordinates of the corresponding plot. Furthermore, the normalized leading eigenvalues of L/L matrices and the averagebandwidths of distance are computed and considered as a kind of invariants for the DNA primary sequences. Similarity and dissimilarity analysis based on invariants of DNA primary sequences are given for eight exon-1 genes of 3-globin about eight species: human, goat, gallus, opossum, lemur, mouse, rabbit , rat, bovine, gorilla and chimpanzee. We present a characteristic representation of amino acid based on the classification of chemical properties of 20 amino acids, propose the definitions of -independent component and characteristic information entropy. Furthermore, we make comparison for several neurocan gene by constructing vectors consisting of the characteristic information entropy and independent components.In Chapter 6, according to the classifications of chemical structure of freebases and base pairs of RNA secondary structures, we present a 3D graphical representation, a 4D representation and a 7D representation and construct distance matrix and L/L matrix. Similarity and dissimilarity analysis based on normalized leading eigenvalues of L/L matrices and structure invariants are given for nine RNA secondary structures of RNA-3 of virus.
Keywords/Search Tags:DNA sequences, characteristic sequences, protein, RNA secondary structure, distance matrix, L/L matrix, normalized leading eigvalue, sequence invariant, structure invariant, minimum free energy.
PDF Full Text Request
Related items