Font Size: a A A

Mathematical Description Of The Protein Sequences And Its Application

Posted on:2011-05-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y P ZhangFull Text:PDF
GTID:2190330332457528Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
There are 20 amino acids that make up the standard chemical alphabet used to build proteins. Thus, a protein sequence is a string over the alphabetΩwith the 20 amino acids, thereΩ={A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y }. Inmolecular biology, the 3D structure of a protein is determined by the invariant sequence of amino acids that makes up the protein. So, the analysis of the protein sequences is an important and interesting work in bioinformatics.In recent years, some researchers generalized the graphical representation of DNA for the analysis of the protein sequences. The problem of graphical representations of protein stems from the combinatorial complexity associated with 20 factorial ways in which 20 amino acids (AAs) can be ordered. There are two ways in the graphical representations of protein: dividing 20 amino acids into four classes or five classes, the protein sequence becomes a sequence over a 4 or 5alphabet in order to reduce the possible number of permutations; In addition, ignoring the similarities/dissimilarities of the 20 amino acids, the amino acids are ordered according to their alphabetical order.We propose the concept of the cyclic order of amino acids, that is, the amino acids are ordered into a loop. In the thesis, we give several different amino acid cycle of order based on the physicochemical properties classes of amino acids, the PAM250 substitution matrix and the 6bit binary reflected Gray code, respectively. Using the Chaos Game Representation (CGR) method and several different cyclic orders of amino acids, we get some graphical representation of the proteins. Then, several numerical characterizations, matrix invariants, first order central momentlike, graphical alignment, respectively, are suggested to describe graphical representation of the protein. And the examination of the similarity among sequences of the NADH dehydrogenase subunit 5 (ND5) proteins of nine species shows the utility of our approach. Then, the similarities of the 34 spike proteins in the coronavirus and the similarities of the RNA polymerases PB1 in the 45 kinds of influenza virus are analyzed by our methods, and their evolutionary relationships and classification are deduced based on the similarities of spike proteins and PB1 proteins.In addition, we get a method to determine matching fragments of two sequences based on the graphical alignment.
Keywords/Search Tags:Graphical representation, numerical characterization, similarities analysis, protein sequences, physicochemical properties of amino acids, PAM250 matrix, Gray code, evolutionary tree, coronavirus, H1N1, ND5
PDF Full Text Request
Related items