Font Size: a A A

The Graphical Representation Of Biological Sequences Based On Some Iterated Functions And Its Application

Posted on:2015-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiuFull Text:PDF
GTID:2250330428964959Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
In recent years, with the increasing of the number of biological sequences indatabase, a simple and convenient method for the analysis of biological sequences is aparticularly important task in bioinformatics. Especially, the graphical representationof biological sequences has received much attention because of the visualization andefficient numerical characterization. In this thesis, the effect of two parameters, of an iterated function system(IFS) was researched on the graphical representation ofbiological sequences.Based on two equal parameters==12and two different parameters=34,=12, several novel graphical representations of protein sequences weresuggested in the thesis. And some new mathematical descriptions were introduced tocompare the similarities of protein sequences. Based on these methods, thesimilarities were compared among sequences of the ND6proteins of eight differentspecies,60strains of different viral subtypes of HA protein sequences, and sequencesbelonging to nine ND5proteins. The cluster analysis results regarding the similaritiesare consistent with the known facts of evolution. By correlation and significanceanalysis, the Clustal W results were compared with our similarity/dissimilarity resultsand other graphical representation results to demonstrate the effectiveness of ourapproaches.Comparing with a protein sequence, a DNA sequence comprises only four bases,which is convenient for analyzing the influence of, on the graphicalrepresentation. For the four bases,,, of DNA, we selected three differentmappings to obtain different graphical representations of DNA sequence. To discussthe effect of two parameters,, the different values of,1/4,1/2,3/4,1and5/4were taken, respectively. Thus, there were25kinds of combinations for each mappingof four bases,,,, which mean that25kinds of IFS were used. Taking the firstexon genes of-globin of human as an example, we found that the leadingeigenvalue of L/L matrix of graphical representation of DNA is a constant when thevalue of is fixed. Finally, we took the leading eigenvalues of L/L matrix of12 graphical representation of DNA as mathematical descriptors to describe DNAsequences. As an example, we compared the similarities/dissimilarities of the firstexon genes of-globin of9species to illustrate the method.
Keywords/Search Tags:iterated function system, graphical representation, protein, DNA, similarity, L/L matrix
PDF Full Text Request
Related items