Font Size: a A A

Classification Of Gene And Protein Sequences

Posted on:2018-01-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:K TianFull Text:PDF
GTID:1360330566487998Subject:Statistics
Abstract/Summary:PDF Full Text Request
Sequence analysis has become one of the most active and important areas of bioinformatics as the tools for getting biological sequences increase.Analyzing the morphological structures from a large number of sequences,obtaining the information such as homology,evolutionary relationship and evolutionary history of different species,and inferring their development ancestors have become important issues.Deducing and predicting the properties of new sequences using the information of known sequences provide an important reference for further study.However,due to the huge size and high complexity of the data,it will lead to a lot of time to compute the solving process if there is no effective algorithm,even becoming an impossible difficult problem to work out.We propose two new methods in this article to solve the above problem.The first part of this article introduces the natural vector method,which is a wholegenome,non-aligned and non-parametric rapid representation for sequences.It is a very powerful new tool for analyzing evolutionary relationships.Natural vector reflects the distribution of nucleotides or amino acids in gene sequence or protein sequence,which contains the total numbers,the average positions and the high order central moments of nucleotides or amino acids.There is a one-to-one correspondence between any sequence and its natural vector.Comparing with the existing methods,natural vector method has low computational complexity and short computation time,and does not depend on any evolutionary model.This method has been applied to build a variety of genomic databases for predicting and classifying new sequences quickly and precisely,which provides more accurate description of the evolution relationships between species.The second part introduces the Yau-Hausdorff method,which takes all possible translations and rotations into consideration to achieve the best match of graphical curves of two DNA or protein sequences.The complexity of this method is lower than that of any other two dimensional minimum Hausdorff algorithm.The Yau-Hausdorff method can be used for measuring the similarity of sequences based on two important tools: the Yau-Hausdorff distance and graphical representation of sequences.The graphical representations of sequences conserve all sequence information and the YauHausdorff distance is mathematically proved as a true metric.Therefore,the proposed distance can preciously measure the difference of sequences.The phylogenetic analyses of gene and protein sequences by the Yau-Hausdorff distance show the accuracy and stability of our approach in similarity comparison of sequences.
Keywords/Search Tags:Sequence Analysis, Natural Vector, Graphical Representation, Hausdorff Distance, Phylogenetic Tree
PDF Full Text Request
Related items