Font Size: a A A

Space Curve Construction And Similarity Analysis For Protein Sequences

Posted on:2015-03-22Degree:MasterType:Thesis
Country:ChinaCandidate:C C GengFull Text:PDF
GTID:2250330428963242Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
With the rapid development of sequencing techniques, the number of biological sequencesis exponentially increasing in various kinds of biological databases. When we are in the face ofthe massive biological data, information extraction, comparative analysis and relational miningfrom DNA, RNA and protein sequences are one of the very important tasks in molecular biologyand bioinformatics.It has been more difficult to directly extract biological information and find their hiddenrules from the chaotic sequences themself. Visualization of biological data is a new way tohandle the vast amounts of biological data. How to effectively use a graphical representation ofbiological sequences for sequence classification and the relationship analysis of biologicalevolution is an important research topic in the bioinformatics. In this article, we focus onstudying graphical representation, similarity analysis of protein sequences and the constructingalgorithm of phylogenetic tree. The main contributions of this article are as follows:(1) We propose a novel graphical representation for protein sequences. Firstly, we constructa3D space discrete point set for amino acids of protein sequences based on the threephysicochemical properties of amino acids. Then we use a cubic Bézier spline curve tointerpolate these discrete points to represent protein sequences. The new graphical curve issmooth, continuous and parametric. It makes the graphical representation more visualization.(2) We introduce a new method for similarity analysis based on the curvature property ofprotein curves. Firstly, we compute the curvature of all the interpolating points and use thecurvature to define the frequency vector. Then we do the similarity analysis by computing theL1distance of vectors. Finally, as an example, we take ND5proteins from nine different speciesto do numerical description and similarity analysis. To show the advantage of our method, wethen calculate the correlation coefficients and do the significance analysis to compare ClustalWapproach with our method and other current methods. The experimental results show that ourproposed method is effective.(3) On the basis of graphical representation for protein sequences, we propose a novelalgorithm for constructing phylogenetic tree. Firstly, we get frequency vector matrix based on the above methods for graphical representation and similarity analysis. Then, we introduce a noveladaptive clustering algorithm based on the K-means algorithm and construct the phylogenetictree by loop iteration. Finally, we take globin from15different species as examples toexperiment and compare our phylogenetic tree with that constructed by the Clustalx andDNAstar software. The experimental results show that our proposed method is reasonable andfeasible.
Keywords/Search Tags:graphical representation for protein sequences, similarity analysis, phylogenetic tree, clustering algorithm, Bézier spline curve
PDF Full Text Request
Related items