Font Size: a A A

Analysis Of Protein Sequence Similarity

Posted on:2012-07-12Degree:MasterType:Thesis
Country:ChinaCandidate:X T YuFull Text:PDF
GTID:2210330338963922Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Protein researched in this study is the most abundant and important biological macromolecules. In recent years, with the completion of the genome draft, protein research has entered a new era. Amino acids are the basic units of protein structure and there are 20 kinds of amino acids appearing in proteins, which connected with each other by peptide bonds to form a peptide chain. So in biology protein is often interpreted as a material formed with polypeptide. Considering amino acids as the basic unit provides conditions for us to analyze protein sequences.On the basis of the existing methods of protein analysis, the paper will look for a different way from the past. Through the similarity analysis for different types of protein, the similarity relationships among proteins are further displayed and thus confirm that our new method is effective and feasible. The main results in this paper are as follows: First consider hydropathy properties of amino acid and propose a new statistic K-blocks (K=1,2,3) and statistical method to complete the conversion from sequence to numerical value then construct a new 56-dimensional vector. Second define an easily computing sequence distance which is used for the similarity analysis of proteins and simplify the complexity of the calculation greatly and thus provide a quickly analysis method for unknown protein. Third select representative proteins (9 kinds of ND5 proteins, (3-globin proteins of 13 species,43 kinds of cytochrome C and 40 kinds of virus proteins) and analyze them with the new method, compare the analysis results with the current results and the cluster trees constructed by the software Clustal-X and MEGA4.1 and discuss the feasibility of the method. Fourth according to the analysis results of proteins in paper, summarize the effective scope and limitations of the new method.Through this study we can found that because the hydropathy properties of the amino acid in membrane protein are well protected during evolution, the new method based on this nature performs better than the method that analyze amino acid sequences alone when analyze small quantities proteins and better than the existing paper results and more closer to the analysis results achieved by MEGA4.1 and Clustal-X software. Meanwhile the new approach is more convenient and effective for the lower calculation complexity. For large quantities of proteins this method needs to be further improved.
Keywords/Search Tags:protein sequence, similarity, hydropathy, clustering tree, 56-dimensional vector
PDF Full Text Request
Related items