Font Size: a A A

Protein Evolution Analysis Based On Graphical Representation And Amino Acid Sequence

Posted on:2017-05-22Degree:MasterType:Thesis
Country:ChinaCandidate:W H LiFull Text:PDF
GTID:2180330488953530Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
The sequences of nucleic acid and protein are relatively stable in the process of evolution, and contain a lot of information. This shows that the research on the organism is consistent with the research on the biological molecules, such as the nucleic acids and proteins. It is helpful to find the relationship and genetic information by analyzing biological sequences. It provides the possibility to cure or prevent some diseases. Protein determines the structure and function of the organism.Along with a large number of protein sequence data, as well as the continuous improvement of all kinds of computer software, the research about the protein sequences has entered the fast lane. In this paper, the evolution of protein sequences is analyzed.The main work and achievements in this thesis are shown as followings:(1) Coordinate representation. Classification and based on the original data are the two forms to comprehensively reflect the properties of amino acids,which several important physical and chemical properties are considered. On the abscissa coordinate, we use the qualitative description style,select the polar and hydrophilic by giving the weight respectively,as the basis for classification. The vertical coordinate uses the quantitative description on basis of PI value.The graphical representation combines the qualitative and quantitative methods to define amino acids coordinates, which not only takes the commonness of similar amino acids into account, but also takes the individuality of each amino acid into account.(2) The position information of amino acid is fully considered when the protein is expressed by graph. After getting the horizontal and vertical coordinates of the sequences, the feature vectors are extracted by using the adjacent distance vector. At the same time, the feature vectors are equal treatment. In order to consider the interactions among adjacent amino acids.We obtain the 125 relative frequency of triplets on the basis of the preliminary classification,as the third feature vector.(3)In order to make this method suitable for short data sets, avoiding the number of some triplets or diads to be zero, we use the 25 diads relative probability as the feature vector and achieved very good classification results.In this paper, we comprehensively consider some important physicochemical properties,position information, the interactions of adjacent amino acids. In order to describe the features of graph,we introduce the concept of adjacent distance vector and equal treatment. Then the feature vectors can accurately represent the protein sequences information. Evolution analysis is carried out by using the corresponding distance formula, In this paper, we use three data sets for validation.It found that this method can be used to analysis the protein sequences accurately,which is very intuitive, fast and can get good classification results.
Keywords/Search Tags:graphical represenation, feature vector, equal treatment, adjacent distance vector, relative frequency of triplets
PDF Full Text Request
Related items