Font Size: a A A

Analysis Of Protein Sequences Based On Granularity

Posted on:2012-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:K ZhangFull Text:PDF
GTID:2120330338954722Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
All the information of protein structure is hidden in the linear structure of the protein, precisely, is hidden in the amino acid sequences. So studying the protein sequence is a key problem in the area of bio-informatics. There are 20 kinds of amino acid types that have been found, it is difficult to study the protein folding directly, so we attempt to study the protein structure by classification. We divided the amino acids into different classes based on the physical and chemical characteristics. If the basis of the classification is different, the results of the classification are different. It is a lumping problem of the states actually. The divide of granularity is dynamic, while the classification of amino acids is static.Under the view of granularity, we analyzed the preference of the connection of amino acids and protein classifications, which based on Markov model and the classification of amino acids we had.Firstly, a concept of lumping map was presented based on Markov model, and a computing method of the transition probability in lumped process was given. On the base of classification of amino acids, the connection bias of amino acids was studied by using the probability transition matrix of Markov model. Results showed that the connection of amino acids had a particular preference that was related to the classification of amino acids, and further verified the rationality of the classification of amino acids. At the same time, the preference would give some help for the prediction of amino acids sequence.Secondly, the conclusion of the connection bias of amino acids of xylanase family shows: it is better to study the amino acids of xylanase family by dividing them into four groups. In this paper, a new method of aligning two protein sequences, the matrix graphical representation of protein (MGR), was given based on the detailed HP model. By computing the Euclidean distance d between that two sequences, the similarity of protein sequences in two xylanase families was analyzed. The results indicated that: the protein sequences belonging to the same family of xylanase have great similarity, and the degree of similarity is related to molecular size, structure and molecular evolution.Finaly, we analyzed the protein sequences of F/10 and G/11 xylanase families with the method of clustering analysis, based on normalized metric, and we determined the optimal clustering. It is the best to divide the protein sequences into three categories for F/10 xylanase family, and into five categories for G/11 xylanase family. It provides quantitative basis for further classified of the protein of the same family.The creative features of this thesis include:(1) A concept of lumping map was presented based on Markov model and a computing method of the transition probability in lumped process was given. We analyzed the preference of the connection of amino acids which would give some help for the prediction of amino acids sequence.(2) A new method of aligning two protein sequences, the matrix graphical representation of protein, was given. And a method of computing the Euclidean distance d between that two sequences was given. At the same time, we analyzed the protein sequences of F/10 and G/11 xylanase families with the method of clustering analysis.
Keywords/Search Tags:Markov model, classification of amino acids, lumping map, bias, xylanase, cluster analysis
PDF Full Text Request
Related items