Font Size: a A A

The Research On Graphical Representation Of Protein Sequence And Application

Posted on:2012-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:B Y LiaoFull Text:PDF
GTID:2230330395985746Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Sequence analysis is the basic of biological research, so most of researches aboutprotein start from the protein sequence. Analysis of sequence similarity is animportant aspect. Based on the sequence graphical expression to study the similarityis an effective method because it provides a visual graphic, which is easy and intuitiveto analyze sequences’ similar and dissimilar.This paper works mainly on search of better graphical representation method ofprotein sequences and theirs application in protein subcellular localization and proteinstructure prediction.Firstly, a new graphical representation of protein was proposed based on geneticcode distribution. This method was applied in the phylogenetic tree constructing, theresult of which is consistent with the result got from a multiple sequence alignmentprogram, Cluster W. Then a new distance computing method was proposed based onthe graphical representation,applied in the protein sub cellular localization predictionusing the similarity comparisons. Results show that this new graphical representationand new distance are more effective than some methods, although it is not as good assome machine learning classifiers. It is easier, because it only need to compute thedistance, without machine learning or training.Secondly, a new2D graphical representation of protein sequences was proposed.In this representation,2dimensions data was used to contain3dimensionsinformation. The first dimension is the amino acids’ evolution index. The seconddimension contains the class information of amino acid based on physiochemicalcharacteristics and the order of the amino acids appearing in the protein sequences.Then the discrete Fourier transform (DFT) was used to transform the sequence signalto the frequency domain that can get new numerical sequences with same length forevery protein sequences. After that, computing the distance of sequences based on thenew numerical sequences is to analyze the similarity of protein sequences. And last,this method was used in protein sequences structure class prediction. Those twoexperiments indicate that this method is effective and significant.
Keywords/Search Tags:Similarity analysis, Graphical representation of sequence, Proteinsubcellular localization, Protein structure prediction
PDF Full Text Request
Related items