Font Size: a A A

The Research On Expression Of Protein Sequence And Its Application For Protein Sub-cellular Localization Prediction

Posted on:2013-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:X M HeFull Text:PDF
GTID:2230330395485089Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The molecular drug design for the benefit of mankind is a powerful tool with theuse of protein structure and function. Therefore, to explain the interrelationships ofprotein sequence, structure and function has become an important task of thepost-genomic era. Recently, the sequence data are being taken more attention, whichexists extensively in various fields of our social life. To excavate the relevantinformation from the mass complex sequence has important theoretical significanceand practical value, and it has been a new challenging research direction of datamining area. Some detailed study shows that the protein structural classes andsub-cellular location are related to the composition of amino acids, especially closelyrelated to the20common amino acids. However, in the prediction issue of proteinstructural class or sub-cellular location, the characterization of the amino acidsequence has a direct impact to it on forecast quality.Focuses on the topic of the characteristics of the protein sequences, the papermakes intensive studies on the graphical representation of the protein sequence andsub-cellular localization, based on the background knowledge of the protein. Thefollowings are main research achievements:The paper proposes a novel3D graphical representation of protein sequences.Here, we consider a new mapping coordinates for each amino acid, and then constructthe Cartesian coordinates for a protein sequences by the proposed mappingcoordinates. Here, a BLOSUM62scoring matrix is used for extracting the score acidas one of the mapping coordinate value for each amino acid, respectively. the matrixcan reflect each of the20amino acids’ s pair statistical score information with others,and support the similarity analysis for those distantly related protein sequences. Then,based on a large number of sequences, the paper makes a statistics of the respectivefrequency for each amino acid, as the second mapping coordinate for each amino acid.Finally, the paper considers the serial number of each amino acid in a sequence as thethird mapping coordinate. We constrct the two short segments of protein of yeastSaccharomyces cerevisiae to the coordinate, we can find the four different sites fromit, which shows that the proposed method has good feasibility.The paper presents an improved two-dimensional Euclidean distance formula. Tostudy further the sequence similarity/dissimilarity, we consider9species of the ND5 proteins. the experimental results show that the proposed computing method canreflect the biological easily. Besides, it can adapt the similarity analysis between twoprotein sequences very well.The paper applies the above two research results in the protein sub-cellularlocalization prediction. Then, we use the jackknife test to evaluate the predictionperformance on two apoptosis protein data sets, the prediction accuracy are90.82%and85.17%. The proposed method achieves better predictive performance without anymachine-learning classifiers. That is the experimental results show the effectiveness of theproposed methods.
Keywords/Search Tags:the molecular drug design, numerical characterization of protein sequences, graphical representation of protein sequences, prediction of the proteinsub-cellular localization, similarity analysis, jackknife test
PDF Full Text Request
Related items