Font Size: a A A

The Study Of Proteins’ Graphical Representation And Its Applications Based On Graph Energy

Posted on:2016-07-26Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WuFull Text:PDF
GTID:2180330461992563Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
The number of biological sequences in the public databases increases fast owing to the rapidly development of sequencing technologies. At the same time, bioinformatics has developed rapidly in recent years with the development and popularity of the Internet. As we all know, DNA is the carrier of genetic information, and protein, which is the most important material basis of life activities, is the basic component of organism. All the life activities of the organisms are reflected by the structures and functions of proteins. It was found out that the function of a protein is determined by its internal structure, and advanced structure of a protein is determined by its primary sequence. Therefore, the analysis of protein sequences is one of the most important foundational questions in bioinformatics. We calculated the graph energy and Laplacian energy of 20 amino acids using mathematical tools and methods based on the codons coding the amino acids in this paper, and applied them to putting forward a novel 2-D graphical representation of proteins. We constructed the model with the numerical characterization of protein sequences, and then applied to the similarity analysis and subcellular localization prediction of protein sequences. The main work and the creative achievements in this thesis are shown as followings:(1) We drew the graphs corresponding to 20 amino acids based on the codons coding the amino acids using a novel 2-D graphical representation of DNA sequences, which could provide an invariant set of coefficients completely characterizing an amino acid. We innovatively proposed the graph energy and Laplaican energy of 20 amino acids applying the knowledge of graph theory. We introduced the graph energy and Laplacian energy of graph in the study of protein sequences, it is without precedent in history, and put forward a novel 2-D graphical representation of proteins. The novel graphical representation with no loss of sequence information, has no circuit or degeneracy, uniquely represents proteins and allows one to easily and quickly visually observe. (2) On the basis of the graph energy of amino acids we’ve raised, we proposed the concept of graph energy of a protein sequence and the increment of graph energy between two protein sequences, and gave the approach to calculate them. And then we outlined a similarities/dissimilarities model, which is dependent on the protein sequences, in the process of analyzing similarities/dissimilarities of ND5 and 36 proteins domains, and successfully analyzed the similarities/dissimilarities of 24 vertebrates and 27 antifreeze proteins with good results consistent with ClustalW even better ones.(3) We proposed a novel model (DWT_SVM) that coupled the discrete wavelet transform(DWT) with support vector machine(SVM) based on the 2-D graphical representation of protein sequences we’ve proposed, to predict the subcellular localization of proteins. In order to investigate the effect of the wavelets on the prediction of protein subcellular localization, three DWTs were chosen for testing in the predicting of CL317 and ZD98. After that, we applied DWT(Ⅲ) with 2-D wavelet transform with the decomposition layer 2, whose accuracy of prediction not lower than the other two DWTs and previous studies, to predict the subcellular locations of the dataset ZW225 and the benchmark dataset iLoc8897 with both singleplex and multiplex sites, and got satisfying results. The overall prediction accuracy for CL317 and ZD98 with jackknife test proteins is both more than 99% which is much higher than other existing algorithms, while for ZW225 the overall accuracy reaches 98.7% which is could not be compared for the existing methods. For iLoc8897 the overall accuracy reaches 87.16%, about 5% improvement against state-of-the-art method.The promising results not only indicate that the graph energy and Laplacian energy of 20 amino acids could uniquely identify amino acids and might be an important property of amino acids, but also declare our novel 2-D graphical representation of proteins based on them we’ve proposed can well represents proteins.
Keywords/Search Tags:graph energy, Laplacian energy, the graphical representation of proteins, similarity dissimilarity analysis, subcellular localization
PDF Full Text Request
Related items