Font Size: a A A

Analysis And Prediction Of Beta-sheet Structures In Proteins And Bioinformatics Software Tools Developing

Posted on:2011-05-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:1100330332472713Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
The (3-sheet is one of the most important protein secondary structures, and has remained one of the main stumbling blocks of protein structure predictions. An in-depth study and an accurate prediction of (3-sheet may lead to noticeable improvements in de novo protein structure prediction and in the study of protein folding and design. In this study, we mainly explored theβ-sheet structure.The dataset used was taken from the PISCES server. Based on our SheetsPair database constructed previously, we prepaired all proteins in the PISCES dataset and integrated them into the database. And then the database was used to manage all the protein data for our further studies.We pursued a research strategy from the interstrand amino acid pairs to (3-strand (peptide segment) arrangement. First of all, statistical analysis had been done on the amino acid pairs and non-random appetency propensities had been revealed. Based on the statistical results, three relative frequency (RF) matrices were obtained for parallel, antiparalllel, and total P-strands, respectively. These matrices were then used widely in our further studies. It was shown that the hydrophobic strength and the disulphide forces were the two main factors influencing the interstrand amino acid pairs. Additionally, it seemed that other aspects (such as surroundings) could also contribute to the pairing. Furthermore, analysis results revealed that there were noteable differences in the amino acid pairing preferences between parallel and antiparallelβ-strands.We then analyzed the amino acid pairing preferences based on the method of metric multi-dimensional scaling (MMDS). The MMDS method was used for making a visual representation for the RF matrices representing the interactions between amino acids. As the MMDS maps showed, there was a distinct "core" constructed mainly by strong hydrophobic amino acids on each map of parallel, antiparallel and totalβ-strands, respectively. This indicated again the importance of the hydrophobic strength in the amino acid pairs. Another found was that the MMDS maps for parallel and antiparallelβ-strands were different, which could be used in our further study to develop methods for predicting parallel and antiparallel orientation. We also use a hierarchical clustering method on our MMDS results to group the 20 amino acids. It arrived at an optimum number of 5 groups for total, but 6 for parallel and 4 for antiparallel.From the results on the analysis of the amino acid pairs above, we then investigated theβ-strand (peptide segment) arrangement. At the most straightforward level, full (3-strand arrangement could consist of:(i) finding the interacting partnerβ-strand(s), (ii) predicting the relative orientation (i.e. parallel or antiparallel) and (iii) shifting the relative positions of the two interactingβ-strands. Our further studies were performed according to these three aspects.First of all, we mainly focused on the second aspect of the three above, i.e. the parallel or antiparallel orientation. By extracting features from the RF matrices, we found that the interstrand amino acid pairs played a significant role in determining the parallel or antiparallel orientation ofβ-strands, and the influences of the surroundings and other uncertain factors were small in this aspect. From these conclusions, we proposed a new encoding scheme and developed a support vector machine-based approach for the prediction of the parallel/antiparallel orientation ofβ-strands. As a result, a prediction accuracy of 86.89% and a Matthew's correlation coefficient value of 0.7126 had been achieved.In the first aspect of the three above, we preformed a preliminary study on the strand partner distribution. Results showed that most P-strands inclined to part with its neareast neighbour strands (or "First Come First Pair" rule). Furthermore, neareast neighbour paired P-strands had more strong preferences in amino acid distances in antiparallel, but it was not so strong in parallel.In the third aspect of the three above, it was found that the ends of one P-strand did not align with the ends of another, but extend a part of it, when they arranged to form aβ-sheet. Statistical results showed that the ratio of the length of the paired part to the extended length (the extended length is the length of paired part plus lengths of two extending parts) was more than 25%, and the ratio of the length of the paired part to the length of the P-strand was more than 40%, generally. In the present study, there has been a lot of research in field of bioinformatics. From our experiences and techniques, we developed several software or computer utilities to facilitate the future studies ofβ-strands and studies of other fields of bioinformatics. Such software or computer utilities are as following: StrandPairsViewer software for interstrand amino acid pairs visualization, SRD software for DNA/Protein sequence relationship visualization based on undirected graphs, NRChart control (an ActiveX control) for time series data reading and visualization, LTPConverter tool for long-term potentiation (LTP) experiments data conversion, Super Notepad software for ASCII text processing for daily bioinformatics research, etc. Many efforts had been done to make these software or computer utilities run faster and occupy less memory. The features, appplications, programming methods and techniques of them have been presented in the dissertation.
Keywords/Search Tags:protein, beta-sheet structure, amino acid pairs, beta-strand arrangement, multi-dimensional scaling, support vector machine, database, software development
PDF Full Text Request
Related items