Font Size: a A A

The Research On Predicting Transmembrane Helix Of Membrane Protein

Posted on:2011-02-21Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2120360308452347Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Membrane protein takes up more or less 30% of all discovered protein, which have indicated its significance in any living creature. The functionality of membrane protein consists of communication between cells, regulating the solution both in and out of the membrane, coordinating ion transportation and acting as the cell sensor, etc. These functionalities are crucial for living creature to survive. At the meanwhile, as the target of many modern pharmaceutical, a more thoroughly understanding about the structure of membrane proteins will be needed so as to develop effective medicines. We all know that function and structure of protein are bound together. Nevertheless, experimental determined high resolution 3D structures of membrane protein are rare for the difficulties in getting crystal model from membrane protein. Less than 500 high resolution 3D structures of membrane protein are available. Comparing to 10,000 helical membrane proteins in the whole genome, there is a huge gap in the middle. In the case of inadequate structural data, the computational approach based on primary structure of proteins becomes a desired alternative.Prior prediction methods make use of the hydrophobicity scale of each amino acid. A profile can be retrieved by using the hydrophobicity scale of each amino acid in the protein sequence. The part that is bigger than a given threshold would be predicted as transmembrane segments. It has led to low prediction accuracy as the methods of this sort overlook the interactions between adjacent residues. Then it comes with the machine learning approaches which take the advantages of the great improvement in artificial intelligence. These technologies comprise Neural Networks, Support Vector Machine, K Nearest Neighbors and so on. Many predictors have been developed based on the above technologies such as TOP-PRED, PHDhtm, MemBrain, etc. These machine learning methods can learn complex relations between residues directly from the data, especially when the evolutionary information is presented. It has been proved that these methods can achieve a relatively high prediction accuracy compared to the former methods. However, these existing methods all based their prediction on a single residue and then put them together rather than take the transmembrane helix as a whole at the first place. It also has led to bottlenecks about how to further improving the prediction accuracy especially the N, C ends accuracy.In this dissertation, I propose a new method which takes the transmembrane helix as a whole and makes use of the ends information of transmembrane helix. Two binary classifiers which are used to predict the ends preferences of each residue in the predicting membrane protein are established by using SVM. MemBrain is borrowed to generate the first prediction results. Based on the obtained results, the preferences of each residue then are used to update the ends that are predicted by MemBrain.As I have already mentioned, the whole prediction can be divided into two phase. The first phase is to get a primitive prediction by using the former methods, more specifically MemBrain here. The second phase is to introduce the ends information which is implemented by using two SVM trained binary classifiers that are used to produce ends preferences of each residue.The dataset has a great impact on the quality of the experiment. I have paid particular attention in data-choosing. The data I chose to base my experiment on are all experimental determined high resolution membrane proteins. Cross validation is used as many other methods do to get an average result of the prediction. The assessment of the result also make use of means, more specifically per residue rate and per segment rate, that widely used by other researchers. In the course of making decision about how can we say that we have made a good prediction about each segment, the number of overlaps between predicted segment and observed segment are used as the indicator. Different methods usually tend to different numbers as their indicator, for an instance, MemBrain use 9. After analyzing this factor, I have found that the choosing of this indicator have a great influence over the final result which generally is their reporting result. So, I use a plot in my result to demonstrate how this specific factor affects the final result. The plot gives a comparison between MemBrain and my improved version which shows that my methods are more robust to the change of this indicator. My aim of this research is to improve the prediction accuracy of transmembrane helix. By integrating the ends information, the result of the experiment shows that prediction accuracy do get improved and is more robust to different indicators.
Keywords/Search Tags:membrane protein, transmembrane helix, secondary structure prediction, machine learning
PDF Full Text Request
Related items