Font Size: a A A

The Application Of Integrated Neural Networks In Predicting Protein Interaction Sites

Posted on:2012-04-30Degree:MasterType:Thesis
Country:ChinaCandidate:X L ShenFull Text:PDF
GTID:2120330335979741Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Interactions among proteins are the basis of various life events, and protein interaction sites are very important in drug design and construction of protein interaction networks. So, it is of important significance to understand and research protein interaction sites in theory and practice. It is slow and arduous to locate protein interaction sites through biological experiments, and researchers often encounter some unexpected situation in the process of experiments. So, it is very valuable to analyze and predict the interactions between sites of protein by using some theoretical methods. In recent years, with the development of bioinformatics and computational intelligence of computer, many applications of computational intelligence methods to predict protein interaction sites also realized faster progress. Based on the above background, neural networks and their integrated methods in computational intelligence were applied to predict protein interaction sites in this paper.Two data sets were used here. The first data set contains 35 protein molecules. The second is a control set that contains 149 protein molecules (S149). Then a series of features that can represent protein interaction sites were extracted. These features included sequence profiles, entropy, accessible surface area, relative accessible surface area, depth index, protrusion index, hydrophobicity and so on. Many effective sample sets were also created by combining some features. In the following, several single back propagation neural networks or Radial Basis Functional neural networks and their integrated approaches were applied to train and test these sample sets. In the paper, three integrated methods were used, and they included a kind of fusion algorithm by voting that prior knowledge were used, Genetic Algorithm based Selected Ensemble (GASEN) and a novel method for constructing ensemble classifiers which was based on Principal Component Analysis. In the experiments, one time cross-validation and ten times cross-validation were used to predict the two data sets respectively.In the first data set, 4 sample sets were created by combining sequence profiles, entropy and accessible surface area. Then back propagation neural networks and one integrated method were used to predict the protein interaction sites. The integrated method is a kind of fusion algorithm by voting that prior knowledge were used. In the second data set, two different experiments were carried out. In the first experiment, 10 features were extracted and 4 sample sets that contained 9 sliding windows were made according to these features. These 4 sample sets were calculated by Radial Basis Functional neutral networks which were optimized by Particle Swarm Optimization respectively. Then 4 groups of results were obtained. Finally, these 4 groups of results were integrated by Genetic Algorithm based Selected Ensemble (GASEN). In the second experiment, 24 different features were extracted and only one sample set was made according to these features. Then Radial Basis Functional neutral network that was optimized by Particle Swarm Optimization was also selected as the single classifier. Finally, a novel method for constructing ensemble classifiers which was based on Principal Component Analysis was chose to process the training data set and it was compared with the Bagging and Adaboost.Experimental results showed that: the three integrated methods all improved prediction accuracy in some extent, and they were also a little better than the traditional methods of Bagging and Adaboost. It explains that the performance of integration predict classifiers were better and more effective.
Keywords/Search Tags:protein interaction sites, eatures, neutral network, integrated
PDF Full Text Request
Related items