Font Size: a A A

Prediction Of Hot Regions In Protein-protein Interaction By Combining Density-based Clustering With Feature-based Classification

Posted on:2016-12-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HuFull Text:PDF
GTID:1220330482969788Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
The protein-protein interaction expresses their function, and not all residues but only a few are important to protein combinations, and those important residues are called hot spot residues. Mass experiments show, hot spots in protein-protein interactions will form a certain conformation, called hot regions. Hot regions stabilize and coordinate protein-protein interaction, and their being discovered and definition are very important to understanding the protein activities like disease origin, pharmaceuticals, drug effect targeting, etc.In recent years, although some protein interaction data were recorded by researchers for experimental purposes, but it is still limited due to the high cost of capital and time and complexities of these experiments, and the computing methods to predict hot regions are becoming more and more important. A prediction model based on density clustering and feature classification is proposed in this paper, and the research result shows a better predictive result than other methods. And based on the hot region research, we propose a sequence conservation-based method to test hot regions. Major innovation points of this paper are as follows:(1) A predictive method combining density-based incremental clustering with feature-based classification for protein interaction hot regions is proposed. Firstly, multiple initial clusters are obtained by classifying the data in the data set using density-based incremental clustering, and non-hot spot residues are removed by feature classification for the final predictive hot regions. The experimental result shows our method is able to predict most of the hot regions, and at the same time, the coverage of hot spots in hot region prediction is higher than other methods.(2) An effective feature selection method is proposed. Effective selection of biological feature is the most important for feature classification, we start from protein structure and collect a serial of features of protein interactions, and with SVM-based recursive feature elimination and normalized mutual information feature selection, using a backward feature elimination strategy to obtain the important features and to remove insignificant features to form a feature list table sorting by importance, at last, F-score is introduced to find the best optimized feature combinations.(3) A sequence conservation-based method to test hot regions is proposed. Because of complexcity and long period, tesing hot regions by biological method is very difficult. Here a sequence conservation-based method to test hot regions is proposed. For each every hot spot in a hot region, two proteins that form the complex are found in the protein database, and by isoforms we obtain the complete sequence of genes, with the complete protein sequence we get all the orthologs in different species, and then by multiple sequence alignment, we record all the sites of the hot spot residue in different species, here we apply Blocks Substitution Matrix to built conservation scoring function of hot regions for the first time, and by the scoring function we built, the conservation scores of the hot regions in different species are obtained, and at last, we calculate the hot region conservation relative to other regions on the interaction interface.
Keywords/Search Tags:protein interaction, hot region, conservation, density clustering, feature classification
PDF Full Text Request
Related items