Font Size: a A A

Prediction Methods Of Hub Protein-Protein Interaction Interfaces

Posted on:2020-10-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:X L LinFull Text:PDF
GTID:1360330575469020Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In the protein-protein interactions,Hub proteins are the key factor to maintain stability and coordination of protein-protein interactions,and exert the protein biological function.Hub proteins help to explain the molecular mechanism of exerting biological function,to understand the micro process of life activity and provide theoretical guidance of the drug design based on protein structure.Some very crucial residues are known as hot spots which contribute the majority of the binding-free-energies.Hot spots are tightly packed together to form the special areas that named hot regions.Hot regions are importance areas where the receptors bind to the ligands with high-affinity.Also,hot regions are the particular functional areas that promote the stability of protein-protein interactions.Therefore,it is very important for understanding protein functionalities to study the hot spots and hot regions of hub protein interfaces.Although more and more structures and attributes of protein are found,a large amount of information is unavailable and redundant,resulting in the extreme difficulty of identifying hub protein interfaces by using traditional methods and techniques.The development of high-quality predictive models and analysis algorithms is an essential task.In this thesis,some research works on the hub protein interfaces have been carried out by using ensemble learning and clustering methods.The main works in this thesis are as follows:(1)Feature Selection Based on Correlation CoefficientFirstly,Pearson correlation coefficient is used as judgment standard to evaluate the features,and try to find the highly associated features and remove redundant features.In order to aggregate variables with correlation patterns,the rows and columns of the matrix are sorted again by principal component analysis(PCA),which makes the relationship between two features more intuitive.In addition,the recursive feature elimination based on support vector machine(SVM-RFE)is used to obtain the optimal feature subset with a backward feature elimination strategy.So,the irrelevant features can be removed without much loss of information.(2)Prediction Hot Spots of Hub Protein Interface Based on Ensemble LearningTo effectively predict the hot spots of hub protein interfaces and classify different hub protein interfaces,three ensemble learning methods,Boosting,Gradient Boosting and Random Forest,are firstly used to create classification models with different datasets,and ten-fold cross validation is used to evaluate different models.Then,three ensemble learning methods are used to predict the hot spots of hub protein interfaces.In addition,an optimization strategy based on protein interaction tendentiousness(PIT)is used to calculate PIT of the hub proteins.The hub protein interfaces with the higher PIT,DD interface(DateHub-DateHub)and PP interface(PartyHub-PartyHub),are classified using the three ensemble learning methods.To evaluate the performance of the classification models,the importance of feature variables is analyzed by using the average precision descent curve and the average Gini coefficient descent curve,and the margin distribution curve is used to measure the reliability of the classification models.The experiment results show that OOB of Random Forest based on PIT is lower,and the classification results have the higher reliability.(3)Prediction Hot Regions of Hub Protein Interface Based on Local Community Structure Detection MethodA clustering method based on local community structure detection is used to predict the hot regions structure of hub protein interfaces.Firstly,the community is divided by identifying the boundary nodes based on clustering.Then,the hot region results are optimized using the pair potentials and relative ASA(PPRA).Finally,the missing residues are reallocated through the optimization strategy.The experiment results show that LCSD is feasible and effective for predicting the hot regions,and the accuracy has been effectively improved.(4)Prediction Hot Regions of Hub Protein Interface Based on RCNO and K-means MethodK-means clustering is used to predict the hot regions on hub protein interfaces.Firstly,in order to improve the efficiency of K-means clustering algorithm,the best k value is selected by calculating the distance square sum and the average silhouette coefficients.Then,the optimization of residue coordination number(RCNO)strategy is used to calculate the average coordination number.In addition,the pair potentials and relative ASA(PPRA)is also used to optimize the hot regions.The experimental results show that RCNO can not affect on the number of predicted hot regions,but the number of hot spots in the hot regions increase and the number of non-hot spots decrease,and the predicted hot regions are much near to the standard hot regions.In summary,based on a new feature selection method,three ensemble learning methods and two clustering methods are proposed to predict the hot spot residues and hot regions of hub protein interfaces.And these methods are optimized by optimization strategies.The experimental results demonstrate that our models have the higher reliability,and our methods are effective for predicting hub protein interfaces.
Keywords/Search Tags:protein-protein interactions, hub protein interfaces, hot region structure, ensemble learning, clustering
PDF Full Text Request
Related items