Font Size: a A A

Researches On The Protein-Protein Interaction Prediction Method Based On The Machine Learning

Posted on:2018-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:J C ZengFull Text:PDF
GTID:2370330515953773Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Proteins play an important role in life activities,and most of the proteins interact with other proteins to achieve specific biological functions.Therefore,it is of great significance to study the relationship between protein interactions.With the continuous development of bio informatics,the method of protein interaction prediction has two main directions:experimental method and calculation method.Due to the rapid development of machine learning technology in recent years,the use of computational methods for protein interaction has become a new research hotspot.The key to the prediction of protein interaction with machine learning is the two steps of data processing and classification.In this paper,the protein interaction is considered as a classification problem,and the method of ensemble learning is used to predict.The research contents of this paper include:(1)Collection of data sets.Obtain high reliability data from the public authority of the DIP database,including 8 different species of a total of 28868 protein composition of 62280 protein interaction pairs.There are three examples of counter-data sets,including an existing unique countdown database and two artificial constructs.Due to the unbalanced data of positive and negative cases,we propose a sampling algorithm based on K-Means clustering to down sample the positive examples.The experimental results show that the Negatome data set and the Amino AcidsReorder data set are valid.(2)Feature extraction and feature selection.Based on the primary structure,secondary structure and physicochemical properties of the protein,we have proposed five kinds of feature extraction methods.Then,five kinds of features are merged,and the Maximum-Relation and Maximum Distance and the feature selection method based on Z test are used to select the effective feature and reconstruct the data set.The experimental results show that the two feature selection methods can effectively improve the performance of the classifier.(3)Ensemble learning.The research shows that the integrated learning is a learning strategy that can effectively improve the weak classifier into a strong classifier.In this paper,we predict the protein interaction by constructing a dynamic cycle integrated classifier LibD3C,and the experimental results show that we can achieve the desired effect.Experimental results show that LibD3C and Random Forest two classifiers the best results.The best results in the F1 measured value indicators can reach 0.99 or more.
Keywords/Search Tags:Protein-Protein Interaction, Features Selection, Ensemble learning
PDF Full Text Request
Related items