Font Size: a A A

A Study On Protein-Protein Interaction Prediction Based On CGR And Random Forests

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiFull Text:PDF
GTID:2480306317968799Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein interactions play a key role in life activities.Once PPI is abnormal,it will lead to various diseases.Accurate identification of protein interactions can not only understand the nature of life phenomena at the molecular level,but also be very useful for the exploration of disease mechanisms and the design of treatment.Over the past few decades,rapidly evolving high-throughput technologies have validated large amounts of protein-protein interaction(PPI)data.However,these biological experiments are costly and time-consuming,and have limited coverage and high false positives.Computational methods have begun to develop for effective prediction of PPI.This paper focuses on sequence-based protein interaction,in which the creation and improvement of protein sequence coding method and PPI prediction model are the focus of the study.In this paper,a new protein interaction prediction model iPPI-PseAAC(CGR)is proposed by incorporating chaos game representation into Pse AAC(pseudo amino acid composition),which uses CGR(chaos game representation)to extract feature information.Random Forest,an integrated classifier based on voting mechanism,is used as an prediction tool to predict and analyze sequences effectively.Among them,maintain the balance of base composition,maximize the increase of amino acids difference between coding is the biggest characteristic of CGR coding mode,which can retain quite a lot of sequence order information or key pattern features.The vector defined in the discrete model may completely lose all the sequence pattern information,and the pseudo amino acid composition we use in this predictor can better avoid this situation.It has been widely used in the field of computational biology.In this study we use a72-dimensional Pse AAC vector to represent samples of any protein pair.5-fold and 10-fold cross-validation tests were performed on two benchmark data sets,saccharomyces cerevisiae and helicobacter pylori.The results show that the prediction effect of iPPI-PseAAC(CGR)is significantly better than the existing prediction methods.A novel and effective scheme is given for the prediction of protein interaction.Furthermore,a user-friendly web server is established for the predictor so that it can be accessed by the public,and the server can generate any feature vectors required for biological sequences according to the needs of the user or its own definition.
Keywords/Search Tags:protein interaction, feature extraction, amino acid sequence, chaos game
PDF Full Text Request
Related items