Research On Essential Protein Recognition Based On Random Forest Algorithm

Posted on:2020-12-22

Degree:Master

Type:Thesis

Country:China

Candidate:J Y Zhang

Full Text:PDF

GTID:2370330599462961

Subject:Agricultural informatization

Abstract/Summary:

PDF Full Text Request

Identifying proteins that are useful in living organisms is extremely important for the evolution of organisms and the medical field.There are two ways to distinguish the importance of proteins today.The first is based on biochemical methods,but the use of biological experiments to identify certain defects,such as: longer time,higher cost,and can not handle the problem of large amount of data and so on.The second is a way to use computers as tools to analyze organisms and interpret them with biologically relevant knowledge.Most methods for recognizing the importance of proteins using computers are identified by using the Protein Interaction Network to extract topological metrics.However,due to the incompleteness of some related biological experimental data and the complexity of the protein network itself,no single central metrics can be found that can accurately distinguish between key and non-critical proteins,and from the current related research,key proteins and non-critical The difference between proteins cannot be determined by a single feature and should be determined by a combination of factors.Single centrality metrics often fail to identify key proteins effectively.It is necessary to integrate multiple topological central metrics,break through the traditional method of fine selection using sorting,and establish a machine learning model for protein classification and recognition.The random forest algorithm is an integrated type of algorithm,which can integrate multiple single classifiers,that is,integrate the classification effects of multiple decision trees to form a classifier in a global sense.In view of the previous research,the single feature is used for classification and recognition,and because the random forest has the advantages of the aggregate multi-classifier,the classification effect has obvious advantages.Therefore,this paper chooses the random forest machine learning method to identify the importance of the protein.This paper will analyze the structure of protein network,integrate multiple topological centrality measurement methods,and build a model using random forest algorithm to study and analyze the identification of key proteins.In this paper,budding yeast protein was selected as the research object.The specific research contents include cleaning the collected data,constructing a protein network(PPI),selecting six central metrics for feature extraction,constructing a model for identifying key proteins,and selecting Random forest algorithm,and the experimental results were evaluated by statistical indicators.The results show that the algorithm can identify key proteins accurately and quickly,and eliminate interference factors such as false positives and redundancy,which has higher recognition ability than other algorithms.In summary,the paper proposes a fusion of multiple central metrics,and the use of random forest algorithm to establish a protein importance prediction model can more effectively identify key proteins.

Keywords/Search Tags:

Protein interaction network, Key protein, Machine learning, Random forest

PDF Full Text Request

Related items

1	Prediction Research Of Protein-Protein Interaction Based On Ensemble Of Support Vector Machine And Random Forest
2	Predicting Non-coding RNA-protein Interactions By Machine Learning
3	Research On Protein Complex Accurate Recognition Based On Machine Learning
4	Prediction Of Protein Structure Classes And Topology Analysis Of Protein Interaction Network Based On Support Vector Machine
5	Modeling And Analysis Of Arabidopsis Protein-protein Interaction Network
6	Research On Prediction Of Protein-Protein Interactions In Plants Based On Ensemble Learning
7	Research On Prediction Of Protein-protein Interactions Based On Deep Neural Network And Ensemble Learning
8	Research On Predicting Protein-protein Interactions Based On Machine Learning
9	Research On Machine Learning-based Protein-Protein Interaction Extraction
10	Relationship Between Prediction Results Of Machine Learning-based Protein-protein Interaction And Sample Repeatability