Prediction Of Neuropeptide Cleavage Sites Based On Random Forest

Posted on:2016-01-19

Degree:Master

Type:Thesis

Country:China

Candidate:S Chen

Full Text:PDF

GTID:2310330479454321

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Neuropeptide is a kind of biologically active polypeptide whose biosynthesis is a very complex process. Only after a process of cleavage and modification by specific enzyme, a inactive peptide precursor will become physiologically active. Therefore,accurate identification of cleavage site of the peptide precursor is a difficult point and hot spot of the research on neuropeptide biosynthesis, which is of great importance to the pathological study and pharmaceutical research of various diseases and the development of neuroscience and brain science. However, the traditional experimental methods, such as site-directed mutagenesis, mass spectrometry(and etc.) are usually time-consuming and labor-consuming, and the accuracy is rather low. In recent years, the explosive growth of biological data bring opportunities and challenges to the solving of biological pattern recognition problems. Methods of bioinformatics based on machine learning is able to discover the hidden knowledge or model in massive data quickly and accurately, and make decisions by building appropriate model and applying learning algorithms, which have been widely applied in many aspects such as gene identification, site prediction and protein structure prediction.Random forest is a kind of ensemble learning algorithms and it is classic because of its high accuracy, good noise immunity, and not easy to cause over fitting, etc. So it is chosen to establish a machine learning model to predict neuropeptide cleavage sites. I have mainly done the following jobs:(1)Screening,sorting and building a local database of neuropeptides based on the Swiss-Prot protein database.(2)The neuropeptide data in the database is classified by different species, and the sample sets are obtained after data preprocessing and feature extraction( training set and test set).(3)Building classification algorithm of random forest and using the dataset mentioned in published article to train and test the algorithm. Doing the parameters optimization,analyzing and evaluating the experimental results, and comparing them with the original article, so that the feasibility and validity has been proved.(4)Predicting with the test set and performing statistical analysis and evaluation on the outcome of the algorithm.Finally, the results indicate that the model of random forest has better accuracy and MCC on the same data sets mentioned in related published article. And it also has good performance on the neuropeptide data sets established in this thesis. So the ideas and methods described about the model are reasonably practicable, and they have some practical reference value to improve the the efficiency and accuracy of prediction of neuropeptide cleavage sites.

Keywords/Search Tags:

Neuropeptide, Cleavage site, Machine learning, Bioinformatics, Random forest

PDF Full Text Request

Related items

1	Prediction Of Neuropeptide Precursor And Its Cleavage Site Based On Machine Learning
2	Research On Cross-species M6A Modification Site Prediction Based On Deep Learning
3	Research On Geochemical Abnormity Identification Of Metric Learning And Random Forest
4	Diversity Analysis Of Viral Protease And Prediction Of Cleavage Site
5	Application Of Turbulence Modeling Based On Machine Learning
6	Research On Prediction Of Phosphorylation Modification Sites Based On Machine Learning
7	Comparative Study On Temperature Simulation In Loess Plateau Based On Different Machine Learning Methods
8	Studies On Prediction Of Selective Cleavage Sites And Cleavage Profile Of Proteasome Using VHSE Amino Acid Descriptor
9	A Machine Learning-based Investigation Into Important Determinants And Predictive Modelling Of Protease-specific Substrate Cleavage Targets
10	Research On The Random Forest Algorithms And Their Applications In Geophysical Exploration Interpretation