Font Size: a A A

Prediction Of Neuropeptide Cleavage Sites Based On Random Forest

Posted on:2016-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2310330479454321Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Neuropeptide is a kind of biologically active polypeptide whose biosynthesis is a very complex process. Only after a process of cleavage and modification by specific enzyme, a inactive peptide precursor will become physiologically active. Therefore,accurate identification of cleavage site of the peptide precursor is a difficult point and hot spot of the research on neuropeptide biosynthesis, which is of great importance to the pathological study and pharmaceutical research of various diseases and the development of neuroscience and brain science. However, the traditional experimental methods, such as site-directed mutagenesis, mass spectrometry(and etc.) are usually time-consuming and labor-consuming, and the accuracy is rather low. In recent years, the explosive growth of biological data bring opportunities and challenges to the solving of biological pattern recognition problems. Methods of bioinformatics based on machine learning is able to discover the hidden knowledge or model in massive data quickly and accurately, and make decisions by building appropriate model and applying learning algorithms, which have been widely applied in many aspects such as gene identification, site prediction and protein structure prediction.Random forest is a kind of ensemble learning algorithms and it is classic because of its high accuracy, good noise immunity, and not easy to cause over fitting, etc. So it is chosen to establish a machine learning model to predict neuropeptide cleavage sites. I have mainly done the following jobs:(1)Screening,sorting and building a local database of neuropeptides based on the Swiss-Prot protein database.(2)The neuropeptide data in the database is classified by different species, and the sample sets are obtained after data preprocessing and feature extraction( training set and test set).(3)Building classification algorithm of random forest and using the dataset mentioned in published article to train and test the algorithm. Doing the parameters optimization,analyzing and evaluating the experimental results, and comparing them with the original article, so that the feasibility and validity has been proved.(4)Predicting with the test set and performing statistical analysis and evaluation on the outcome of the algorithm.Finally, the results indicate that the model of random forest has better accuracy and MCC on the same data sets mentioned in related published article. And it also has good performance on the neuropeptide data sets established in this thesis. So the ideas and methods described about the model are reasonably practicable, and they have some practical reference value to improve the the efficiency and accuracy of prediction of neuropeptide cleavage sites.
Keywords/Search Tags:Neuropeptide, Cleavage site, Machine learning, Bioinformatics, Random forest
PDF Full Text Request
Related items