Font Size: a A A

Recognition On Cold-stress Protein Based On Machine Learning

Posted on:2018-10-15Degree:MasterType:Thesis
Country:ChinaCandidate:Q FuFull Text:PDF
GTID:2310330533969817Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Cold stress has great influence on biological life,especially for plants.Research on cold stress protein of plants is useful for related biotechnology and increasing crop yield.Currently,recognition on plants stress protein is in the artificial means mainly,which cost much time and money.Up to now,we only found 594 protein in the whole Arabidopsis protein database after analyzing the existed data.So,training on the existed data and predicting based on machine learning methods can provide data support for biological experiment,and it is worth it.In the paper,there are only positive data and the whole protein sequences of Arabidopsis,and there are much unknown cold stress protein in the latter.So,PU Learning is the first choice for the job,which is most suitable way theoretically.All the protein was regarded as unlabled data except positive ones.After trying PUCPI and LibD3 C,the two popular PU Learning methods,a bad result occured,whose accuracy was only about 50%.After that,normal classification methods were taken into consideration,the unlabled data was regarded as negative data,and the result got improved using LibSvm.Besides,several feature extraction methods were used in the experiment,like Pse-One,K-Skip-N-Gram,Information Theory and so on,as well as combination of them.And now,the result was improved to 80%.At last,a new negative dataset was built up,which can reduce the positive data in unlabled dataset effectively,and with it,the accuracy was improved to about 85%.After the way to predict cold stress protein was found,the existed data was summed up,and a database website was developed.The website uses some popular Java Web technology,such as Maven,Spring Boot,Mybatis,Mysql,VUE and so on,and as well as Java mainly.Browse,full-text search(Lucence),sequence alignment(Blast),prediction for Arabidopsis cold stress protein are provided in the website.
Keywords/Search Tags:Cold Stress Protein, Machine Learning, Prediction, Feature Extraction, Database
PDF Full Text Request
Related items