Font Size: a A A

Protein Solubility Prediction Based On SVM&Pattern Analysis Of Hepatitis B Virus Mutations

Posted on:2017-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:K ShenFull Text:PDF
GTID:2310330512957480Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Protein is an important biomacromolecule associated with life activities of humans.This paper analyzes proteins from the perspective of computer science,mainly from two parts:The first part is to extract the features of proteins via analysis and then use SVM to predict protein solubility from the primary structure of proteins.The second part explores the variation pattern of hepatitis B virus amino acids from the methods of statistical pattern analysis and computer software.A)In the field of machine learning,feature is the representation of learning data.The result of feature selection has a direct impact on the classification accuracy and generalization performance of the classifier.Therefore,effective methods for feature selection are very important.As one of the most widely used machine learning models,SVM has important applications in text categorization,image recognition,bioinformatics,and so on.As an indispensable component of all cells and tissues in lifes,protein plays a decisive role in life activities.The solubility and insolubility of protein determine whether it is capable of functioning.Furthermore,numbers of diseases are caused by the variations of the solubility of proteins.Therefore,it's no doubt that understanding and research of the solubility of proteins are important.In this paper,the SVM model is used to train a model to predict the solubility of proteins from the point of view of computational science,according to the physical and chemical properties of protein amino acids and the sequence features of protein,and then to predict the solubility of new proteins.Comparing with the previous work,we obtained the protein soluble characteristics with better classification and the prediction model.B)Hepatitis B is a widely infectious inflammatory disease of severe harm,which is unable to be cured throughly,and only vaccination may prevent its morbidity.This paper apply the statistical analysis methods,analyses the variation trends of amino acids in four HBV proteins,to find their mutation patterns.Then,we use pattern analysis software,combined with the sequence information about amino acids,to analyse the antigen epitopes of these four proteins.The results are antigen epitopes of relatively active mutations,which would provide effective help to the designs of HBV drugs and vaccines.Based upon the analysis about protein mutations of HBV,we find that,a)among four proteins of HBV,DNA polymerase and surface protein have more mutations and more research quantum,b)serine,threonine and alanine have frequent mutantions while the tryptophan and methionine have low mutability,c)some active antigen epitopes are obtained through our analysis,among which the mutations are much more frequent.
Keywords/Search Tags:Protein solubility, Feature selection, Support vector machine(SVM), Hepatitis B Virus, Pattern analysis
PDF Full Text Request
Related items