Font Size: a A A

Reasearch On The Types Of Protein Function Predictor Based On Sequence Feature

Posted on:2017-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:M J HuiFull Text:PDF
GTID:2310330512961294Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Protein is the main undertaker in cellular and biological life,it is also the manifestation of biological traits.In recent years,with the development of biotechnology in the era of post-genomic,protein sequences are appearing explosive,It can't meet the demand of people for protein structure and function study only relying on the experimental methods.In order to shorten the research cycle and save valuable research funding,the biological information scholars are increasingly depending on computational methods that can be used to predict many biological properties of proteins.There are a number of professional classification methods in protein research,each of which has a very important practical value in a certain fields.As a branch of proteomics,the protein classification has attracted more and more attention in recent years.Protein classification is the premise and foundation of mastering the structure and function of protein.It plays a very important role in the domains of molecular biology,cell biology,pharmacology and medicine.On the basis of previous research,the research of this paper focuses on three hot issues: the prediction of enzyme catalytic sites,the reasearch of DNA binding proteins and the identification of antifreeze proteins.Although the structural information of protein has been significant to predict the protein's function,due to the limitation of existing biological technology,not knowing the information of the structure of many proteins,the primary structure determining the tertiary structure and function,so in this study,the methods based on sequence information are adopted to predict the function of protein.To construct digital model of the sequences,we adopt the feature extraction algorithm in data fusion of various amino acid sequence information such as amino acid composition,PSSM matrix,the Physical-chemical properties,dipeptide composition,grey dynamic factor and so no.The discrete model is not only simple,but also contains rich physical-chemical and genetic information.When constructing the training set the data which are removed the redundancy are determined strictly by biological experiment,only in this way,the training set can be very good for the design of algorithm.Based on the good robustness of intelligent algorithm,we adopt fuzzy k nearest neighbor method and random forest method,and fusion algorithm,to establish the enzyme catalytic sites,DNA binding proteins and antifreeze proteins predictors.The predictors are compared with the existing methods,on the accuracy,sensitivity,specificity,and Matthew's correlation coefficient and ROC index.We respectively establish the on-line predictors and provide detailed description of predictors step by step.The predictors can supply biological scholars with convenience without considering the complexity of algorithm.As long as the people input the suitable format sequence,they can obtain the forecast value,improving practical value of the predictors.The algorithms designed in this paper can also be applied to other related protein domain.
Keywords/Search Tags:proteins, digital model, feature extraction, intelligent algorithm, predictor
PDF Full Text Request
Related items