Font Size: a A A

Research On Sequence Information Extraction Methods And Subcellular Location Prediction Of Proteins

Posted on:2019-06-22Degree:MasterType:Thesis
Country:ChinaCandidate:Q H LiuFull Text:PDF
GTID:2370330548974398Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Since the human genome was sequenced,The number of protein sequences which unknown function and structure is increasing rapidly.Using traditional biological methods to recognize these proteins is time-consuming and material,and the speed of processing lags far behind the growth in the number of proteins.To solve this problem,in this paper,two techniques of protein subcellular localization prediction are improved: feature extraction and classifier.(1)Put forward a new improved series feature extraction method,this method based on feature fusion and the amino acid composition,entropy density and autocorrelation coefficient are combined to carry out the characteristic expression.,use this method to build the feature vectors can contain the amino acid composition of protein sequence information,physical and chemical properties of amino acids and correlation information of amino acid residues,can better express the protein sequences.Select the support vector machine(SVM)as classifier,which is widely recognised.Using the jackknife method,cross inspection on the two data sets and compared with the traditional method of prediction effect,to prove the effectiveness of the new method.(2)This thesis proposes an improved pseudo-amino acid composition method.based on feature fusion,proposes an improved sequence feature extraction method,and improves the performance of the classifier.The new pseudo amino acid method composition contains richer protein physicochemical properties,the improved feature fusion method combines the new pseudo amino acid composition and the information extracted by PSSM matrix to carry out the characteristic expression.This method not only combines the amino acid composition,the physical and chemical properties of amino acids and the interaction between the amino acid residues,but also introduces the protein evolution information,contains protein information more diverse.On the basis of use new methods to extract information,multiple support vector machines are connected in parallel to build an integrated classifier,which improves the prediction performance and achieves the desired prediction results.
Keywords/Search Tags:Protein subcellular localization, Support vector machine, Pseudo amino acid composition, position specific score matrix, Integrated classifier
PDF Full Text Request
Related items