Font Size: a A A

Research On The Proteins Subcellular Localization Based On Repetitive Information Measurement And Convolutional Neural Networks

Posted on:2019-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2370330545969229Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
A cell is composed of cell membrane,nucleus,endoplasmic reticulum and so on,which are called subcellular components.The subcellular functions are carried out by proteins located in them.The subcellular components are called subcellular locations.Protein must be transported to its correct subcellular position to function normally.Otherwise,functional disorders and diseases will happen.Therefore,accurate understanding of subcellular localization information is important for reveal the nature of protein function,cell life activity and so on.At the same time,the mass,multimodal,correlation and incomplete characteristics of the protein data lead the subcellular localization of protein to a challenging research hotspot in bioinformatics.The research on subcellular localization of proteins includes three steps,feature extraction,classification prediction and algorithm evaluation respectively.Feature extraction is the most critical step in research on the subcellular localization of proteins.Feature extraction extracts main features and constructs the characteristic vectors by analyzing protein sequences.The process of classification is to predict the localizations of proteins.The results of feature extraction methods are feed into the input layer of classification algorithms.Algorithm evaluation is to evaluate which feature extraction method and classification algorithm are better.This paper focuses on feature extraction methods and classification algorithms of research on subcellular localizations of proteins.The work is as follows.First,this paper analyzes the shortcomings of traditional feature extraction methods.Then,this paper proposes three new feature extraction methods based on repetition information,R-Dipeptide,I-PseAAC,PseAAC2 respectively.R-Dipeptide adds repetitive information via moving windows to extract dipeptide features.I-PseAAC based on R-Dipeptide,computes the differences in physical chemical properties of the relationship between each residue and its following residues.Compared with the PseAAC,I-PseAAC makes several adjustments to order information.PseAAC2 based on R-Dipeptide,computes the whole physical and chemical properties of each residue,and the product of each residue and other residues to reflect the differences among residues.The experiments show that feature extraction methods are superior to traditional feature extraction methods in adding key repetitive information,extracting different order information and comparing the whole physical and chemical properties among residues.Secondly,the convolutional neural networks is introduced to predict the subcellular localizations.The convolutional neural networks has the ability of automatic extraction and induction.The convolutional neural networks can extract feature again and refine feature to improve the prediction of subcellular localizations based on the feature extraction methods.Compared with different classifiers,such as MLKNN algorithm,SVM algorithm,the prediction of convolutional neural networks is better.Thirdly,first-order gradient descent of training algorithm of convolutional neural networks is improved.Compared with the descent speed of mean square error of CNN in first derivative,the experiments show that CNN in second derivative is faster than CNN in first derivative.The prediction is improved by feature extraction methods and classification algorithms research.
Keywords/Search Tags:research on the subcellular localization, R-Dipeptide, I-PseAAC, PseAAC2, convolutional neural networks
PDF Full Text Request
Related items