Font Size: a A A

A Multimodal Deep Architecture For Large-scale Protein Ubiquitylation Site Prediction

Posted on:2019-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:L L BaoFull Text:PDF
GTID:2370330563453721Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Protein post-translational modification is an important chemical modification,not just decoration,it changes the protein properties by hydrolyzing the protein or adding a modified group to one or more amino acids.In 1975,Goldstein et.al discovered ubiquitin,which is a small protein of 8.5 kDa and is consisted of 76 amino acids.The attachment of a single ubiquitin or poly-ubiquitin chain to a specific lysine of protein is an important post-translational modification,which is ubiquitylation.The study of protein ubiquitination will be of great significance to the expression of genetic information and the understanding of various diseases.The existing methods are divided into two main steps,based on biological experiments and computing method.There are mass spectrometry,CHIP-CHIP,etc.in biological experiments and they takes a lot of time and energy,and it is also expensive in the purchase of the instrument.Therefore,the computing method has come into being,and it can solve such problems and is also a popular method today.Many existing computational methods are based on feature engineering,the loss and redundancy of the features all lead to the deviation of the result and the discriminant ability of the model,which has a high standard in the feature extraction and selection.With a large number of biological features,it is impossible to easily determine which features or combination of features are helpful to model.Based on this,people have begun to pay more attention to deep learning,which provides multiple-layer networks and non-linear mapping operations to detect potential complex patterns in a data-driven way,especially for large-scale data.It can learn appropriate feature expression from the original data pattern.In this dissertation,we propose a multimodal deep architecture for protein ubiquitylation sites prediction.First,the protein lysine modified database(PLMD)is used as the underlying database,which covers the most full ubiquitination data so far.We select all protein ubiquitination sites from PLMD.To ensure the accuracy of the results,we divide the data into training sets,test sets and validation sets randomly and the validation set is 30% of the training set.Next,in view of the data and the characteristics of ubiquitination,different data patterns are extracted from three aspects,namely,protein sequence information(One-of-key),physical and chemical properties(Physico-chemical properties)and evolutionary conservatism information(Positionspecific scoring matrix),and sliding windows are used at the same time.Then,the bootstrap strategy is used to solve the problem that the positive and negative samples are extremely unbalanced.Different network structures are designed for different data patterns,and their respective network models are trained respectively.Finally,the three network models are integrated together,and the final fine-tuning is carried out,and the integrated model is used for prediction.The experimental results show that the deep learning algorithm has the ability of self-learning,which can learn more effective feature information and the final model is excellent.The sensitivity and specificity of our model have been improved and the performance reach the best.The accuracy is 66.43%,sensitivity is 66.7%,specificity is 66.4% and MCC is 0.221.By comparing with other algorithms,the effectiveness and robustness of our method are more illustrated.
Keywords/Search Tags:PTM, Ubiquitylation site, Deep learning, Multiple Modalities
PDF Full Text Request
Related items