Font Size: a A A

A Study On Predicting Lysine Acetylation Sites Based On Modular Dense Convolutional Blocks

Posted on:2022-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z L YanFull Text:PDF
GTID:2480306542481134Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Lysine acetylation(Kace)of protein is one of the most important types of Post-Translational Modifications(PTMs),which is involved in various physiological activities of cells and closely related to biological processes such as DNA repair and cell signal transduction.The dynamic regulation of Kace modification in the organism is an important condition to ensure the normal progress of various biological functions.But its abnormal modification will lead to various diseases,such as diabetes,cancer and neurodegeneration disease.Therefore,the identification of the Kace sites is of great significance for studying the mechanism of acetylation and the pathological process of related diseases.The existing Kace site identification methods mainly include traditional experimental techniques and computational prediction methods.The traditional experimental techniques are usually the gold standard for Kace site identification,but their costs were high and the experimental processes were complex.And they were difficult to obtain a large amount of site modification information in a short time.The computational prediction methods modelled the potential characteristics of Kace sites by the protein sequence characteristics,amino acid characteristics and other information,and generated many candidate Kace sites in a high-confidence manner,thereby guiding the experimental verification of the sites and reducing the cost of large-scale identification of sites.However,most of existing computational prediction methods used protein sequence level information as input,and the protein structural properties were not considered comprehensively.Furthermore,only high-level features were focused on.These result in serious information loss and weakens the prediction results of Kace sites.Based on the above problems,we combine with deep learning technology to study the prediction method of Kace site.The main innovations and contents are as follows:(1)The protein structural properties contain highly useful local and global structural information,which provides a powerful basis for identifying PTMs.Thus,we introduce the results of SPIDER3 to encode the protein structural properties of the peptides of the sites,and combine them with the original protein sequence and the amino acid physicochemical properties to construct the potential sites' feature space.(2)Aiming at the shortcomings that the existing Kace site prediction methods do not consider the richer protein structural properties,and in the process of feature extraction,the information crosstalk will occur among the three types of information: the protein structural properties,the original protein sequence and the amino acid physicochemical properties.We propose a Kace site prediction model based on a modular convolutional neural network(CNN),called MC-Kace.MC-Kace uses three CNN modules to extract high-level representations of Kace sites from the above three aspects,while considering the protein structural properties,it effectively avoids the crosstalk between different types of information.The experimental results show that the model MC-Kace predicts the potential Kace sites well.(3)Because of the model MC-Kace strictly follows the layer number of the network from low to high for feature learning,ignoring the reusability of low-level features,and the importance of different features is different.We propose a deep learning model based on modular dense convolutional blocks(MDC)for Kace sites prediction,called MDC-Kace.MDC-Kace uses dense convolutional blocks to simultaneously focus on low-level features and high-level features to achieve feature reuse and reduce information loss.And the squeeze-excitation(SE)layer is introduced to weight feature maps to enhance the acetylation information flow in the network.The experimental results show that the model MDC-Kace effectively improves the prediction accuracy of Kace sites and is quite competitive compared with existing methods.(4)In order to provide convenience to users,we combine the Python web design framework,Django,to develop the MDC-Kace web service platform.Through this platform,users can input protein sequences to predict Kace sites.In summary,our work makes full use of the information of the protein structural properties,the original protein sequence and the amino acid physicochemical properties,and uses MDC to extract features.While automatically learning the high-level representation of the Kace site,MDC-Kace achieves the reuse of features and effectively predicts the potential Kace sites.This result will generate useful site modification information for the research of disease process of related metabolic diseases and the development of the therapeutic drugs.
Keywords/Search Tags:lysine acetylation, protein structural properties, modular network, dense convolutional blocks, feature learning
PDF Full Text Request
Related items