Font Size: a A A

Research On Splice Site And Prote In Interation Prediction Based On Deep Learning Network

Posted on:2020-11-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y YaoFull Text:PDF
GTID:2370330575954473Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The central dogma of molecular biology includes DNA transcription,RNA translation and protein expression.Alternative splicing is the key step during DNA transcription,the correct execution of splicing process largely depends on the precise identification of splicing sites.Incorrect splice site identification often leads to various diseases.Alternative splicing allows a gene to produce a variety of transcription variants,these variants will produce the corresponding proteins with different specific biological functions.Proteins mutation or abnormal interaction may also lead to disease and even cancer.In view of the importance of splice sites and protein interaction predictions in the biological field,this paper focuses on the deep learning methods for splice site and protein interaction predictions.The main contents include:1.Splice sites prediction and splicing pattern analysis are crucial to the understanding of the transcription process in genes.Although existing computational approaches have achieved great success in classifying true/false splice sites,many experiments rely on inevitable hand-extracted features and model interpretability is relatively weak.Considering these challenges,we report a deep learning-based framework(DeepSS),which consists of DeepSS-C module to classify splice sites and DeepSS-M module to detect splice sites sequence pattern.Compared with state-of-the-art algorithms,experimental results show that DeepSS-C module yields more accurate performance on six public donor/acceptor splice sites datasets.In addition,in order to explore deep learning prediction process,model interpretation module visualizes the convolution features and demonstrates the abstract feature extraction process from bottom to top,model interpretation and downstream analysis include:1)motif detection;2)convolution kernel analysis;3)splicing pattern exploration.2.After alternative splicing,genes are transcribed into mRNAs which will induce production of corresponding proteins.As we know,the execution of various life activities in living organisms mainly depends on execution of proteins and their correct interactions.Therefore,the study of protein interactions will help to understand the regulation mechanism in living organisms and facilitate drug discovery and disease control process.At present,the majority of existing computational methods for protein interaction prediction methods use a two-stage process to first extract features based on protein sequence and structure information and then use traditional machine learning methods for classifying.However,artificially extracted features have many disadvantages,such as the need for very specialized domain knowledge.What is worse,the extracted features can not fully represent explicit biological properties of the protein sequence,which induces limited gains from the state-of-the-art model.In this work,a new sequence representation method is designed to generate protein sequence representation.Word2vec is a successful word embedding technique in various applications in NLP and can describe words interactions between neighboring members.Considering that Word2vec model can obtain high-quality feature representations in a data-driven manner and deep belief network has powerful ability to automatically extract features from high-dimensional and large-scale data,we report a protein interaction prediction method based on Word2vec and deep belief networks.The proposed method is evaluated on the S.Cerevisiae,human dataset and five independent datasets.The experimental results show that Word2vec method can generate protein sequence representation more precise than the other feature extraction approaches and employing deep learning method for classifying has a certain help in protein interaction prediction problems.In a word,in the study of splicing site classification,this paper utilizes the ability of automatically extracting deep features that hidden in convolutional neural networks and eliminates the drawbacks of manual feature extraction step.Combined with deep convolutional networks,this paper explores splicing site pattern behind models,which makes up the shortcomings in model interpretation.In the study of protein interaction prediction,a new sequence representation method is designed to capture different residue interactions from the Swiss-prot database and then transform each residue into an eigenvector representation of fixed dimensionality.Finally,the method combines the new feature representation method and deep learning for classification.
Keywords/Search Tags:Splice Site, Protein-protein Interaction(PPI), Deep Learning, Word2vec
PDF Full Text Request
Related items