Font Size: a A A

Deep-learning Based Identification And Functional Annotation Of LncRNA And Research Of Associations Of LncRNAs And Cancers

Posted on:2020-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:G L AnFull Text:PDF
GTID:2404330599452362Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
In the eukaryotic transcriptome data,only about 2% of the sequences can be translated into proteins,up to 70% of the human genome is transcribed into non-coding RNA,and the higher the complexity,the higher the proportion of non-coding transcripts.Therefore,the role of non-coding transcripts has gradually attracted widespread attention.In particularly,the research for long non-coding RNA(lncRNA)becomes a hotspot in recent years in non-coding RNA.LncRNAs are a class of non-coding transcripts whose length is above 200 nucleotides.LncRNA is involved in many important cellular processes,changes in the expression of certain transcripts and results in greatly changes in cell life activities leading to certain diseases.There is increasing evidence that lncRNA plays a role in the development and progression of cancer.The lncRNA expression profile in cancer cells is significantly different from that in normal cells,and the lncRNA expression profiles are different in cancer cells at different stages.Therefore,it is a very important research direction to understand the mode of regulatory goals of lncRNA and understand the role of lncRNA in diseases such as cancer.At present,identifying lncRNA from transcriptome data and understanding the function of specific lncRNAs,demonstrating the role of lncRNA in cancer remains a huge challenge.On the other hand,the study on machine learning has developed rapidly,and machine learning has gradually become an effective way of learning and reasoning.Deep learning is a branch of machine learning and one of the most popular scientific research trends.Deep learning can analyze a large amount of complex data to find the potential relationship between these data and extract multiple features layer by layer from the data.The calculation of deep learning is very complicated.It takes a long time to process complex data and requires a large memory.The development of hardware performance such as CPU and GPU promotes the extensive use of deep learning.So far,deep learning has been gradually applied to images processing,speech recognition and other fields and plays an important role in the research of these fields.The method of deep learning is also applied to the field of biomedicine.For example,many deep-learning-based methods with highly accurate rate have been proposed for identification of different functional elements and sites,as well as the extraction of medical image features.Therefore,the application of deep learning is of great significance to the research in the field of biomedicine,and is a breakthrough in traditional research methods.With the rapid development of high-throughput sequencing technology,transcriptome data has been rapidly accumulated.It provides an important data foundation for the establishment of the lncRNA intelligent identification system using deep learning methods.In this paper,we identify lncRNA from transcripts based on deep learning method,develop lnc2 Catlas database to quantify the associations between lncRNAs and cancers,and develop LIVE database to explore binding and regulation network of experimentally validated associations between lncRNAs and cancers.We completed the following researches in this paper:First,we focused on the various lncRNA identity methods based on deep learning and machine learning and compare these algorithms.By studying and comparing these methods,we find that these methods generally require a priori knowledge such as conservation and other related knowledge.It takes a lot of time to calculate the selected features,and the input sequence is divided into short segments,which is easy to lead to noise or information loss in models,affecting the characteristics of model learning and resulting in low accuracy.We have an insight into the two commonly used deep learning models,convolutional neural networks and recurrent neural networks,and explore the mathematical operations of the two networks.Convolutional neural networks is used based on the characteristics of the sequence data.We used DeepSea model to extract the epigenetic features of the sequence and use recurrent neural networks to extract sequence features.Based on sequence features and epigenetic features,we can identify lncRNA and protein-coding RNAs from transcriptome data in test set.Our models have high accuracy and generalization capabilities.Second,we explore ways in which lncRNAs play a role in the development and progression of cancer based on identifying lncRNAs.The current research on the relationship between lncRNA and cancer is based on experimental verification and computational prediction methods.A few interactions between lncRNA and cancer are experimentally validated,and the relationships between lncRNA and cancer are calculated by computational methods based mainly on machine learning algorithm models and integration of lncRNA-miRNA and miRNA-cancer interactions.We quantify associations between lncRNAs and the corresponding cancer through SNPs,proteins and genes.We used RNAsnp,Global Score,and WGCNA to evaluate SNP-induced changes in secondary structure of lncRNA,lncRNA and protein interactions,and co-expression networks.Based on the above data,we developed the Lnc2 Catlas database to facilitate user to query cancers that may be associated with lncRNA and to provide candidate lncRNAs for further experimental validation.Finally,we develop LIVE database to explore the relationship between lncRNA and cancer for researchers.Unlike earlier databases that extracted candidate lncRNAs from experimental validation and computational prediction studies,the current database focus on the specific functional role of lncRNA,but the potential lncRNA-cancer interaction network contained in the literature is not fully revealed.Therefore,we developed LIVE database.First we searched the PubMed database,and developed a word segmentation system to preprocess the abstract of the document and extract keywords such as species,experiment type,lncRNA,and classify the literature according to keywords.Then we curated the experimentally validated interactions between lncRNAs and cancers.Based on the manually labeled interactions between the lncRNAs and cancers,we constructed the LIVE(LncRNA Interaction Validated Encyclopedia)database.The LIVE database divides these validated relationships into three types of networks: binding interaction network,regulation network,and disease association network.Through the combination of these three networks,we further understand the different types of functional regulatory elements and interactions contained in the lncRNA interaction network.In summary,the work of this paper focuses on "deep-learning based identification and functional annotation of lncRNA and research of associations of lncRNAs and cancers".The hybrid architecture based lncRNA recognition algorithm is developed to identify lncRNAs from transcripts only using sequences,and the Lnc2 Catlas database is developed to quantify the associations between lncRNA and cancer.We developed LIVE database,providing manually labeled associations between lncRNAs and cancers and a complete lncRNA-cancer interaction network,which is helpful to further reveal the potential relationship between lncRNA and cancer and explore the role of lncRNA in cancer therapy.
Keywords/Search Tags:deep learning, lncRNA, cancer, interaction, database
PDF Full Text Request
Related items