Font Size: a A A

Deep Learning Based Method For Identification Of Mutation Loci And Association With Disease

Posted on:2022-08-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y H LiuFull Text:PDF
GTID:2504306731978029Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of modern sequencing technology,the speed and difficulty of biological sequencing are constantly decreasing,and bioinformatics has entered the era of omics and big data.However,there are the following challenges to current studies based on genomic data.First of all,according to the central principle,those gene regions that can be translated into protein account for less than 3% of the total gene sequence,and the remaining gene sequences that cannot be translated directly also contain rich functional regions that can affect biological traits.At present,the research on non-coding genes is not deep enough at home and abroad,and there is a lack of a comprehensive and effective epigenetic space to expatiate the influence of non-coding genes.Second,the traditional manual method to deal with short gene read although accurate and effective,but in biological era of big data have been unable to successfully deal with hundreds of millions of high-throughput sequencing data,and the depth of the traditional learning model in specific space system can obtain good effect,but once the feature space to expand with the decrease of its performance.Therefore,how to design and implement an algorithm model that can perfectly adapt to large-scale apparent feature space has important scientific reference meaning and clinical analysis value for biological research and disease association analysis.In view of the above difficulties,this paper constructed a large-scale apparent feature space and proposed a deep learning-based mutation site recognition method.Finally,gene sequences were mapped to the apparent space based on the constructed feature space model,so as to achieve the purpose of linking functional elements and diseases.The specific research work is as follows:(1)Construction of large-scale apparent feature space: Based on data from projects like Enocode(Encyclopedia of DNA Elements),Moden Code and Road Map,The functional DNA annotation data were successively constructed into a large-scale apparent feature space through format normalization,file merging,interval de-duplication,fragment sorting,sequence mapping and other operations.(2)Function element prediction model for multi-dimensional feature extraction:under the convolutional neural network model,the frequency division feature extraction mechanism was proposed to replace the original convolutional filter for feature extraction operation,and a new deep learning model Deep MSA adapted to sequence input was designed by integrating the muti-head-attention mechanism in the information update step.The human reference gene HG19 provided by NCBI(National Center for Biotechnology Information)was used to train and validate the model.The results show that the AUROC value of DEEPMSA model is 0.03~0.05,0.02~0.04 higher than that of other models.(3)Design and implementation of the epigenetic feature prediction system: In order to provide support to relevant researchers and to be applied in clinical studies,we completed the design and implementation of the epigenetic feature prediction system.Specific implementation is as follows: Based on CSS and HTML developed a user friendly interactive interface,the persistent ability is developed based on My SQL sequence-features warehouse space mapping,based on the data buffer memory databases developed middleware,effectively improve the response of the system and concurrent coping,using distributed system and the guard mechanism to improve the usability of the system,The master-slave mechanism is used to ensure the disaster tolerance,permanence and recoverability of the system,the B+ tree index is used to improve the search efficiency of the database,and the visual middle station system is written based on VUE to facilitate users to monitor and manage the system.The deep learning-based mutation locus recognition method proposed in this paper takes into account the accuracy and completeness of the gene reading prediction in the high-dimensional feature space,which has certain reference significance for the study of sequence processing and provides a new idea for other omics data research.
Keywords/Search Tags:Deep learning, Non-coding region DNA, Genomics, Disease association analysis
PDF Full Text Request
Related items