Font Size: a A A

Research On CRISPR/Cas9 Off-target Prediction Method Based On Deep Learning

Posted on:2022-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:D ChenFull Text:PDF
GTID:2480306731977829Subject:Computer technology
Abstract/Summary:PDF Full Text Request
CRISPR/Cas9,as the third-generation gene editing technology,is currently the most promising and most versatile tool in gene manipulation applications.However,some recent studies have found that Cas9 nuclease sometimes cleaves gene sequences similar to the target gene.This phenomenon is also called off-target reaction.The uncertainty of off-target will bring unpredictable consequences to the organism being edited.For computer-assisted programs,if the site of off-target reactions can be predicted before biological experiments,not only can they be prepared for adverse reactions in advance,but also can screen reaction reagents based on the predicted results to avoid dangerous situations.Therefore,in response to the above difficult problems,this paper proposes a method of CRISPR/Cas9 off-target prediction based on supervised learning: DNA-BERT and a method of CRISPR/Cas9 off-target prediction based on VAE data enrichment: H-VAE.The specific research work is as follows:(1)CRISPR/Cas9 off-target prediction method based on self-supervised learning:This method starts with massive unlabeled genetic data.The pre-training tasks allows the model to perform self-supervised training on unlabeled genomic data,and incorporates biometric information in the process of self-supervised learning to alleviate labeled The problem of less data.This paper proposes a pre-training model for DNA sequences based on the text pre-training framework BERT: DNA-BERT.The experimental results show that DNA-BERT,a genetic information extraction framework based on self-supervised learning,has improved various evaluation indicators on the public data set compared with the current best Deep CRISR model,and has improved the ROC-AUC and Spearman correlation coefficients.13.07% and21.47%,especially in AUC-PR and weighted Spearman correlation coefficient,which can significantly reflect the performance of the model on unbalanced data,increased by 52.8% and 162%,respectively,compared with Deep CRISPR.This proves that the model can effectively predict CRISPR/Cas9 off-target sites,and it also proves that additional pre-training can also alleviate class imbalance.(2)H-VAE,a few-sample data enhancement method based on variational autoencoders: First of all,to solve the problem of the existing model's weak ability to extract base pair matching information,a deep learning framework based on Pair encoding is proposed,which makes The model can make full use of the matching information of sg RNA-DNA base pairs.At the same time,the encoding method can also deal with accidental types of mismatches.Due to the extremely unbalanced data categories,the model training is extremely unstable.This paper proposes a method for small-sample data enhancement based on a variational autoencoder.The experimental results show that the proposed method H-VAE can achieve the best results on various types of data sets.On the Mismatch data set,compared with Deep CRISPR in the three test scenarios,the average ROC-AUC and PR-AUC increased by 13.87% and 51.45%,respectively.In the new Indels data set test scenario,compared with CRISPR-Net,the ROC-AUC and PR-AUC were increased by 1.53% and 31.89% respectively,which proved H-VAE can improve off-target prediction in various scenarios.From the improvement of PR-AUC,it can also be seen that H-VAE can significantly improve the effect of unbalanced classification.
Keywords/Search Tags:Deep Learning, Self-supervised learning, CRISPR/Cas9, Off-target Prediction
PDF Full Text Request
Related items