Font Size: a A A

Deep Learning-based Algorithm Research And Tool Development For Cas9 And Variant SgRNA Activity Prediction

Posted on:2023-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:X W NieFull Text:PDF
GTID:2530306842965089Subject:Animal breeding and genetics and breeding
Abstract/Summary:PDF Full Text Request
The CRISPR/Cas9 system-mediated targeted genome modification has been widely used in the fields of porcine functional gene screening and gene editing breeding,and the mutated and modified Cas9 variants have the advantages of low off-target and broadened PAM loci compared with wild type.There is a lack of algorithms for accurate prediction of sg RNA activity of Cas9 and its variants,and thus there is an urgent need to develop a new algorithm that can resolve the accurate prediction of sg RNA activity of Cas9 and its variants.To address the above issues,this study collected sg RNA activity datasets of Sp Cas9 and eight Cas9 variants containing e Sp Cas9(1.1),Hypa Cas9,evo Cas9,Sp Cas9-VRQR,Sniper-Cas9,Sp Cas9-HF1,Sp Cas9-NG,and x Cas9,using Both Lightgbm and attention mechanism algorithms were used to resolve the activity patterns of sg RNA sequence features and base composition against Cas9 and its variants.This led to the development of a new algorithm for sg RNA activity prediction(sg Rscore)for Cas9 and its variants,and a new software for sg RNA activity prediction and off-target assessment(sg RNAcas9-AI)for Cas9 and its variants,which can be widely used for designing sg RNAs in mammals.The results are as follows:(1)Using the sg RNA activity dataset,we compared the accuracy of sg RNA activity prediction on the multilayer perceptron algorithm(MLP)for the six cases of sg RNA sequence lengths up to 20,22,24,26,28 and 30 nt by adding target flanking sequences to the three sequence coding methods,label coding,one-hot coding and two-nucleotide coding,and found that the target flanking sequences improved the prediction accuracy by20% on average in Sp Cas9,x Cas9 and Sp Cas9-NG in the three variants.(2)By comparing the prediction accuracy of five different neural networks,namely fully connected neural network(FNN),convolutional neural network(CNN),recurrent neural network(RNN),gated unit neural network(GRU),and long short-term memory network(LSTM),on Cas9 and its variant sg RNA activity dataset,it was found that the LSTM algorithm had the highest sg RNA activity prediction accuracy on the dataset compared to other neural networks.(3)The characteristic importance of the extracted sequences using the Lightgbm algorithm was used to evaluate the predicted effect of the base composition of different positions of sg RNAs of Cas9 and variants on activity,and it was found that the base composition of the effect of sg RNA sequences on sg RNA activity of the 4 Cas9 variants,evo Cas9,Hyapa Cas9,Sniper-Cas9 and x Cas9,was uniformly distributed at all positions;while the base composition patterns of the different positions of the sg RNA sequences of the three variants of Sp Cas9,e Sp Cas9(1.1)and Sp Cas9-HF1 on sg RNA activity were highly similar,and the base composition of the different positions of the sg RNA sequences of the other variants on sg RNA activity had large differences.(4)Using the sg RNA activity dataset,a deep learning algorithm based on the attention mechanism was constructed to evaluate the position-dependent nucleotide preference of sg RNA sequences on sg RNA activity by extracting the attention weight coefficients in the training model,and found that base T in sg RNA sequences reduces the sg RNA activity of Sniper-Cas9,while base C in sg RNA sequences reduces the sg RNA activity of Sp Cas9-VRQR;further,after overlaying the LSTM algorithm in the attentionbased mechanism and extracting the second-order preference matrix in the model,it was found that sg RNA activity was affected by the interactions of adjacent bases of sg RNA sequences.(5)Based on Cas9 and its variant sg RNA activity position-dependent nucleotide preference laws,a new algorithm for sg RNA activity prediction,sg Rscore,was developed and compared with six representative sg RNA activity prediction algorithms,Deep Cas9,Deep Sp Cas9,Deep Sp Cas9 variants,CNN-SVR,C_RNNCrispr and Deep HF,and the sg Rscore algorithm was found to have higher prediction accuracy on the sg RNA activity dataset as well as the Chuai2018 independent dataset.(6)The new software sg RNAcas9-AI,which can apply to the sg RNA design needs of Cas9 and variant variants,was further developed and compared with four representative software,CRISPRpick,CHOPCHOP,E-CRISP and CRISPOR,and the sg RNAcas9-AI software was found to have higher prediction accuracy on the Kim2020-NBE-lenti293 T dataset had higher prediction accuracy.Compared with the 3 sg RNA offtarget evaluation software,CRISPRseek,cas-offinder,and off-spotter,the sg RNAcas9-AI software has faster off-target calculation speed.The experimental identification revealed that the predicted value of sg RNA activity by sg RNAcas9-AI software was positively correlated with the sg RNA activity based on the experimental assay(0.77),indicating that the sg RNAcas9-AI software predicted the sg RNA activity with higher accuracy.In conclusion,based on Cas9 and its variant sg RNA activity dataset,this study elucidated the position-dependent nucleotide preference pattern of the sg RNA sequences of Cas9 variants on their activities by deep learning methods,developed an algorithm for sg RNA activity prediction of Cas9 and its variants,and then developed new software that can be applied to the sg RNA design of Cas9 and its variants,providing a new tool for functional gene research and gene editing breeding in pigs using Cas9 and its variants.
Keywords/Search Tags:Gene editing, Deep learning, Activity prediction, Algorithm, Software
PDF Full Text Request
Related items