Font Size: a A A

Research And Implementation Of Spark Based Off-target Prediction Algorithm For CRISPR System

Posted on:2020-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiuFull Text:PDF
GTID:2370330599459593Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Genome editing technology plays a very important role in gene function research,species trait improvement and disease research,and has become a hot research topic.The CRISPR system is currently the most promising genome editing tool.However,due to its off-target effect,it may lead to the destruction of DNA fragments in unknown locations.Predicting off-target sites within the genome-wide range in advance to achieve risk avoidance has important guiding significance for the design and application of a safe and effective CRISPR system.At present,the existing off-target prediction algorithm of the CRISPR system is not very efficient,and it is time-consuming to predict off-target sites in the genome-wide range.In this paper,a new off-target prediction algorithm called Spark-OFFinder is proposed.The algorithm applies the FM-index algorithm to the off-target prediction.By using the Spark distributed computing framework,it can run concurrently in the Spark cluster.This paper generates an FM-index index file for the reference genome sequence,and compresses the contents of the index file so that it can be completely loaded into the memory to improve the reading efficiency.Spark-OFFinder designs a partial fuzzy matching algorithm based on FM-index algorithm,which can search the off-target sites of the CRISPR system in the reference genome sequence,and reduce the search space through some optimization measures to improve the efficiency of the algorithm.This paper also uses the MapReduce programming model to parallelize the algorithm,and implements the parallel algorithm based on the Spark distributed computing framework,so it can be distributed in the Spark cluster to further improve the operating efficiency.Finally,this paper compares Spark-OFFinder with the widely used off-target prediction tool Cas-OFFinder,and the results of Spark-OFFinder are completely correct.In a standalone environment,the speed of Spark-OFFinder is much higher than that of Cas-OFFinder.In a cluster environment,the control variable method is used to test the influence of the length of the reference genome,the number of sgRNA sequences and the maximum allowable mismatch.The test results show that in the cluster environment used in this paper,Spark-OFFinder can run much faster than Cas-OFFinder under different input conditions,and can achieve hundreds or even thousands of times of improvement under some input conditions.In addition,Spark-OFFinder can take advantage of its speed when the reference genome sequence is long,the number of sgRNA sequences is large,and the maximum allowable mismatches is small.And the algorithm has better scalability,and can steadily increase the running speed with the expansion of the cluster size.
Keywords/Search Tags:CRISPR system, Off-target prediction, FM-index algorithm, Spark
PDF Full Text Request
Related items