Font Size: a A A

Research On Integrative Analysis Technologies For CRISPR Screening Data

Posted on:2019-03-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y B CuiFull Text:PDF
GTID:1360330611492941Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the completion of Human Genome Project,the research major of life science converts from structual genomics to functional genomics.Functional genomics describs gene(and protein)functions and interactions from the view of molecular biology.CRISPR/-Cas9 system targets DNA based on the complementary base pairing,which is more stable,easy to use,and cost-efficient.As for these advances,CRISPR/Cas9 has been widely used in genome editing.CRISPR screening adopts CRISPR/Cas9 system for genome-wide genetic screening,allowing systematic analysis the relationship between gene and pheno-type.It is an important tool in functional genomics.CRISPR screening promotes the dis-covery of essential genes,cancer gene vulnerability,and potential drug target.However,some improvements can be made for the experiment and analysis of CRISPR screening.The off-target rate of some CRISPR screening library is high.Sequencing process impacts the identification of essential genes of CRISPR screening.No good resource for gene reg-ulatory network analysis in functional analysis of essential genes identified by CRISPR screening.We propose a series of technologies for integrative analysis of CRISPR screens.We design and implement a highly efficient parallel CRISPR off-target sites detection al-gorithm,a series of normalization methods,a database of CRISPR screens and a database of single-gene perturbation transcriptome.The main contents of the dissertation are as follows:BWT-based highly efficient and parallel CRISPR off-target detection algorithm.In the CRISPR/Cas9 gene editing system,a single guide RNA(sgRNA)directs Cas9 nu-clease to the target DNA region,and the target gene is edited by Cas9.The gene editing efficiency of the CRISPR/Cas9 system relies heavily on well-designed sgRNA.However,both sgRNA and Cas9 protein allow for a mismatch of several bases when binding DNA,resulting in damage to the DNA outside the target region by the CRISPR/Cas9 system,causing non-specific off-target variation,which seriously affects the performance of gene editing.In response to this problem,the second chapter of this paper designs and imple-ments an efficient parallel parallel search target search algorithm OffScan based on BWT.OffScan is not limited by sgRNA mismatch numbers and PAM.The backward search algo-rithm based on FM index reduces space complexity from O(n~2)to O(n)while maintaining the exact search time complexity of O(n).OffScan design implements a fuzzy search al-gorithm based on restricted traversal,reducing the time complexity of fuzzy search from O(|Q||X|)to O(|Q|~2)(Q is the length of the query string,X is the length of the original string).Moreover,OffScan parallelizes the search algorithm on multi-core processors and implements parallel IO with three-stage pipeline technology to improve data throughput.In addition,we also designed a high-specificity sgRNA screening method based on OffS-can.After testing,it can find more potential off-target sites and improve the specificity of sgRNA.Normalization methods for CRISPR screens data.The primary goal of CRISPR screening data analysis is to identify the genes whose perburbation leads to phenotype change under certain screening conditions,relative to a predefined control condition.However,due to the library size and sequencing depth varies between samples,it is im-possible to compare read counts of different samples directly.Cells exposed to differ-ent conditions(for example,with or without drug)may have different proliferation rates.Comparing cells that have faster doubling time to more slowly prolifering cells may lead to biases in essential genes identification.In addition,the screening outcome is biased in regions with high Copy Number Variation(CNV)levels,due to the ability of the Cas9nuclease to induce multiple double-strand breaks and strong DNA damage responses at these regions,leading to G2 arrests.In Chapter 3,we propose a series of data normal-ization methods,including read count normalization with negative control genes or non-essential genes,Beta score normalization with essential genes,and piecewise linear re-gression based copy number bias correction,correcting CRISPR screening data from three aspects systematicly.Validated by data analysis,out normalization methods can correct the bias efficiently.In addition,we have incorporated these methods to MAGeCK and MAGeCK-VISPR to improve essential gene identification.Gene regulatory network analysis methods based on single-gene perturbation data.After identifying the key genes for CRISPR high-throughput screening,the func-tion of key genes needs to be analyzed to determine the cellular pathways and roles of the genes.The existing analytical methods are mainly based on gene ontology and gene set enrichment analysis methods to analyze the roles and pathways of key genes,and lack data resources and methods that can analyze gene regulatory relationships.To solve this problem,the fourth chapter of this dissertation proposes a gene regulation network analysis method based on single gene perturbation data.We integrated 15260 sets of single-gene perturbation expression profiles and 5864 sets of corresponding ChIP-seq data,and based on these data analysis,we constructed a gene co-expression association network and a gene transcriptional regulation network.In order to facilitate the anal-ysis of gene regulation,we also designed and implemented a public database SIGMA(http://www.sigmagene.cn/)and integrated the gene regulatory network analysis method.SIGMA supports online interactive gene regulatory network analysis functions,includ-ing gene differential expression analysis,transcription factor target gene analysis,gene upstream regulatory element analysis,and gene regulation relationship analysis.Essential gene analysis techniques for cancer cells based on massive high-throughput screening data.Identification and study of cancer-specific essential genes can facilitateunderstanding of cancer cell survival pathways,as well as the discovery of potential thera-peutic targets.Although many studies have been published on the use of high-throughput screening techniques to study certain cancer-specific essential genes,there is no work to integrate these data into a systematic study of a variety of cancer-specific essential genes.The fifth chapter of this paper proposes an essential gene analysis technique for cancer cells based on massive high-throughput screening data.We have integrated nearly 7000sets of high-throughput screening data such as CRISPR and RNAi,including human cell lines,mouse cell lines and in vivo experimental data,which have been carefully proof-read and unified to systematically analyze the essential genes specific to various cancers.A technique for essential gene analysis of cancer cells was proposed.To facilitate data query and analysis,we also designed and implemented a public database CRISP-view(http://crisp-view.cistrome.org/)and integrated the essential genetic analysis technology.CRISP-view supports online analysis of key genes such as proto-oncogenes,tumor sup-pressor genes,and essential genes for cancer cells,as well as information on potential drug targets,and guides drug design and cancer treatment.
Keywords/Search Tags:CRISPR/Cas9, CRISPR Screening, Off-target Effect, Single-gene Perturbation, Gene Regulatory Network, Essential Genes
PDF Full Text Request
Related items