Font Size: a A A

Research And Implementation Of CRISPR Sequence Prediction And Visualization Analysis Methods

Posted on:2022-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:F YanFull Text:PDF
GTID:2480306317457914Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The cluster of regularly spaced short palindrome repeats and its associated protein(CRISPR-Cas)system is an adaptive immune system possessed by prokaryotes(including most archaea and about half of bacteria),and its main components include direct CRISPR repeats Sequences(direct repeats,DRs),spacers,promoters,and some CRISPR-associated protein genes.Prokaryotes can remember and recognize foreign invading genetic elements(such as phages,plasmids,etc.)through the CRISPR-Cas system,and drive adaptive immune functions to block the invasion and attack of foreign substances.The gene editing technology developed by the CRISPR/Cas9 system has been widely used in the field of genetic engineering and has played an important role in cancer and virus research.As an important part of the CRISPR-Cas system,the CRISPR sequence(including the repetitive sequence and its spacer sequence),which is generally 24-47 bp in length,determines which gene fragments can be transcribed to activate immunity when prokaryotes are invaded by foreign sources.Features.However,a large number of CRISPR-Cas systems in prokaryotes have not been discovered and studied in depth.Therefore,effective identification and analysis of CRISPR sequences is one of the important goals of current prokaryotic genome research.Currently,bioinformatics methods provide efficient technical means for the identification/prediction of CRISPR sequences.Typical methods include PILER-CR,CRT,CRISPRFinder,CRISPRCasFinder,MinCED,CRISPRDetect,and CRISPRdisco,etc.Other CRISPR databases(such as CRISPRDetect,CRISPRI,CRISPRCasdb,etc.)also integrate some of the above methods for CRISPR sequence identification.However,these prediction software generally use traditional sequence identification methods,and rarely provide users with convenient methods to observe,manipulate and analyze these CRISPR sequences,thus affecting further research and analysis.In response to this problem,researchers generally use visualization methods to graphically display the CRISPR sequence recognition results,which helps to discover and compare CRISPR-Cas systems of different strains.However,there are currently few visual analysis tools for CRISPR sequences,and their functions are not yet complete.Researchers have mainly used Excel macro programming to visualize CRISPR sequences in the past,but macro programming has problems such as complex steps and non-interaction.Other tools(such as CRISPRcompar)can only display the local alignment information of CRISPR at the sequence level,and can only display the alignment results in binary images,making it difficult for users to observe and analyze CRISPR sequences at the global level.In recent years,there are two softwares(CRISPRviz and CRISPRStudio)that provide visualization of CRISPR sequences.However,when these two kinds of software present complex CRISPR sequences,the combination of graphics and colors is too complicated,and there are problems such as inflexible operation and insufficient functions,which are not conducive to the observation and in-depth analysis of CRISPR sequences by researchers.In response to the existing problems in the identification and visual analysis of CRISPR sequences,this paper developed a visual analysis system,CrisprVi,which mainly includes the following functions:(1)The CRISPR repeat sequence prediction method based on convolutional neural network effectively improves the accuracy of DR sequence prediction;(2)Graphically display,label,modify,and compare CRISPR repetitive sequences and inter-region sequences,so that users can more intuitively analyze and compare the spatial distribution of CRISPR sequences on the genome;(3)Perform statistical analysis on CRISPR repetitive sequences and interval sequences,and automatically generate charts;(4)A search algorithm for similar sequences based on BLAST DR or spacer is provided,and the results are presented in the form of a clustering heat map.Based on this,users can judge whether there is a consensus sequence between strains and provide help for detecting specific sequences.In summary,the system provides rich functions for users to predict and analyze CRISPR sequences,which helps researchers to better discover and analyze the CRISPR-Cas system in prokaryotes.
Keywords/Search Tags:CRISPR, DR, Spacer, Prediction, Visualization
PDF Full Text Request
Related items