Font Size: a A A

Bioinformatics Analysis And Identification For Anti-CRISPR Proteins

Posted on:2021-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:C DongFull Text:PDF
GTID:1360330647960773Subject:Biomedical engineering
Abstract/Summary:PDF Full Text Request
In 2013,Bondy-Denomy et.al.discovered one kind of protein,which can help bacteriophages escape the injury from the CRISPR-Cas systems.Such proteins can exert their inhibitory function via binding and modifying manners.According to their suppress function for the CRISPR-Cas system,Bondy-Denomy et.al.termed them as anti-CRISPR proteins(Acrs).The article's work mainly concerned on Acrs.The work firstly collected Acrs from reference papers and other Acr-associated information from public databases.For example,the present work collected species source,coding gene sequence from NCBI,protein-protein interactions from STRING and DIP databases,sequence similarity from VFDB database.The work further constructed an online comprehensive database Anti-CRISPRdb(http://cefg.uestc.cn/anti-CRISPRdb2)after organizing those data,which contains more than 400 records in the firstly released version.As time goes on,more and more Acrs entries and Acrs families are discovered,thus the work further updated the initially released version of the database and added new features,which can be accessed at http://cefg.uestc.cn/anti-CRISPRdb.In the updated version,more families and family members are included,for example,6 new Acr types that inhibit other CRISPR-Cas systems were added in the updated database,and the work also integrated NCBI genome browser into the newly updated database,which can allow users to view the surrounding proteins of Acrs.Meanwhile,more structures are added via comparing Acrs with all protein chains in the PDB database using sequence similarity search.The updated database now totally contains more than 320 structure informations between Acrs and their interaction partners.This work further analyzed the characteristics of Acrs based on the Anti-CRISPRdb database.The analysis shows that Acrs and non-Acrs have distinguishable features.For example,Acrs are much shorter compared with non-Acrs and have more significant codon usage deviation compared with non-Acrs.A large number of Acrs in prokaryotes are annotated as "hypothetical proteins" in NCBI.This work also found some evolutionary characteristics of Acrs: 1)most of Acrs were located on genome islands and prophage fragments,indicating the horizontal gene transfer events;2)The codon usage comparison between Acrs and the whole genome scale indicates that Acrs were recently transferred into the host bacteria instead of the early stage;3)The continuous distribution of Acrs among evolutionary related species indicates that some Acrs have recently transferred and expanded among those evolutionary related species.To the quantify possibility that a gene is located on genomic islands and prophage,this work defined and proposed an alignment-free parameter named dev,which is based on the measurement of codon usage bias between Acrs and the whole genome-scale.Based on the features derived from genome background and evolutionary characteristics of Acrs,this work proposed a recognition algorithm based on random forest tree-based model.The results in cross-validations shows that this method can obtain an average accuracy equal to 99.75%,an average recall equal to 75.1%,and an average precision equal to 86.1%.The cross-species cross-validation shows that the method can put 71.4% true Acrs in the top 10 ranks among our predicted entries,and the algorithm can also accurately identify 4 bona fide Acrs from the newly identified Acr species.Based on the Acrs recognition algorithm,the present work designed a web-based service(http://cefg.uestc.cn/acr Detector)to screen potential Acrs,meanwhile we also released a local version on Git Hub,which was named as Acr Detector by us in this work.Acr Detector doesn't rely on sequence similarity search,thus it can find novel Acrs.Also,due to that Acr Detectot relies on features derived from genomic background,it can be used as a supplemental tool,which depends on sequence composition features.Acr Detector can identify potential Acrs,whereas it cannot tell users what the CRISPR-Cas systems it may inhibit.The accurate identification of CRISPR-Cas subtype that a candidate species has is a key step for identifying Acr type.For this reason,this work introduced the Markov graph clustering algorithm and the method of extending MCCS into annotations of Cas protein,cas locus,and locus type,which lead to the method can identify fused Cas protein,and can also identify a more accurate cas locus.In addition,this work also developed the Cas Locus Anno tool,which integrated a parallel algorithm during the process of annotating Cas proteins,cas loci and types.This work tested its execution efficiency and recognition ability.Results show that Cas Locus Anno can complete the annotation process within 29 seconds.For most of the test dataset,Cas Locus Anno can complete the annotation within 27.5 seconds.The comparison between Cas Locus Anno and CRISPRCas Finder shows that Cas Locus Anno has 5% accuracy higher than CRISPRCas Finder,and has 1.4% additional prediction rate lover than CRISPRCas Finder.
Keywords/Search Tags:CRISPR-Cas system, anti-CRISPR proteins(Acrs), identification algorithm of Acrs, Cas protein annotation, cas loci annotation
PDF Full Text Request
Related items