Font Size: a A A

Prediction Of The Inhibitory Ability Of Acrs Protein

Posted on:2022-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:C MaFull Text:PDF
GTID:2480306764969179Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
The CRISPR-Cas(Clustered regularly interspaced short palindromic repeats and CRISPR associated protein)system is an adaptive immune system that is widely found in prokaryotes.By studying its mechanism of inhibition against foreign nucleic acid molecules,researchers have invented efficient gene editing tools such as CRISPR-Cas9 and CRISPR-Cpf1.In recent years,researchers have verified a small molecule protein which can inhibit the CRISPR-Cas system-Acrs(anti-CRISPR proteins).Researchers used genomic and sequence features for new Acrs family discovery,On the other side,they applied Acrs to many fields by analyzing their three-dimensional(3D)structures and studying their inhibitory mechanisms.In this thesis,we implemented the prediction of Acrs on their inhibitory activity,hoping that it could serve as a linker between the prediction of new Acrs family and the application of Acrs mechanism.The workflows of the prediction model are described as follows: firstly,we obtained a 160-item dataset of Acrs inhibitory ability based on the experimental data of the previously validated Acrs,which from plaque infection efficiency,DNA in vitro cleavage,mammalian genome editing efficiency and other experimental data.The positive and negative samples are 78 and 82;Secondly,based on principle of the Markov chain correlation,we extracted inhibitory strength information features from gene sequences and protein sequences,respectively.After dividing the data set into four to one training set and testing set,the feature description was formed after feature selection by t-test and RFE(Recursive Feature Elimination)on the training set;Then the SVM model and XGBoost model were used for training models,the obtained models were applied to the testing set evaluation.And the optimal model was obtained after comparing the above methods: the XGBoost model trained with protein sequence features,which obtained the best AUC value of 0.841 with a variance of 0.0101 after five times five-fold crossvalidation.Our constructed model was also applied to the newly collected candidate Acrs in this thesis,showing that the XGBoost model could also be well applied to the actual prediction task.
Keywords/Search Tags:Anti-CRISPR Proteins, Feature Selection, Machine Learning, Inhibitory Prediction
PDF Full Text Request
Related items