Font Size: a A A

Prediction Of C2H2 Zinc Finger Protein And Transcription Factors Based On Support Vector Machine

Posted on:2024-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z LiuFull Text:PDF
GTID:2530307139985039Subject:Biophysics
Abstract/Summary:PDF Full Text Request
Transcription factors can bind to specific nucleotide sites upstream of genes and then influence the transcription.The regulation of transcription factors is important during eukaryotic transcription,and they can bind to different sites on the DNA to promote or inhibit gene expression.Prediction of transcription factors can better understand them,and help people to explore and study the pathogenesis and treatment of some diseases.The category with the largest number of transcription factors is zinc finger protein.Because zinc finger motifs in zinc finger protein are different,so they can bind to different sites and perform different regulatory process.The category with the largest number of zinc finger protein is C2H2 zinc finger protein.The prediction of C2H2 zinc finger protein will help us to understand the structure,function and regulatory mechanism of them,and help people to explore and study the genetics,epigenetics and medicine.In this thesis,in order to predict C2H2 zinc finger protein,the data set of C2H2 zinc finger protein is established by Uni Prot data bank,and based on the three types of feature information including amino acid composition,auto-covariance average chemical shift and dipeptide composition.The C2H2 zinc finger protein is predicted by using the algorithm of support vector machine,and the accuracy is 87.86% in Jackknife.After that,F-score and m RMR and MRMD are used to reduce the dimension of amino acid composition and dipeptide composition.The accuracy of amino acid composition is not improved,but the dimension is reduced to thirteen.The accuracy is 90.21% after dimension reduction of dipeptide composition.Finally,multi-feature information is used to predict,and the accuracy is 92.55%.In this thesis,the training datasets and independent datasets set of transcription factors are from Liu et al.And based on the four types of feature information including amino acid composition,protein blocks,dipeptide composition and conjoint triad feature,and the transcription factors is predicted by using the algorithm of support vector machine.The accuracy is 86.54% in Jackknife by protein blocks.After that,F-score,m RMR and MRMD are used to reduce the dimension of protein blocks,dipeptide composition and conjoint triad feature,and the accuracy is 87.74% after dimension reduction of protein blocks,multi-feature information is used to predict,and the accuracy is 88.22% in Jackknife,and the accuracy is 1.68% higher than Liu et al.Finally,the accuracy is 84.91% in Jackknife by dipeptide composition in transcription factors independent datasets.Multi-feature information is used to predict,and the accuracy is87.26%.
Keywords/Search Tags:Zinc finger protein, Transcription factors, Feature information, Prediction, Dimension reduction
PDF Full Text Request
Related items