Font Size: a A A

Prediction Of Human Papillomavirus Integration Sites Based On Deep Learning

Posted on:2024-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y P SunFull Text:PDF
GTID:2530307109981209Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
DNA integration is one of its forms of recombination,mainly the process by which extra-chromosomal DNA such as viruses,plasmids phages,etc.enter the chromosomal DNA of the host cell.The locus is usually located on the chromosome and has the function of DNA expression.Viruses have a simple structure and usually contain only one nucleic acid,DNA or RNA.The integration of viruses can lead to loss or increase in the amount of gene expression and can lead to the risk of disease or even cancer.the integration of HPV(human papilloma virus)into the human genome is an important step in the progression of cancer,and HPV can have seriously harmful effects on host cells.HPV poses a major risk to human health and is also a major cause of many cancer induction.The E6 and E7 proteins of HPV are known to disrupt oncogenes,but the exact mechanism of action has not been studied.Studies have shown that HPV can integrate its genome into host genes and that the integration mechanism is strongly dependent on the local genomic environment.The study of the integration mechanism can deepen the understanding of HPV and the development of vaccines,which can further help the treatment of related diseases and cancers,so it is very meaningful to perform the prediction of HPV integration sites.With the continuous development of deep learning technology,the current research on DNA integration sites basically adopts the deep learning method.However,the research on HPV integration sites is at the preliminary stage,and the tools proposed so far for predicting HPV integration sites have some problems,mainly in terms of features: single features,and no relevant features for DNA sequences are proposed,etc.;there are too many network parameters and gradient disappearance in the model,etc.The latest HPV integration site tool,DeepHPV,also has the above problems.Based on this,this paper proposes a deep learning model DSHP based on sequence features for HPV integration site prediction.In this study,certain innovations are made in feature engineering.In order to enable the network to better predict HPV integration sites,six features are extracted from DNA sequences in this study.3-bits Z-curve encoding method can reflect the detailed features of sequence distribution in DNA as well as the overall structural features;GC content encoding method has fundamental properties in DNA sequence determination;The cumulative GC tilt can observe the GC tilt positive and negative transition points;the K-mer encoding style can give the highest frequency base results;the ATGC radio encoding style can measure the ratio characteristics of the global sequence;the Kgap frequency calculates the structure of the left and right side of a specific any k-length sequence.In terms of computational model design,compared with the model structure of DeepHPV,this study reduces some network parameters and provides mitigation for the problems caused by gradient disappearance.In terms of model performance,this study showed some improvement over existing HPV integration site prediction tools,demonstrating that DSHP can effectively predict potential HPV integration sites using DNA sequencing data.Finally,an all-round ablation experiment was designed based on the model to verify the validity of the features and the robustness of the model.In the 5-fold cross-validation,the ACC and AUC of DSHP were 0.914 and 0.934,respectively;in the10-fold cross-validation,the ACC and AUC of DSHP were 0.933 and 0.941,respectively.the experimental results fully illustrated the validity of DSHP.
Keywords/Search Tags:HPV, HPV integration, Deep learning, DNA sequence feature selection, Bioinformatics
PDF Full Text Request
Related items