Font Size: a A A

Machine Learning-based Impact Prediction Tool For Copy-number Variation.

Posted on:2020-06-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y R TaoFull Text:PDF
GTID:2404330596967378Subject:Biomedical engineering
Abstract/Summary:
Advances in high-throughput DNA sequencing have promoted the identification of genomic variants in the human genome.However,inferring the effects of germline copy-number variants(CNVs)is still a challenge.Although there have been preliminary attempts on germline CNV impact prediction in the past,none of existing tools provides a quantitative prediction about the pathogenicity of CNVs.Here we present a novel computational tool for the prediction of copy-number variant impact on the pathogenicity with XGBoost-based algorithm,we incorporated 85 individual annotation features of CNV in the model,which were distributed along with coding,non-coding,and intergenic regions.We also considered the population based allele frequency of each germline CNV which was systematically curated and calculated by maximum clique method as component feature.Totally,we collected the structure variants of over forty thousand of normal samples from seven major ethnic groups around the world as reference.Our model achieved robust performance with AUC(0.9537),F1-score(0.9645)in the pathogenicity prediction with the testing dataset and achieved AUC(0.9736)in independent validation dataset.A convenient webserver is available for all users to freely access.
Keywords/Search Tags:XGBoost, Copy Number Variation, Pathogenicity, Feature
Related items