Font Size: a A A

Construction And Research Of Chinese Population Regional Inference Model Based On DNA Methylation

Posted on:2023-11-08Degree:MasterType:Thesis
Country:ChinaCandidate:C C SunFull Text:PDF
GTID:2530306794967719Subject:Special medicine
Abstract/Summary:PDF Full Text Request
Objective:With the continuous emergence of high-throughput DNA methylation data,a large number of DNA methylation sites related to the occurrence of diseases and tumors have been discovered.Phenotypes such as population area and physical characteristics are the result of the interaction between genotype and environment.Currently,researches are mainly based on SNP,while there are few reports on epigenetic differences.The purpose of this study was to explore whether there are differences in epigenome among Chinese populations and to screen the genetic sites of differences.Methods:1.The Ch AMP package of R software was used to preprocess the low-quality methylation data,BMIQ method was used to normalize the β value,SVD method was used to detect the relationship between batch effect and methylation level,and Re FACTor algorithm was used to calculate the principal components of different cell types and used as covariables in downstream analysis.2.(1)483 Han male samples were divided into southern and northern Han groups(2)825 male samples were divided into eight groups according to their provinces of origin.GLINT software was used for EWAS analysis of the genome-wide methylation chip data of the above samples.3.On the basis of EWAS analysis results,LASSO regression method was used to screen sites.4.Using multiple logistic regression algorithm to construct(1)Prediction model of Han population in north and south China(2)Prediction model of population in different provinces of China.5.Evaluate the accuracy of the above model through the method of ten-fold crossvalidation.The evaluation indexes of the model included Kappa,sensitivity,specificity,positive predictive value and negative predictive value.Results:1.A group of Cp G sites with significant differences between the han nationality in the south and the north were screened out.The accuracy of multivariate logistic regression model was 99.03%,and the Kappa was 0.9796.The results of ten times of ten-fold crossvalidation were all above 98% with an average accuracy of 98.79%,and the prediction performance indexes of other models were all above 0.95.2.Cp G sites with significant difference in Xinjiang,Guangxi,Jiangxi,Shanxi,Inner Mongolia,Sichuan,Shandong and Henan were screened out respectively.The prediction accuracy of 30% test set of multivariate logistic regression was above 96%,and the Kappa was above 0.82.The results of the ten-fold cross-validation were all above 96%,and the predicted performance indexes of the other models were all above 0.76.Using the strategy of hierarchical inference,a multi-level dichotomous inference model is constructed to distinguish the population in these eight provinces.Conclusion:This study shows that there are epigenetic differences between han populations in the north and south and between populations in different provinces in China,which lays a foundation for further studies on epigenetic differences between populations in different regions.
Keywords/Search Tags:Forensic Genetics, Epigenetics, DNA methylation, EWAS, Chinese population
PDF Full Text Request
Related items