Font Size: a A A

Research And Application Of Feature Modeling Algorithm Based On Age Prediction

Posted on:2021-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:X Q GaoFull Text:PDF
GTID:2370330629952688Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Aging is a complex process,which is characterized by a comprehensive decline in physiological functions and is associated with an increase in the risk of multiple diseases.In our aging society,reliable age biomarkers and accurate age prediction are very important for effective and priority management of medical services and patient resources.Age prediction and age markers not only play an active role in modern medicine,but also an important research direction in forensic science.Accurate prediction of individual age can help investigators narrow the scope of search for suspects in criminal investigation,greatly reducing human and material resources.Age biomarkers are defined as a biological parameter of organisms.Recently,DNA methylation has been found to be an effective marker for age prediction.It has been observed that some CpG sites show hypomethylation or hypermethylation changes with age.At present,methylation measurement methods are generally divided into three categories: absolute quantitative,relative quantitative and whole genome DNA methylation measurement.With the development of science and technology,genomewide measurement can detect the methylation of nearly one million CpG sites,and the number of samples used in the study is far less than this number,which makes DNA methylation data with typical characteristics of small samples and high latitude.Directly using this data training model,it is easy to find that the prediction effect of training set is good,but the prediction ability of test set is poor.In addition,not all methylation sites are age-related,and detecting all methylation sites each time can lead to high cost and waste.In order to avoid overfitting,poor generalization ability of the model and reduce cost,DNA methylation sites should be screened before modeling.This paper focuses on the research of feature modeling for age prediction.We propose a three-step feature selection algorithm,AgeGuess,by combining the Filter class and Wrapper class methods.In the first step of this method,the maximum information coefficient was used to preliminarily screen age-related CpG sites with the help of the rapidness of the Filter method.Then,further redundancy was removed with the accuracy of the Wrapper method.In this process,SVR-RFE and BackFS were used.AgeGuess finally selected 107 methylated age markers,which were used to construct the regression model with an mean absolute error of 1.9859 between predicted age and actual age.At the same time,we choose the commonly used feature selection methods: Pearson correlation,mutual information,single variable f-regression,L1-RFE and L2-RFE compared with AgeGuess.Under the same number of features,the feature subset selected by AgeGuess is better than other methods.Next,we tested the performance of AgeGuess on other data sets,used AgeGuess on EPIC data set,and built a prediction model with the mean absolute error of 2.4780,proving that AgeGuess was also applicable to other data sets.The number of CpGs selected by EPIC data is 388,of which 214 appear in the 450 k array,and the other 174 are unique to EPIC chip.EPIC data model may need 6% unique methylation characteristics in 450 k array to accurately describe aging process.Then,we studied the influence of gender on the age prediction model.It is found that the age prediction model can be further improved by establishing two gender-specific models.The selected methylation sites were corresponding to the genes,and age-related genes such as ELOVL2,KLF14,CCDC102 B,ATPAF1 and ALDOA were found in the all-gender group.In addition,studies have found that other types of data can also be used for age prediction,such as transcriptome and glycosylation data.In the follow-up work,we will add various types of data for age prediction research.
Keywords/Search Tags:Age prediction, Methylomic Biomarker, Machine Learning, Feature Selection, Regression
PDF Full Text Request
Related items