| Objective: Age prediction of an unknown individual can facilitate case investigations and disaster victim identification.Estimating the age of known persons with an unclear age can provide important clues in legal affairs.Age information will provide a favorable clue for the investigation,if the DNA of the biological evidence left at the crime scene is not matched with the database.The age of individuals can be determined by techniques that rely on morphological measures of teeth and skeletal remains.But this approach is restricted to samples with a nearly complete skeleton and is influenced by subjective factors.In recent years,several molecular methods were proposed,such as telomere shortening,mitochondrial DNA deletion,signal-joint T-cell receptor excision circle(sj TRECs),and m RNA.Although these markers showed good age correlation,the age estimation models constructed based on them have problems of low precision,poor reproducibility and poor applicability.For a long time,DNA methylation(DNAm)became a “black spot” for forensic scientists.DNAm,which has emerged as a most promising method for predicting age in forensics with an uncertainty mean absolute deviation of about 3–5 years in the predicted age,provides high accuracy but has several limitations such as requiring relatively large amounts of DNA and complicated bioinformatics analysis.Recently,some potential candidate biomarkers for age estimation have come into play.Accumulating evidence suggests roles for micro RNAs(mi RNAs)and circular RNAs(circ RNAs)in regulating a large variety of processes during aging.These non-coding RNAs(nc RNAs)generally act as post-transcriptional regulators of gene expression.Therefore,we suggest they can be considered as novel biological age predictors.Here,we performed circ RNA sequencing in two age groups and analyzed microarray data from Gene Expression Omnibus(GEO)and Array Express databases with integrated bioinformatics methods,and conducted forensic application evaluation studies.Methods:1.Age-associated mi RNAs biomarker candidates were selected based on public databases.Screening was performed from Array Express and GEO public databases using bioinformatic analysis,according to following criteria:(1)studies using mi RNA microarray or transcriptome microarray methods for detection,(2)studies with human peripheral blood samples,and(3)studies that included healthy control samples with a sample size of at least 15 young and old individuals.And data preprocessing was performed on the datasets that met the study objectives.Differential expression analysis and correlation analysis were performed on mi RNA expression profile datasets.For the data sets with clear age grouping,the "limma" package was used to screen the candidate age-related mi RNAs,and the screening criteria were the absolute value of log2(foldchange)more than 1 and the False Discovery Rate(FDR)corrected P value less than 0.05;for the data sets with no clear age grouping,the data were screened by the "limma" package.For data sets with unspecified age grouping,candidate ageassociated mi RNAs were screened based on Spearman correlation analysis,and the screening criteria were correlation coefficient rho absolute value more than 0.2 and FDR corrected P value less than 0.05.The candidate age-associated mi RNAs were validated by RT-q PCR experiments.Age-associated mi RNAs target genes were predicted using Target Scan,mir Tar Base and mir Walk public databases,and GO enrichment analysis and KEGG pathway analysis were performed on the set of target genes.2.Peripheral blood circ RNA expression profiles of healthy unrelated individuals of different ages,including 4 in the young group(20-29 years old)and 4 in the old group(50-62 years old),were examined using circ RNA next-generation sequencing technology(NGS).Ageassociated circ RNAs biomarker candidates were screened by the "limma" package,and the screening criteria were absolute value of log2(foldchange)more than 2 and P value less than 0.001.Ageassociated circ RNAs molecules were divided into three groups based on their expression characteristics: circ RNAs expressed only in individuals in the younger group,circ RNAs expressed only in individuals in the older group,and circ RNAs that were significantly differentially expressed in different age groups.Age-associated circ RNAs candidate molecules were validated by RT-q PCR experiments.3.Age-related nc RNAs experimentally validated were used for age predictive model construction in additional 200 blood samples.All samples were randomly divided into two sets: a training set(80% of all subjects)to construct the age-predictive model and a testing set(the remaining 20%)to evaluate the model’s prediction performance.Several machine learning algorithms were applied to fit models,including regression tree,Bagging,random forest regression(RFR),support vector regression(SVR)and XGBoost.To determine the performance of different models,the root mean square error(RMSE)and mean absolute error(MAE)from the chronological age were calculated for the testing set.4.Assessment of age estimation model for forensic applications.Based on the previously constructed age predictive models,the following aspects were evaluated:(1)Sensitivity: one blood sample was taken and gradient dilution of RNA starting template amount was performed to test the sensitivity of the RT-q PCR method.(2)Reproducibility: one samples were taken and quantitative assays were repeated three times,and the consistent reproducibility of the age predictive models was verified.(3)Anti-degradation: 3 samples were transferred into blood spots and left for 0,1,7,14,28 and 90 days to compare whether the target non-coding RNA expression was stable and the model predicted the difference between the same sample in the fresh state and degraded samples.(4)Cross-tissue applicability: 3 samples each of saliva,semen,menstrual blood and vaginal secretion to assess whether there is tissuehumoral specificity of non-coding RNAs in common human body fluids and if there is no expression specificity,to further assess how applicable the models are across different body fluids.Results:1.(1)Three datasets including E-MTAB-3303,E-MTAB-1231 and GSE89042 were selected.A total of 171 human peripheral blood samples aged 17-104 years were included.(2)According to the screening criteria,a total of 55 age-associated mi RNAs were screened in the E-MTAB-3303 dataset(40 up-regulated and 15 down-regulated);117 age-associated mi RNAs were screened in the E-MTAB-1231 dataset;40 age-associated mi RNAs were screened in the GSE89042 dataset(30 up-regulated and 10 down-regulated).Finally,a total of 18 mi RNAs with significant age-related changes and consistent trends in at least two datasets were selected as candidate age-related mi RNAs for RT-q PCR validation.(3)The RT-q PCR experiments were validated and 11 age-associated mi RNAs were finally screened for subsequent age estimation model construction.(4)GO enrichment analysis and KEGG pathway analysis of age-associated mi RNAs target genes showed that mi RNA target genes were involved in various important cellular physiological processes,and the largest number of target genes were enriched in the cellular senescence pathway.2.(1)Age-associated circ RNAs were divided into three groups: circ RNAs expressed only in the young group(10 circ RNAs,0.7 %);circ RNAs expressed only in the old group(141 circ RNAs,10 %);and circ RNAs differentially expressed significantly in different age groups(1403 circ RNAs,921 up-regulated and 482 down-regulated).The top 5 circ RNAs in the three groups were selected as candidate age-associated circ RNAs for RT-q PCR validation.(2)RT-q PCR experimental validation finally screened 4 age-associated circ RNAs for subsequent age predictive model construction.3.(1)Five different machine learning algorithms were used to modelling,and the results showed that the mean absolute error(MAE)values for age estimation ranged from 3.68 to 6.536 years on the training set and from 6.84 to 7.985 years on the testing set.The random forest regression(RFR)model and the support vector regression(SVR)model outperformed the other models in the testing set.(2)Model construction was performed with mi RNAs only and circ RNAs only,respectively,and the results showed that the MAE values were between 8.1 and 10.9 years for circ RNAs only modeling,and between 9.1 and 12.6 years for mi RNAs only modeling.(3)Subgroup analysis of age showed that the prediction error was larger for younger and older individuals,and younger individuals were tended to be overestimated while older individuals’ age was underestimated.(4)Separate models were constructed for male and female samples,and the results showed that the prediction errors of male samples were significantly lower than those of female samples in the training set.Sex had a slight influence on the accuracy of the models in the current study.4.(1)Sensitivity: Among the 15 non-coding RNAs,the minimum detectable RNA input amount was 0.1 ng for 9 non-coding RNAs and the minimum template amount was 0.01 ng for the remaining 6 noncoding RNAs;(2)Repeatability: one blood sample was taken with a real age of 45 years,and the extracted RNA was repeatedly tested 3 times,with estimated age of 40.5,43.8 and 39.7 years old using RFR method;(3)Anti-degradation: three blood samples were transformed into blood spots(all the actual age was 30 years old),placed for different days and detected the expression of target non-coding RNAs.The results showed that 15 non-coding RNAs were detected in blood spots within 90 days but all had different degrees of degradation,the stability of mi RNAs was better than circ RNA,and the stability of the internal reference gene U6 was better than that of 18 S r RNA.(4)Crosstissue applicability: The age estimation system constructed in this study could be better applied to age inference in semen samples and vaginal secretion samples,with MAE values ranging from 5.858-8.074 years and 3.854-7.733 years for the five models,respectively.Overall,the age predictive models constructed in this study had good applicability in semen,menstrual blood and vaginal secretion samples,while they performed poorly in saliva samples.Conclusion:1.In this study,we conducted circ RNA-seq and analyzed several mi RNA microarray datasets with integrated bioinformatic methods.Differentially expressed non-coding RNAs screened from NGS and online databases were experimentally validated using RT-q PCR tests.2.An independent set of 200 blood samples(20–80 years old)was used to develop age prediction models based on 15 age-related noncoding RNAs(11 micro RNAs and 4 circular RNAs).Different machine learning algorithms for age prediction were applied.This study demonstrates that the noncoding RNA aging clock has potential in predicting chronological age and will be an available biological marker in routine forensic investigation to predict the age of biological samples.3.In this study,a systematic evaluation of age prediction models constructed based on non-coding RNAs for forensic applications showed that non-coding RNAs have high sensitivity and resistance to degradation,and have good applicability in common human body fluids. |