Age estimation model
Comparison of models
We constructed three types of age estimation models based on the DNA
methylation levels of CpGs adjacent to SLC12A5 (SLC12A5-1, -2,
-3, and -4), POU4F2 (POU4F2-1, -2, -3, and -4), VGF(VGF-1, -2, and -3), and SCGN (SCGN-1 and -2) and compared their
performances (Table 2; Supplementary Figures S_R2_1, 2, 3, and 4).
The single regression model using the methylation level of SLC12A5-4
showed the best performance; the mean absolute error (MAE) after LOOCV
was 1.6 (Figure 4a). The formula for age estimation was as follows:
“Estimated Age” = (−1.962e-11 + 0.9808 × “methylation level of
SLC12A5-4”) × 10.113 (standard deviation of training data) + 12.550
(mean of training data)
The elastic net regression model selected as the best performing model
included the methylation levels of SLC12A5-4, POU4F2-2, and VGF-2; the
MAE after LOOCV was 1.5 (Figure 4b). The formula for age estimation is
as follows:
“Estimated Age” = (−1.717e-11 + 0.6728 × “methylation level of
SLC12A5-4” + 0.1652 × “methylation level of POU4F2-2” + 0.1535 ×
“methylation level of VGF-2”) × 10.113 + 12.550
The SVR model that showed the best performance used the methylation
levels of SLC12A5-1, -2, -3 and -4; the MAE after LOOCV was 1.3 (Figure
4c). The R script used to estimate age is available in Supplementary
File. Details of the parameters used in the elastic net regression and
SVR models are shown in Supplementary Table S_R1.
Influences of age, sex, and growth environment on the model
We used linear regression analysis to identify the factors that affect
Δage and |Δage| in the best model (i.e., the SVR model
with four CpGs adjacent to SLC12A5 ). When Δage was used as the
dependent variable, the best regression model included age, growth
environment, and the interaction between age and growth environment as
explanatory variables (adjusted R2 = 0.1869) (Table 3
and Figure 5). Among those variables, the interaction between age and
growth environment was statistically significant (Figure 5b). When
|Δage| was used as the dependent variable, the best
regression model included age, growth environment, and the interaction
between age and growth environment as explanatory variables (adjusted
R2 = 0.186) (Table 4 and Figure 6). Among them, growth
environment was statistically significant (Figure 6d). The explanatory
variables that were statistically significant for other models (i.e.,
the single regression model and elastic net regression model) were shown
in supplementary tables (Supplementary Tables S_R2, 3, 4, and 5).