Age estimation model and model validation
To establish an age estimation model, we used only one sample per
individual wild bear (i.e., 15 samples from wild bears were used in
total). For wild bears sampled multiple times, samples were selected to
include as wide an age range as possible, and some were selected
randomly (see Supplementary Figure S_M1).
Based on the pyrosequencing results, we generated three age estimation
models, including a single regression that requires only the methylation
level of one CpG and two multiple regression models (elastic net
regression and a support vector regression [SVR]) that require
multiple CpG methylation levels. Age and DNA methylation levels were
standardized prior to integration into the model. Single regression
models were generated using the R command “lm”. Elastic net regression
models, a type of penalized regression that has often been used in age
estimation models for other species (Horvath et al. 2013), were
generated using the R package “glmnet”. Optimized parameters (alpha
and lambda) were obtained using “cv.glmnet”. The SVR models, which are
considered better for age estimation than are elastic net regression
models (Xu et a. 2015; Fan et al. 2022), were generated using the R
package “e1071”. The parameters (cost, gamma, and epsilon) were
determined using the “tune” command with fixed settings
“type = eps-regression, kernel = radial”. We performed leave-one-out
cross-validation (LOOCV) to validate all models. LOOCV is a
cross-validation method in which only a single dataset is extracted for
testing, and all other data are used as training data, which are
repeated as many times as the number of samples.
To evaluate whether age, sex, or growth environment affects the
deviation of the age estimation model, linear regressions were generated
using Δage (predicted age − chronological age) and
|Δage| (absolute difference between predicted age and
chronological age) as dependent variables and three factors, as well as
the interactions among each factor pair, as explanatory variables (Qi et
al. 2021). Model construction and selection were conducted using the
“lm” command and “MnMIn” package in the R software.