Age estimation model and model validation
To establish an age estimation model, we used only one sample per individual wild bear (i.e., 15 samples from wild bears were used in total). For wild bears sampled multiple times, samples were selected to include as wide an age range as possible, and some were selected randomly (see Supplementary Figure S_M1).
Based on the pyrosequencing results, we generated three age estimation models, including a single regression that requires only the methylation level of one CpG and two multiple regression models (elastic net
regression and a support vector regression [SVR]) that require multiple CpG methylation levels. Age and DNA methylation levels were standardized prior to integration into the model. Single regression models were generated using the R command “lm”. Elastic net regression models, a type of penalized regression that has often been used in age estimation models for other species (Horvath et al. 2013), were generated using the R package “glmnet”. Optimized parameters (alpha and lambda) were obtained using “cv.glmnet”. The SVR models, which are considered better for age estimation than are elastic net regression models (Xu et a. 2015; Fan et al. 2022), were generated using the R package “e1071”. The parameters (cost, gamma, and epsilon) were determined using the “tune” command with fixed settings “type = eps-regression, kernel = radial”. We performed leave-one-out cross-validation (LOOCV) to validate all models. LOOCV is a cross-validation method in which only a single dataset is extracted for testing, and all other data are used as training data, which are repeated as many times as the number of samples.
To evaluate whether age, sex, or growth environment affects the deviation of the age estimation model, linear regressions were generated using Δage (predicted age − chronological age) and |Δage| (absolute difference between predicted age and chronological age) as dependent variables and three factors, as well as the interactions among each factor pair, as explanatory variables (Qi et al. 2021). Model construction and selection were conducted using the “lm” command and “MnMIn” package in the R software.