Demographic Modeling
One of the most important recent advances in phylogeography is the development of model selection as a framework for inference of evolutionary processes (e.g., Carstens et al., 2013). After exploring our data using STRUCTURE, we constructed eleven demographic scenarios (Fig. 2) to test using a machine-learning model-selection framework (Smith & Carstens 2020). For each focal species, these alternative demographic hypotheses include divergence between the coastal and inland populations that occurred either before or after the Pleistocene glaciations. The pre-Pleistocene divergence scenarios (models B-G in Fig. 2) model the populations diverging at the time of the xerification of the Columbia Basin (Waring & Franklin 1979), which followed the uplift in the Cascades Mountains (Priest 1990). In the recent dispersal models (models H-K in Fig.2), post-Pleistocene divergence between the populations posits the ITR populations arising from the coastal populations only via dispersal of coastal migrants; these models imply a recent time of divergence, as the coastal migrants could only have recolonized the inland region after the last glacial retreat, ~ 10 kya (Waitt and Thorson 1983). The varying migration scenarios include divergence with migration, where migration eventually ends between the coast and ITR populations a substantial time after divergence. In divergence with secondary contact models, migration begins again between the coast and ITR populations, at the very earliest, after the retreat of the Cordilleran ice sheet, ~ 10 kya (Waitt and Thorson 1983). Thus, these ancient vicariance with secondary contact models (models C & G) encompass both the older “ancient vicariance” and “recent dispersal” scenarios of Brunsfeld et al. (2001). The bottleneck events that are modeled are those that hypothetically occurred in the populations at the onset and for the duration of the Pleistocene and the subsequent population expansion events occur after the retreat of the glaciers, more likely as recent as 3500 ya (Whitlock 1992, Mehringer 1996).
Before using our data to assess these models, we first assessed our statistical power using simulations. Specifically, we simulated genomic data similar to the empirical genomic data we generated for Thuja plicata and Tsuga heterophylla under each of the eleven models. Specifically, we used the R package delimitR (Smith & Carstens 2020) which relies on the multi-dimensional SFS (mSFS) and the machine learning algorithm abc-randomForests (Pudlo et al. 2015) for model selection. For this, we simulated jSFS under eleven demographic scenarios we deem plausible for both species (Fig. 2) using fastsimcoal2 (Excoffier 2011, Excoffier et al. 2013). We generated 10,000 simulated jSFS for each model. We then converted our jSFS into mSFS by flattening the matrix and binning the cells into a coarser representation of itself. In delimitR (Smith et al. 2020), we then used the mSFS of the simulated datasets as the predictor variables to train a RF (Breiman 2001, Liaw & Weirner 2002, Pudlo et al.2018) classifier to distinguish among the eleven demographic models. This allowed us to assess the limits of the inference we could make with respect to phylogeographic model selection and construct a confusion matrix to summarize this differentiability among models. It also generated the classifier used in model selection for our empirical datasets.
Following the constructing of a demographic model RF classifier, we assessed the demographic models (Fig. 2) for the empirical datasets for each of Thuja plicata and Tsuga heterophylla . These classifiers simultaneously self-cross-validate by testing the accuracy of the decision trees being constructed using out-of-bag error rates. For this, data that were not used to construct specific decision trees were then classified using those trees. Thus, the data being tested are not included in the construction of the decision trees classifying it. This results in overall error rates for the classifier, as well as specific model misclassification rates. This is an error rate specific to the classifier and represents how often a model class is incorrectly identified, and as which model.
We constructed six different classifiers to mimic the six empirical jSFS, with differing coastal and inland sample sizes and unlinked SNPs (Table 1). We then used the appropriate classifier to make predictions for the 100 corresponding subsampled jSFS. We summarized the support for each model and each dataset by the number of votes for each model. We then estimated the posterior probability for the best model.