Demographic Modelling
The STRUCTURE results provided some guidance for deciding how many populations to model in the demographic models investigated. Since the primary goal of this study is to make inferences regarding the evolutionary history of the ecosystem dominants, we decided to model two populations on the basis of geography (i.e., samples from either the inland or coastal forests), which largely corresponds to the division between the two coastal and one inland population in the K = 3 STRUCTURE analysis. We combine the clusters that contain coastal samples because these are not structured in a geographically meaningful manner (Fig. 3), so organizing them by cluster would combine genetically similar populations that aren’t necessarily geographically close.
We developed eleven demographic models (Fig. 2) to assess the phylogeographic history of each species. In these models, we considered both ancient and recent divergence events, and various migration scenarios, including divergence with and without migration and secondary contact. We also model possible bottleneck events associated with the onset of the Pleistocene (~2.5 Mya). We modeled population expansion to be associated with population regrowth after glacial retreat ~10 Kya (Fig. 2). We used fastsimcoal2 to simulate DNA sequence data, setting the number of loci and variable sites to match the empirical data, and then summarize that data using jSFS. We simulated 100 datasets for three different dataset sizes and for each species. No missing data can be included in the calculation of the jSFS, therefore for a particular locus to be included, it must be present in all individuals. Thus, there is a trade-off between numbers of individuals and SNPs considered in the analysis.
The analyses of model identifiability resulted in low error rates for classifying each of the eleven models (Table 1, Fig. S3, S4) Most models were classified correctly most of the time, with all of them having a classification accuracy above 0.72 (Fig. S3, S4), except for the models with a recent divergence event between coastal and inland populations. We therefore collapsed these models, which all varied in the presence/absence of migration and bottleneck and expansion events, into a single recent dispersal model (Fig. 4, Fig. S5). This decreased the overall error rate dramatically (Table 1); the identifiability of the recent dispersal class of models increased to 0.90.
The first classifier, with all eleven models, was used to make predictions using the observed jSFS for each species (Fig. 5). For each dataset size, we used 100 different jSFS that were constructed by subsampling unlinked SNPs randomly. For Thuja plicata , all datasets had the highest prediction probability for the same model: ancient divergence between coastal and inland populations, followed by a bottleneck in both populations, and then population expansion contemporaneous to secondary contact between populations due to migration (“AV + sc + bot/exp”, Fig. 5; model G in Fig. 2). On average, each Thuja plicata dataset received 552/1000 votes for that model and had an average posterior probability of 0.72 (Fig. 5). Meaning some datasets, or certain subsampled jSFS, had higher and/or lower prediction probabilities for this model.
The results were different for Tsuga heterophylla in that each dataset did not have the same prediction probability. Those with most SNPs supported models similar to Thuja plicata (“AV + sc + bot/exp”, Fig. 5; model G in Fig. 2). On average, this model received 564/1000 votes in the classifier for each observed jSFS and had an average estimated posterior probability of 0.83 (Fig. 5). With fewer SNPs included in the jSFS, but more samples represented in the population, the model that had the highest prediction probability was model C, or ancient divergence between coastal and inland populations, followed by secondary contact between populations due to migration (“AV + sc”, Fig. 2), which is very similar to model G. The difference being that model G includes the population bottleneck and expansion that are not modeled in the former model C (“AV + sc”, Fig. 2). On average, this model received 532 /1000 votes in the classifier for each observed jSFS and had an average estimated posterior probability of 0.78. We suspect this lower model support could be due to the fact that, in comparison to Thuja plicata , there were fewer SNPs to inform parameters associated with the additional process – population bottleneck and expansion – as well as fewer individuals sampled from each population.