Demographic Modelling
The STRUCTURE results provided some guidance for deciding how many
populations to model in the demographic models investigated. Since the
primary goal of this study is to make inferences regarding the
evolutionary history of the ecosystem dominants, we decided to model two
populations on the basis of geography (i.e., samples from either the
inland or coastal forests), which largely corresponds to the division
between the two coastal and one inland population in the K = 3
STRUCTURE analysis. We combine the clusters that contain coastal samples
because these are not structured in a geographically meaningful manner
(Fig. 3), so organizing them by cluster would combine genetically
similar populations that aren’t necessarily geographically close.
We developed eleven demographic models (Fig. 2) to assess the
phylogeographic history of each species. In these models, we considered
both ancient and recent divergence events, and various migration
scenarios, including divergence with and without migration and secondary
contact. We also model possible bottleneck events associated with the
onset of the Pleistocene (~2.5 Mya). We modeled
population expansion to be associated with population regrowth after
glacial retreat ~10 Kya (Fig. 2). We used fastsimcoal2
to simulate DNA sequence data, setting the number of loci and variable
sites to match the empirical data, and then summarize that data using
jSFS. We simulated 100 datasets for three different dataset sizes and
for each species. No missing data can be included in the calculation of
the jSFS, therefore for a particular locus to be included, it must be
present in all individuals. Thus, there is a trade-off between numbers
of individuals and SNPs considered in the analysis.
The analyses of model identifiability resulted in low error rates for
classifying each of the eleven models (Table 1, Fig. S3, S4) Most models
were classified correctly most of the time, with all of them having a
classification accuracy above 0.72 (Fig. S3, S4), except for the models
with a recent divergence event between coastal and inland populations.
We therefore collapsed these models, which all varied in the
presence/absence of migration and bottleneck and expansion events, into
a single recent dispersal model (Fig. 4, Fig. S5). This decreased the
overall error rate dramatically (Table 1); the identifiability of the
recent dispersal class of models increased to 0.90.
The first classifier, with all eleven models, was used to make
predictions using the observed jSFS for each species (Fig. 5). For each
dataset size, we used 100 different jSFS that were constructed by
subsampling unlinked SNPs randomly. For Thuja plicata , all
datasets had the highest prediction probability for the same model:
ancient divergence between coastal and inland populations, followed by a
bottleneck in both populations, and then population expansion
contemporaneous to secondary contact between populations due to
migration (“AV + sc + bot/exp”, Fig. 5; model G in Fig. 2). On
average, each Thuja plicata dataset received 552/1000 votes for
that model and had an average posterior probability of 0.72 (Fig. 5).
Meaning some datasets, or certain subsampled jSFS, had higher and/or
lower prediction probabilities for this model.
The results were different for Tsuga heterophylla in that each
dataset did not have the same prediction probability. Those with most
SNPs supported models similar to Thuja plicata (“AV + sc +
bot/exp”, Fig. 5; model G in Fig. 2). On average, this model received
564/1000 votes in the classifier for each observed jSFS and had an
average estimated posterior probability of 0.83 (Fig. 5). With fewer
SNPs included in the jSFS, but more samples represented in the
population, the model that had the highest prediction probability was
model C, or ancient divergence between coastal and inland populations,
followed by secondary contact between populations due to migration (“AV
+ sc”, Fig. 2), which is very similar to model G. The difference being
that model G includes the population bottleneck and expansion that are
not modeled in the former model C (“AV + sc”, Fig. 2). On average,
this model received 532 /1000 votes in the classifier for each observed
jSFS and had an average estimated posterior probability of 0.78. We
suspect this lower model support could be due to the fact that, in
comparison to Thuja plicata , there were fewer SNPs to inform
parameters associated with the additional process – population
bottleneck and expansion – as well as fewer individuals sampled from
each population.