Demographic inference and molecular dating
The marked genetic divergence of the UCSD juncos with respect to
neighboring populations documented to date (Fudickar et al. 2017;
Rasner et al. 2004), has been considered to be the result of the
founder event in the 1980’s and the novel selective forces associated
with the newly colonized habitat. Alternatively, UCSD juncos may derive
from a population that diverged elsewhere over a longer period before
colonizing UCSD. The effect of a potential founder event just ca. 30
generations ago on the pattern of genetic variation observed in UCSD
juncos needs to be confirmed, as the population could potentially be
part of a larger population, so that no actual reduction in effective
population size took place upon the colonization of the campus. To
explore these competing hypotheses, we performed model comparison under
the likelihood framework developed in fastSIMCOAL2 (Excoffier et
al. 2013). Even though methods based on SFS for demographic inference
using low density genomic data have been successfully applied to systems
at comparable temporal scales (e.g. McCoy et al. 2014;
Sendell-Price et al. 2020), GBS data have limitations when
attempting to infer demographic events that took place just a few
generations ago (Beichman et al. 2018). In an attempt to account
for this, we implemented a general model that was as simple as possible,
with a single cladogenetic event, and setting narrow bounds around
splitting times for the different hypotheses to be tested. Fortunately,
we had an approximate estimate of the current census size of the UCSD
population, which was used to inform the models.
We established two temporal ranges (recent = 20-60 generations ago; and
postglacial = 4,000 – 12,000 generations ago; generation time = 1.5
years) for two potential demographic events (population split from
source lineage with or without a founder event) resulting in four
possible models to compare: recent population split following a founder
event (Fig. 2A); postglacial population split followed by a recent
founder event (Fig. 2B); recent population split without a founder event
(Fig. 2C); and postglacial population split without a founder event
(Fig. 2D). In addition, two different gene migration scenarios were
implemented: a ‘strict isolation’ scenario with migration rates set to
zero; and an ‘isolation with migration’ scenario where migration rates
can vary freely. The current effective size of the UCSD population (Pop.
Size 1 in Fig. 2) was estimated within a range of 35 to 280 individuals
based on previous direct censuses (Atwell et al. 2014; Fudickaret al. 2017) in scenarios A and B, while it was allowed to vary
in models C and D. The number of colonizers in scenarios A and B was set
to vary between 2 and 280. Mutation rate was fixed at
2.21x10-9 in all the cases (Nam et al. 2010;
Smeds et al. 2016), and a generation time of 1.5 years was
assumed to calibrate parameters and estimate divergence times. The
likelihood of each of the final eight models was estimated with
fastSIMCOAL2 along with the time of population splitting, the current
and ancestral effective population sizes, and the gene migration rates
in those models where they were implemented.
As input data we used the folded site frequency spectra (SFS) generated
from GBS data. We retained the samples corresponding to the two study
groups from the ‘Full Dataset’ and filtered out sex-linked positions to
avoid distortions related to loci with different effective sizes. We
applied the HWE filter, resulting in a matrix of 828,699 SNP loci.
Because singletons are important for estimating parameters and
likelihoods, no MAF filters were applied. The SFS were generated with
easySFS (https://github.com/isaacovercast/easySFS), maximizing the
number of segregating sites as recommended by the authors (I. Overcast,
pers. comm.). Parameter values with the highest likelihood were
estimated under each of the models after 50 cycles of the algorithm,
with 150,000 coalescent simulations per cycle. This procedure was
replicated 100 times and the set of parameter values with the highest
final likelihood was retained as the best point estimate. To compare
models we applied the Akaike Information Criterion (AIC; Akaike 1998).
For estimating 95% support limits for the parameters estimated under
the best model, we followed the same bootstrap procedure than for
RADpainter using blocks of 51 SNPs to produce 100 bootstrapped datasets.