Demographic inference and molecular dating
The marked genetic divergence of the UCSD juncos with respect to neighboring populations documented to date (Fudickar et al. 2017; Rasner et al. 2004), has been considered to be the result of the founder event in the 1980’s and the novel selective forces associated with the newly colonized habitat. Alternatively, UCSD juncos may derive from a population that diverged elsewhere over a longer period before colonizing UCSD. The effect of a potential founder event just ca. 30 generations ago on the pattern of genetic variation observed in UCSD juncos needs to be confirmed, as the population could potentially be part of a larger population, so that no actual reduction in effective population size took place upon the colonization of the campus. To explore these competing hypotheses, we performed model comparison under the likelihood framework developed in fastSIMCOAL2 (Excoffier et al. 2013). Even though methods based on SFS for demographic inference using low density genomic data have been successfully applied to systems at comparable temporal scales (e.g. McCoy et al. 2014; Sendell-Price et al. 2020), GBS data have limitations when attempting to infer demographic events that took place just a few generations ago (Beichman et al. 2018). In an attempt to account for this, we implemented a general model that was as simple as possible, with a single cladogenetic event, and setting narrow bounds around splitting times for the different hypotheses to be tested. Fortunately, we had an approximate estimate of the current census size of the UCSD population, which was used to inform the models.
We established two temporal ranges (recent = 20-60 generations ago; and postglacial = 4,000 – 12,000 generations ago; generation time = 1.5 years) for two potential demographic events (population split from source lineage with or without a founder event) resulting in four possible models to compare: recent population split following a founder event (Fig. 2A); postglacial population split followed by a recent founder event (Fig. 2B); recent population split without a founder event (Fig. 2C); and postglacial population split without a founder event (Fig. 2D). In addition, two different gene migration scenarios were implemented: a ‘strict isolation’ scenario with migration rates set to zero; and an ‘isolation with migration’ scenario where migration rates can vary freely. The current effective size of the UCSD population (Pop. Size 1 in Fig. 2) was estimated within a range of 35 to 280 individuals based on previous direct censuses (Atwell et al. 2014; Fudickaret al. 2017) in scenarios A and B, while it was allowed to vary in models C and D. The number of colonizers in scenarios A and B was set to vary between 2 and 280. Mutation rate was fixed at 2.21x10-9 in all the cases (Nam et al. 2010; Smeds et al. 2016), and a generation time of 1.5 years was assumed to calibrate parameters and estimate divergence times. The likelihood of each of the final eight models was estimated with fastSIMCOAL2 along with the time of population splitting, the current and ancestral effective population sizes, and the gene migration rates in those models where they were implemented.
As input data we used the folded site frequency spectra (SFS) generated from GBS data. We retained the samples corresponding to the two study groups from the ‘Full Dataset’ and filtered out sex-linked positions to avoid distortions related to loci with different effective sizes. We applied the HWE filter, resulting in a matrix of 828,699 SNP loci. Because singletons are important for estimating parameters and likelihoods, no MAF filters were applied. The SFS were generated with easySFS (https://github.com/isaacovercast/easySFS), maximizing the number of segregating sites as recommended by the authors (I. Overcast, pers. comm.). Parameter values with the highest likelihood were estimated under each of the models after 50 cycles of the algorithm, with 150,000 coalescent simulations per cycle. This procedure was replicated 100 times and the set of parameter values with the highest final likelihood was retained as the best point estimate. To compare models we applied the Akaike Information Criterion (AIC; Akaike 1998). For estimating 95% support limits for the parameters estimated under the best model, we followed the same bootstrap procedure than for RADpainter using blocks of 51 SNPs to produce 100 bootstrapped datasets.