Genotype-environment association analysis
We used genotype-environment association (GEA) analysis to identify selective pressures potentially driving rapid adaptive divergence of UCSD juncos. Here, we used redundancy analysis (RDA), a canonical ordination method that combines regression and principal components analysis enabling an estimate of the variance of a set of response variables explained by a number of constraining or explanatory variables (Borcard et al. 2011; Legendre & Legendre 1998; Van Den Wollenberg 1977). In RDA applied to GEA studies, explanatory variables are usually environmental parameters, while response variables correspond to the individual genotypes at each variant position. We chose RDA because of its ability to accommodate multiple explanatory variables simultaneously. We used geo-referenced environmental data, extracted for the coordinates at which junco breeding individuals were sampled to capture the distinct environmental conditions at UCSD with respect to the native range of Oregon juncos. We started by selecting a subset of the 19 variables available in the WorldClim database (Hijmanset al. 2005), chosen a priori according to their relevance to junco ecology (Miller 1941; Nolan et al. 2002). They measured mean temperature and precipitation over the year (BIO1 and BIO12); mean temperature and precipitation over the warmest quarter (BIO10 and BIO18), which corresponds generally to the junco breeding season; isothermality, which refers to how the range of day-to-night temperature differs from the range of summer-to-winter (BIO3); and seasonality of temperature and precipitation (BIO4 and BIO15). We also included three vegetation variables from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellites as available in https://modis.gsfc.nasa.gov: percent tree cover (TREE), Normalized Difference Vegetation Index (NDVI, a measure of canopy greenness), and NDVI’s annual standard deviation (std_NDVI). In addition, we included the high-quality elevation data provided by the NASA Shuttle Radar Topographic Mission (SRTM), downloadable from http://www2.jpl.nasa.gov/srtm, for a total of eleven variables. To reduce the number of explanatory variables while retaining those maximizing differences between habitats at UCSD and elsewhere, we computed the sum of the pairwise Mahalanobis distances (Mahalanobis, 1936) between UCSD juncos and each of the other forms (as listed in Table 1) for each variable separately. Environmental variables whose sums across the pairwise computed distances were higher than average were retained, consisting of BIO1, BIO10, BIO4, NDVI and SRTM. Based on the variance inflation factor (VIF), BIO1 was eliminated from the set to reduce correlation among explanatory variables. The SNP dataset implemented as the matrix of response variables was obtained by excluding UCSD n.b.s. individuals from the ‘Full Dataset’, as well as positions with rates of missing data over 0.5 computed over each form. The HWE filter was also applied for a final dataset of 132,343 SNPs. Because RDA does not allow missing data, we replaced non-genotyped positions with the major frequency allele. This approach may hinder genotype-environment signals, resulting in a higher false-negative rate, but allows us to screen a larger fraction of the genome.
Identification of candidate genesTo ask if signals of genotype-environment association recovered in the RDA were driven by selection rather than by demography related effects, we searched for functional genes associated with the loci showing the highest contribution to the correlation patterns. Candidate regions under selection were identified as those containing SNPs with RDA loadings that departed six times or more the standard deviation from mean loading (Kamvar et al. 2017, tutorial by Forester in popgen.nescent.org/2018-03-27_RDA_GEA.html). We extracted 2,000-bp regions flanking these SNPs (1,000 bp from the SNP of interest in each direction of the sequence, a conservative distance in terms of linkage disequilibrium for passerines, Backström et al. 2006), and overlapping annotated genes were identified by similarity using BLAST2 (Tatusova & Madden 1999) and the annotation of the B10K consortium for the Junco genome (Feng et al. 2020). To further explore how adaptive variability is structured among Oregon junco forms and test for divergence specifically in UCSD juncos, we computed pairwiseF ST values (Weir & Cockerham 1984) based on identified candidate gene loci among all forms, as well as per-locus between UCSD and its more closely related group (see Results).
All R analyses were carried out in RStudio v 1.1.463 (R_Studio_Team 2015) with R v3.5.3 (R_Core_Team 2019). See Appendix I in Supporting Information for R scripts.