Genotype-environment association analysis
We used genotype-environment association (GEA) analysis to identify
selective pressures potentially driving rapid adaptive divergence of
UCSD juncos. Here, we used redundancy analysis (RDA), a canonical
ordination method that combines regression and principal components
analysis enabling an estimate of the variance of a set of response
variables explained by a number of constraining or explanatory variables
(Borcard et al. 2011; Legendre & Legendre 1998; Van Den
Wollenberg 1977). In RDA applied to GEA studies, explanatory variables
are usually environmental parameters, while response variables
correspond to the individual genotypes at each variant position. We
chose RDA because of its ability to accommodate multiple explanatory
variables simultaneously. We used geo-referenced environmental data,
extracted for the coordinates at which junco breeding individuals were
sampled to capture the distinct environmental conditions at UCSD with
respect to the native range of Oregon juncos. We started by selecting a
subset of the 19 variables available in the WorldClim database (Hijmanset al. 2005), chosen a priori according to their relevance
to junco ecology (Miller 1941; Nolan et al. 2002). They measured
mean temperature and precipitation over the year (BIO1 and BIO12); mean
temperature and precipitation over the warmest quarter (BIO10 and
BIO18), which corresponds generally to the junco breeding season;
isothermality, which refers to how the range of day-to-night temperature
differs from the range of summer-to-winter (BIO3); and seasonality of
temperature and precipitation (BIO4 and BIO15). We also included three
vegetation variables from the Moderate Resolution Imaging
Spectroradiometer (MODIS) satellites as available in
https://modis.gsfc.nasa.gov: percent tree cover (TREE), Normalized
Difference Vegetation Index (NDVI, a measure of canopy greenness), and
NDVI’s annual standard deviation (std_NDVI). In addition, we included
the high-quality elevation data provided by the NASA Shuttle Radar
Topographic Mission (SRTM), downloadable from
http://www2.jpl.nasa.gov/srtm, for a total of eleven variables. To
reduce the number of explanatory variables while retaining those
maximizing differences between habitats at UCSD and elsewhere, we
computed the sum of the pairwise Mahalanobis distances (Mahalanobis,
1936) between UCSD juncos and each of the other forms (as listed in
Table 1) for each variable separately. Environmental variables whose
sums across the pairwise computed distances were higher than average
were retained, consisting of BIO1, BIO10, BIO4, NDVI and SRTM. Based on
the variance inflation factor (VIF), BIO1 was eliminated from the set to
reduce correlation among explanatory variables. The SNP dataset
implemented as the matrix of response variables was obtained by
excluding UCSD n.b.s. individuals from the ‘Full Dataset’, as well as
positions with rates of missing data over 0.5 computed over each form.
The HWE filter was also applied for a final dataset of 132,343 SNPs.
Because RDA does not allow missing data, we replaced non-genotyped
positions with the major frequency allele. This approach may hinder
genotype-environment signals, resulting in a higher false-negative rate,
but allows us to screen a larger fraction of the genome.
Identification of candidate genesTo ask if signals of genotype-environment association recovered in the
RDA were driven by selection rather than by demography related effects,
we searched for functional genes associated with the loci showing the
highest contribution to the correlation patterns. Candidate regions
under selection were identified as those containing SNPs with RDA
loadings that departed six times or more the standard deviation from
mean loading (Kamvar et al. 2017, tutorial by Forester in
popgen.nescent.org/2018-03-27_RDA_GEA.html). We extracted 2,000-bp
regions flanking these SNPs (1,000 bp from the SNP of interest in each
direction of the sequence, a conservative distance in terms of linkage
disequilibrium for passerines, Backström et al. 2006), and
overlapping annotated genes were identified by similarity using BLAST2
(Tatusova & Madden 1999) and the annotation of the B10K consortium for
the Junco genome (Feng et al. 2020). To further explore
how adaptive variability is structured among Oregon junco forms and test
for divergence specifically in UCSD juncos, we computed pairwiseF ST values (Weir & Cockerham 1984) based on
identified candidate gene loci among all forms, as well as per-locus
between UCSD and its more closely related group (see Results).
All R analyses were carried out in RStudio v 1.1.463 (R_Studio_Team
2015) with R v3.5.3 (R_Core_Team 2019). See Appendix I in Supporting
Information for R scripts.