Environmental Association Analysis
To elucidate the association between climate and genetic variation,
three approaches were applied: a redundancy analysis (RDA), latent
factor mixed models (LFMM) and BAYPASS. RDA is a multivariate method
that assumes linear relationships from explanatory variables on response
variables, thus allowing the estimation of genetic variance related to
each distinct environmental factor simultaneously (Forester et al.,
2018). RDA and LFMM require full data sets, therefore we imputed missing
data as the most common allele in the locus from the optimal ancestral
cluster (k ) as defined in the SNMF output. The explanatory
variables (i.e., climate) were then constrained by the dependent
variables (i.e., individuals), using the rda function in theVEGAN package 2.5‐1 in R (Oksanen et al., 2018). Theanova.cca function was used to test for RDA significance using
999 permutations (randomised environmental variables). We did not
explicitly control for population structure because RDA without explicit
population structure inputs improves the output (Forester et al., 2018).
We also used LFMM to test for climate associations (Frichot et al.,
2013), which applies a univariate regression model to assess
genotype-environment associations while using the optimal k -value
estimated in SNMF to control for ancestral population structure. The
analyses were independently performed for each of the climate variables,
consisting of 30,000 iterations each (15,000 discarded as initial
burn-in). Median z -scores were combined from a total of 5 runs
for each variable and recalibrated by computing the genomic inflation
factor, λ, and then dividing the scores by λ. p -values were then
adjusted manually to flatten the histogram (false discoveries were
controlled with the Benjamin-Hochberg algorithm using q = 0.01), which
ideally should display a peak close to zero. We used λ = 0.45 in the
adjustment function to flatten the histogram and followed the steps and
R script available from the LFMM manual. To account for multiple
comparisons, we applied a false discovery rate (FDR) threshold of 0.05
to all runs. Lastly, we used a hierarchical clustering model implemented
in BAYPASS (Gautier, 2015), based on the model from BayEnv (Coop et al.,
2010). A population covariance matrix (Ω) was generated by running the
core model. Each run had 100,000 iterations (50,000 discarded as initial
burn-in), repeated five times and averaged. The covariance matrix was
then used in the AUX covariate mode (100,000 iterations; 50,000
as burn-in), repeated five times and averaged for final results.
Significant SNPs were identified if they had a Bayes Factor (BF)
> 3 (Kass & Raftery, 1995). Like LFMM, BAYPASS is based on
a mixed linear model to account for potentially confounding allele
frequency variances due to population structure. However, the difference
between the two approaches may provide a means of identifying any
influence of population structure (Forester et al., 2018; Ahrens et al.,
2021a).