Population structure and genetic diversity analyses
To explore genome-wide population structure among Oregon junco forms, we
applied principal components analysis (PCA) based on SNP data. SNP loci
under linkage disequilibrium were filtered out from the ‘Full Dataset’
with bcftools (Danecek & McCarthy 2017) applying an
R2 limit of 0.2 in windows of 10K bp. Using vcftools,
positions with less than 50% of individuals genotyped for each
taxon/population were removed from the data matrix, along with those
presenting a minor allele frequency (MAF) below 0.02. We implemented a
threshold for SNP loci showing highly significant deviations from
Hardy-Weinberg equilibrium (HWE) with a p-value of
10-4 to filter out false variants arising from the
alignment of paralogous loci. To fulfil neutrality assumptions in
population structure analyses, we used PCADAPT (Luu et al. 2017)
to detect and exclude sites putatively under selection, applying a
q-value threshold of 0.1, resulting in a final data matrix of 21,879
SNPs (hereafter referred to as the ‘Neutral Dataset’ in downstream
analyses). The PCA was conducted using the function snpgdsPCAavailable in SNPrelate to obtain the eigenvectors to be plotted.
We also examined population divergence in Oregon juncos with the program
STRUCTURE (Pritchard et al. 2000) and the ‘Neutral Dataset’. We
converted the vcf file to STRUCTURE format using PGDspider
(Lischer & Excoffier 2012) version 2.0.5.1. Bash scripts to perform the
analyses were created with STRAUTO (Chhatre & Emerson 2016), and we ran
the program five times per K value, with K ranging from 1 to 10 after
running a preliminary analysis to infer the lambda value. The burn-in
was set to 50,000 iterations and the analysis was run for an additional
100,000 iterations. Similarity scores among runs and graphics were
computed with CLUMPAK (Kopelman et al. 2015).
We computed pairwise F ST (Weir & Cockerham 1984)
values from the ‘Neutral Dataset’ after excluding UCSD n.b.s. samples
(SNP matrix size = 38,247) whose breeding range is uncertain, using the
hierfstat R package (Goudet et al. 2015). We also used the
program populations from the Stacks pipeline (Rochette et al.2019) to compute the observed (HO) and expected
heterozygosity (HE), and nucleotide diversity (π) based
on autosomal positions for each form.