Population structure and genetic diversity analyses
To explore genome-wide population structure among Oregon junco forms, we applied principal components analysis (PCA) based on SNP data. SNP loci under linkage disequilibrium were filtered out from the ‘Full Dataset’ with bcftools (Danecek & McCarthy 2017) applying an R2 limit of 0.2 in windows of 10K bp. Using vcftools, positions with less than 50% of individuals genotyped for each taxon/population were removed from the data matrix, along with those presenting a minor allele frequency (MAF) below 0.02. We implemented a threshold for SNP loci showing highly significant deviations from Hardy-Weinberg equilibrium (HWE) with a p-value of 10-4 to filter out false variants arising from the alignment of paralogous loci. To fulfil neutrality assumptions in population structure analyses, we used PCADAPT (Luu et al. 2017) to detect and exclude sites putatively under selection, applying a q-value threshold of 0.1, resulting in a final data matrix of 21,879 SNPs (hereafter referred to as the ‘Neutral Dataset’ in downstream analyses). The PCA was conducted using the function snpgdsPCAavailable in SNPrelate to obtain the eigenvectors to be plotted.
We also examined population divergence in Oregon juncos with the program STRUCTURE (Pritchard et al. 2000) and the ‘Neutral Dataset’. We converted the vcf file to STRUCTURE format using PGDspider (Lischer & Excoffier 2012) version 2.0.5.1. Bash scripts to perform the analyses were created with STRAUTO (Chhatre & Emerson 2016), and we ran the program five times per K value, with K ranging from 1 to 10 after running a preliminary analysis to infer the lambda value. The burn-in was set to 50,000 iterations and the analysis was run for an additional 100,000 iterations. Similarity scores among runs and graphics were computed with CLUMPAK (Kopelman et al. 2015).
We computed pairwise F ST (Weir & Cockerham 1984) values from the ‘Neutral Dataset’ after excluding UCSD n.b.s. samples (SNP matrix size = 38,247) whose breeding range is uncertain, using the hierfstat R package (Goudet et al. 2015). We also used the program populations from the Stacks pipeline (Rochette et al.2019) to compute the observed (HO) and expected heterozygosity (HE), and nucleotide diversity (π) based on autosomal positions for each form.