Variant calling
We evaluated GBS read quality using FASTQC (Andrews 2010) after sorting
them by individual with AXE (Murray & Borevitz 2017) and performed the
trimming and quality filtering treatment using Trim Galore (Krueger
2015), excluding all reads outside of a range of 40 to 90 bp long.
Adapter removal stringency was set to 1 and the quality parameter ‘q’ to
20. GBS reads were then mapped to the Junco hyemalis genome
(BioProject accession number PRJNA493001; for assembly details see Friiset al. 2018) using the mem algorithm in the Burrows-Wheeler
Aligner (BWA, Li & Durbin 2009). Read group assignment and generation
of BAM files was carried out with Picard Tools version 2
(http://broadinstitute.github.io/picard). We used the Genome Analysis
Toolkit (GATK, McKenna et al. 2010) version 3.6-0 to call the
individual genotypes with the HaplotypeCaller tool. We then used the
GenotypeGVCFs tool to gather all the per-sample GVCFs files generated in
the previous step and produce a set of jointly-called SNPs and indels
(GATK Best Practices, Auwera et al. 2013; DePristo et al.2011) in the variant call format (vcf ). We retained biallelic
SNPs and applied GATK generic hard-filtering recommendations consisting
on QualByDepth (QD) > 2.0; FisherStrand (FS) <
60.0; RMSMappingQuality (MQ) > 40; and
MappingQualityRankSumTest (MQRankSum) > -12.5. Since GBS
homologous reads have identical starts and ends, filters depending on
position within reads (ReadPosRankSum and StrandOddsRatio) were not
applied. Individuals with rates of missing data above 0.6 were excluded
at this point. Using vcftools (Danecek et al. 2011), we then
excluded those SNPs out of a range of coverage between 4 and 50 or with
a genotyping phred quality score below 40. The resulting dataset
(hereafter referred to as ‘Full Dataset’ in downstream analyses)
consisted of 158 individuals (Table 1) and 1,448,676 SNPs with a
per-individual average coverage of 4.25 and a missing data rate of 0.65.