Genotyping by DArTseq Platform
Sequencing of the 280 jarrah individuals was undertaken using DArT-SeqTM technology at Diversity Arrays Technology Pty Ltd (Canberra, Australia). This technology uses a double digestion complexity reduction method for next generation sequencing (Kilian et al., 2012). The reduction of the genome is accomplished by using a combination of PstI and HpaII enzymes in digestion/ligation reactions with different adapters corresponding to two different restriction-enzyme overhangs. The PstI-compatible adapter is designed to include flowcell attachment sequence, sequencing primer sequence and varying length barcode region. Diversity Arrays Technology’sproprietary bioinformatics pipeline was used to demultiplex and align the raw fastq files. Identical sequences were then collapsed into “fastqcall files”. These files were used in the secondary pipeline for DArT P/L’s proprietary SNP calling algorithm (DArTsoft14). Minimum read depth for each individual was set to 6 and average read depth was 30.93 across all SNPs, guaranteeing call quality for all SNPs and individuals. For the SNP calling algorithm, only nucleotide substitutions were considered a SNP. Only one random SNP was retained on each 75 bp sequence to avoid linkage disequilibrium bias. The full data set was then filtered in R (R Core Development Team, 2020) using custom scripts. We applied a minor allele frequency (MAF) of 2%, which equates to a minor allele count of 11 calls, minimising inclusion of sequencing errors. Missing data was set to 6% across individuals (minimum of 263 individuals scored for each SNP). These thresholds were chosen because this translates to, on average, an estimation of a population-level allele frequency from nine individuals, which is adequate for EAA type of method and identifying SNPs under selection (Ahrens et al., 2021a). Linkage disequilibrium (LD) was calculated within each of the chromosomes using the function LD.Measures inLDcorSV (Mangin et al., 2012). To guarantee adequate independence between SNPs and prevent potential linkage bias, the dataset was filtered by the within chromosome LD r2 coefficient: if the r2 value between two SNPs is >0.5, only one of the SNPs was randomly retained for analysis.