Genotyping by DArTseq Platform
Sequencing of the 280 jarrah individuals was undertaken using
DArT-SeqTM technology at Diversity Arrays
Technology Pty Ltd (Canberra, Australia). This technology uses a double
digestion complexity reduction method for next generation sequencing
(Kilian et al., 2012). The reduction of the genome is accomplished by
using a combination of PstI and HpaII enzymes in digestion/ligation
reactions with different adapters corresponding to two different
restriction-enzyme overhangs. The PstI-compatible adapter is designed to
include flowcell attachment sequence, sequencing primer sequence and
varying length barcode region. Diversity Arrays Technology’sproprietary bioinformatics pipeline was used to demultiplex and align
the raw fastq files. Identical sequences were then collapsed into
“fastqcall files”. These files were used in the secondary
pipeline for DArT P/L’s proprietary SNP calling algorithm
(DArTsoft14). Minimum read depth for each individual was set to 6 and
average read depth was 30.93 across all SNPs, guaranteeing call quality
for all SNPs and individuals. For the SNP calling algorithm, only
nucleotide substitutions were considered a SNP. Only one random SNP was
retained on each 75 bp sequence to avoid linkage disequilibrium bias.
The full data set was then filtered in R (R Core Development Team, 2020)
using custom scripts. We applied a minor allele frequency (MAF) of 2%,
which equates to a minor allele count of 11 calls, minimising inclusion
of sequencing errors. Missing data was set to 6% across individuals
(minimum of 263 individuals scored for each SNP). These thresholds were
chosen because this translates to, on average, an estimation of a
population-level allele frequency from nine individuals, which is
adequate for EAA type of method and identifying SNPs under selection
(Ahrens et al., 2021a). Linkage disequilibrium (LD) was calculated
within each of the chromosomes using the function LD.Measures inLDcorSV (Mangin et al., 2012). To guarantee adequate independence
between SNPs and prevent potential linkage bias, the dataset was
filtered by the within chromosome LD r2 coefficient:
if the r2 value between two SNPs is >0.5,
only one of the SNPs was randomly retained for analysis.