Genetic analysis
Reads were quality filtered with FASTX TOOLKIT such that only reads in which 90% or more of the bases with a Phred score of at least 20 were retained (Gordon, Hannon, and Others 2010). These reads were mapped to the well-assembled genome of the congener, Acropora millepora(Fuller et al. 2020) with bowtie2 (Langmead and Salzberg 2012). Of the 299 samples sequenced, 291 have passed the initial filter of not being more than 3 standard deviations below mean log(depth-quality). Depth-quality is the proportion of sites covered at 5x or higher; it is calculated by the script plotQC.R, which is part of the 2b-RAD analytical pipeline (https://github.com/z0on/2bRAD_denovo). The retained samples have depth-quality exceeding 24%. We then calculated individual heterozygosities using ANGSD v0.921 (Korneliussen, Albrechtsen, and Nielsen 2014) and removed three high-heterozygosity outliers that were more than five standard deviations above the average (the next-highest sample is 1.4 standard deviations above the average). Such high-heterozygosity outliers are most likely mixtures of two or more genotypes, possibly due to accidental mixing during library preparation. Such samples should be removed since they could create false signal of relatedness. The initial hierarchical clustering was based on the identity-by-state (IBS) distance matrix generated by ANGSD. Genetic structure was analyzed using PCAngsd (Meisner and Albrechtsen 2018), and using functions capscale and adonis2 from the R package vegan(Oksanen et al. 2007) based on the IBS matrix for non-clonal samples. PCAngsd initially detected spurious genetic structure (K=2) that was strongly correlated with the number of sites passing filters (minimum mapping quality = 20, minimum base call quality = 20, p-value for being a true SNP = 1e-6, genotyping rate across individuals = 80%), which was likely an artifact of PCR duplicates retained in the original version of 2b-RAD protocol used here. We therefore restricted the analysis to only the 257,264 sites (both variable and invariable) that were shared among the three samples that had the least number of sites passing the mapping quality filter and the base call quality filter. After additional filtering (80% genotyping rate in the whole dataset and p-value for being a true SNP = 1e-5, strand bias p-value cutoff 1e-3, heterozygote bias p-value cutoff 1e-3) 11,089 variable sites were retained, and no genetic structure was detectable by PCAngsd anymore. These sites were used to construct the updated IBS distance matrix in ANGSD and to calculate relatedness and pairwise site frequency spectra in NGSrelate v.2 (Korneliussen and Moltke 2015). To avoid distortion of the ordination space due to presence of highly similar siblings, only one sibling was included in the ordination construction. Coordinates of the left-out sibling were then predicted based on their distance to other samples using predict.ccafunction of the vegan package in R. Nucleotide diversity (\(\pi\)) has been calculated as per-chromosome theta (expected number of differences between two chromosome copies) estimated by ANGSD utilities realSFS and thetaStat , divided by the number of genotyped sites in the chromosome. Significance of the genetic diversity difference between adults and juveniles was inferred using a linear mixed model with fixed effect of age class and scalar random effect of chromosome, using R package lme4(Bates et al. 2015). For plotting Fig 1F, we have computed deviations of \(\pi\) from each chrtomosome’s mean, to better illustrate which groups were unusual in their diversity. The code to reproduce these analyses is available at the github repository https://github.com/z0on/Yap_siblings.git.