WGS data, filtering and phylogenetic analysis
We selected as source of the SNPs four scaffolds that together cover 1,567,760 bp of the male-specific part of the caprine Y-chromosome. These are unplaced in the ARS1 assembly but for a large part closely match a recent Y-chromosomal contig of the Saanen_v1 assembly (Table S2)(Li et al., 2021) and contain the single-copy Y-chromosomal genes SRY , DDX3Y ,ZFY with the SNPs that define the major haplotypes Y1A, Y1B, Y2A and Y2B (Çinar Kul et al., 2015; Lenstra & Econogene Consortium, 2005; Waki et al., 2015). The genes USP9Y, UTY, DDX3Y and ZFYare proximate near one of the ends of the male-specify Y-chromosomal region and well separated from SRY (Li et al., 2021). The selected contigs have a low overall level of apparent heterozygosity, indicating a high frequency of hemizygous markers (Table S2).
In a preliminary study (https://www.biorxiv.org/content/biorxiv/early/2020/02/17/2020.02.17.952051.full.pdf), we used WGS data from the Sequence Read Archive for 70 mainly Asian and Moroccan male goats. We extracted the genotypes of 2350 SNPs without female of male-heterozygous scores, <5% missing scores/SNP and a minor allele frequency (MAF) >0.02.
For a more comprehensive global coverage, we used the WGS datasets generated by the VarGoats project (Denoyelle et al., 2021, www.goatgenome.org/vargoats.html) or whole-genome sequences that have been published previously or have been submitted to the Short-Read Archive (Table S3). Four Y-chromosomal scaffolds were selected with a low average frequency of heterozygous variants, which cannot belong to a hemizygous Y-chromosomal SNP (Table S2). A VCF file for these scaffolds was imported into Plink and contained 65556 variants, 54032 of which were SNPs. From the 424 male samples, 34 were from wild goats, whereas 354 domestic goats had call rate of >5%. After discarding two domestic goats with unknown breed origin, we kept 386 wild or domestic goats. SNPs were selected in four subsequent steps. (i) In 670 female goats with <1% scores for the Y-chromosomal SNPs (i.e. free of male contamination) 36804 SNPs had a call rate of zero and thus were considered male-specific. (ii) 29035 SNPs did not show heterozygous score in the 424 males and were considered as being hemizygous. (iii) 16495 had a call rate in the 388 male goats (354 domestic, 34 wild) of >1%. (iv) 552 SNPs had a MAF in the 354 male domestic goats of >95% and from the remaining 15943, 9977 SNPs had a at least one score in wild goats, summing up to a total of 10529 SNPs representing male-specific Y-chromosomal variation in domestic and/or wild goats.
Allele-sharing distances between individuals were calculated using Plink or Mega (Tamura et al., 2011), and visualized in Neighbor-Joining trees by using the programs Splitstree (Huson & Bryant, 2006) and Mega.
For construction of a Median-Joining networks (Bandelt et al., 1999), we selected 286 male goats and 27 bezoars, omitting transboundary breeds outside their region of origin and balancing the breed representation by analyzing ≤18 individuals per breed. With 1734 polymorphic SNPs, the program Popart (Leigh & Bryant, 2015) generated a network of 91 haplotypes (Table S4).