WGS data, filtering and phylogenetic analysis
We selected as source of the SNPs
four scaffolds that together cover 1,567,760 bp of the male-specific
part of the caprine Y-chromosome. These are unplaced in the ARS1
assembly but for a large part closely match a recent Y-chromosomal
contig of the Saanen_v1 assembly (Table S2)(Li et al., 2021) and
contain the single-copy Y-chromosomal genes SRY , DDX3Y ,ZFY with the SNPs that define the major haplotypes Y1A, Y1B, Y2A
and Y2B (Çinar Kul et al., 2015; Lenstra & Econogene Consortium, 2005;
Waki et al., 2015). The genes USP9Y, UTY, DDX3Y and ZFYare proximate near one of the ends of the male-specify Y-chromosomal
region and well separated from SRY (Li et al., 2021). The
selected contigs have a low overall level of apparent heterozygosity,
indicating a high frequency of hemizygous markers (Table S2).
In a preliminary study
(https://www.biorxiv.org/content/biorxiv/early/2020/02/17/2020.02.17.952051.full.pdf),
we used WGS data from the Sequence Read Archive for 70 mainly Asian and
Moroccan male goats. We extracted the genotypes of 2350 SNPs without
female of male-heterozygous scores, <5% missing scores/SNP
and a minor allele frequency (MAF) >0.02.
For a more comprehensive global coverage, we used the WGS datasets
generated by the VarGoats project (Denoyelle et al., 2021,
www.goatgenome.org/vargoats.html)
or whole-genome sequences that have been published previously or have
been submitted to the Short-Read Archive (Table S3). Four Y-chromosomal
scaffolds were selected with a low average frequency of heterozygous
variants, which cannot belong to a hemizygous Y-chromosomal SNP (Table
S2). A VCF file for these scaffolds was imported into Plink and
contained 65556 variants, 54032 of which were SNPs. From the 424 male
samples, 34 were from wild goats, whereas 354 domestic goats had call
rate of >5%. After discarding two domestic goats with
unknown breed origin, we kept 386 wild or domestic goats. SNPs were
selected in four subsequent steps. (i) In 670 female goats with
<1% scores for the Y-chromosomal SNPs (i.e. free of
male contamination) 36804 SNPs had a call rate of zero and thus were
considered male-specific. (ii) 29035 SNPs did not show heterozygous
score in the 424 males and were considered as being hemizygous. (iii)
16495 had a call rate in the 388 male goats (354 domestic, 34 wild) of
>1%. (iv) 552 SNPs had a MAF in the 354 male domestic
goats of >95% and from the remaining 15943, 9977 SNPs had
a at least one score in wild goats, summing up to a total of 10529 SNPs
representing male-specific Y-chromosomal variation in domestic and/or
wild goats.
Allele-sharing distances between individuals were calculated using
Plink or Mega (Tamura et al., 2011), and visualized in
Neighbor-Joining trees by using the programs Splitstree (Huson
& Bryant, 2006) and Mega.
For construction of a Median-Joining networks (Bandelt et al., 1999), we
selected 286 male goats and 27 bezoars, omitting transboundary breeds
outside their region of origin and balancing the breed representation by
analyzing ≤18 individuals per breed. With 1734 polymorphic SNPs, the
program Popart (Leigh & Bryant, 2015) generated a network of
91 haplotypes (Table S4).