Inferring population structure
To identify the genetic structure of A. corniculatum , 423
individuals from 18 populations with Sanger sequences were assigned into
a putative number of clusters using a Bayesian clustering approach with
STRUCTURE 2.3.4 (Falush, Stephens, & Pritchard, 2003; Hubisz, Falush,
Stephens, & Pritchard, 2009; Pritchard, Stephens, & Donnelly, 2000).
The program identified the K genetic clusters of origin of the sampled
individuals and assigned the individuals simultaneously to the genetic
clusters by calculating the posterior probability. The maximum K was set
to 12, and for each K, 10 replicates were conducted. Each run consisted
of 1×106 Markov chain Monte Carlo (MCMC) iterations
with a burn-in of 2×105 under an assumed model of
admixture and correlated allele frequencies. The most likely K was
determined by the delta K statistic using STRUCTURE HARVESTER (Evanno,
Regnaut, & Goudet, 2005). The population structure results are shown
graphically by DISTRUCT 1.1(Rosenberg, 2004), and each individual is a
line segment partitioned into K coloured components, which represent the
individual’s estimated membership coefficients in the K clusters.
We also performed principal component analysis (PCA) on the SNP
frequency matrix (summarizing the frequency of each SNP in each
Illumina-sequenced population) using the “princomp” function in R
(Venables & Ripley, 2013) to test whether the SNP frequencies differed
among populations. Using the “pegas” R package, the analysis of
molecular variance (AMOVA) was performed to characterize the
hierarchical assignment of variance components at levels of population
and cluster of populations. We performed this analysis for each of the
six Sanger-sequenced genes.
The revealed genetic structure was further checked gene by gene by
constructing a haplotype network for each gene and mapping the
haplotypes geographically. Haplotypes of six nuclear genes across the 18
populations were inferred using DnaSP 5.10 (Librado & Rozas, 2009), and
the networks were constructed by an expectation-maximization algorithm
with A. floridum as the outgroup. The networks were visualized
using NETWORK 5.0
(http://www.fluxus-engineering.com/)
(Bandelt, Forster, & Röhl, 1999) and plotted on a map using GenGIS
(Parks et al., 2009).
The 93 genes sequenced by the Illumina platform were also used to infer
haplotypes using the method developed by (Z. He et al., 2019). He et al.
validated the accuracy of this method to infer haplotypes by sequencing
individuals using the Sanger method. The details associated with using
this method have been described in a previous publication (Wang et al.,
2021). We obtained 392 gene segments and 84 gene segments longer than
300 bp, and a haplotype network was also constructed for each segment
longer than 300 bp using NETWORK 5.0
(http://www.fluxus-engineering.com/)
(Bandelt et al., 1999).