Inferring population structure
To identify the genetic structure of A. corniculatum , 423 individuals from 18 populations with Sanger sequences were assigned into a putative number of clusters using a Bayesian clustering approach with STRUCTURE 2.3.4 (Falush, Stephens, & Pritchard, 2003; Hubisz, Falush, Stephens, & Pritchard, 2009; Pritchard, Stephens, & Donnelly, 2000). The program identified the K genetic clusters of origin of the sampled individuals and assigned the individuals simultaneously to the genetic clusters by calculating the posterior probability. The maximum K was set to 12, and for each K, 10 replicates were conducted. Each run consisted of 1×106 Markov chain Monte Carlo (MCMC) iterations with a burn-in of 2×105 under an assumed model of admixture and correlated allele frequencies. The most likely K was determined by the delta K statistic using STRUCTURE HARVESTER (Evanno, Regnaut, & Goudet, 2005). The population structure results are shown graphically by DISTRUCT 1.1(Rosenberg, 2004), and each individual is a line segment partitioned into K coloured components, which represent the individual’s estimated membership coefficients in the K clusters.
We also performed principal component analysis (PCA) on the SNP frequency matrix (summarizing the frequency of each SNP in each Illumina-sequenced population) using the “princomp” function in R (Venables & Ripley, 2013) to test whether the SNP frequencies differed among populations. Using the “pegas” R package, the analysis of molecular variance (AMOVA) was performed to characterize the hierarchical assignment of variance components at levels of population and cluster of populations. We performed this analysis for each of the six Sanger-sequenced genes.
The revealed genetic structure was further checked gene by gene by constructing a haplotype network for each gene and mapping the haplotypes geographically. Haplotypes of six nuclear genes across the 18 populations were inferred using DnaSP 5.10 (Librado & Rozas, 2009), and the networks were constructed by an expectation-maximization algorithm with A. floridum as the outgroup. The networks were visualized using NETWORK 5.0 (http://www.fluxus-engineering.com/) (Bandelt, Forster, & Röhl, 1999) and plotted on a map using GenGIS (Parks et al., 2009).
The 93 genes sequenced by the Illumina platform were also used to infer haplotypes using the method developed by (Z. He et al., 2019). He et al. validated the accuracy of this method to infer haplotypes by sequencing individuals using the Sanger method. The details associated with using this method have been described in a previous publication (Wang et al., 2021). We obtained 392 gene segments and 84 gene segments longer than 300 bp, and a haplotype network was also constructed for each segment longer than 300 bp using NETWORK 5.0 (http://www.fluxus-engineering.com/) (Bandelt et al., 1999).