Sequence analysis and genetic diversity estimation
The sequences of 18 populations obtained from Sanger sequencing were aligned and edited in SeqMan 7.1.0. For each population, the nucleotide diversity (π) and DNA polymorphism (Watterson’s θ) were calculated using DnaSP 5.10 (Librado & Rozas, 2009).
We obtained the reference sequences of the 93 genes by sequencing oneA. corniculatum individual using the Sanger method (see the supplementary file of He et al., 2019). The reference sequences ranged in length from 203 to 2422 bp. The short reads produced from Illumina sequencing were mapped to reference sequences using MAQ 0.7.1 (H. Li, Ruan, & Durbin, 2008) with the parameters set such that the mutation rate between the reference and read was set to 0.002, the threshold of mismatch base quality sum was 200, and the minimum mapping quality of the reads was 30. To exclude false-positive mismatches, we counted the mismatch rate for each site across the read and the mismatch rate for each base quality. We trimmed the first and last 10 bases of each read and filtered bases with a quality score of less than 20. Single nucleotide polymorphisms (SNPs) were also identified using MAQ 0.7.1 (H. Li et al., 2008). To avoid introducing bias from sequencing errors, we discarded the sites with insufficient site coverage (<100 reads) and those with minor allele frequency less than 1/2N (N is the number of individuals) in each population (Z. He et al., 2013). The allele frequencies for each SNP site in a population were obtained by counting the depth of each allele.
For the Illumina data, we estimated the nucleotide polymorphism (Watterson’s θ) of each gene using the method of He et al. (Z. He et al., 2013). The nucleotide diversity (π) of each gene was also estimated according to Nei’s formula (Nei, 1987) with an in-house script. To estimate absolute genetic divergence between populations, we computed pairwise DXY following the formula derived by Nei (Nei & Li, 1979). Pairwise DXY values were summed over all SNPs, and the sum was normalized by effective sequence length. For each pair of populations, the effective sequence length was defined by sites without missing data in either population. We also estimated Wright’s F statistics (FST ) (Wright, 1950) with these data.