Sampling, sequencing and genetic analysis
We sampled a total of 327 individuals of A. hueti (27), I. mcclellandii (52), P. pectoralis (60), L. lutea (149), and S. torqueola (39) from 36 sites (Supporting information, Figure 1). From each individual, blood or muscle samples were collected and stored at -80°C. Genomic DNA was extracted using a QIAGEN Blood/Tissue Extraction Kit following the manufacturer’s instructions. Two mitochondrial genes, cytochrome b (Cytb ) and cytochrome c oxidase subunit 1 (COI ), were selected for this part of experiments. The mitochondrial genome is known to experience higher substitution rate than nuclear genome (Ballard and Whitlock 2004). This is more evident in vertebrates, such as birds and mammals (Lynch et al. 2006) with substitution rate 25 folds higher in mitochondrial than in nuclear genome. Therefore, the mitochondrial marker serves as a suitable marker for the intraspecies study with rich informative polymorphism.
Both mitochondrial genes were amplified from all 327 samples using published primers (Supporting information; Amer et al. 2013). PCR was performed using High fidelity Superstar PCR Mix (GenStar, Beijing, China). The 30 μL PCR mixture contained 15 μL PCR Mix, 1.5 μL of each primer of 10μM, 3 μL of template DNA and 9 μL of deionized water. The PCR program was set as follows: denaturation at 94°C for 4 min; 30 cycles of 94°C for 10 s; annealing for 30 s (temperature in Supporting information); extension at 72°C for 90 s; and final extension at 72°C for 8 min. The amplicons were sent to Tian Yi Hui Yuan (Guangzhou, China) for Sanger sequencing from both ends. To rule out sequencing errors and produce an exact alignment, sequences were manually checked and the ends trimmed in MEGA X (Kumar et al. 2018).
We concatenated the 2 genes (COI + Cytb ) into one sequence of 1681-1861 bp for further analysis (Table 1). Aligned sequences were imported to DNaSP 6.0 to calculate θ, π and Tajima’s D, and to produce a haplotype file (Rozas et al. 2017). Single nucleotide polymorphisms (SNPs) were retained and formatted into 5 matrices for further analysis of the 5 species, respectively. A haplotype network was calculated for each species with Network 10.0, using the median joining algorithm (Polzin and Daneshmand 2003). Because the uneven sample size was not suitable for regular estimates of diversity based on allele frequency, such as Fst (Nei and Miller 1990, Cruickshank and Hahn 2014), we calculated the number of segregating non-synonymous (na) and synonymous (ns) sites based on the JC69 model (Jukes and Cantor 1969). Pairwise distance, N, was approximated (using Python scrips) as the mean of among-individual ns+na values. Populations with only one individual were removed in calculations based on population genetic data. Genetic estimate based on species and genetic clustering, as mentioned in the following text, still involved all the sequences.
To determine population clustering, we performed genetic clustering on the time tree using gmyc function in the R package splits(Ezard et al. 2021). The aligned sequences were tested for best substitution model in MEGA X (Kumar et al. 2018). A time tree was built using BEAST with substitution rate of 10-8 site/year ((Nguyen and Ho 2016), 107 MCMC chain length, best fitted substitution model “HKY” for all species and other parameters as default. The time tree was imported to R for clustering analysis using gmyc .