Population structure within the distribution range of A.
corniculatum
The Sanger sequences of 423 individuals from 18 populations were used to
identify the uppermost hierarchical level of the genetic clusters. The
optimal K was estimated to be two according to Evanno’s method (Figure
S1). One cluster included the Bali, La-un, Ngao, Bangladesh, Pambala,
Rekawa, Dongzhai, Danzhou, Sanya, Wenchang, and Yalong populations,
which cover almost the whole Asian range, excepting the southern South
China Sea. This cluster is defined as the “Indo-Malayan” group. The
other cluster included the Kuching, Chai-ya, Sibu, U-Daintree,
L-Daintree, Darwin, and Sorong populations, which cover the whole
Australasia region and the southern South China Sea (Figure 2). We call
the second cluster the “Pan-Australasia” group.
To test whether a substructure exists within each group defined above,
the same STRUCTURE clustering was applied to each group. The optimal K
in the “Indo-Malayan” group was estimated to be four according to
Evanno’s method (Figure S2).
The
populations around the Gulf of Bengal (“Gulf of Bengal” subgroup,
including La-un, Ngao, Bangladesh, Pambala, and Rekawa) share a
component, the single Bali population on Java Island (“Bali” subgroup)
constitutes a component, and the remaining northern South China Sea
populations (“n-SCS” subgroup, including Dongzhai, Wenchang, Danzhou,
Sanya, and Yalong) consisted of two admixed components (Figure 2).
Hence, the “n-SCS” populations are more plausibly considered as one
subgroup. In the Pan-Australia group, two components (Figure S3) are
distributed in the southern South China Sea (“s-SCS” subgroup,
including Chai-ya, Kuching, and Sibu) and
Australasia
(“Australasia” subgroup, including U-Daintree, L-Daintree, Darwin, and
Sorong) (Figure 2). Hence, the 18 populations were reasonably clustered
into two groups and further clustered into five subgroups.
The clustering pattern revealed by the Sanger sequencing data was
validated using Illumina data. The PCA clustering based on the SNP
frequency matrix revealed an approximately consistent pattern: (1) the
“s-SCS” subgroup (represented by Kuching and Chai-ya) and
“Australasia” subgroup (represented by U-Daintree, Darwin, and Sorong)
grouped in the upper right corner; (2) the “n-SCS” populations
(represented by Sanya, Yalong, and Wenchang) were grouped in the
lower-left corner; and (3) the populations of the “Gulf of Bengal”
(represented by La-un and Ngao) were grouped between the “n-SCS”
population cluster and the single population (“Bali”). Hence, the
differentiation between the “Indo-Malayan” and “Pan-Australasia”
groups is higher. Within the “Indo-Malayan” group, the “n-SCS”
subgroup may be less different from the “Gulf of Bengal” subgroup than
the “Bali” subgroup.
The FST and DXY statistics
provide a direct estimation of population differentiation and
divergence. The FST values estimated from Sanger
data agree well with the clustering pattern revealed above (Figure 2a).
Both datasets show lower FST values between
populations within each subgroup (ranging from 0.07~
0.39 estimated from Solexa data and 0.01 to 0.54 from Sanger data),
while higher FST values were observed between
populations from different subgroups (0.42~0.53 from
Solexa data and 0.37~0.92 from Sanger data) (Figure 2a
and S4). We performed the AMOVA to determine the hierarchical
percentages of variation, basing on the FST matrix of each of the six Sanger-sequenced genes. The majority of
variation components (83.63~97.20%) were attributed to
“among subgroups” in all genes, excepting the gene A414 (40.13%)
(Table S3). Consistently, little variation was attributed to “among
populations within subgroup” or “within populations” (Table S3). TheDXY statistics estimated from the six
Sanger-sequenced genes showed a very clear divergence between the five
subgroups but an obviously lower divergence between populations within
each subgroup (Figure S5). Interestingly, the DXYestimated from Solexa data did not show high divergence (0.78- 3.01)
among the population pairs around the SCS (including populations in the
“n-SCS” and “s-SCS”, Figure S6).