Global population structure
To investigate genetic clustering of P. vivax populations we used
the biallelic SNPs as input for PCA and phylogenetic analysis. Both
analyses (PCA + tree) reveal the presence of three major clusters
consistent with their geographical origin (Figure 2A and B). Isolates
from ESEA + MSEA form a differentiated cluster in the vicinity of
isolates from OCE. Isolates from AFR cluster close to isolates from WAS,
however, these two regions are clearly separated in the fourth principal
component of the PCA (supplementary figure 1) and form separate clades
in the tree (Fig 2B). Isolates from LAM form a distinct cluster and
clade in the PCA and tree, respectively. Together this indicates a high
genetic diversity of the global P. vivax population, confirmed by
high nucleotide diversity (supplementary figure 2), with a geographical
structuring of populations.
Admixture analysis estimated ten (K=10) geographically distinct
ancestral populations (Figure 2C). All genomes from AFR, WAS and OCE
were predicted to belong predominantly to a single shared ancestry
within each region, while genomes from LAM, ESEA and MSEA regions, each
belong to distinct subpopulations (i.e. ancestral population
within a region, Figure 2C). Admixture (predicted ancestry to more than
one cluster) is mostly observed between subpopulations within a region
(e.g., in LAM and ESEA), and rarely between regions, with the
exception the admixture observed in AFR with WAS.
In the phylogenetic tree, isolates from WAS form two separate clades,
with the upper cluster containing isolates from India (Figure 2B). This
separate subpopulation could not be confirmed in the admixture analysis
that estimated one ancestral cluster in this region (Figure 2C).
Therefore, while Indian isolates might be distinct from other isolates
in WAS, all P. vivax isolates from this region share a common
ancestry. The highest amount of admixture between isolates is observed
between the three subpopulations in LAM (mixed ancestry proportions to
K7 and K10 and to a lesser extent K4), indicating a shared ancestry or
gene flow between these subpopulations (Figure 2C).