Variant detection
Sequencing reads were first aligned using BWA version 0.7.17 to the
human reference genome obtained from the Genome Reference Consortium
Human Build 38 patch release 13 (GRCh38.p13). Reads not mapped in proper
pairs to the human reference genome were extracted using samtools
version 1.10 (flag -F 2), and subsequently aligned to the P.
vivax PvP01 reference genome from PlasmoDB (version 46) using BWA.
Duplicate reads were removed with Picard’s MarkDuplicates (version
2.22.4). Variant detection was performed using the Genome Analysis
ToolKit (GATK) version 4.1.4.1, using in a first step the
HaplotypeCaller command in GVCF mode for individual chromosomes. GVCF
files were merged using the GenomicsDBImport, followed by genotyping
using GenotypeGVCFs, resulting in one vcf file per chromosome. The vcf
files were filtered according to GATK best practices. Finally, for most
downstream analysis the core genome (14 chromosomes, excluding
subtelomeric regions and low-complexity domains and the apicoplast and
mitochondrial sequences) was selected using the BCFtools query command.