Whole genome sequencing
We received no data for one sample and despite being sequenced across
two lanes we received only one demultiplexed fastq file for two samples,
both of which were in sequencing pool ANHU_003. Across 283 individuals,
our sequencing runs produced 5.4 billion short reads with more than 99%
of samples having 90% or more of reads with a quality score of
Q>30. On average, 98.3% of the sequence reads mapped to
the reference genome per individual, and individual coverage ranged from
0X to 4.7X with an average of 2.2X. We removed individuals (N = 35) from
the dataset that exhibited any of the following: samples that failed to
sequence, indicated by a very low number of raw reads (<
1000), samples that mapped poorly to the reference genome (<
50%), and samples that had low individual coverage (< 1.0X).
We also removed five outliers in an initial PCA, which we believe may
have been misidentified. We identified two pairs of potentially related
individuals and removed one individual from each pair. The remaining 241
individuals had on average 98.7% of the sequence reads mapped to the
reference genome, and their coverage ranged from 1.0X to 4.45X with an
average of 2.5X. The number of loci used for analyses ranged from 22,902
for the PCA (SNPs present in all individuals) to 934,225,517 (all base
pairs with sufficient coverage) to calculate theta (Table S3).