Genetic variation and gene flow
To estimate nucleotide diversity, we first grouped samples into six
regions (Fig. 1A) based on geography and expansion history : Washington
(WAS), Humboldt California (HUM), Bay Area California (BAY), Central
Valley California (VAL), Pacific coast of southern California (PAC), and
eastern expansion samples (EAS). Since the number of samples can affect
estimates of genetic diversity, we downsampled each population to the
lowest sample size (N=13) by randomly selecting that number of
individuals from each population for downstream calculations. A folded
site frequency spectrum (SFS) was generated for each downsampled
population by generating a site allele frequency file using ANGSD (for
parameter details see Table S1) from which an SFS is estimated using
realSFS -fold . Finally, each SFS was used as a prior (-pest) to
estimate diversity statistics (-doTheta) in ANGSD. We estimated pairwise
divergence between samples grouped by county for counties that had at
least five individuals using ANGSD (for parameter details see Table S1)
and realSFS (fst stats) on polymorphic sites. We estimated global
heterozygosity per individual for five to ten individuals per county
(Table S2) using ANGSD (for parameter details see Table S1) and realSFS
(parameters: -fold 1) to create site frequency spectra. To assess the
direction of gene flow among the defined populations we calculated a
directionality index, ψ (Peter and Slatkin 2013). First, we created
pairwise 2D SFSs using the site allele frequency files created for each
population SFS. Then we calculated ψ using equation 1b from , which
detects mismatches between pairwise site frequency spectra indicative of
successive founder events and thus identifies geographic origins and
directionality of expansions.