Genetic variation and gene flow
To estimate nucleotide diversity, we first grouped samples into six regions (Fig. 1A) based on geography and expansion history : Washington (WAS), Humboldt California (HUM), Bay Area California (BAY), Central Valley California (VAL), Pacific coast of southern California (PAC), and eastern expansion samples (EAS). Since the number of samples can affect estimates of genetic diversity, we downsampled each population to the lowest sample size (N=13) by randomly selecting that number of individuals from each population for downstream calculations. A folded site frequency spectrum (SFS) was generated for each downsampled population by generating a site allele frequency file using ANGSD (for parameter details see Table S1) from which an SFS is estimated using realSFS -fold . Finally, each SFS was used as a prior (-pest) to estimate diversity statistics (-doTheta) in ANGSD. We estimated pairwise divergence between samples grouped by county for counties that had at least five individuals using ANGSD (for parameter details see Table S1) and realSFS (fst stats) on polymorphic sites. We estimated global heterozygosity per individual for five to ten individuals per county (Table S2) using ANGSD (for parameter details see Table S1) and realSFS (parameters: -fold 1) to create site frequency spectra. To assess the direction of gene flow among the defined populations we calculated a directionality index, ψ (Peter and Slatkin 2013). First, we created pairwise 2D SFSs using the site allele frequency files created for each population SFS. Then we calculated ψ using equation 1b from , which detects mismatches between pairwise site frequency spectra indicative of successive founder events and thus identifies geographic origins and directionality of expansions.