Predictive Phylogeography
Prior to analyzing genomic data, we made predictions about whether or not Thuja plicata and Tsuga heterophylla are expected to harbor cryptic diversity using the random forest (RF) classifier developed for this system by Espíndola et al. (2016) and Sullivanet al. (2019). For the predictor variables, we gathered occurrence data previously used for predictive phylogeography of species in the PNW (Espíndola et al. 2016, Sullivan et al. 2019) and occurrence data from recently investigated species (Smith et al. 2017, Smith et al. 2018, Ruffley et al. 2018). These occurrence data are a combination of GBiF records and field collections and we used them to extract bioclimatic variables from WOLRDCLIM v2 (Fick & Hijmans, 2017). Along with these bioclimatic variables, taxonomic rank and discrete trait variables, such as life stage at dispersal, outcrosser or selfer, dispersal mechanism, and trophic level (Table S3, Sullivan et al.2019), were used as the predictor variables in the RF classifier.
The predicted trait (response variable) was harboring or not cryptic diversity (“cryptic” vs. “non-cryptic”). To date, twelve species complexes with disjunct ranges in the PNW have been investigated in a phylogeographic framework (Avise et al. 1987); Ascaphus truei / A. montanus (Nielson et al . 2001, Metzger et al. 2015),Plethodon idahoensis / P. vandykei (Carstens et al. 2004),Prohphysaon coeruleum (Wilke & Duncan 2004), Microtus richardsoni (Carstens et al. 2005), Dicamptodon aterrimusand complex, and D. tenebrosus (Steele et al. 2005),Salix melanopsis (Brunsfeld et al. 2006; Carstens et al. 2013), Conaphe armata (Espíndola et al. 2016),Haplotrema vancouverense ( Smith et al. 2017), Alnus rubra ( Ruffley et al. 2018) , Prophysaon dubium/andersoni ( Smith et al. 2019), Hemphillia sp.complex (Rankin et al. 2019). Of these, eight species were classified as non-cryptic, according to the respective study, meaning the species does not harbor a deep divergence between populations and has also not experienced significant gene flow between population (Table S3). The remaining 6 species/complexes were classified as cryptic because the coastal and inland populations are deeply diverged and in some cases are described as different species.
We constructed four different RF classifiers using different combinations of the predictor variables we had available: 1) bioclimatic variables only, 2) bioclimatic variables and taxonomy, 3) bioclimatic variables and life history traits, and 4) bioclimatic variables, taxonomy, and life history traits. In all of these, we are predicting the probability of a species being cryptic. We reported the average out-of-the-bag error rates for these classifiers, which is the proportion of simulations that were misclassified out of all the simulations left out of the construction of the classifier, averaged across classes.
With each of these classifiers, we predicted the presence or absence of cryptic diversity for Thuja plicata and Tsuga heterophylla , separately. To do this, we gathered occurrence records for the species in question, Thuja plicata (791; 569 GBIF records and 222 field collections (Table S4) and Tsuga heterophylla (468; 346 GBIF records and 111 field collections (Table S5). We excluded all occurrence records from GBIF that fell outside of the PNW temperate rainforest (35° to 65° latitude, −160° to −100° longitude). We used these locality coordinates to extract values from 19 bioclimatic variables from WOLRDCLIM v2 on 5 Feb 2019 (Fick & Hijmans, 2017) at a resolution of 30 arc-secs (~1 km2). We also assembled trait data to coincide with the trait data collected for PNW taxa for predictive phylogeography as in (Sullivan et al. 2019). Using these data, we use the four classifiers and followed the procedure of Sullivan et al. (2019) to predict the probability of each species being cryptic. We ultimately aimed to validate these predictions using phylogeographic model testing describe below. After validation was successful, we included the newly classified species data gathered in this study to assess whether the classifier improved in overall accuracy with the addition of two plant species.