Predictive Phylogeography
Prior to analyzing genomic data, we made predictions about whether or
not Thuja plicata and Tsuga heterophylla are expected to
harbor cryptic diversity using the random forest (RF) classifier
developed for this system by Espíndola et al. (2016) and Sullivanet al. (2019). For the predictor variables, we gathered
occurrence data previously used for predictive phylogeography of species
in the PNW (Espíndola et al. 2016, Sullivan et al. 2019)
and occurrence data from recently investigated species (Smith et
al. 2017, Smith et al. 2018, Ruffley et al. 2018). These
occurrence data are a combination of GBiF records and field collections
and we used them to extract bioclimatic variables from WOLRDCLIM v2
(Fick & Hijmans,
2017).
Along with these bioclimatic variables, taxonomic rank and discrete
trait variables, such as life stage at dispersal, outcrosser or selfer,
dispersal mechanism, and trophic level (Table S3, Sullivan et al.2019), were used as the predictor variables in the RF classifier.
The predicted trait (response variable) was harboring or not cryptic
diversity (“cryptic” vs. “non-cryptic”). To date, twelve
species complexes with disjunct ranges in the PNW have been investigated
in a phylogeographic framework (Avise et al. 1987); Ascaphus truei
/ A. montanus (Nielson et al . 2001, Metzger et al. 2015),Plethodon idahoensis / P. vandykei (Carstens et al. 2004),Prohphysaon coeruleum (Wilke & Duncan 2004), Microtus
richardsoni (Carstens et al. 2005), Dicamptodon aterrimusand complex, and D. tenebrosus (Steele et al. 2005),Salix melanopsis (Brunsfeld et al. 2006; Carstens et
al. 2013), Conaphe armata (Espíndola et al. 2016),Haplotrema vancouverense ( Smith et al. 2017), Alnus
rubra ( Ruffley et al. 2018) , Prophysaon
dubium/andersoni ( Smith et al. 2019), Hemphillia sp.complex (Rankin et al. 2019). Of these, eight species were
classified as non-cryptic, according to the respective study, meaning
the species does not harbor a deep divergence between populations and
has also not experienced significant gene flow between population (Table
S3). The remaining 6 species/complexes were classified as cryptic
because the coastal and inland populations are deeply diverged and in
some cases are described as different species.
We constructed four different RF classifiers using different
combinations of the predictor variables we had available: 1) bioclimatic
variables only, 2) bioclimatic variables and taxonomy, 3) bioclimatic
variables and life history traits, and 4) bioclimatic variables,
taxonomy, and life history traits. In all of these, we are predicting
the probability of a species being cryptic. We reported the average
out-of-the-bag error rates for these classifiers, which is the
proportion of simulations that were misclassified out of all the
simulations left out of the construction of the classifier, averaged
across classes.
With each of these classifiers, we predicted the presence or absence of
cryptic diversity for Thuja plicata and Tsuga
heterophylla , separately. To do this, we gathered occurrence records
for the species in question, Thuja plicata (791; 569 GBIF records
and 222 field collections (Table S4) and Tsuga heterophylla (468;
346 GBIF records and 111 field collections (Table S5). We excluded all
occurrence records from GBIF that fell outside of the PNW temperate
rainforest (35° to 65° latitude, −160° to −100° longitude). We used
these locality coordinates to extract values from 19 bioclimatic
variables from WOLRDCLIM v2 on 5 Feb 2019 (Fick & Hijmans,
2017)
at a resolution of 30 arc-secs (~1
km2). We also assembled trait data to coincide with
the trait data collected for PNW taxa for predictive phylogeography as
in (Sullivan et al. 2019). Using these data, we use the four
classifiers and followed the procedure of Sullivan et al. (2019) to
predict the probability of each species being cryptic. We ultimately
aimed to validate these predictions using phylogeographic model testing
describe below. After validation was successful, we included the newly
classified species data gathered in this study to assess whether the
classifier improved in overall accuracy with the addition of two plant
species.