INTRODUCTION

A universal attribute of all species is that their geographic distributions are limited. The abiotic and biotic factors that jointly determine this distribution are expected to be numerous, posing a serious empirical challenge to their identification and quantification. One approach is to employ species distribution models (SDM), which attempt to explain species presence data with a large set of predictor variables (Elith & Leathwick, 2009). Although the approach is generally limited to the use of environmental variables (or their proxies) as predictors of suitable habitat, SDM have provided new insights into species requirements that are akin to the “fundamental niche” (Hutchinson, 1957) by incorporating constraints set by biotic interactions (Leach, Montgomery, & Reid, 2016).
Geographic distributions are, of course, dynamic and dependent upon changing environmental conditions. Species also vary in their sensitivity to shifting environmental conditions, and will respond differently to the same changes (Hickling, Roy, Hill, Fox, & Thomas, 2006; Malcolm, Markham, Neilson, & Garaci, 2002), including the possibility of failure to track new conditions altogether (Loarie et al., 2009). While there is growing evidence of climate change-induced range shifts in many taxa, predicting its ecological and evolutionary implications remains a central challenge (Parmesan, 2006) For example, climatic variation is undoubtedly linked to natural changes in community composition over geological timescales, and there is growing evidence of rapid changes in climate being linked to the invasion and expansion of alien species (Guo, Sax, Qian, & Early, 2012). Changing climatic conditions has also led to more frequent contact between historically separate species, and this can result in hybridisation (Vallejo-Marín & Hiscock, 2016) and, in some cases, species collapse (Njiru, Mkumbo, & van der Knaap, 2010). For these reasons, the responses of hybridizing species to environmental change has been touted as a particularly important “window” on climate change (Taylor, Larson, & Harrison, 2015).
In the present paper, we took the opportunity to evaluate potential range shifts in a parapatric pair of insect species that appear to hybridize in overlapping regions of their respective ranges.Phymata  americana Melin and P. pennsylvanicaHandlirsch are two of the most common North American species in the genus (Family: Reduviidae), with the former more northerly in distribution, extending west across the American Midwest and Canadian prairies, and the latter mostly concentrated in the northeastern United States. Hybridisation in wild populations has been suspected or inferred (Punzalan & Rowe, 2017; Swanson, 2013); consistent with this, current molecular phylogenetic data fail to distinguish between the two (Masonick, Michael, Frankenberg, Rabitsch, & Weirauch, 2017; Masonick & Weirauch, 2020), despite substantial morphological divergence (Punzalan & Rowe, 2017). Both species are generalist predators occurring in temperate habitats, where they utilize a wide range of plant species as hunting sites (Balduf, 1939, 1941; Yong, 2005), suggesting considerable niche overlap. In at least one of the species, climatic variables (e.g., environmental temperature) play a key role in thermoregulation, which is linked to mating activity (Punzalan, Rodd, & Rowe, 2008a).Thermoregulatory abilities have been linked to melanic traits, and within- and between-species latitudinal variation in these traits (Punzalan & Rowe, 2015) indirectly supports the importance of climatic variables in the ecology of Phymata . Although there is evidence that their ecological requirements are consequential to their life histories, there is a shortage of knowledge regarding their ecology. Thus, there is value in understanding their habitat requirements and predicting their current and future ranges. We used biogeographic climate data and SDM to characterize the current and recent historical range of these two species, and forecast future distributions under several scenarios of anthropogenic climate change. We hypothesized that different sets of candidate abiotic factors are limiting the respective ranges of the two species, resulting in their current distributions; given anthropogenic climate warming, we hypothesized that their future overlapping ranges would increase.

MATERIALS AND METHODS

Ambush bug distribution data was compiled from specimens examined by one of the authors (DP) at the American Museum of Natural History (AMNH), Carnegie Museums of Pittsburgh, Canadian National Collection of Insects, Arachnids, and Nematodes, Royal Ontario Museum, Smithsonian National Museum, University of Guelph Insect Collection, and the University of Michigan Museum of Zoology. Identifications considered questionable by DP were excluded from subsequent analyses. These data were supplemented with information available from museum databases provided by the Spencer Entomological Collection, at the Beaty Biodiversity Museum (https://www.zoology.ubc.ca/entomology/) and the Plant Bug Inventory maintained by the AMNH (http://research.amnh.org/pbi/). We also gathered data from two citizen science websites, iNaturalist.org and BugGuide.net. Data collected from iNaturalist.org was included if the species identity was verified by Paul Masonick (UC Riverside), an authority on Phymata systematics and curator of the iNaturalist project “Uncovering the ambush bugs” (https://www.inaturalist.org/projects/uncovering-the-ambush-bugs). Data collected from BugGuide.net requires secondary identification verification before publishing on the Phymata webpage and was assumed to be accurate.
For accessions lacking latitude and longitude data, we supplemented these data manually using Google Earth. Given that bug sightings occurred at a specific point, but its accuracy was not reflected on Google Earth, coordinates were rounded to include degree and hour only (i.e., minutes and seconds were not used). This is because locality data were only accurate down to the city level, and, in some cases, were accurate down to the location of the field station, research station, or building. For the present purposes, degrees and hours should be sufficient. The localities of all sightings are mapped in Figure 1.
Although museum and citizen science databases go to lengths to confirm species identifications, the possibility of misidentifying individuals is inevitable, as is heterogeneity in sampling methods and reporting. To mitigate the effects of such errors, the datasets were inspected, and we removed any data points that were outside of the range of North America that were a result of errors in the data entry of latitudes or longitudes, as well as any species identifications that were unreliable (i.e. indicated as uncertain by the collector/photographer).
Locality data was also divided into two subsets, according to: “historical”, referring to sightings before 1970, and “current” referring to sightings from 1970-present. Preliminary inspection of the “current” and “historical” distributions of each species suggests that median latitudes of P. americana were decreasing, indicating a southward shift (Mann-Whitney U = 109360 , n = 1075 ,p -value = 0.00019), while P. pennsylvanica median latitudes were increasing, indicating a northward shift (U = 127290, n = 970, p -value = 0.00076). This was consistent with our suspicion that there are opposing changes in the ranges of both species, and motivated the current study. Additionally, visual comparisons of the historical and current distributions suggested an increase in the overlap of P. americana and P. pennsylvanica distributions (Figure S1.1 in Appendix 1). For subsequent analyses, we restricted the data from 1970-2000 (see Table S1.1, Appendix 1 in Supporting Information), and we refer to this as the current distribution. This was done in order to prevent temporal mismatch with the bioclimatic data which contained environmental data from 1970-2000.
We obtained climate data in raster form from the WorldClim database (Fick & Hijmans, 2017), which covered all global land areas except for Antarctica. The grid data was in 2.5 arcminutes (approximately 4.5 square kilometers). The updated (2.0) version of Worldclim’s current environmental data was used to investigate the influence of environmental variables on ambush bug distributions. This environmental data ranged from 1970 to 2000 and included 19 bioclimatic variables (Table S1.2, Appendix 1) derived from monthly temperature and precipitation measurements. For any pair of variables with Pearson’s r > 0.8, one variable was removed to minimize problems due to collinearity, based on numerous MAXENT runs (see below). This resulted in 11 bioclimatic variables used for each species. To map future distributions, data from future climate variables were collected from Worldclim’s original (1.4) version (as there is currently no updated 2.0 version), using the same 11 bioclimatic variables. Future data were downscaled from data created using the Community Climate System Model (CCSM4) global climate model (GCM) from the Fifth Assessment Report by the Intergovernmental Panel on Climate Change (IPCC). Climate variables were projected into 2050 and 2070, with four Representative Concentration Pathways (RCP) trajectories representing four possible climate change scenarios dependent on atmospheric greenhouse gas concentration. The best-case scenario for atmospheric greenhouse gases is represented by the lowest RCP of 2.6. As RCP increases, the atmospheric greenhouse gas concentration increases as well, up to RCP8.5, the business-as-usual scenario. Each bioclimatic variable was scaled to each of the four RCPs; the distributions from these models are mapped.
MAXENT
We used maximum entropy modeling of species’ geographic distributions (MAXENT), an approach often favoured when restricted to presence data and considered robust even when data is limited. In the absence of information about environmental conditions, we assumed the probability of a species’ occurrence within a grid was 0.5 (the default). When a species was found within a grid for which there is information about environmental conditions, MAXENT improves the model using the environmental variables.
We randomly set aside 25% of the data for the ‘training’ phase, and assumed a logistic model in all runs, which gives an estimate of the probability of ambush bug presence within a grid. Our subsequent analyses were based on the averaged values for 10 runs. A preliminary run was conducted using all 19 bioclimatic variables to identify variables that contributed 0% gain to the model. If these variables were correlated with other variables that contributed to the model, the variables with 0% gain were removed.
We inspected response curves and jackknife plots to evaluate the effect of different continuous environmental variables, and to determine the relative importance of these variables. Response curves were generated for each individual predictor while all other predictors were set at their average. Geographical projections of the models in the form of heat maps were used to visualize the predicted probability of ambush bug occurrence under current and future environmental conditions. Heat maps were then converted to binary presence/absence maps using thresholds on ArcMap (ESRI 10.6.1). The threshold used to create binary maps was “10th percentile training presence logistic threshold”. This threshold was selected as it assumes that 90% of the predicted occurrences will accurately predict the potential range, while 10% of the predicted occurrences may be erroneous. This results in a more conservative threshold and is more commonly used with species distribution data collected over a longer period of time by different observers (Rebelo & Jones, 2010).
Raster math calculations were drawn from methods used to calculate the “suitability status change index” (SSCI), adopted from (Ceccarelli & Rabinovich, 2015). In order to compare the change in suitable habitat, the future predicted distribution was subtracted from the current predicted distribution. For both P. americana and P. pennsylvanica , current suitable habitat was assigned “1” and current unsuitable habitat was assigned “0”, while future suitable habitat was assigned “2” and future unsuitable habitat was assigned “0”. The difference between current and future predicted distributions resulted in: “-1” = suitable habitat will become unsuitable; “0” = unsuitable habitat remains unsuitable; “1” = suitable habitat remains suitable”; “2” = unsuitable habitat becomes suitable.
We calculated the percent of overlap between P. americana andP. pennsylvanica, projected for 2070, to assess changes in potential contact zones. The predicted ranges of P. americana was subtracted from the predicted ranges of P. pennsylvanica and the same method was used to calculate the change in suitable habitats. The suitable habitat of P. pennsylvanica was reclassified to be “0” for unsuitable habitat, and “2” for suitable habitat, while the current suitable habitat and current unsuitable habitat remained “1” and “0” respectively for P. americana . Subtracting rasters resulted in: “-1” = suitable habitat for P. americana only; “0” = unsuitable habitat for both species; “1” = suitable habitat for both species, indicating potential overlap; “2” = suitable habitat for P. pennsylvanica only. The attributes table for each generated map gives the total number of grids for suitable and unsuitable habitat. Using these ratios, the percent change in future predicted distributions under different RCP trajectories and the degree of overlap between the two species was quantified.

RESULTS

A total of 226 observations were available within the interval for which climate data was available, 104 for P. americana and 122 forP. pennsylvanica (Table S1.1, Appendix 1). Using these data, the models produced by the MAXENT approach were statistically well supported, as the ratio of true positives (i.e., sensitivity) to false positives (i.e., 1–specificity) was maximized. Inspection of the Receiver Operating Curves (ROC), and the Area Under the Curve (AUC) for both training and test data were greater than 0.90, indicating excellent model performance (Araújo, Pearson, Thuiller, & Erhard, 2005).
For both species, precipitation and temperature were identified as the strongest predictor of occurrence (see Appendix 2). Precipitation Seasonality (BIO15), the deviation of monthly precipitation from the annual average, was the environmental variable with the largest relative percent contribution to the P. americana ranges. Response curves indicate that the highest probability of P. americana occurrence was at a lower precipitation seasonality, or localities with low variation in monthly precipitation (Figure 2). Isothermality, defined as a ratio of the diurnal temperature range to the annual temperature range, was the variable with the greatest permutation importance. ForP. pennsylvanica , precipitation was also implicated as an important factor as indicated in the response curves (Figure 2), though in this case it was Precipitation of the Driest Quarter (BIO17). The models suggest an optimal precipitation of about 200 millimeters during the driest quarter, in which P. pennsylvanica has the highest probability of occurrence. Below precipitation levels of 140 millimeters and above precipitation levels of 300 millimeters, the probability ofP. pennsylvanica occurrence decreases drastically to less than half of the probability of occurrences at optimal precipitation. Additionally, there was support for the Mean Temperature of the Coldest Quarter (BIO11) as it was the variable with the greatest permutation importance. The probability of P. pennsylvanica occurrences was greatest at a temperature of about -2°C during the coldest quarter. Above a temperature of 3°C and below a temperature of -6°C, the probability of P. pennsylvanica occurrences decrease to less than half of the probability of occurrences at optimal temperatures.
Predicted future distributions were mapped as habitat suitability models for P. americana and P. pennsylvanica at RCP8.5 (Figure 3), and the predicted changes in suitable habitats are summarized as percentages (Table 1). Generally, the percentage of suitable habitats is predicted to increase for P. americana . The greatest percent increase of suitable habitats occurs at RCP6.0, with a 2.7% increase in 2050 and a 2.0% increase in 2070, compared to current suitable habitats. The direction of the range increase is mostly northwestward and southward. The increase in the percentage of suitable habitats forP. pennsylvanica occurs at a lesser extent when projected for 2050 and 2070. Similar to P. americana , the greatest percent increase of suitable habitats for P. pennsylvanica is a 0.8% increase which occurs at RCP6.0 in 2050, and the direction of the predicted range expansion is southward. Notably, there is no change in the percentage of suitable habitats at an RCP of 4.5 predicted in 2050. However, predictions for 2070 indicate that the percentage of suitable habitat either will decrease at an RCP2.6 and RCP4.5 or remain constant at RCP6.0 and RCP8.5 relative to current suitable habitats. Although there is no predicted change in the percentage of suitable habitats, there are some fluctuations around range edges where suitable habitat is expected to become unsuitable and vice versa.
At all RCP trajectories, there is a slight increase in overlapping ranges of the two species (Figure 4 ). The largest increase in overlap is 0.7%, which occurs at RCP6.0 and RCP8.5. This translates to a contraction of regions that contain only a single species. ForP. pennsylvanica the largest reduction in predicted range area of 2.8% occurs at RCP6.0. However, at an RCP6.0, habitats that currently only contain P. americana is predicted to increase. This suggests that the changes in the amount of suitable habitat will result in the range shift of P. americana towards the range of P. pennsylvanica , resulting in a larger degree of overlap. Additionally, at RCP2.6, 4.5 and 8.5, the amount of unsuitable habitat decreases, with the greatest decrease of 0.4% at RCP8.5 This indicates that a small amount of habitat that was previously unsuitable is now an area where either one or both species can inhabit. Conversely, at RCP6.0, there is a 0.1% increase of habitat unsuitable for both species.

DISCUSSION

Distributions in a changing world
Our models predict different responses of P. americana andP. pennsylvanica to anthropogenic climate change, which may correspond to their respective niche requirements. Our forecasts predict range expansions of both species is forecasted into 2050, but with a larger range expansion for P. americana (see Appendix 3). This suggests that the ranges of both species may be able to keep up with short-term predicted climate change. But by 2070, the range expansion ofP. americana is predicted to be more modest, while P. pennsylvanica ranges were predicted to remain the same or contract. It should be noted that the occurrence and environmental data used in this study spans 1970 to 2000, which we referred to as the “current” range. Naturally, validating the predictions derived from the models will require observation and updating as the movement (presumably) proceeds.
A sometimes underappreciated underlying assumptions of the models is that species are currently in equilibrium with the environment, and that species ranges are expected to shift as a consequence of changing environmental conditions (Elith, Kearney, & Phillips, 2010; Guisan & Thuiller, 2005). For example, despite our data suggesting that P. pennsylvanica has a relatively restricted range, it is possible that both Phymata species are still responding to a recent climatic event, or have already begun responding to recent climate change at the time that occurrence and climatic data was collected. The predicted distributions for P. pennsylvanica indicates particularly prominent variation around this species’ range edges, possibly indicating that P. pennsylvanica is near its climatic optimum (Araujo & Pearson, 2005; Hutchinson, 1957). In comparison, there appears to be more habitat that satisfies the niche of P. americana given different climate change scenarios. Although it seems contradictory, at first, that the range of suitable habitat for P. pennsylvanica is not predicted to shift northward in response to forecasted environmental conditions, we propose several explanations. As suitable habitats for P. pennsylvanica are predicted to be strongly dependent on winter temperature and precipitation, it is possible that future winter conditions do not change as drastically as other environmental variables that have lesser effects, as predicted by the model. Although it remains to be seen whether the forecasted changes in climatic conditions are realized, the predicted range expansions (and potential range overlap) suggest increased hybridisation opportunities and a larger arena for competition. The potential consequences are difficult to predict, and depend on a number of factors including rates of dispersal, the fitness of hybrids, and the possibility of character displacement (Goldberg & Lande, 2007; Pfennig, Kelly, & Pierce, 2016). Such biotic interactions are not accounted for in our models, but will almost certainly have important influence on realized distributions (Hof, Jansson, & Nilsson, 2012). This also highlights a critical limitation of SDM in that they typically omit biotic interactions (Bulgarella, Trewick, Minards, Jacobson, & Morgan-Richards, 2014), and the integration of biological interactions with abiotic information remains one of the frontiers in modeling species distributions (Anderson, 2017; Elith & Leathwick, 2009). Furthermore, the models only make projections of potentially suitable habitats, but do not exclude the possibility that some populations may successfully persist at or beyond the predicted range margins of the ‘preferred’ habitat (e.g. due to local adaptation and/or metapopulation dynamics). There is also no guarantee that populations will always successfully track spatial shifts in environmental regimes, in which case the models may underestimate the possibility and rate of local extirpation. Nevertheless, our models provide a starting point for generating hypotheses, and adds to a growing recognition that the current trajectory of climate warming can have important eco-evolutionary ramifications.
Overall, our results are consistent with effects of climate change that is highly variable across species, geographic regions and over time (Menzel et al., 2002). In other taxa, a diverse spectrum of range shifts have been well documented (Chen, Hill, Ohlemüller, Roy, & Thomas, 2011). Variability in responses to different climate change scenarios at different timepoints in the future is seen in studies that have investigated both individual species (Dowling, 2015; Ning, Wei, & Feng, 2017) and groups of species (Rebelo, Tarroso, & Jones, 2010; Urbani, D’Alessandro, & Biondi, 2017). Different emissions scenarios (i.e., different RCPs) may have opposite effects on distributions, where a lower RCP induces range expansions and higher RCP projections lead to range contractions (Wang et al., 2018). Temporal variation has also been reported, where species were predicted to face extinction due to climate change at the end of the century, even though current distributions were predicted to expand (Rebelo et al., 2010). Additionally, predicted trends of range shifts may also be dependent on the amount of uncertainty incorporated in climate data sets (Parra & Monahan, 2008). For instance, our present study used four climate change projections in order to capture several potential future distributions, but there are several other projected concentration pathways that encompass a wider range of possible future greenhouse gas emissions. Due to the variability present in these predictions, modelled scenarios should be used as guides that are ultimately supplemented by additional sampling or modelling; any long-term trends may be obscured by short-term range expansions or contractions. The use of SDM such as MAXENT are critical tools for predicting range shifts but these distributions are contingent upon the emission scenarios used.
Abiotic determinants of a niche
Species that inhabit the same geographic range may exhibit high ecological similarity, but imperfect niche overlap will permit coexistence (Darwell & Althoff, 2017). The distinct yet overlapping distributions of P. americana and P. pennsylvanicasuggests that different bioclimatic variables act to limit ranges. Here, we identify the variables that are candidates for determining the ranges of P. americana and P. pennsylvanica .
Our analyses consistently implicate precipitation as an important determinant of the abiotic limits of both species, whereby P. americana and P. pennsylvanica have different optima. Our results highlight the possibility that natural selection mediated by abiotic factors may be specific to life stage (Arnold & Wade, 1984). For example, the driest quarter corresponds to the period when eggs of both species are dormant in winter diapause. During this period, eggs, which are laid on plant material at the end of summer, are likely to be near the ground, and the amount of precipitation might translate to potentially vulnerability to flooding or inundation in the following spring. Previous work in one species (Mason, 1976) has also invoked winter conditions during the egg stage as an important period for triggering phenotypically plastic responses later in life. For many organisms, strong fluctuations in the environment can be the source of severe selection, possibly explaining why our analyses also recovered variability in precipitation (during the summer, when juvenile and adult bugs are present) as important in predicting occurrences (i.e., exclusion).
Mirroring the results for precipitation, our analyses also identified mean and variability of temperature as potentially important determinants of geographic distribution. The distribution of P. pennsylvanica was particularly dependent upon the mean temperature of the coldest quarter, possibly pointing to challenges in the overwintering of eggs as a mechanism restricting P. pennsylvanicato southern latitudes. Although identification of specific mechanisms is beyond the scope of the present work, the importance of temperature and fluctuating environmental conditions is consistent with an extensive body of literature on thermal ecology in insects, including a series of studies demonstrating fluctuating and temperature-dependent selection in ambush bugs (Punzalan et al., 2008a; Punzalan, Rodd, & Rowe, 2008b, 2010).
Spatial errors in species distribution models
Errors in species occurrence data is virtually inevitable, resulting from inaccuracies in georeferencing, imprecision in latitude and longitude co-ordinates, or uncertainty in locality descriptions. However, relative to other species distribution modelling methods based on occurrence data, MAXENT has been found to maintain predictive accuracy even with locational errors. MAXENT is also less sensitive to a limited sample compared to other SDM (Wisz et al., 2008), and performs well, so long as the data is comprised of widely distributed localities. If, however, the subset of the data used is spatially biased, it may exacerbate bias in estimates. Citizen science data may be particularly prone to opportunistic collection, and hence, biased occurrence data. However, in the present study, the contributions of citizen science data limited to only three data points retained in any of the model; the subset of the observations that temporally overlapped with the available environmental data happened to be comprised mostly of museum data. The data used in the models originated from multiple sources and databases and consist of samples across much of the previously assumed range ofPhymata .
A potentially more pressing concern is the potential errors arising from species misidentification (i.e., misidentifying P. pennsylvanicaindividuals as P. americana or vice versa), as it may result in seemingly robust but inaccurate models (Lozier, Aniello, & Hickerson, 2009). MAXENT has been found to be robust against these systematic biases relative to other SDM, but nevertheless, it is for this reason that we attempted to remove data corresponding to questionable identifications, which included a large portion of the citizen science data. Future studies involving species distribution modelling could surely benefit from the addition of citizen science data (Tiago, Pereira, & Capinha, 2017), as these databases improve and provided that species occurrence data are sufficiently widely distributed.

FIGURE LIST

Figure 1. Locations of P. americana sightings in red and P. pennsylvanica sightings in blue; this includes both museum and citizen science data and spans a time frame from 1864-2018.
Figure 2. Response curves of P. americana (top) and P. pennsylvanica (bottom) to their strongest respective predictors. Red indicates the mean response averaged over the 10 replicate MAXENT runs, while blue indicates one standard deviation. For P. americana, BIO15 (Precipitation Seasonality) had the largest percent contribution, while BIO3 (Isothermality) had the largest permutation importance. For P. pennsylvanica, BIO17 (Precipitation of the Driest Quarter) had the largest percent contribution, and BIO11 (Mean Temperature of the Coldest Quarter) had the largest permutation importance.
Figure 3. Distributions of P. americana in red and P. pennsylvanica in blue. The top row depicts predicted current (1970-2000) distributions with points. The second and third row shows projected for 2050 and 2070 respectively at RCP8.5, with the lighter shades of blue and red indicate previously suitable habitats becoming unsuitable, and the darker shade of blue and red indicate previously unsuitable habitats becoming suitable.
Table 1. The percentage of change of suitable habitat under different RCP trajectories in comparison to current predicted distributions.
Figure 4. Projected overlap of distributions of P. americana and P. pennsylvanica in 2050 at RCP2.6 (top left), RCP4.5 (top right), RCP6.0 (bottom left), and RCP8.5 (bottom right). Red indicates locations that are only suitable for P. americana, blue indicates locations that are only suitable for P. pennsylvanica, and purple indicates locations that are suitable for both taxa.