4.1 Species delimitation
One of the objectives of this study was to explore the utility of a large-scale single-locus DNA barcode analysis of the genusPolypedilum to investigate its molecular diversity and compare the adequacy of molecular species delimitation approaches. Our results suggest that tree-based algorithms are more suitable than distanced-based because they are able to integrate evolutionary theory, not requiring arbitrary thresholds (Schwarzfeld & Sperling, 2015). In our study, ABGD and ASAP produced unreasonable delimitations, not consistently proposing species hypotheses. These approaches are known to over-lump, performing poorly on more speciose datasets such as ours, whereas the success rate increases remarkably for small populations (Dellicour & Flot, 2015; 2018). In contrast to ABGD and ASAP’s over-lumping, the Barcode Index Number (BINs) method, assigned by BOLD, is known to oversplit species numbers due to the low intracluster distance (2.2%) at the initial clustering step of RESL algorithm (Ratnasingham & Hebert, 2013). Similar results were found by Song et al. (2018), when applying the BIN system also to delimitPolypedilum species, mostly from East Asia.
Among the drawbacks of distance-based methods is the lack of a universal threshold that fits all taxa (Yang & Rannala, 2017). Several DNA barcoding studies try to determine a fixed threshold value, Hebert et al. (2004) suggested the interspecific divergences at least 10 times as large as the intraspecific divergence the so-called “10 × rule,”. However, it seems that different best-fit thresholds apply to different taxonomic groups (Havermans, et al., 2011). For example, a threshold of 2-3% was indicated for some for Ephemeroptera, Plecoptera and Trichoptera (Zhou et al., 2010), and 3-5% for some dipteran species groups (Lin et al., 2015; Nzelu et al., 2015), while a threshold 5-8% for species in Polypedilum was suggested by Song et al. (2018). Another downside of distance-based approaches is that they do not consider evolutionary relationships into their algorithms (Kapli et al., 2017). Tree-based methods are not influenced by such thresholds, since they use phylogenetic inference for a more precise barcode assignment (Song et al., 2018).
Applied to our dataset, sGMYC and PTP tended to over-perform when compared to delineations made with distance-based methods and the morphological species concept. The Poisson Tree Process (PTP) relies on the distribution of branch lengths in the gene tree in order to identify species status (Zhang et al., 2013). The tree and branch lengths are inferred from a sequence alignment using maximum likelihood and then treated as lacking errors (Ranala & Yang, 2020). In our study, there was a large difference between recovered MOTUs among the PTP methods. There was a 109 MOTU difference between results based on the bPTP and sPTP methods. mPTP was the most conservative and commonly underestimated species by lumping singleton species, represented in our tree by isolated branches, into MOTUs. Along with our results, other studies have found that the mPTP algorithm leads to a lower number of recovered species when compared with other approaches (e.g., da Silva et al. 2018, Parslow et al. 2021).
The sGMYC analysis based on a single gene revealed the presence of 370 MOTUs (likelihood ratio: 600.4823, confidence interval: 349-383, threshold time: -0.01053644). This species-delimitation algorithm relies on the priors and parameters used to construct the ultrametric tree (Ceccarelli et al., 2012), and tends to overestimate species diversity compared to other methods (Paz & Crawford, 2012; Miralles & Vences, 2013; Talavera et al., 2013; Kekkonen & Hebert, 2014). In our study, the sGMYC method seems to be the most accurate since it recovered substantially fewer putative species than the bPTP and sPTP analyses despite its hypothesized oversplitting. Moreover, the sGMYC approach has been suggested to suit datasets with large numbers of singleton taxa (Talavera et al., 2013), which is what we observe forPolypedilum . Based on the aforementioned considerations, we chose the putative species delimited by the sGMYC method as the basis for the biogeographical analyses.