4.1 Species delimitation
One of the objectives of this study was to explore the utility of a
large-scale single-locus DNA barcode analysis of the genusPolypedilum to investigate its molecular diversity and compare
the adequacy of molecular species delimitation approaches. Our results
suggest that tree-based algorithms are more suitable than
distanced-based because they are able to integrate evolutionary theory,
not requiring arbitrary thresholds (Schwarzfeld & Sperling, 2015). In
our study, ABGD and ASAP produced unreasonable delimitations, not
consistently proposing species hypotheses. These approaches are known to
over-lump, performing poorly on more speciose datasets such as ours,
whereas the success rate increases remarkably for small populations
(Dellicour & Flot, 2015; 2018). In contrast to ABGD and ASAP’s
over-lumping, the Barcode Index Number (BINs) method, assigned by BOLD,
is known to oversplit species numbers due to the low intracluster
distance (2.2%) at the initial clustering step of RESL algorithm
(Ratnasingham & Hebert, 2013). Similar results were found by Song et
al. (2018), when applying the BIN system also to delimitPolypedilum species, mostly from East Asia.
Among the drawbacks of distance-based methods is the lack of a universal
threshold that fits all taxa (Yang & Rannala, 2017). Several DNA
barcoding studies try to determine a fixed threshold value, Hebert et
al. (2004) suggested the interspecific divergences at least 10 times as
large as the intraspecific divergence the so-called “10 × rule,”.
However, it seems that different best-fit thresholds apply to different
taxonomic groups (Havermans, et al., 2011). For example, a threshold of
2-3% was indicated for some for Ephemeroptera, Plecoptera and
Trichoptera (Zhou et al., 2010), and 3-5% for some dipteran species
groups (Lin et al., 2015; Nzelu et al., 2015), while a threshold 5-8%
for species in Polypedilum was suggested by Song et al. (2018).
Another downside of distance-based approaches is that they do not
consider evolutionary relationships into their algorithms (Kapli et al.,
2017). Tree-based methods are not influenced by such thresholds, since
they use phylogenetic inference for a more precise barcode assignment
(Song et al., 2018).
Applied to our dataset, sGMYC and PTP tended to over-perform when
compared to delineations made with distance-based methods and the
morphological species concept. The Poisson Tree Process (PTP) relies on
the distribution of branch lengths in the gene tree in order to identify
species status (Zhang et al., 2013). The tree and branch lengths are
inferred from a sequence alignment using maximum likelihood and then
treated as lacking errors (Ranala & Yang, 2020). In our study, there
was a large difference between recovered MOTUs among the PTP methods.
There was a 109 MOTU difference between results based on the bPTP and
sPTP methods. mPTP was the most conservative and commonly underestimated
species by lumping singleton species, represented in our tree by
isolated branches, into MOTUs. Along with our results, other studies
have found that the mPTP algorithm leads to a lower number of recovered
species when compared with other approaches (e.g., da Silva et al. 2018,
Parslow et al. 2021).
The sGMYC analysis based on a single gene revealed the presence of 370
MOTUs (likelihood ratio: 600.4823, confidence interval: 349-383,
threshold time: -0.01053644). This species-delimitation algorithm relies
on the priors and parameters used to construct the ultrametric tree
(Ceccarelli et al., 2012), and tends to overestimate species diversity
compared to other methods (Paz & Crawford, 2012; Miralles & Vences,
2013; Talavera et al., 2013; Kekkonen & Hebert, 2014). In our study,
the sGMYC method seems to be the most accurate since it recovered
substantially fewer putative species than the bPTP and sPTP analyses
despite its hypothesized oversplitting. Moreover, the sGMYC approach has
been suggested to suit datasets with large numbers of singleton taxa
(Talavera et al., 2013), which is what we observe forPolypedilum . Based on the aforementioned considerations, we chose
the putative species delimited by the sGMYC method as the basis for the
biogeographical analyses.