Are ‘natural’ clusters necessarily less homogeneous?
Although a degradation of cluster homogeneity is implicit in our model,
the degree to which this is realised is likely to be highly dependent on
the structure of individual datasets. In our case study, the
mis-classification rate achieved by Chameleon was half that of k-means
at the cost of a 10% reduction in cluster homogeneity. We speculate
that if the clusters Chameleon retrieved in our dataset are indeed
irregular shapes, then our results suggest they are unlikely to be
highly elongated, and variability in our data structure tends toward
uneven density rather than irregular shape.
The question of whether ‘natural’ clusters necessarily have fewer
diagnostic species is more difficult to resolve based on our analyses.A priori , we inclined to the notion that more heterogenous
clusters would mean fewer diagnostic species, the pattern reflected in
our results, however Schmidtlein et al. (2010) demonstrated that
Isopam, an algorithm that adapts to irregular cluster shapes,
consistently out-performed other algorithms in terms of the number of
indicator species (sensu Dufrêne & Legendre 1997) and was also
highly ranked in terms of the number of species with standardized phi
>0.35 (Tichy ̵́ & Chytry ̵́ 2006). Higher numbers of
diagnostic species could reflect the sampling of a wider species pool,
since samples sharing no species can occupy the same cluster if comprise
an interconnected neighbourhood (Schmidtlein et al. 2010).
However, it is not clear that higher numbers of diagnostic species is
not an artefact of Isopam’s partitioning of the ordinations space by
medoids, notwithstanding the fact the ordination axes are adjusted to
accommodate non-linearities (and hence irregularities).
On the evidence of our results, we conclude that our original contention
is supported, that cluster solution derived by algorithms sensitive to
data structure are unlikely to be as compact or homogenous as those
derived by optimising central tendency, although the differences may not
always be pronounced, depending on the characteristics of individual
datasets and thematic scale of investigation. In that case, we suggest
that further research is required into metrics which give insights into
how well cluster solutions model the structure of vegetation data (eg
within-cluster inter-connectedness, mis-classification rates) to better
understand the potential trade-offs involved in maximising homogeneity
or indicator values.