Conclusion
Our study demonstrates that scale-dependent irregularities in vegetation
data can exist and potentially affect the utility and stability of
clustering solutions underlying vegetation classification schemes. The
existence of clusters of irregular shape and density implies that novel
metrics are required in their evaluation because clusters which are
‘natural’ in the sense they reflect human responses to visual cues
(Barton et al . 2019) are unlikely to score well on traditional
metrics that assume a spheroidal model (Aho et al . 2008).
Evaluating the utility of such cluster solutions requires metrics which
assess inter-connectivity rather than central tendency.
While the results presented here demonstrate the potential utility of
the Chameleon algorithm in vegetation science, there exist several
issues requiring further investigation. Although Chameleon produced
informative solutions at broad thematic scales, solutions derived with
different parameters varied markedly, and some were clearly inferior to
those of the traditional algorithms we evaluated. While its theoretical
advantages are widely cited, we found very few examples in the
literature to guide how Chameleon should be parameterised, and none
pertaining to analysis of vegetation data. Although the algorithm can be
implemented on wide range of distance metrics, we opted to import a
distance matrix which underpins Consistent Classification Sections (CCS,sensu De Cáceres et al . 2015) within our study area in
order to maximise the potential for integrating our results with those
CCSs. On the basis of our trials, we recommend the use of small
neighbourhood sizes over large and either omitting the agglomerative
phase or restricting the number of partitions to no more than twice the
number of samples in the desired solution. Although Karypis (2003)
recommended using a cluster-weighted single linkage function in the
implementation of Chameleon, we found this induced chaining in our
solutions, while the cluster-weighted complete linkage function produced
satisfactory results and we recommend this function if an agglomerative
step is employed. Further experimentation with each of these parameters
using other datasets is clearly required.
Finally, there is some uncertainty in relation to how the algorithm can
be implemented. We employed the Cluto clustering package (Karypis 2003)
distributed by Chameleon’s authors, however we noted some
inconsistencies in relation to the parameters offered compared to the
description of the algorthim (Karypis et al . 1999). Furthermore,
Barton et al. (2019) have suggested Cluto’s implementation does
not entirely embody the Chameleon concept. Barton et al. (2019)
offer an alternative implementation which deserves evaluation, although
it relies on an different partitioning algorithm because the original is
proprietary protected.
In summary, while our results support the notion the Chameleon algorithm
is theoretically better suited to the task of elucidating vegetation
classes, the characteristics of its solutions, and the ways in which
these improve upon those retrieved by traditional clustering approaches
requires further quantification. We suggest this is a worthwhile
endeavour because Chameleon offers a conceptually simple model, can
process very large datasets quickly and potentially presents a solution
to the problem of integrating plot-based classifications across
hierarchical levels.