Conclusion
Our study demonstrates that scale-dependent irregularities in vegetation data can exist and potentially affect the utility and stability of clustering solutions underlying vegetation classification schemes. The existence of clusters of irregular shape and density implies that novel metrics are required in their evaluation because clusters which are ‘natural’ in the sense they reflect human responses to visual cues (Barton et al . 2019) are unlikely to score well on traditional metrics that assume a spheroidal model (Aho et al . 2008). Evaluating the utility of such cluster solutions requires metrics which assess inter-connectivity rather than central tendency.
While the results presented here demonstrate the potential utility of the Chameleon algorithm in vegetation science, there exist several issues requiring further investigation. Although Chameleon produced informative solutions at broad thematic scales, solutions derived with different parameters varied markedly, and some were clearly inferior to those of the traditional algorithms we evaluated. While its theoretical advantages are widely cited, we found very few examples in the literature to guide how Chameleon should be parameterised, and none pertaining to analysis of vegetation data. Although the algorithm can be implemented on wide range of distance metrics, we opted to import a distance matrix which underpins Consistent Classification Sections (CCS,sensu De Cáceres et al . 2015) within our study area in order to maximise the potential for integrating our results with those CCSs. On the basis of our trials, we recommend the use of small neighbourhood sizes over large and either omitting the agglomerative phase or restricting the number of partitions to no more than twice the number of samples in the desired solution. Although Karypis (2003) recommended using a cluster-weighted single linkage function in the implementation of Chameleon, we found this induced chaining in our solutions, while the cluster-weighted complete linkage function produced satisfactory results and we recommend this function if an agglomerative step is employed. Further experimentation with each of these parameters using other datasets is clearly required.
Finally, there is some uncertainty in relation to how the algorithm can be implemented. We employed the Cluto clustering package (Karypis 2003) distributed by Chameleon’s authors, however we noted some inconsistencies in relation to the parameters offered compared to the description of the algorthim (Karypis et al . 1999). Furthermore, Barton et al. (2019) have suggested Cluto’s implementation does not entirely embody the Chameleon concept. Barton et al. (2019) offer an alternative implementation which deserves evaluation, although it relies on an different partitioning algorithm because the original is proprietary protected.
In summary, while our results support the notion the Chameleon algorithm is theoretically better suited to the task of elucidating vegetation classes, the characteristics of its solutions, and the ways in which these improve upon those retrieved by traditional clustering approaches requires further quantification. We suggest this is a worthwhile endeavour because Chameleon offers a conceptually simple model, can process very large datasets quickly and potentially presents a solution to the problem of integrating plot-based classifications across hierarchical levels.