Abstract
Questions: Traditional clustering methods generally assume data
are structured as discrete hyper-spheroidal clusters to be evaluated by
measures of central-tendency. If vegetation data do not conform to this
model, then vegetation data may be clustered incorrectly. What are the
implications for cluster stability and evaluation if clusters are of
irregular shape or density?
Location: Southeast Australia
Methods: We define mis-classification as the placement of a
sample in a cluster other than its nearest neighbour and hypothesise
that: i) optimising homogeneity incurs the cost of higher rates of
mis-classification; and ii) misclassification varies with thematic
scale. We comparied the performance of an algorithm (Chameleon) which
operates on interconnectivity and thus is sensitive to the shape and
distribution of clusters with that of three traditional algorithms over
varying scales.
Results: Chameleon-derived solutions had lower rates of
misclassification and only marginally higher heterogeneity than those of
k-means in the range 15–60 clusters, but their metrics converged at
finer thematic scales. Solutions derived by agglomerative clustering had
the best metrics (and divisive clustering the worst) but both produced
inferior high-level solutions clusters to those of Chameleon by merging
distantly-related clusters.
Conclusions: Our results suggest that Chameleon may have an
advantage over traditional algorithms at thematic scales at which data
exhibit discontinuities and variable structure, potentially producing
more stable solutions (due to lower rates of mis-classification), but
scoring lower on traditional metrics of central-tendency. Chameleon’s
advantages are less obvious in the partitioning of continuous data,
however its graph-based partitioning protocol facilitates hierarchical
integration of solutions.