Performance of Chameleon under combinations of varying parameters-
single linkage
Trends in the mis-classification rate and average within cluster
homogeneity in Chameleon cluster solutions generated using the weighted
single-link functions are summarised in Figure 3. The misclassification
rate rose with increasing neighbourhood size (Figure 3A). This result
may reflect aberrations caused by forcing members of small clusters to
forge links with samples in other clusters as illustrated by Chameleon’s
attribution of the simulated data we presented in Figure 1 given
neighbourhoods of different sizes (Figure 4). Solutions derived by
agglomeration from 30 sub-partitions had consistently lower rates of
misclassification, but beyond 30 sub-partitions solutions became
increasing uneven (chaining) and mis-classification rates became
meaningless because a high proportion of samples were concentrated in
few clusters. The problem of chaining was not corrected by directing the
algorithm to prioritise large clusters over small in the partitioning
phase however more even clusters were produced when the cluster-weighted
complete link function was employed in the agglomerative phase of the
algorithm and subsequent analyses were performed using this option, as
described in the next section. There was no clear trend in
within-cluster homogeneity with increasing neighbourhood size when the
agglomeration phase was omitted (Figure 3B). Solutions derived by
agglomeration from 30 sub-partitions had highest homogeneity with a
neighbourhood size of 100. Beyond 30 sub-partitions the data showed no
clear trend and varied erratically depending on the uneven-ness of the
solutions.
Clusters of 15 solutions generated using the cluster-weighted complete
link function exhibited higher rates of mis-classification and lower
within-cluster homogeneity when either neighbourhood size (n) or the
number of sub-partitions (a) in the agglomerative phase were increased,
although increasing n disproportionately affected the mis-classification
rate while increasing a disproportionately affected cluster homogeneity
(Figure 5).
Both the rate of mis-classification and within-cluster homogeneity
increased with increasing thematic resolution (Figure 6). Chameleon
solutions derived using small neighbour sizes and either: modest numbers
of sub-partitions (twice the number of classes in the solution); or with
the agglomeration phase omitted, were better (lower rates of
misclassification and higher homogeneity) than those derived with the
divisive algorithm, but worse than those derived with the agglomerative
algorithm (Figure 6). However, 15- class solutions derived by Chameleon
were more even than those produced by either the agglomerative or
divisive algorithms (Figure 7). Chameleon solutions were better than
those of k-means at broad thematic scales (15 – 60 classes) but
equivalent at finer scales (90 – 250 classes). Chameleon produced more
even 15-class solutions than k- means (Figure 7).
Clusters derived by Chameleon solutions were generally characterised by
fewer diagnostic species than those derived using the traditional
algorithms (Table 2), however species diagnostic of Chameleon clusters
corresponded more with those characterising units of a reference
classification for our study area than those diagnostic of cluster
derived by agglomerative or divisive algorithms, both in the range of
units represented and with less overlap between unrelated units (Table
3a, 3b, 3c). Clusters derived by k-means retrieved units of the
reference classification with efficiency similar to Chameleon (Table
3d).