The Chameleon algorithm
Chameleon models the feature space as a k-nearest neighbour graph (sparse graph) with samples forming vertices connected by links that are proportional to pairwise similarity between samples (Figure 2). The user specifies the number of links between samples (neighbourhood range) and then in the first phase, links are progressively dissolved (in order of increasing similarity) until a user-specified number of sub-partitions has formed. In this partitioning phase the algorithm seeks to minimise the summed length of all links hence minimising the affinity between samples in different sub-partitions (Karypis, 1999). Sub-partitions are then (optionally) merged using a hierarchical agglomerative clustering algorithm to resolve the number of groups required of the solution. An advantage of this approach is that it encapsulates the concept of environmental /compositional continua by weighting cluster interconnectivity over homogeneity. That is, samples that are distantly related may still share a cluster if they are linked by strongly interconnected neighbours. One of the key features of the Chameleon algorithm is that the structure of inter-sample relationships is preserved through the partitioning phase because co-membership of sub-partitions is dependent on pairwise inter-sample connectivity. In contrast, traditional algorithms merge or split samples progressively and the outcome at each step depends on comparing samples with intermediate clusters, the compositional characteristics of which are artificial and reflect the range of the samples merged (Han et al ., 2012).