Comparing clustering solutions with a reference classification
We assessed the extent to which clustering solutions (15 classes) produced by each algorithm retrieved species-sets characterising the units of an established subcontinental-scale vegetation classification that covers 800,000 km2 in southeastern Australia (Keith 2004), including the study area (c. 11% of total area). The reference classification was developed from the top-down based on an extensive review of vegetation studies, field reconnaissance and qualitative synthesis of vegetation maps available at the time (Keith 2004). Its highest level of classification (vegetation formation) is based on structural/physiognomic features. Formations are subdivided into vegetation classes based on geographically distinct expressions of structural and compositional features. Fifteen of 99 vegetation classes recognised in the reference classification are mapped within the study area and are described with lists of indicative species (Keith, 2004). For each clustering solution, we identified the species diagnostic of each cluster as those with a frequency of occurrence statistically higher within the cluster samples than across the dataset as a whole (cumulative hypergeometric probability > 0.999). We compared these with the species identified as diagnostic of the reference classes, compiling a confusion matrix with the units of the respective classifications as rows and columns and cell values calculated as the proportion of reference class species that were identified as diagnostic of each cluster class.