2.4 Performance of the image-matching software
To test which image-matching software most accurately matched crops of the same individual, we created two separate datasets for the Kenyan and Zimbabwean populations. To select suitable crops, we used the four-step image pre-processing method described above. We also visually inspected discarded crops to avoid missing suitable crops. We then manually identified individuals from the dataset of right flank crops, to provide a standard against which automated identifications could be compared, and randomly selected two crops per individual. To prevent similar lighting conditions and posture from creating a bias towards matching images of the same individual, we ensured selected images were taken on different days. The two generated datasets consisted of 104 individuals from the Kenyan population, and 48 individuals from the Zimbabwean population. To increase the dataset for the Zimbabwean population, we also included left-flank crops for 41 individuals and horizontally mirrored the crops to enable comparison with the right-flank crops. This increased the total number of unique flanks from the Zimbabwean population to 89. The coat pattern of wild dogs differs between right and left flanks, and we have no reason to expect that including left-flank crops would bias our results.
We analysed the Kenyan dataset with each of the three image-matching software packages: Hotspotter, WildID, and I3S-Pattern. We then analysed the Zimbabwean dataset with Hotspotter and WildID. I3S-Pattern was not tested with the Zimbabwean dataset because tests with the Kenyan dataset identified it to be considerably less accurate than the other software packages and considerably more time-consuming to input images and assign reference points.
We also examined whether image background removal increased the accuracy of WildID and Hotspotter. I3S-Pattern requires users to manually select the outline of the animal in the program and therefore was not included in this analysis, because it does not take the background into account in its default use. We compared the image-matching results obtained using images from which we manually cropped just the individuals’ flanks, with those based on crops of complete individuals from which the background was automatically removed (see Figure S2). For three of the 178 images from the Zimbabwe site, the algorithm did not crop out the wild dog, instead cropping out vegetation in the foreground. For these images, a manually cropped flank of the wild dog was used.
To compare the image-matching performance of each software package, we examined the 10 crops identified as most similar to the sample individual. We used the first 10 ranked images, as the best performing software’s accuracy started levelling off around this rank, indicating that inspecting the first 10 image matches could maximise recognition rates, while minimising the time spent visually inspecting and confirming potential matches. We used a mixed effects logistic regression to test for differences in the efficacy of the software packages. Here, the response variable was a binary variable describing whether or not an individual was successfully matched in the first 10 ranked images, and software package was the explanatory variable. Individual identity was included as a random effect to avoid pseudoreplication. Post-hoc pairwise comparisons were carried out using Tukey contrasts. This analysis was performed separately for the Zimbabwean and Kenyan datasets. Models were run using the “lme4” (v. 1.1-27.1, Bates et al., 2015) package in Program R (R Core Team, 2020, version 4.0.4).
Previous studies have shown that the image-matching performance of different software packages is affected by database size (Matthé et al., 2017). Therefore, to compare software performance on wild dogs from Kenyan and Zimbabwean populations, we randomly selected a subset of the Kenyan individuals to equal the number of identified individuals in the Zimbabwean dataset (n = 89). We then used the best performing software package identified in the previous step of the analysis to rerun the image-matching analysis for both datasets. Differences in software performance between the two populations were then assessed using a mixed effects logistic regression with a binomial link function. The response variable in the model was whether or not a match was detected in the first 10 ranked images, and study site (Kenya or Zimbabwe) was the explanatory variable. To correct for possible differences in image quality, two proxies for image quality were included in the model. Firstly, we included image size (total number of pixels of the crop) as a continuous predictor. Secondly, all images were visually scored on a scale of 1 to 3, based on how well their distinct marks could be recognised. This approach followed Nipko, Holcombe & Kelly (2020), where score 1 was given to images that were out of focus, of a moving animal, or badly lit, score 2 was given to images of intermediate quality, and score 3 was given to images where all features were clearly visible (for examples, see Figure S3). Score was included as a fixed effect, and individual identity was included as a random effect. Furthermore, a Wilcoxon Rank Sum test was performed to test for differences between the quality score of crops from Kenya and Zimbabwe. The model was fit using the “lme4” package (v 1.1-27.1, Bates et al., 2015) in R (version 4.0.4, R Core Team, 2020).

3. Results