2.2 Pre-processing steps
To automate the selection of suitable images for image-matching we
developed a five-step image pre-processing method (Figure 2).
2.2.1 Detecting and cropping individuals from images
The aim of the first step in the image pre-processing method was to
automatically detect and crop wild dog individuals from the images. To
do this, we used the Microsoft AI for Earth MegaDetector (hereafter
‘MegaDetector’, Beery, Morris & Yang, 2019) that automatically detects
and crops animals in images. We assessed the efficacy of this method by
visually recording the presence of wild dogs in a subset of 1060 images
from the Kenyan dataset and 246 images from the Zimbabwean dataset, and
comparing the results to the cropped images (hereafter ‘crops’) produced
by the MegaDetector for the same subset of images. In this way, we
obtained the MegaDetector’s number of true positives (wild dogs that
were successfully detected), false positives (detections which did not
contain a wild dog), and false negatives (wild dogs which were found by
visual inspection, but not by the MegaDetector). All images contained
wild dogs, so there were no true negatives in the dataset.
2.2.2 Aspect-ratio filtering
The aim of the second step in the image pre-processing method was to
filter out images that were unsuitable for identification due to the
individual’s body rotation in the image. We considered crops suitable
for image-matching if approximately ≥80% of the individual’s flank was
visible, and the angle between the image axis and animal’s flank was
less than approximately 30°, i.e., the flank was facing the camera.
Crops where the angle between the image axis and the animal’s flank was
more than 30° were expected to be narrower than crops suitable for
image-matching and therefore demonstrate a relatively low aspect-ratio.
By contrast, crops where the flank was concealed because the individual
was lying down, or obscured by vegetation, were expected to be
considerably wider and demonstrate a relatively high aspect-ratio. These
criteria were visually assessed for the crops that the MegaDetector
produced in the previous step. We then calculated the range of
aspect-ratios for suitable crops, i.e., where an unobscured flank was
facing the camera, using the “jpeg” package (Urbanek, 2021) in Program
R (version 4.0.4, R Core Team, 2020). Images with an aspect-ratio
outside of this range were removed from the dataset.
2.2.3 Selecting standing individuals
Not all sitting or lying individuals could be filtered out solely using
image aspect-ratios. Therefore, the aim of the third step in the image
pre-processing method was to filter out the remaining crops that were
unsuitable for identification because the individual’s body position,
i.e. sitting or lying, obscured the full coat pattern. To do this, we
trained a Convolutional Neural Net (CNN) to classify crops as either a
standing wild dog or a sitting wild dog. To obtain data to train this
image classifier, we used the full image catalogues from both sites (n =
11205). The crops produced by steps 1 and 2 of the pre-processing (n =
21745) were then manually classified as either containing a standing
wild dog (n = 13500) or sitting wild dog (n = 6512). We removed all
crops depicting anything other than wild dogs (e.g., birds, rocks or
logs), or wild dogs where it could not be confirmed whether they were
standing or sitting, because only a small part of the animal was visible
(n = 1733). We trained a CNN using the remaining 20012 pre-processed
crops, to classify these as containing either a standing wild dog or
not. The CNN was made using Tensorflow (Abadi et al., 2016) in Python
(Version 3.6.10). The model was trained with 16012 crops, validated with
2000 crops, and tested with 2000 crops.
CNN’s consist of convolutional layers (Albawi, Mohammed & Al-Zawi,
2017): filter layers which digitally ‘slide’ over the image and aim to
recognise specific features. The convolutional layers pass a map of
specific features to the next layer, a Max Pooling layer. The Max
Pooling layer reduces the resolution of this feature map, thus reducing
the importance of the position of features within this map. This step
can help prevent the model from becoming too fine-tuned to the training
data, which causes over-fitting and reduces the generalisability of the
classifier. After this, a dropout layer is applied, which randomly
removes 50% of connections made between layers. This benefits the model
by teaching it to recognise robust features, again preventing
over-fitting. The data are then passed on to a flattening layer, which
turns the data into a 1-dimensional string, which is passed onto the
final two layers. Firstly, the string goes through a layer which
connects all the data from the previous layer and produces prediction
scores from the inputs. Secondly, another layer turns these scores into
a single prediction: standing, or not standing (for a more detailed
description of CNNs, see O’Shea & Nash, 2015 and Albawi, Mohammed &
Al-Zawi, 2017).
The number of convolutional layers and the size of the filters that they
comprise was optimised using KerasTuner (O’Malley et al., 2019).
KerasTuner runs CNNs with a range of values, and automatically selects
the model with the highest validation accuracy, i.e., the proportion of
correct classifications on the validation database. KerasTuner ran CNNs
with between one and three convolutional layers, with 16, 32 and 64
filters per layer, and with a kernel size (the number of pixels in the
filters) of 3x3 pixels. This was done for 20 different random
combinations for the number of convolutional layers and number of
filters per layer. Test runs showed that the maximum accuracy was
reached before the 70th epoch, and therefore each
combination was run for 70 epochs, meaning that the training data were
passed through the CNN 70 times. The learning rate of the model, that
is, the speed at which the model improved itself, was also optimised
with KerasTuner, testing a rate of 10-3,
10-4, and 10-5, with the optimal
number of convolutional layers. The model with the highest test-accuracy
was selected as the final model.
2.2.4 Separating left and right flanks
The aim of the fourth step in the image pre-processing method was to
separate crops depicting left- and right-flanks of a wild dog, because
image-matching software packages can only match images for one side of
the animal. To do this, we made another CNN to automate the separation
of left- and right-flanks. To obtain training data for this CNN, we
visually classified all crops of standing dogs used for the CNN in step
three (n = 12357) whose side was facing the camera, as showing the right
(n = 6140) or left flank (n = 6217). We optimised this CNN’s parameters
as described in step three of the image pre-processing method, using
KerasTuner to find the optimal number of convolutional layers and
learning rate. Each CNN ran for 100 epochs, because test runs showed
that this model took longer than the previous model to reach its maximum
accuracy. The first layer of this CNN was an average pooling layer, a
layer which reduced the resolution of the input images by a factor of
four, which prevents overfitting. This layer was added to this CNN,
because preliminary runs showed this CNN was more prone to overfitting
than the CNN developed in step three of the image pre-processing method.
We used 9857 crops as training data, 1246 as validation data, and 1246
as testing data. All other layers were equal to the previous CNN. For
the full model conditions, see Table S1 in Supporting Information.
2.2.5 Image background removal
Lastly, we removed the image backgrounds of suitable images using the
“rembg” package in Python (Gatis, 2020). We removed image backgrounds
to remove the risk of the background confounding image-matching results,
while eliminating the need to manually select an individual’s flank.