Discussion

Visual identification of instances of plumes is performed over a subset of the dataset in order to generate a labeled dataset for training the model. Due to the large number of images, the process of labeling was very time-consuming. In order to improve the efficiency of the process, statistical heuristics were used to find points of separation between images that contained plumes and those that do not. This approach whilst useful, was not always conclusive and one aspect to be wary of were the selection biases associated with any of these techniques. Other challenges associated with labeling the dataset included variations in weather and visibility patterns across different time periods which made plumes difficult to identify. Additionally to this, usage patterns vary throughout the time periods meaning that frequency, volume, and type of plumes varied considerably. We needed to be conscious of this when identifying the training set to ensure a suitably representative set of images were reviewed (by multiple different reviewers) to tag enough plumes across different conditions such that the model will be be more robust to environmental variations.