Figure 2 . A schematic representation of the model setup. Panel
a) shows the SUMMA runtime process. Parameters and meteorologic forcing
data, as well as the state variables from the previous timestep, are fed
to SUMMA to compute all fluxes, which are used to update the state
variables for the subsequent timestep. The purple box labeled
“Turbulent heat flux” highlights the process representation that we
modify in our experiment. Panel b) shows the ways we represent the
turbulent heat fluxes. One of the options from panel b) replaces the
purple box in panel a). SA is the standalone SUMMA representation, as
described in section 2.2. NN1W and NN2W are our DL-based representations
described in section 2.3. Thus, SUMMA-x represents one of the three
model configurations where x is one of SA, NN1W, or NN2W.
The first network we trained took meteorological forcing data for the
current timestep, vegetation and soil types, and the calibrated SUMMA
parameter values as input. We chose to include the calibration
parameters to provide the same information to the neural networks as was
provided to the calibrations, allowing for a more direct comparison and
because the calibrated parameter values might be a proxy for site
characteristics that can be associated with different responses among
the sites. The neural network outputs the bulk latent and sensible heat
fluxes at the half hourly timescale. We denote this network NN1W, for
Neural-Network-1-Way, because this configuration only takes
meteorological forcing data and parameters, which cannot be changed by
the rest of the SUMMA calculations. That is, the neural network provides
information about turbulent heat fluxes to SUMMA, but SUMMA does not
provide any internally-derived information to the neural network.
The second network we trained took all of the same input data as the
NN1W configuration, as well as a number of additional inputs that are
derived states taken from the output of the coupled SUMMA-NN1W
simulations. We included surface vapor pressure, leaf area index,
surface soil layer volumetric water content, depth averaged transpirable
water (as a volumetric fraction), surface soil layer temperature, depth
averaged soil temperature, and a snow-presence indicator. These
variables were chosen because they are used in the process-based SUMMA
parameterizations for either latent or sensible heat, or affect the way
in which the partitioning of the heat flux is distributed to the soil,
vegetation, or snow domains. At runtime this network uses the additional
variables as calculated internally by SUMMA, rather than the ones
provided during training from NN1W. We denote this network NN2W, for
Neural-Network-2-Way, because SUMMA internal states provide feedback to
the ML model. That is, the neural network is provided inputs which are
dependent on the state variables derived internally by SUMMA, which in
turn depend on the turbulent heat fluxes that are predicted by the
neural network.
After training each of these networks they were saved and translated
into a format that could be loaded into Fortran via the Fortran Keras
Bridge (FKB) package (Ott et al., 2020). The FKB package allows for
translation of a limited subset of Keras model files (architecture,
weights, biases, and activation functions) to be translated into a file
format which can be loaded into the FKB Fortran library which implements
several simple components for building and evaluating neural networks in
Fortran, such as the deep-dense architecture used here.
We then extended SUMMA (which is written in Fortran) to allow for the
use of these neural networks to simulate the turbulent heat fluxes.
Normally SUMMA breaks the calculation of turbulent heat fluxes into
several domains to delineate between heat exchanges in the vegetation
and soil domains. Because we estimate these as bulk quantities we
implemented this as only heat fluxes in the soil domain, and specified
that the model should skip any computation of vegetation fluxes. We then
specified that all ET resulting from the neural network’s estimate of
latent heat be taken from the soil domain as transpiration, according to
SUMMA’s internal routines. We chose this rather than taking all of the
ET as soil evaporation because this allowed for a wider range of ET
behaviors. In our simulations, the domain was split into nine soil
layers, with a 0.01 m deep top layer. In SUMMA soil evaporation is only
taken from the top soil layer and the shallow surface soil depth in our
setup would not have allowed for sufficient storage to satisfy the
predicted ET for many of the vegetated sites. Water removed as
transpiration is weighted by the root density in each soil layer, which
generally provides a large enough reservoir to satisfy the evaporative
demand predicted by the neural networks. Another side-effect of our
decision for taking all ET as transpiration is the removal of snow
sublimation from the model entirely. As we will show in the results, the
amount of snow sublimation in the SA simulations is negligible at most
of our FluxNet sites, so we believe that this is an acceptable
simplification for our initial demonstration. In cases where the neural
network predicts greater evaporation than is available in the soil SUMMA
enforces the water balance and limits the evaporation to an amount it
can satisfy. A brief comparison of the computational cost and runtimes
associated with training both NN1W and NN2W is provided in the
supporting information.
3 Results
We present our results in two categories. First, we compare the
performance of the coupled neural network simulations to the standalone
calibrated simulations (SA). We use two commonly used metrics for
determining the performance of the simulated turbulent heat fluxes, the
Nash-Sutcliffe efficiency (NSE) and Kling-Gupta efficiency (KGE) scores.
Using two metrics in tandem allows us to be sure that our results are
robust (Knoben et al., 2019). Then, we explore how the inclusion of
NN-based parameterizations for turbulent heat fluxes affects the overall
model dynamics. This analysis is crucial to ensure that the new
parameterizations do not lead to unrealistic simulations of other
processes
3.1 Performance analysis
Figure 3 shows the cumulative density functions of the performance
metrics across all sites, evaluated on the half-hourly data for all
non-gap-filled periods. For all cases we see that both NN1W and NN2W
outperformed the SA simulations. NN1W showed a median increase in NSE of
0.07 for latent heat and 0.12 for sensible heat, while NN2W showed a
median increase in NSE of 0.10 for latent heat and 0.14 for sensible
heat. Similarly, for KGE these were 0.10 (latent) and 0.21 (sensible)
for NN1W and 0.17 (latent) and 0.23 (sensible) for NN2W. Examination of
the individual KGE components (bias, variance, and correlation) shows
that the NNs showed consistent improvements in all components. Overall
we see that the NN2W configuration slightly outperforms the NN1W
configuration. However, it is possible that in both cases that there are
additional performance gains to be made with better model architectures
and/or training procedures. We will come back to this in the Discussion.