Figure
7 . Difference in diurnal phase lag from observation. Positive values
indicate that the simulated phase lag leads the observed phase lag.
4 Discussion
Our analysis shows that the DL parameterizations were able to outperform
the standalone simulations for both latent and sensible heat fluxes.
Most of the bulk gains in performance from the NN-based configurations
stemmed from drastic improvements at sites where the SA configuration
performed poorly. This is important to note, since our SA simulations
were calibrated at site (and included the calibration period in the
evaluation), while all NN-based simulations were trained out of sample
in both time and space. This indicates that our NN-based configurations
would likely be better able to represent turbulent heat fluxes in
regions without measurements, implying that deep learning may be
suitable for regionalization applications.
Both of the NN-based configurations represented the diurnal phase lag
between shortwave radiation and turbulent heat fluxes better than SA.
Renner et al. (2020) explored the ability of the land surface models
used in the PLUMBER experiments (Best et al., 2015) to reproduce the
observed diurnal phase lag, finding similar deviations from the observed
phase lag as our SA simulations. This indicates that the NN-based
approach has been able to learn something that has not been codified in
PBHMs, and could provide better insight into how turbulent heat fluxes
are generated at the scales that FluxNet towers operate. It is difficult
to definitively state why the NN-based simulations provided more
accurate simulations than SA’s process-based parameterizations. Even if
the functional forms of the SA were correct, the model parameters may be
difficult to determine. Zhao et al. (2019) were able to achieve good
predictive performance out of a standalone (that is, not coupled to a
larger model) machine-learning model that used a neural network to
estimate the resistance term of the bulk transfer equations, and then
computed the heat fluxes from the standard equations. Using such an
approach would likely work well in the coupled setting as well.
We also found that the NN2W configuration maintained higher performance
than either NN1W or SA at longer than daily timescales, as well as more
accurately reproduced the observed long-term evaporative fraction. This
indicates that the synergy between the deep-learned parameterization and
the soil-moisture state evolution in SUMMA was able to better capture
the long-term dynamics than either a purely machine-learned or purely
process-based approach. This lends credibility to our proposition that
the synergy between data-driven and physics-based approaches will likely
lead to better simulations than a rigid adherence to either one of the
methods by themselves.
These performance gains came at the cost of drastically simplifying the
way in which we represented evapotranspiration. The SA simulations
partition the latent heat fluxes amongst the soil, snow, and vegetation
domains separately, while the NN simulations were set up to only
represent the latent heat as a bulk flux, whose withdrawals we set to be
taken from each soil layer according to the root density in that layer.
This leads to the SA simulations being able to represent a more diverse
range of conditions. While this was not a problem for the NN simulations
on average, we were able to identify two locations where our
simplification to the way in which ET is taken from the soil led to poor
performance. At US-WCr and US-AR2 both NN configurations underestimated
ET, because the soil was too dry to meet evaporative demand for much of
the time. At these two sites the NN simulations performed significantly
worse than the SA simulations, indicating a clear failure mode of the
neural network based approach. This shortcoming might be be addressed by
developing strategies that better partition the latent heat fluxes
amongst the soil, snow, and vegetation domains. This would also allow
for adding snow sublimation back in, reducing the number of
modifications which must be made to SUMMA in order to run with an
embedded neural network.
Other neural network architectures will likely lead to further
performance improvements. Many recent studies that used neural networks
to predict hydrologic systems have shown that Long-Short-Term-Memory
(LSTM) networks are superior at learning timeseries behaviors compared
to the methods used here (Feng et al., 2020; Frame et al., 2020; Jiang
et al., 2020; Kratzert et al., 2018). Convolutional neural networks
(CNN) have been used extensively to learn from spatially distributed
fields (Geng & Wang, 2020; Kreyenberg et al., 2019; Liu & Wu, 2016;
Pan et al., 2019). To take advantage of these specialized architectures
in existing PBHMs like SUMMA will require the investment in tools and
workflows. As of the time of writing, the FKB library only supports
densely connected layers, and a few simple activation and loss
functions. Implementing these layers in the FKB library, or some other
framework that can be used to couple ML models with PBHMs, would open
many possibilities for future research. Additionally, implementing more
specialized activation functions and loss functions (such as NSE or KGE)
will offer more flexibility for a wider range of applications.
Alongside better tools for incorporating machine learning into
process-based models, the development and identification of workflows to
perform machine and deep learning tasks will be necessary for wider
adoption in the field. For instance, we initially trained the NN2W
networks using the SA soil states, which were drastically different from
the spun up states in the NN configurations. This led to almost
identical performance in the NN1W and NN2W simulations, since the soil
state information from the SA simulations was very different from what
the network saw during training. Only after realizing this and training
the NN2W on the states predicted by the NN1W simulations were we able to
achieve better performance out of the NN2W simulations. Understanding
whether there is a sort of iterative train-spinup-train workflow that
balances overfitting and provides representative training data will be
important for future studies.
Similarly, it is unclear whether there would be significant difficulties
in trying to calibrate either of the NN-based models in new basins like
we did for the SA simulations. Particularly, we do not know if the
output of the neural networks is sensitive to the values of the
calibration parameters. Our decision to include the calibrated parameter
values in the training of the NN-based configurations was to provide the
same types of information to both optimization procedures. In future
studies it may be worthwhile to explore whether these parameters are
necessary, or how regionalization of data driven approaches should best
be codified. It is also unclear whether our NN-based configurations are
able to be calibrated efficiently for other processes such as
streamflow.
Finally, model architectures that separate process parameterizations in
as clean a way as possible will allow for more robust and rapid
development of ML parameterizations of other processes. Building modular
and general purpose ways to incorporate machine learning into
process-based models will allow researchers to more efficiently evaluate
different approaches. Exploring and answering these practical questions
will likely lead to community accepted practices which can be adopted to
accelerate research of other applications.
5 Conclusions
We have shown that coupling DL parameterizations for prediction of
turbulent heat fluxes into a PBHM outperforms existing physically-based
parameterizations while maintaining mass and energy balance. We were
able to couple our neural networks into SUMMA in two different ways,
which both showed significant performance improvements when performed
out of sample over the at-site calibrated standalone SUMMA simulations.
The one-way coupling (NN1W), despite being conceptually simpler and not
taking any model states as inputs, was able to improve simulations
almost as much as the more complex two-way coupling (NN2W) at the
sub-daily timescale. Both of the new parameterizations better represent
the observed diurnal cycles and NN2W was better able to represent the
long-term evaporative fraction as well as both turbulent heat fluxes at
longer than daily timescales. We found that NN1W was also able to
accurately predict sensible heat fluxes at greater than daily
timescales, indicating that even “simple” DL parameterizations show
great promise for coupling into PBHMs.
While we consider our new parameterizations a step forward in
incorporating ML techniques into traditional process-based modeling, we
have only scratched the surface on many of the different avenues which
will surely be explored. We used the simplest possible network
architecture, a deep-dense network. For spatial applications we suspect
that CNN layers will prove invaluable. Recurrent layers such as LSTMs
have been dominant in the timeseries domain. More sophisticated
architectures such as neural ordinary differential equations (Ramadhan
et al., 2020) or those discovered through neural architecture search
(Geng & Wang, 2020) are bound to be both more efficient and
interpretable than our dense networks. The opportunities for
incorporating and learning from ML-based models into the hydrologic
sciences are virtually untapped. We believe that as the community builds
tools and workflows around the existing ML ecosystems we will be able to
unlock this potential.
Acknowledgments, Samples, and Data
We would like to thank Yifan Cheng and Yixin Mao for reading and
commenting on an early version of this manuscript. Their comments
improved the clarity and framing of our work. The code to process,
configure, calibrate/train, run, and analyze the FluxNet data is
available at https://doi.org/10.5281/zenodo.4300929. The SUMMA
model configuration for SA is available at
https://doi.org/10.5281/zenodo.4300931.
The SUMMA model configuration for NN1W is available at
https://doi.org/10.5281/zenodo.4300932. The SUMMA model
configuration for NN2W is available at
https://doi.org/10.5281/zenodo.4300933. We would like to
acknowledge high-performance computing support from Cheyenne
(doi:10.5065/D6RX99HX) provided by NCAR’s Computational and Information
Systems Laboratory, sponsored by the National Science Foundation.
References
Ball, J. T., Woodrow, I. E., & Berry, J. A. (1987). A Model Predicting
Stomatal Conductance and its Contribution to the Control of
Photosynthesis under Different Environmental Conditions. In J. Biggins
(Ed.), Progress in Photosynthesis Research: Volume 4 Proceedings
of the VIIth International Congress on Photosynthesis Providence, Rhode
Island, USA, August 10–15, 1986 (pp. 221–224). Dordrecht: Springer
Netherlands. https://doi.org/10.1007/978-94-017-0519-6_48
Best, M. J., Abramowitz, G., Johnson, H. R., Pitman, A. J., Balsamo, G.,
Boone, A., et al. (2015). The Plumbing of Land Surface Models:
Benchmarking Model Performance. Journal of Hydrometeorology ,16 (3), 1425–1442. https://doi.org/10.1175/JHM-D-14-0158.1
Bonan, G. (2015). Ecological Climatology: Concepts and
Applications . Cambridge University Press.
Brenowitz, N. D., & Bretherton, C. S. (2018). Prognostic Validation of
a Neural Network Unified Physics Parameterization. Geophysical
Research Letters , 45 (12), 6289–6298.
https://doi.org/10.1029/2018GL078510
Camuffo, D., & Bernardi, A. (1982). An observational study of heat
fluxes and their relationships with net radiation. Boundary-Layer
Meteorology , 23 (3), 359–368. https://doi.org/10.1007/BF00121121
Chollet, F. (2015). Keras. Retrieved from
https://github.com/fchollet/keras
Clark, M. P., Nijssen, B., Lundquist, J. D., Kavetski, D., Rupp, D. E.,
Woods, R. A., et al. (2015). A unified approach for process-based
hydrologic modeling: 1. Modeling concept: A unified approach for
process-based hydrologic modeling. Water Resources Research ,51 (4), 2498–2514. https://doi.org/10.1002/2015WR017198
Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P.,
Kobayashi, S., et al. (2011). The ERA-Interim reanalysis: configuration
and performance of the data assimilation system. Quarterly Journal
of the Royal Meteorological Society , 137 (656), 553–597.
https://doi.org/10.1002/qj.828
Feng, D., Fang, K., & Shen, C. (2020). Enhancing streamflow forecast
and extracting insights using long-short term memory networks with data
integration at continental scales. ArXiv:1912.08949 [Cs,
Stat] . Retrieved from http://arxiv.org/abs/1912.08949
Foken, T. (2008). The Energy Balance Closure Problem: An Overview.Ecological Applications , 18 (6), 1351–1367.
https://doi.org/10.1890/06-0922.1
Frame, J., Nearing, G., Kratzert, F., & Rahman, M. (2020). Post
processing the U.S. National Water Model with a Long Short-Term Memory
network (preprint). EarthArXiv. https://doi.org/10.31223/osf.io/4xhac
Geng, Z., & Wang, Y. (2020). Automated design of a convolutional neural
network with multi-scale filters for cost-efficient seismic data
classification. Nature Communications , 11 (1), 3311.
https://doi.org/10.1038/s41467-020-17123-6
Hu, C., Wu, Q., Li, H., Jian, S., Li, N., & Lou, Z. (2018). Deep
Learning with a Long Short-Term Memory Networks Approach for
Rainfall-Runoff Simulation. Water , 10 (11), 1543.
https://doi.org/10.3390/w10111543
Jiang, S., Zheng, Y., & Solomatine, D. (2020). Improving AI System
Awareness of Geoscience Knowledge: Symbiotic Integration of Physical
Approaches and Deep Learning. Geophysical Research Letters ,47 (13), e2020GL088229. https://doi.org/10.1029/2020GL088229
Jung, M., Reichstein, M., & Bondeau, A. (2009). Towards global
empirical upscaling of FLUXNET eddy covariance observations: validation
of a model tree ensemble approach using a biosphere model, 13.
Kidston, J., Brümmer, C., Black, T. A., Morgenstern, K., Nesic, Z.,
McCaughey, J. H., & Barr, A. G. (2010). Energy Balance Closure Using
Eddy Covariance Above Two Different Land Surfaces and Implications for
CO2 Flux Measurements. Boundary-Layer Meteorology , 136 (2),
193–218. https://doi.org/10.1007/s10546-010-9507-y
Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note:
Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta
efficiency scores. Hydrology and Earth System Sciences ,23 (10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., & Herrnegger, M.
(2018). Rainfall-Runoff modelling using Long-Short-Term-Memory (LSTM)
networks. Hydrology and Earth System Sciences Discussions , 1–26.
https://doi.org/10.5194/hess-2018-247
Kreyenberg, P. J., Bauser, H. H., & Roth, K. (2019). Velocity Field
Estimation on Density-Driven Solute Transport With a Convolutional
Neural Network. Water Resources Research , 55 (8),
7275–7293. https://doi.org/10.1029/2019WR024833
Lathière, J., Hauglustaine, D. A., & Friend, A. D. (2006). Impact of
climate variability and land use changes on global biogenic volatile
organic compound emissions. Atmos. Chem. Phys. , 19.
Li, L., Wang, Y.-P., Yu, Q., Pak, B., Eamus, D., Yan, J., et al. (2012).
Improving the responses of the Australian community land surface model
(CABLE) to seasonal drought. Journal of Geophysical Research:
Biogeosciences , 117 (G4). https://doi.org/10.1029/2012JG002038
Liu, Y., & Wu, L. (2016). Geological Disaster Recognition on Optical
Remote Sensing Images Using Deep Learning. Procedia Computer
Science , 91 , 566–575.
https://doi.org/10.1016/j.procs.2016.07.144
Matott, L. S. (2017). OSTRICH: an Optimization Software Tool,
Documentation and User’s Guide, Version 17.12.19. University at Buffalo
Center for Computational Research. Retrieved from
www.eng.buffalo.edu/~lsmatott/Ostrich/OstrichMain.html
Moshe, Z., Metzger, A., Elidan, G., Kratzert, F., Nevo, S., & El-Yaniv,
R. (2020). HydroNets: Leveraging River Structure for Hydrologic
Modeling. Retrieved from https://arxiv.org/abs/2007.00595v1
Musselman, K. N., Clark, M. P., Liu, C., Ikeda, K., & Rasmussen, R.
(2017). Slower snowmelt in a warmer world. Nature Climate Change ,7 (3), 214–219. https://doi.org/10.1038/nclimate3225
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz,
D., Frame, J. M., et al. (2020). What Role Does Hydrological Science
Play in the Age of Machine Learning? Water Resources Research ,
e2020WR028091. https://doi.org/10.1029/2020WR028091
Niu, G.-Y., Yang, Z.-L., Mitchell, K. E., Chen, F., Ek, M. B., Barlage,
M., et al. (2011). The community Noah land surface model with
multiparameterization options (Noah-MP): 1. Model description and
evaluation with local-scale measurements. Journal of Geophysical
Research: Atmospheres , 116 (D12).
https://doi.org/10.1029/2010JD015139
Ott, J., Pritchard, M., Best, N., Linstead, E., Curcic, M., & Baldi, P.
(2020). A Fortran-Keras Deep Learning Bridge for Scientific Computing.ArXiv:2004.10652 [Cs] . Retrieved from
http://arxiv.org/abs/2004.10652
Pan, B., Hsu, K., AghaKouchak, A., & Sorooshian, S. (2019). Improving
Precipitation Estimation Using Convolutional Neural Network. Water
Resources Research , 55 (3), 2301–2321.
https://doi.org/10.1029/2018WR024090
Pastorello, G., Trotta, C., Canfora, E., Chu, H., Christianson, D.,
Cheah, Y.-W., et al. (2020). The FLUXNET2015 dataset and the ONEFlux
processing pipeline for eddy covariance data. Scientific Data ,7 (1), 225. https://doi.org/10.1038/s41597-020-0534-3
Ramadhan, A., Marshall, J., Souza, A., Wagner, G. L., Ponnapati, M., &
Rackauckas, C. (2020). Capturing missing physics in climate model
parameterizations using neural differential equations.ArXiv:2010.12559 [Physics] . Retrieved from
http://arxiv.org/abs/2010.12559
Rasp, S., Pritchard, M. S., & Gentine, P. (2018). Deep learning to
represent subgrid processes in climate models. Proceedings of the
National Academy of Sciences , 115 (39), 9684–9689.
https://doi.org/10.1073/pnas.1810286115
Renner, M., Brenner, C., Mallick, K., Wizemann, H.-D., Conte, L., Trebs,
I., et al. (2019). Using phase lags to evaluate model biases in
simulating the diurnal cycle of evapotranspiration: a case study in
Luxembourg. Hydrology and Earth System Sciences , 23 (1),
515–535. https://doi.org/10.5194/hess-23-515-2019
Renner, M., Kleidon, A., Clark, M., Nijssen, B., Heidkamp, M., Best, M.,
& Abramowitz, G. (n.d.). How well can land-surface models represent the
diurnal cycle of turbulent heat fluxes? Journal of
Hydrometeorology , 1–56. https://doi.org/10.1175/JHM-D-20-0034.1
Shen, C. (2018). A Transdisciplinary Review of Deep Learning Research
and Its Relevance for Water Resources Scientists. Water Resources
Research , 54 (11), 8558–8593.
https://doi.org/10.1029/2018WR022643
Tolson, B. A., & Shoemaker, C. A. (2007). Dynamically dimensioned
search algorithm for computationally efficient watershed model
calibration. Water Resources Research , 43 (1).
https://doi.org/10.1029/2005WR004723
Tramontana, G., Jung, M., Schwalm, C. R., Ichii, K., Camps-Valls, G.,
Ráduly, B., et al. (2016). Predicting carbon dioxide and energy fluxes
across global FLUXNET sites with regression algorithms.Biogeosciences , 13 (14), 4291–4313.
https://doi.org/10.5194/bg-13-4291-2016
Wilson, K., Goldstein, A., Falge, E., Aubinet, M., Baldocchi, D.,
Berbigier, P., et al. (2002). Energy balance closure at FLUXNET sites.Agricultural and Forest Meteorology , 113 (1), 223–243.
https://doi.org/10.1016/S0168-1923(02)00109-0
Zhao, W. L., Gentine, P., Reichstein, M., Zhang, Y., Zhou, S., Wen, Y.,
et al. (2019). Physics-Constrained Machine Learning of
Evapotranspiration. Geophysical Research Letters , 46 (24),
14496–14507. https://doi.org/10.1029/2019GL085291