Figure 7 . Difference in diurnal phase lag from observation. Positive values indicate that the simulated phase lag leads the observed phase lag.
4 Discussion
Our analysis shows that the DL parameterizations were able to outperform the standalone simulations for both latent and sensible heat fluxes. Most of the bulk gains in performance from the NN-based configurations stemmed from drastic improvements at sites where the SA configuration performed poorly. This is important to note, since our SA simulations were calibrated at site (and included the calibration period in the evaluation), while all NN-based simulations were trained out of sample in both time and space. This indicates that our NN-based configurations would likely be better able to represent turbulent heat fluxes in regions without measurements, implying that deep learning may be suitable for regionalization applications.
Both of the NN-based configurations represented the diurnal phase lag between shortwave radiation and turbulent heat fluxes better than SA. Renner et al. (2020) explored the ability of the land surface models used in the PLUMBER experiments (Best et al., 2015) to reproduce the observed diurnal phase lag, finding similar deviations from the observed phase lag as our SA simulations. This indicates that the NN-based approach has been able to learn something that has not been codified in PBHMs, and could provide better insight into how turbulent heat fluxes are generated at the scales that FluxNet towers operate. It is difficult to definitively state why the NN-based simulations provided more accurate simulations than SA’s process-based parameterizations. Even if the functional forms of the SA were correct, the model parameters may be difficult to determine. Zhao et al. (2019) were able to achieve good predictive performance out of a standalone (that is, not coupled to a larger model) machine-learning model that used a neural network to estimate the resistance term of the bulk transfer equations, and then computed the heat fluxes from the standard equations. Using such an approach would likely work well in the coupled setting as well.
We also found that the NN2W configuration maintained higher performance than either NN1W or SA at longer than daily timescales, as well as more accurately reproduced the observed long-term evaporative fraction. This indicates that the synergy between the deep-learned parameterization and the soil-moisture state evolution in SUMMA was able to better capture the long-term dynamics than either a purely machine-learned or purely process-based approach. This lends credibility to our proposition that the synergy between data-driven and physics-based approaches will likely lead to better simulations than a rigid adherence to either one of the methods by themselves.
These performance gains came at the cost of drastically simplifying the way in which we represented evapotranspiration. The SA simulations partition the latent heat fluxes amongst the soil, snow, and vegetation domains separately, while the NN simulations were set up to only represent the latent heat as a bulk flux, whose withdrawals we set to be taken from each soil layer according to the root density in that layer. This leads to the SA simulations being able to represent a more diverse range of conditions. While this was not a problem for the NN simulations on average, we were able to identify two locations where our simplification to the way in which ET is taken from the soil led to poor performance. At US-WCr and US-AR2 both NN configurations underestimated ET, because the soil was too dry to meet evaporative demand for much of the time. At these two sites the NN simulations performed significantly worse than the SA simulations, indicating a clear failure mode of the neural network based approach. This shortcoming might be be addressed by developing strategies that better partition the latent heat fluxes amongst the soil, snow, and vegetation domains. This would also allow for adding snow sublimation back in, reducing the number of modifications which must be made to SUMMA in order to run with an embedded neural network.
Other neural network architectures will likely lead to further performance improvements. Many recent studies that used neural networks to predict hydrologic systems have shown that Long-Short-Term-Memory (LSTM) networks are superior at learning timeseries behaviors compared to the methods used here (Feng et al., 2020; Frame et al., 2020; Jiang et al., 2020; Kratzert et al., 2018). Convolutional neural networks (CNN) have been used extensively to learn from spatially distributed fields (Geng & Wang, 2020; Kreyenberg et al., 2019; Liu & Wu, 2016; Pan et al., 2019). To take advantage of these specialized architectures in existing PBHMs like SUMMA will require the investment in tools and workflows. As of the time of writing, the FKB library only supports densely connected layers, and a few simple activation and loss functions. Implementing these layers in the FKB library, or some other framework that can be used to couple ML models with PBHMs, would open many possibilities for future research. Additionally, implementing more specialized activation functions and loss functions (such as NSE or KGE) will offer more flexibility for a wider range of applications.
Alongside better tools for incorporating machine learning into process-based models, the development and identification of workflows to perform machine and deep learning tasks will be necessary for wider adoption in the field. For instance, we initially trained the NN2W networks using the SA soil states, which were drastically different from the spun up states in the NN configurations. This led to almost identical performance in the NN1W and NN2W simulations, since the soil state information from the SA simulations was very different from what the network saw during training. Only after realizing this and training the NN2W on the states predicted by the NN1W simulations were we able to achieve better performance out of the NN2W simulations. Understanding whether there is a sort of iterative train-spinup-train workflow that balances overfitting and provides representative training data will be important for future studies.
Similarly, it is unclear whether there would be significant difficulties in trying to calibrate either of the NN-based models in new basins like we did for the SA simulations. Particularly, we do not know if the output of the neural networks is sensitive to the values of the calibration parameters. Our decision to include the calibrated parameter values in the training of the NN-based configurations was to provide the same types of information to both optimization procedures. In future studies it may be worthwhile to explore whether these parameters are necessary, or how regionalization of data driven approaches should best be codified. It is also unclear whether our NN-based configurations are able to be calibrated efficiently for other processes such as streamflow.
Finally, model architectures that separate process parameterizations in as clean a way as possible will allow for more robust and rapid development of ML parameterizations of other processes. Building modular and general purpose ways to incorporate machine learning into process-based models will allow researchers to more efficiently evaluate different approaches. Exploring and answering these practical questions will likely lead to community accepted practices which can be adopted to accelerate research of other applications.
5 Conclusions
We have shown that coupling DL parameterizations for prediction of turbulent heat fluxes into a PBHM outperforms existing physically-based parameterizations while maintaining mass and energy balance. We were able to couple our neural networks into SUMMA in two different ways, which both showed significant performance improvements when performed out of sample over the at-site calibrated standalone SUMMA simulations. The one-way coupling (NN1W), despite being conceptually simpler and not taking any model states as inputs, was able to improve simulations almost as much as the more complex two-way coupling (NN2W) at the sub-daily timescale. Both of the new parameterizations better represent the observed diurnal cycles and NN2W was better able to represent the long-term evaporative fraction as well as both turbulent heat fluxes at longer than daily timescales. We found that NN1W was also able to accurately predict sensible heat fluxes at greater than daily timescales, indicating that even “simple” DL parameterizations show great promise for coupling into PBHMs.
While we consider our new parameterizations a step forward in incorporating ML techniques into traditional process-based modeling, we have only scratched the surface on many of the different avenues which will surely be explored. We used the simplest possible network architecture, a deep-dense network. For spatial applications we suspect that CNN layers will prove invaluable. Recurrent layers such as LSTMs have been dominant in the timeseries domain. More sophisticated architectures such as neural ordinary differential equations (Ramadhan et al., 2020) or those discovered through neural architecture search (Geng & Wang, 2020) are bound to be both more efficient and interpretable than our dense networks. The opportunities for incorporating and learning from ML-based models into the hydrologic sciences are virtually untapped. We believe that as the community builds tools and workflows around the existing ML ecosystems we will be able to unlock this potential.
Acknowledgments, Samples, and Data
We would like to thank Yifan Cheng and Yixin Mao for reading and commenting on an early version of this manuscript. Their comments improved the clarity and framing of our work. The code to process, configure, calibrate/train, run, and analyze the FluxNet data is available at https://doi.org/10.5281/zenodo.4300929. The SUMMA model configuration for SA is available at https://doi.org/10.5281/zenodo.4300931. The SUMMA model configuration for NN1W is available at https://doi.org/10.5281/zenodo.4300932. The SUMMA model configuration for NN2W is available at https://doi.org/10.5281/zenodo.4300933. We would like to acknowledge high-performance computing support from Cheyenne (doi:10.5065/D6RX99HX) provided by NCAR’s Computational and Information Systems Laboratory, sponsored by the National Science Foundation.
References
Ball, J. T., Woodrow, I. E., & Berry, J. A. (1987). A Model Predicting Stomatal Conductance and its Contribution to the Control of Photosynthesis under Different Environmental Conditions. In J. Biggins (Ed.), Progress in Photosynthesis Research: Volume 4 Proceedings of the VIIth International Congress on Photosynthesis Providence, Rhode Island, USA, August 10–15, 1986 (pp. 221–224). Dordrecht: Springer Netherlands. https://doi.org/10.1007/978-94-017-0519-6_48
Best, M. J., Abramowitz, G., Johnson, H. R., Pitman, A. J., Balsamo, G., Boone, A., et al. (2015). The Plumbing of Land Surface Models: Benchmarking Model Performance. Journal of Hydrometeorology ,16 (3), 1425–1442. https://doi.org/10.1175/JHM-D-14-0158.1
Bonan, G. (2015). Ecological Climatology: Concepts and Applications . Cambridge University Press.
Brenowitz, N. D., & Bretherton, C. S. (2018). Prognostic Validation of a Neural Network Unified Physics Parameterization. Geophysical Research Letters , 45 (12), 6289–6298. https://doi.org/10.1029/2018GL078510
Camuffo, D., & Bernardi, A. (1982). An observational study of heat fluxes and their relationships with net radiation. Boundary-Layer Meteorology , 23 (3), 359–368. https://doi.org/10.1007/BF00121121
Chollet, F. (2015). Keras. Retrieved from https://github.com/fchollet/keras
Clark, M. P., Nijssen, B., Lundquist, J. D., Kavetski, D., Rupp, D. E., Woods, R. A., et al. (2015). A unified approach for process-based hydrologic modeling: 1. Modeling concept: A unified approach for process-based hydrologic modeling. Water Resources Research ,51 (4), 2498–2514. https://doi.org/10.1002/2015WR017198
Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., et al. (2011). The ERA-Interim reanalysis: configuration and performance of the data assimilation system. Quarterly Journal of the Royal Meteorological Society , 137 (656), 553–597. https://doi.org/10.1002/qj.828
Feng, D., Fang, K., & Shen, C. (2020). Enhancing streamflow forecast and extracting insights using long-short term memory networks with data integration at continental scales. ArXiv:1912.08949 [Cs, Stat] . Retrieved from http://arxiv.org/abs/1912.08949
Foken, T. (2008). The Energy Balance Closure Problem: An Overview.Ecological Applications , 18 (6), 1351–1367. https://doi.org/10.1890/06-0922.1
Frame, J., Nearing, G., Kratzert, F., & Rahman, M. (2020). Post processing the U.S. National Water Model with a Long Short-Term Memory network (preprint). EarthArXiv. https://doi.org/10.31223/osf.io/4xhac
Geng, Z., & Wang, Y. (2020). Automated design of a convolutional neural network with multi-scale filters for cost-efficient seismic data classification. Nature Communications , 11 (1), 3311. https://doi.org/10.1038/s41467-020-17123-6
Hu, C., Wu, Q., Li, H., Jian, S., Li, N., & Lou, Z. (2018). Deep Learning with a Long Short-Term Memory Networks Approach for Rainfall-Runoff Simulation. Water , 10 (11), 1543. https://doi.org/10.3390/w10111543
Jiang, S., Zheng, Y., & Solomatine, D. (2020). Improving AI System Awareness of Geoscience Knowledge: Symbiotic Integration of Physical Approaches and Deep Learning. Geophysical Research Letters ,47 (13), e2020GL088229. https://doi.org/10.1029/2020GL088229
Jung, M., Reichstein, M., & Bondeau, A. (2009). Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model, 13.
Kidston, J., Brümmer, C., Black, T. A., Morgenstern, K., Nesic, Z., McCaughey, J. H., & Barr, A. G. (2010). Energy Balance Closure Using Eddy Covariance Above Two Different Land Surfaces and Implications for CO2 Flux Measurements. Boundary-Layer Meteorology , 136 (2), 193–218. https://doi.org/10.1007/s10546-010-9507-y
Knoben, W. J. M., Freer, J. E., & Woods, R. A. (2019). Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores. Hydrology and Earth System Sciences ,23 (10), 4323–4331. https://doi.org/10.5194/hess-23-4323-2019
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., & Herrnegger, M. (2018). Rainfall-Runoff modelling using Long-Short-Term-Memory (LSTM) networks. Hydrology and Earth System Sciences Discussions , 1–26. https://doi.org/10.5194/hess-2018-247
Kreyenberg, P. J., Bauser, H. H., & Roth, K. (2019). Velocity Field Estimation on Density-Driven Solute Transport With a Convolutional Neural Network. Water Resources Research , 55 (8), 7275–7293. https://doi.org/10.1029/2019WR024833
Lathière, J., Hauglustaine, D. A., & Friend, A. D. (2006). Impact of climate variability and land use changes on global biogenic volatile organic compound emissions. Atmos. Chem. Phys. , 19.
Li, L., Wang, Y.-P., Yu, Q., Pak, B., Eamus, D., Yan, J., et al. (2012). Improving the responses of the Australian community land surface model (CABLE) to seasonal drought. Journal of Geophysical Research: Biogeosciences , 117 (G4). https://doi.org/10.1029/2012JG002038
Liu, Y., & Wu, L. (2016). Geological Disaster Recognition on Optical Remote Sensing Images Using Deep Learning. Procedia Computer Science , 91 , 566–575. https://doi.org/10.1016/j.procs.2016.07.144
Matott, L. S. (2017). OSTRICH: an Optimization Software Tool, Documentation and User’s Guide, Version 17.12.19. University at Buffalo Center for Computational Research. Retrieved from www.eng.buffalo.edu/~lsmatott/Ostrich/OstrichMain.html
Moshe, Z., Metzger, A., Elidan, G., Kratzert, F., Nevo, S., & El-Yaniv, R. (2020). HydroNets: Leveraging River Structure for Hydrologic Modeling. Retrieved from https://arxiv.org/abs/2007.00595v1
Musselman, K. N., Clark, M. P., Liu, C., Ikeda, K., & Rasmussen, R. (2017). Slower snowmelt in a warmer world. Nature Climate Change ,7 (3), 214–219. https://doi.org/10.1038/nclimate3225
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., et al. (2020). What Role Does Hydrological Science Play in the Age of Machine Learning? Water Resources Research , e2020WR028091. https://doi.org/10.1029/2020WR028091
Niu, G.-Y., Yang, Z.-L., Mitchell, K. E., Chen, F., Ek, M. B., Barlage, M., et al. (2011). The community Noah land surface model with multiparameterization options (Noah-MP): 1. Model description and evaluation with local-scale measurements. Journal of Geophysical Research: Atmospheres , 116 (D12). https://doi.org/10.1029/2010JD015139
Ott, J., Pritchard, M., Best, N., Linstead, E., Curcic, M., & Baldi, P. (2020). A Fortran-Keras Deep Learning Bridge for Scientific Computing.ArXiv:2004.10652 [Cs] . Retrieved from http://arxiv.org/abs/2004.10652
Pan, B., Hsu, K., AghaKouchak, A., & Sorooshian, S. (2019). Improving Precipitation Estimation Using Convolutional Neural Network. Water Resources Research , 55 (3), 2301–2321. https://doi.org/10.1029/2018WR024090
Pastorello, G., Trotta, C., Canfora, E., Chu, H., Christianson, D., Cheah, Y.-W., et al. (2020). The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Scientific Data ,7 (1), 225. https://doi.org/10.1038/s41597-020-0534-3
Ramadhan, A., Marshall, J., Souza, A., Wagner, G. L., Ponnapati, M., & Rackauckas, C. (2020). Capturing missing physics in climate model parameterizations using neural differential equations.ArXiv:2010.12559 [Physics] . Retrieved from http://arxiv.org/abs/2010.12559
Rasp, S., Pritchard, M. S., & Gentine, P. (2018). Deep learning to represent subgrid processes in climate models. Proceedings of the National Academy of Sciences , 115 (39), 9684–9689. https://doi.org/10.1073/pnas.1810286115
Renner, M., Brenner, C., Mallick, K., Wizemann, H.-D., Conte, L., Trebs, I., et al. (2019). Using phase lags to evaluate model biases in simulating the diurnal cycle of evapotranspiration: a case study in Luxembourg. Hydrology and Earth System Sciences , 23 (1), 515–535. https://doi.org/10.5194/hess-23-515-2019
Renner, M., Kleidon, A., Clark, M., Nijssen, B., Heidkamp, M., Best, M., & Abramowitz, G. (n.d.). How well can land-surface models represent the diurnal cycle of turbulent heat fluxes? Journal of Hydrometeorology , 1–56. https://doi.org/10.1175/JHM-D-20-0034.1
Shen, C. (2018). A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists. Water Resources Research , 54 (11), 8558–8593. https://doi.org/10.1029/2018WR022643
Tolson, B. A., & Shoemaker, C. A. (2007). Dynamically dimensioned search algorithm for computationally efficient watershed model calibration. Water Resources Research , 43 (1). https://doi.org/10.1029/2005WR004723
Tramontana, G., Jung, M., Schwalm, C. R., Ichii, K., Camps-Valls, G., Ráduly, B., et al. (2016). Predicting carbon dioxide and energy fluxes across global FLUXNET sites with regression algorithms.Biogeosciences , 13 (14), 4291–4313. https://doi.org/10.5194/bg-13-4291-2016
Wilson, K., Goldstein, A., Falge, E., Aubinet, M., Baldocchi, D., Berbigier, P., et al. (2002). Energy balance closure at FLUXNET sites.Agricultural and Forest Meteorology , 113 (1), 223–243. https://doi.org/10.1016/S0168-1923(02)00109-0
Zhao, W. L., Gentine, P., Reichstein, M., Zhang, Y., Zhou, S., Wen, Y., et al. (2019). Physics-Constrained Machine Learning of Evapotranspiration. Geophysical Research Letters , 46 (24), 14496–14507. https://doi.org/10.1029/2019GL085291