the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Fortran-Python Interface for Integrating Machine Learning Parameterization into Earth System Models
Abstract. Parameterizations in Earth System Models (ESMs) are subject to biases and uncertainties arising from subjective empirical assumptions and incomplete understanding of the underlying physical processes. Recently, the growing representational capability of machine learning (ML) in solving complex problems has spawned immense interests in climate science applications. Specifically, ML-based parameterizations have been developed to represent convection, radiation and microphysics processes in ESMs by learning from observations or high-resolution simulations, which have the potential to improve the accuracies and alleviate the uncertainties. Previous works have developed some surrogate models for these processes using ML. These surrogate models need to be coupled with the dynamical core of ESMs to investigate the effectiveness and their performance in a coupled system. In this study, we present a novel Fortran-Python interface designed to seamlessly integrate ML parameterizations into ESMs. This interface showcases high versatility by supporting popular ML frameworks like PyTorch, TensorFlow, and Scikit-learn. We demonstrate the interface's modularity and reusability through two cases: a ML trigger function for convection parameterization and a ML wildfire model. We conduct a comprehensive evaluation of memory usage and computational overhead resulting from the integration of Python codes into the Fortran ESMs. By leveraging this flexible interface, ML parameterizations can be effectively developed, tested, and integrated into ESMs.
- Preprint
(4010 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
CEC1: 'Comment on gmd-2024-79', Juan Antonio Añel, 20 Jun 2024
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlWe have detected two main problems. The first one is that you have archived the E3SM code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other alternatives for long-term archival and publishing, such as Zenodo.Â
The second problem is that given that your manuscript deals with machine learning techniques, you need to publish the training data to assure the replicability of your work. Again, the training datasets must be published in one of the repositories mentioned in our policy.
Therefore, please, publish your code and data in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Therefore, the current situation with your manuscript is irregular. If you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Finally, the Code and Data Availability section in your manuscript reads "Data Availability Statement". You have missed the part of the "Code", please, add it in any reviewed version.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2024-79-CEC1 -
AC1: 'Reply on CEC1', Tao Zhang, 22 Jun 2024
Dear editor,Â
We have uploaded the E3SM codes and training data into zenodo. Please find the details:Â
1. E3SM codes: https://zenodo.org/records/12175988, https://doi.org/10.5281/zenodo.12175988
2. Dataset for machine learning trigger function: https://zenodo.org/records/12205917, https://doi.org/10.5281/zenodo.12205917
3. Dataset for machine learning wild fire: https://zenodo.org/records/12212258, https://doi.org/10.5281/zenodo.12212258
Â
Thanks,
Tao Â
Citation: https://doi.org/10.5194/gmd-2024-79-AC1
-
AC1: 'Reply on CEC1', Tao Zhang, 22 Jun 2024
-
RC1: 'Comment on gmd-2024-79', Anonymous Referee #1, 05 Oct 2024
Summary
My understanding is that this paper's main purpose is to introduce and describe a general approach for coupling Python-based software components to primarily Fortran-based Earth System Models. Â This is a problem that has become of increasing interest in recent years, with the advent of machine-learning based parameterizations, which are often most convenient to write, train, and evaluate with various different Python frameworks. Â As the authors note, multiple strategies have been employed for coupling these ML models to ESMs in previous studies. Â The main contribution of the authors here is to describe an approach which involves writing Cythonized Python functions to carry out initialization and execution of an ML model, C functions which call those Cythonized Python functions as a bridge, and ultimately Fortran functions which bind to those C functions, which can be called from anywhere in the Fortran model. Â They document the approach some, and then they describe using this framework in real-world scientific applications involving coupling ML parameterizations for a convective trigger function or fire burned area to E3SM, as well as some benchmarking test cases. Â
I have to admit that I found the scope of this paper to be somewhat sprawling. Â Documenting this approach for coupling ML models to Fortran models seems valuable, but I think more space and detail could have been devoted to that, with less space devoted to the details of the scientific applications, which the authors note will be described elsewhere. Â In terms of benchmarks, the direct comparison to the CFFI coupling approach seemed quite relevant, but other aspects like the impact of ML model type and complexity on performance seemed somewhat orthogonal to the choice of coupling method. Â For example, it does not seem surprising that more complex ML models will be more computationally expensive, regardless of the coupling approach. Â Maybe there is something I am missing about the motivation that the authors could describe more clearly, but as it stands now, I would like to see improvements to the focus of the manuscript before condsidering recommending publishing.
General comments
- I found Table 1 to be somewhat vague. Â I feel like a simplified toy code example would go a long way toward illustrating what is required and how everything fits together. Â To me, for this paper, this is more important than the scientific details of the case studies, which the authors note will be described more fully in forthcoming papers. Â There is value in noting that this coupling approach has been successfully used in each of these real-world applications, but I do not think much more needs to be said beyond the general idea of each project, what kind of ML model is used in each, and maybe what the inputs and outputs are. Â In other words, Figures 4-7, illustrating the structure of these models and skill when they are coupled online, and much of the paragraphs that go along with them, feel outside the scope of this paper.
- What is the intended takeaway of the performance experiments with different types of ML models in Section 3.3? Â Is this not something that could be learned by profiling the computational performance of the ML models in isolation? Â There maybe is some value in documenting the relative cost of a typical ML model to a typical climate model simulation, but to some extent one can already get the sense for this through Figure 8(a) or previous ML parameterization papers. Â In practice the tradeoff will always need to be assessed on a case-by-case basis regarding whether the improvement in hybrid model skill justifies the additional computational cost of the ML model (i.e. this kind of discussion seems better suited for an application-specific paper). Â
Specific comments
Lines 79-84: I am not sure I follow the discussion in these lines.  As I understand it, the key advance of Kochkov et al. (2023) is that their entire model—both the physics-based dynamics and ML-based physics—is differentiable, enabling feedbacks between the two to be felt and accounted for in training.  This is more significant than merely enabling greater flexibility in the ML model one can couple to a Fortran-based GCM.  So long as the GCM is still written in legacy Fortran I do not think there is anything that can be done to easily enable differentiation through the entire hybrid model.  In other words, you will still need to train the ML model in a purely "offline" sense.  A software interface between the ML model and the GCM—however hard-coded or flexible it is—merely enables online testing, which is no doubt important, but not the same as enabling coupling during training.
Lines 115-117: I think it is fair to say that this approach offers access to calling any Python code from Fortran, of which the ML frameworks listed are obviously just a subset. Â The phrasing of this line makes it sound as though there is some flexibility, but some frameworks might not be supported.
Line 131: "[...] without disrupting the Fortran infrastructure."  This feels maybe a bit overstated—beyond calling the ML code itself within Fortran—which is maybe self-evident—the build system of the now hybrid Fortran/Python model needs to be updated to support these changes, which is not always trivial (e.g. it might be a little easier to build in a bespoke Fortran implementation of an ML model even though that is obviously much less flexible).
Lines 338-339: if it is to be included here, I think it should be noted that XGBoost is a totally different type of ML model than the CNNs implemented in PyTorch or TensorFlow, so it is not really an apples-to-apples comparison for computational performance. Â This is sort of alluded to in Line 350, but I think it could be made more explicit.
Lines 334-336: as I am sure you are aware, for pure Python, this is true, but most packages designed for numerical computation wrap C/C++ or Fortran.  This is something that is also somewhat orthogonal to the framework one uses for coupling—if the Python code is a bottleneck, it will be a bottleneck no matter how it is coupled.  To truly test the degree to which implementation language was a bottleneck one would need a baseline where the identical ML model was evaluated directly in Fortran (like in Rasp et al., 2018).
Lines 358-359: do you know if is this a deep fundamental issue with TensorFlow (i.e. hard to fix)?
Lines 373-383: for provenance it could be useful to see the code used to perform these tests. Â As far as I can tell it is not included in the Zenodo archive.
Figure 11: it is sort of surprising that the single column model is slower than the ne4 configuration. Â Is there not a way to get it to run faster than ne4?
Data Availability Statement: I understand the long-term value of storing the code in a Zenodo archive, but could you also include a link to the code on GitHub? Â This makes it easier for people to quickly read and review, rather than downloading and unpacking the code from Zenodo.
Technical corrections
Line 217: "applied it two" -> "applied it in two"
Line 218: "CAPE-based trigger function in deep convection" -> "CAPE-based trigger function in a deep convection"
Line 219: "machine-learnt" -> "machine-learned"
Line 404: "A same" -> "The same"
Line 427: "Compassion CNTL" -> "Comparison of CNTL"
References
Kochkov, D., Yuval, J., Langmore, I., Norgaard, P., Smith, J., Mooers, G., Klöwer, M., Lottes, J., Rasp, S., Düben, P., Hatfield, S., Battaglia, P., Sanchez-Gonzalez, A., Willson, M., Brenner, M. P., & Hoyer, S. (2024). Neural general circulation models for weather and climate. Nature, 632(8027), 1060–1066. https://doi.org/10.1038/s41586-024-07744-y.
Rasp, S., Pritchard, M. S., & Gentine, P. (2018). Deep learning to represent subgrid processes in climate models. Proceedings of the National Academy of Sciences, 115(39), 9684–9689. https://doi.org/10.1073/pnas.1810286115.
Citation: https://doi.org/10.5194/gmd-2024-79-RC1 - AC2: 'Reply on RC1', Tao Zhang, 17 Nov 2024
-
RC2: 'Comment on gmd-2024-79', Anonymous Referee #2, 14 Oct 2024
Review of Zhang et al.
Zhang et al present a Fortran-Python interface for machine learning applications in Earth System Models. In general, the paper is well written and addresses and important tehcnical gap. The paper should be published following the authors addressing a few concerns.
Â
Impact on compute times
An explicit assessment of computational cost as a function of ML model parameters would strengthen this work greatly. The authors mention this, and perform a toy analogue, but in my experience there are often severe consequences from moving between simple case-study analogues and real implementation.
The lack of GPU implementation further weakens the manuscript. I understand that this might be out of scope, but the authors should more directly confront this limitation in the text.
Figure 11. Why is the ne4 gap so much larger?
Â
Minor Comments/Requests for Clarification
Figure 1. This is extremely similar to the FKB logo: https://github.com/scientific-computing/FKB I suggest changing it. The bridge is unnecessary and borders on copying.
Line 197: Is the ML calls operating on chunks a requirement of the ML model structure? Do I need to design an ML model to predict output on chunks?
Line 479: Gradient computation is not the only reason for model complexity issues, particularly for the case here where models are used for inference.
Citation: https://doi.org/10.5194/gmd-2024-79-RC2 - AC3: 'Reply on RC2', Tao Zhang, 17 Nov 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
717 | 205 | 206 | 1,128 | 16 | 19 |
- HTML: 717
- PDF: 205
- XML: 206
- Total: 1,128
- BibTeX: 16
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1