the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Reducing Time and Computing Costs in EC-Earth: An Automatic Load-Balancing Approach for Coupled ESMs
Abstract. Earth System Models (ESMs) are intricate models employed for simulating the Earth's climate, typically constructed from distinct independent components dedicated to simulate specific natural phenomena (such as atmosphere and ocean dynamics, atmospheric chemistry, land and ocean biosphere, etc.). In order to capture the interactions between these processes, ESMs utilize coupling libraries, which oversee the synchronization and field exchanges among independent developed codes typically operating in parallel as a Multi-Program, Multi-Data (MPMD) application.
The performance achieved depends on the coupling approach, as well as on the number of parallel resources and scalability properties of each component. Determining the appropriate number of resources to use for each component in coupled ESMs is crucial for efficient utilization of the High Performance Computing (HPC) infrastructures used in climate modelling. However, this task traditionally involves manual testing of multiple process allocations by trial and error, requiring significant time investment from researchers. Thus, making the process more error-prone, and often resulting in a loss in application performance due to the complexity of the task. This paper introduces the automatic load-balance tool (auto-lb), a methodology and tool for determining the resource allocation to each component within coupled ESMs, aimed at improving the application's performance. Notably, this methodology is automatic and does not require expertise in HPC to improve the performance achieved by coupled ESMs. This is accomplished by minimizing the load-imbalance: reducing each constituent's execution cost (core-hours), as well as minimizing the core-hours wasted resulting from the synchronizations between them, without penalizing the execution speed of the entire model. This optimization is achieved regardless of the scalability properties of each constituent and the complexity of their dependencies during the coupling.
To achieve this, we designed a new performance metric called "Fittingness" to assess the performance of coupled execution evaluating the trade-off between the parallel efficiency and application throughput. This metric is intended for scenarios where optimality can depend on various criteria and constraints. Aiming for maximum speed might not be desirable if it leads to a decrease in parallel efficiency and, therefore, increasing the computational costs of simulation.
The methodology was tested across multiple experiments using the widely recognized European ESM, EC-Earth3. The results were compared with real operational configurations, such as those used for the Coupled Model Intercomparison Project Phase 6 (CMIP6) and for the European Climate Prediction Project (EUCP), and validated on different HPC platforms. All of them suggest that the current approaches lead to performance loss, and that auto-lb can achieve better results in both, execution speed and reduction of the core-hours needed. When comparing to the EC-Earth standard-resolution CPMIP6 runs, we achieved a configuration 4.7 % faster while also reducing the core-hours required by 1.3 %. Likewise, when compared to the EC-Earth high-resolution EUCP runs, the method presented showed an improvement of 34 % in the speed, with a 6.7 % reduction in the core-hours consumed.
- Preprint
(1357 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 17 Jan 2025)
-
CEC1: 'Comment on gmd-2024-155: No compliance with the policy of the journal', Juan Antonio Añel, 29 Oct 2024
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have archived your code on Git repositories in servers that do not comply with the standards for long-term archival and accessibility. Therefore, please publish your code (EC-Earth versions and the prediction script) in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Therefore, the current situation with your manuscript is irregular.
In this way, if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Also, you must include the modified Code and Data availability sections in a potentially reviewed manuscript, with the DOI of the code.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2024-155-CEC1 -
AC1: 'Reply on CEC1', Sergi Palomas, 14 Nov 2024
reply
Good afternoon,
Apologies for any inconvenience. The code is now available on a FAIR-aligned platform, Zenodo.
Here is the DOI: https://doi.org/10.5281/zenodo.14163512
Please let me know if you would like me to upload a revised manuscript with the updated "Code Availability" section, and I'll be happy to do so.
Best regards,
SergiCitation: https://doi.org/10.5194/gmd-2024-155-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 15 Nov 2024
reply
Dear authors,
Unfortunately, your reply does not address the issues I pointed out in my previous comment, and the new repository only contains the prediction script, which is useless to replicate your work. As I made clear in my previous comment, you must include in the repository the code of the EC-Earth3 model that you use in your work.
Regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2024-155-CEC2 -
AC2: 'Reply on CEC2', Sergi Palomas, 18 Nov 2024
reply
Dear Juan,
Apologies for the confusion. Regarding the EC-Earth3 model used in our work, we suggest the following wording for the "Code Availability" section of the manuscript (which includes the new repository for the prediction script and references to the Autosubmit workflow manager):
"""
The source code for the prediction script is publicly available at: https://doi.org/10.5281/zenodo.14163512 (Palomas, 2024).
The EC-Earth3 source code is accessible to members of the consortium through the EC-Earth development portal. Access to the EC-Earth3 source code can be requested from the EC-Earth community via
the EC-Earth website: http://www.ec-earth.org (last access: 18 November 2024). Model codes developed at ECMWF, such as the IFS atmospheric model, are the intellectual property of ECMWF and its member states. Therefore, access to the EC-Earth3 source code requires signing a software license agreement with ECMWF.
The version of EC-Earth used in this study is tagged as 3.3.3.1 in the repository.
The Autosubmit workflow manager is available as a Python package on PyPi (https://pypi.org/project/autosubmit/, last access: 18 November 2024), with its documentation and user guide hosted at https://autosubmit.readthedocs.io/en/master/ (last access: 18 November 2024).
"""In Latex:
\codeavailability{The source code for the prediction script is publicly available at: https://doi.org/10.5281/zenodo.14163512 \citep{prediction-script-zenodo}.
The EC-Earth3 source code is accessible to members of the consortium through the EC-Earth development portal. Access to the EC-Earth3 source code can be requested from the EC-Earth community via
the EC-Earth website: \url{http://www.ec-earth.org} (last access: 18 November 2024). Model codes developed at ECMWF, such as the IFS atmospheric model, are the intellectual property of ECMWF and its member states. Therefore, access to the EC-Earth3 source code requires signing a software license agreement with ECMWF.
The version of EC-Earth used in this study is tagged as 3.3.3.1 in the repository.The Autosubmit workflow manager is available as a Python package on PyPi (\url{https://pypi.org/project/autosubmit/}, last access: 18 November 2024), with its documentation and user guide hosted at \url{https://autosubmit.readthedocs.io/en/master/} (last access: 18 November 2024).}
This requires adding a new entry in the bibliography::
@misc{prediction-script-zenodo,
Howpublished = {sergipalomas/auto-lb\_prediction-script: version for publication (v1.0)},
author = {Palomas, S},
year = {2024},
doi = {10.5281/zenodo.14163512}
}
Which results in this line in the References:
Palomas, S.: sergipalomas/auto-lb_prediction-script: version for publication (v1.0), https://doi.org/10.5281/zenodo.14163512, 2024.We believe this aligns with what has been accepted in similar publications such as https://gmd.copernicus.org/articles/15/2973/2022/
Best regards,
SergiCitation: https://doi.org/10.5194/gmd-2024-155-AC2 -
CEC3: 'Reply on AC2', Juan Antonio Añel, 04 Dec 2024
reply
Dear authors,
Regarding your reply, it would be better if you store the version of EC-Earth that you use in this work in a Zenodo private repository. In this way, we are sure that it is permanently stored and located with a DOI, and in the meantime you keep the control on who can access it.
Please, let us know if there is something that prevents you of doing this.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2024-155-CEC3 -
AC3: 'Reply on CEC3', Sergi Palomas, 12 Dec 2024
reply
Dear Juan,
Unfortunately, we do not have the rights to upload the EC-Earth code to a private repository. The reason is that for developing and using EC-Earth, a software license from ECMWF is needed to cover the atmosphere model component IFS. This is because developing and using EC-Earth requires a software license from ECMWF to cover the atmosphere model component, IFS. This license is managed directly through the EC-Earth portal. Detailed information on obtaining the license can be found here: https://dev.ec-earth.org/projects/ecearth3/wiki/How_to_get_a_software_license_from_ECMWF.
If access to the EC-Earth code is required for review purposes, we can facilitate this through the appropriate channels.
I hope this better clarifies the situation. If this needs to be explicitly mentioned in the "Code Availability" section, we are happy to update it accordingly.Best regards,
SergiCitation: https://doi.org/10.5194/gmd-2024-155-AC3 -
CC1: 'Reply on AC3', Etienne Tourigny, 13 Dec 2024
reply
The linked webpage is only accessible to registered users of the EC-Earth development portal. We can provide details on the procedure upon request.
Citation: https://doi.org/10.5194/gmd-2024-155-CC1
-
CC1: 'Reply on AC3', Etienne Tourigny, 13 Dec 2024
reply
-
AC3: 'Reply on CEC3', Sergi Palomas, 12 Dec 2024
reply
-
CEC3: 'Reply on AC2', Juan Antonio Añel, 04 Dec 2024
reply
-
AC2: 'Reply on CEC2', Sergi Palomas, 18 Nov 2024
reply
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 15 Nov 2024
reply
-
AC1: 'Reply on CEC1', Sergi Palomas, 14 Nov 2024
reply
Model code and software
Prediction script Sergi Palomas https://earth.bsc.es/gitlab/spalomas/prediction-script
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
202 | 37 | 11 | 250 | 1 | 1 |
- HTML: 202
- PDF: 37
- XML: 11
- Total: 250
- BibTeX: 1
- EndNote: 1
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1