the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The Real Challenges for Climate and Weather Modelling on its Way to Sustained Exascale Performance: A Case Study using ICON (v2.6.6)
Abstract. The weather and climate model ICON (ICOsahedral Nonhydrostatic) is being used in high resolution climate simulations, in order to resolve small-scale physical processes. The envisaged performance for this task is 1 simulated year per day for a coupled atmosphere-ocean setup at global 1.2 km resolution. The necessary computing power for such simulations can only be found on exascale supercomputing systems. The main question we try to answer in this article is where to find sustained exascale performance, i. e. which hardware (processor type) is best suited for the weather and climate model ICON and consequently how this performance can be exploited by the model, i. e. what changes are required in ICON’s software design so as to utilize exascale platforms efficiently. To this end, we present an overview of the available hardware technologies and a quantitative analysis of the key performance indicators of the ICON model on several architectures. It becomes clear that domain decomposition-based parallelization has reached the scaling limits, leading us to conclude that the performance of a single node is crucial to achieve both better performance and better energy efficiency. Furthermore, based on the computational intensity of the examined kernels of the model it is shown that architectures with higher memory throughput are better suited than those with high computational peak performance. From a software engineering perspective, a redesign of ICON from a monolithic to a modular approach is required to address the complexity caused by hardware heterogeneity and new programming models to make ICON suitable for running on such machines.
- Preprint
(698 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
RC1: 'Comment on gmd-2024-54', Anonymous Referee #1, 28 May 2024
reply
Adamidis et al. presents benchmarking results of the ICON model on different hardware; CPU, GPU, and Vector. This is an interesting comparison, especially given the direction of travel for HPC hardware and the use of accelerators.
It is mostly well written, and I recommend its publication in GMD given some minor revisions, mainly around the figures, which is where the majority of my issues with the manuscript are.
Figure 1 is a nice-looking figure, but I don't believe it adds anything to the discussion and is in fact misleading as it isn't really saying anything of substance at all. It doesn't reflect the point made in the text: "the appropriate programming model becomes a difficult task as not all of them support all types of accelerators" (lines 67-68), but the diagram just shows a wall with each brick representing a different programming model and arrows going indiscriminately to different hardware. It would instead be useful to show which programming models work on which hardware, either through a different diagram or a table.
Figure 2 likewise looks very nice but doesn't really contain any substance about how the ICON code is currently and how ICON-C will change the code structure and apply the different programming models. I would rather for the authors show code snippets or pseudocode to illustrate how the modules could be ported to the new architectures using the different programming models, and how would this be done, e.g. manually or using a code writer? Other models, such as the LFRic weather and climate model (https://www.metoffice.gov.uk/research/approach/modelling-systems/lfric), are using tools such as PSyclone (https://psyclone.readthedocs.io) to try to achieve performance portability of the Fortran source code on different architectures.
Figure 3 is quite confusing. The ideal strong scaling curves are very difficult to see as they are a light grey colour (the same as the gridlines behind). The point type used (left- and right-facing triangles) are difficult to differentiate and the fact that the colours are slightly different for the global and nested domains, but not different enough, makes it hard to unpick this. The different line styles are also not discussed. The authors could consider making the plot much larger, using very different point styles, mentioning the different line styles, and perhaps separating out the global and nested results into different sub-figures.
Figure 4 is very interesting given the results in Figure 3 and the core numbers in Table 2. I would suggest highlighting the number of cores in the discussion around GPUs for Figure 3. It might also be helpful to highlight on Figure 4 the number of cores for each hardware type configuration, so it can be seen when the hardware becomes underutilised.
For Figure 5 I'm a bit unsure if the timing for B1 as given in the example will include to the end of the K41 kernel, i.e. outside the end of B1.
The plots in Figure 6 are very hard to see as the points are large and the lines are close together and overlapping. The points are also all clustered around a small section of the graph, but the scale is much larger in X and especially Y, mainly to include the legend. I would suggest plotting each point on its own graph, making a large multi-figure plot, and zooming in as much as possible onto (0.1:100, 1:50,000) ranges to allow the relevant areas to be seen as clearly as possible. I would likewise do the same for Figure 7 by combining Figures 6 and 7 in a single multi-panel plot.
Figure 8 is also really interesting. Do you know the energy usage from the CPU runs?
Minor comments:
Line 17: I don't believe “System” should be capitalised here.
Line 26: The term GPU is used without being defined (this occurs on line 40). Also, the discussion here is around x86 hardware being combined with GPUs, Vector, or ARM systems. Note that superchip hardware, such as NVIDIA's Grace-Hopper (https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip) is an ARM-GPU system, i.e. there is no x86 CPU, where the CPU itself is instead ARM-based.
Line 345-6: I think this sentence needs to be rephrased: "Not only the ICON community is currently on the way to using current and upcoming Exascale systems for high-resolution simulations." - do you mean something like "ICON is not the only model that is currently on the way to using current and upcoming Exascale systems for high-resolution simulations."
Line 356-7: Perhaps I'm missing something, but wouldn't halving the horizontal resolution result in a 4-fold increase in necessary resources, rather than 8-fold? I'm assuming that the level structure remains unchanged.
Citation: https://doi.org/10.5194/gmd-2024-54-RC1 -
AC1: 'Reply on RC1', Panagiotis Adamidis, 21 Jun 2024
reply
Dear Reviewer,
please find attached our response to all of your remarks in the attached pdf file.
Best regards,
Panagiotis Adamidis on behalf of all Co-Authors
-
AC1: 'Reply on RC1', Panagiotis Adamidis, 21 Jun 2024
reply
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
533 | 129 | 15 | 677 | 8 | 9 |
- HTML: 533
- PDF: 129
- XML: 15
- Total: 677
- BibTeX: 8
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1