Concurrent calculation of radiative transfer in the atmospheric simulation in ECHAM-6.3.05p2

Heidari, Mohammad Reza; Song, Zhaoyang; Degregori, Enrico; Behrens, Jörg; Bockelmann, Hendryk

doi:https://doi.org/10.5194/gmd-14-7439-2021

Articles | Volume 14, issue 12

https://doi.org/10.5194/gmd-14-7439-2021

Articles | Volume 14, issue 12

Development and technical paper

06 Dec 2021

Development and technical paper |

| 06 Dec 2021

Concurrent calculation of radiative transfer in the atmospheric simulation in ECHAM-6.3.05p2

Mohammad Reza Heidari, Zhaoyang Song, Enrico Degregori, Jörg Behrens, and Hendryk Bockelmann

Abstract

The scalability of the atmospheric model ECHAM6 at low resolution, as used in palaeoclimate simulations, suffers from the limited number of grid points. As a consequence, the potential of current high-performance computing architectures cannot be used at full scale for such experiments, particularly within the available domain decomposition approach. Radiation calculations are a relatively expensive part of the atmospheric simulations, taking up to approximately 50 % or more of the total runtime. This current level of cost is achieved by calculating the radiative transfer only once in every 2 h of simulation. In response, we propose extending the available concurrency within the model further by running the radiation component in parallel with other atmospheric processes to improve scalability and performance. This paper introduces the concurrent radiation scheme in ECHAM6 and presents a thorough analysis of its impact on the performance of the model. It also evaluates the scientific results from such simulations. Our experiments show that ECHAM6 can achieve a speedup of over 1.9× using the concurrent radiation scheme. By performing a suite of stand-alone atmospheric experiments, we evaluate the influence of the concurrent radiation scheme on the scientific results. The simulated mean climate and internal climate variability by the concurrent radiation generally agree well with the classical radiation scheme, with minor improvements in the mean atmospheric circulation in the Southern Hemisphere and the atmospheric teleconnection to the Southern Annular Mode. This empirical study serves as a successful example that can stimulate research on other concurrent components in atmospheric modelling whenever scalability becomes challenging.

Download & links

Article (PDF, 14524 KB)

Supplement (10356 KB)

Download & links

Article (14524 KB)
Full-text XML
Supplement (10356 KB)
BibTeX
EndNote

How to cite.

Received: 16 Apr 2021 – Discussion started: 09 Jun 2021 – Revised: 18 Oct 2021 – Accepted: 12 Nov 2021 – Published: 06 Dec 2021

1 Introduction

Earth system modelling has traditionally been a computationally demanding domain with a continual increase in complexity and resolution. It is a major application of high-performance computing and represents a large variety of climate processes with diverse computational profiles and disparate performance optimization requirements. A primary subsystem of a typical Earth system model (ESM) is the atmospheric simulation, which resolves several physical processes including radiative transfer.

Radiative transfer is one of the most expensive parts for coarse- and low-resolution atmospheric simulations. This process is resolved to respond to the changing state of the chemical species which interact with the radiation (Balaji et al., 2016; Salby, 1996; Wallace and Hobbs, 2006). Solar energy is the driving force for the atmosphere through radiative transfer, which is the only physical process that is capable of exchanging energy between a planet like the Earth and the rest of the universe (Wallace and Hobbs, 2006). Energy transfer in the atmosphere involves electromagnetic radiation that can be separated into shortwave and longwave parts: shortwave, emitted by the sun, and longwave, emitted by the Earth's surface and the atmosphere (Wallace and Hobbs, 2006; Salby, 1996). There are several atmospheric processes – including greenhouse gases, aerosols, and clouds – that interact with electromagnetic radiation through the mechanisms of absorption, scattering, and emission. The level of interaction strongly depends on the state of the atmospherics particles (evolving by advection, cloud processes, and chemistry) and the optical properties (the wavelengths and intensity) of the incident radiation (Wallace and Hobbs, 2006).

In principle, the absorption of solar radiation by the atmosphere and the Earth's surface must be balanced by the longwave emission to space from the terrestrial radiation (Salby, 1996). It is crucial for atmospheric models to accurately represent the radiative transfer process (Rasp, 2019). Solving the problem is in essence straightforward (Wallace and Hobbs, 2006). However, this can be quite computationally demanding in practice, despite the simplifying approximations adopted in the radiation component (Balaji et al., 2016; Wallace and Hobbs, 2006). As a result, in most climate models around the world, this component is not called in every time step (Balaji et al., 2016). On the contrary, it is calculated at a coarser time step than the rest of the atmospheric physics, entirely pursuing a performance improvement rather than fulfilling any other technical objective (Balaji et al., 2016).

Over the years, various techniques have been used to represent radiative transfer in different models and maintain its calculation cost within acceptable limits. Morcrette (2000) discusses the effect of the temporal or spatial sampling techniques of the radiation inputs on the forecasts and analyses at the European Centre for Medium-Range Weather Forecasts (ECMWF). In a different approach, the radiative calculations are computed on a coarser grid than the one on which all other physical processes are implemented. Morcrette et al. (2008) report the implementation of a reduced grid for radiation in the ECMWF Integrated Forecasting System (IFS) model.

Resolving radiation transfer on coarser time and spatial resolutions, however, can lead to errors in weather and climate simulations. A report by Hogan and Bozzo (2015) describes a computationally efficient solution to this problem. It suggests updating the surface longwave and shortwave fluxes in every time step and grid point according to the local skin temperature and albedo. Hogan and Hirahara (2016) propose a careful treatment of the cosine of the solar zenith angle in models that calculate radiation every 3h. This solution aims at reducing the negative impacts of the biases that occur due to discrete sampling of solar zenith angle. A follow-up study by Hogan and Bozzo (2018) introduces a flexible new radiation scheme (ecRAD) in the ECMWF model which is around 41 % faster than the previous package. The report shows some improvements in the skill of weather forecasts by calling the radiation scheme more frequently for the same overall computational cost.

Another proposal is based on the re-organization of the radiation component within the atmospheric models. In this approach, the radiative transfer is calculated in separate tasks in parallel with the rest of the model, in pursuit of improved scalability and performance. In classical atmospheric modelling, all physical processes including solar radiation are resolved sequentially with respect to each other, thus creating a prohibitively long latency in the overall runtime. Re-organizing the radiation component as separate parallel tasks allows for the simultaneous calculation of radiative transfer along with the other atmospheric processes and removes the long response latency from the component. Mozdzynski and Morcrette (2014) report a re-organization of the radiation calculations in IFS at ECMWF and demonstrate the radiation-in-parallel configuration, in which calculating radiative transfer is performed on separate MPI (Message Passing Interface) processes in parallel with the rest of the model.

In a similar effort but in a broader sense, Balaji et al. (2016) propose coarse-grained component concurrency (CCC) to increase the level of concurrency within ESMs. This approach suggests re-organizing more lower- and higher-level components in parallel with each other and exploiting fine-grained parallelism within each component individually. Additionally, the report demonstrates the result of applying this approach to the radiation component of the Geophysical Fluid Dynamics Laboratory (GFDL) Flexible Modeling System (Balaji, 2004). In this use case, the atmospheric radiative transfer is configured to run in parallel with the atmospheric dynamics and all other atmospheric physics. This technique uses shared-memory parallelism and divides the available threads in an OpenMP (Open Multi-Processing) region between the parallel components.

This paper concentrates on the performance optimization of the radiation scheme in ECHAM6, which is the sixth generation of the atmospheric general circulation ECHAM (Stevens et al., 2013). The model was developed at the Max Plank Institute for Meteorology (MPI-M) in Hamburg. It is the traditional atmospheric component of the coupled Earth system model MPI–ESM, as described by Giorgetta et al. (2013). ECHAM6 benefits from spectral and finite-difference methods in five different grid resolutions, ranging from the coarse (CR) and low resolution (LR) to the very high resolution (XR). The CR or T31 is a truncation to 31 wave numbers in the spectral part and corresponds to a horizontal spatial resolution of 96 × 48 points in longitude and latitude. The LR or T63, however, corresponds to 192 × 96 points. Its special prominence in this research is due to its application in the German climate modelling initiative (PalMod), which aims at simulation of a complete glacial cycle (i.e. about 120 000 years) from the last interglacial to the Anthropocene (https://www.palmod.de, last access: 30 November 2021). However, a serious caveat remains as to the feasibility of such an ambitious project, which should be acknowledged in advance. In particular, a major concern has been raised over the poor performance of ECHAM6 suffering from the limited number of grid points at setups used in palaeoclimate simulation. For this reason, the performance optimization of the model is instrumental in ensuring the viability of such long-time simulations.

Experiments reveal that the radiation component is one of the most expensive computational parts in ECHAM6, at least for palaeoclimate simulation. Endeavours to adopt higher optimized radiative calculations will therefore be opportune for PalMod experiments. In response, two solutions have been investigated in parallel within the PalMod project to alleviate the computational burden of the radiative transfer on the atmospheric simulations in ECHAM6: single-precision arithmetic and the concurrent radiation scheme. The technique of mixed-precision calculations is a practice of reducing the time to solution of scientific algorithms whenever lower accuracy is permitted. Empirical studies have revealed that radiation calculations can benefit from reduced-precision arithmetic. Cotronei and Slawig (2020) discuss the results of applying single-precision arithmetic to the radiation calculations in ECHAM6. They indicate that this mathematical treatment accelerates the component by about 40 %, while the overall runtime of the model is reduced by 18 %.

This paper, on the other hand, presents a report on the concurrent radiation scheme applied to the atmospheric model ECHAM6 and provides a thorough analysis of the performance and stability of the model. Calculating radiative transfer in parallel with other atmospheric processes can potentially affect the model's accuracy since the radiation fields will always lag one more radiation time step behind in comparison with the classical scheme. This lag may have negative impacts on physical processes that benefit from a tighter coupling in time with radiation. The boundary layer clouds, particularly stratocumulus, are a good example. They are maintained by longwave cooling at cloud tops once they are formed. This could explain why Hogan and Bozzo (2018) found that calling radiation more frequently led to more skilful forecasts of near-surface temperature and low cloud cover.

In contrast to the OpenMP approach used by Balaji et al. (2016), the concurrent radiation scheme opts for the MPI parallelization in order to fully exploit the potential of higher concurrency in the model. In addition, encapsulating the radiation calculations in a distinct component and name space realizes the idea of separation of concerns (SoC), which results in more degrees of freedom. An immediate benefit includes the independent development and optimization of the radiation component from the main model pursuing higher throughput, which is essential for the ambitious long simulation runs of the PalMod project. This architectural merit also enables the potential of combining the virtues of the concurrent radiation scheme with other appropriate optimized solutions such as “single-precision arithmetic in ECHAM radiation” (Cotronei and Slawig, 2020) in the future.

The paper is organized as follows: Sect. 2 describes the classical approach to the radiation calculations in ECHAM6. In Sect. 3, we introduce the solution of a new radiation scheme in the model. Finally, the performance analysis and the scientific evaluation of the new scheme are presented in Sects. 4 and 5, respectively.

2 The classical radiation scheme in ECHAM6

Radiative transfer in ECHAM6 is represented with PSrad/RRTMG (a postscript to the Rapid Radiative Transfer Model for general circulation models – GCMs; Pincus and Stevens, 2013) for both shortwave and longwave parts of the electromagnetic spectrum (Stevens et al., 2013). The radiation component is one of the most expensive components within atmospheric physics. As a result, this component is stepped forward at a slower rate than the rest of the atmospheric physics in ECAHM6 as well as most of the climate models around the world (Balaji et al., 2016). In its flagship configuration, ECHAM6 updates the optical properties of radiation every 2 h, except for very high-resolution (T255) simulations in which the radiation is calculated hourly. Figure 1 shows the organization of the classical radiation scheme in ECHAM6. As is apparent in this figure, the radiative transfer is generally calculated once in every n normal atmospheric time steps, i.e. $Δ t_{rad} = n \times Δ t_{atm}$ .

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f01

Figure 1The organization of the classical radiation scheme in ECHAM6: the radiative transfer is resolved sequentially with respect to the other atmospheric physics and dynamics, and it is stepped forward at a slower rate than the other atmospheric processes.

Download

In this scheme, the radiation results are used beyond the state of the input tracers, which may be as much as Δt_rad−Δt_atm (approximately 2 h) behind. Figure 1 schematically gives a clear account of what happens. As it shows, there are multiple normal atmospheric time steps between two consecutive radiation time steps. The atmospheric calculations are provided with old feedback from the radiation component within the normal time steps. On non-radiation time steps between updates of the optical properties, longwave irradiance is rescaled based on the surface temperature, while shortwave irradiance is rescaled by the zenith angle (Stevens et al., 2013). Infrequent calculations of the radiative heating may result in numerical instability in climate models, as described by Pauluis and Emanuel (2004). As pointed out by Balaji et al. (2016), the use of the lagged state can be viewed as a potential source of discrepancy between the cloud field and the “cloud shadow field” seen by the radiation component. This therefore introduces numerical errors in atmospheric models and becomes considerably worse at higher resolutions (Xu and Randall, 1995; Balaji et al., 2016). Although the choice of larger radiation time steps (Δt_rad>Δt_atm) evidently reduces the overall runtime of simulations, the radiative portion is still considered relatively high for some configurations in ECHAM6. As shown in Fig. 2, the radiative calculations take up almost 40 % to 58 % of the total simulation time at the CR resolution (which is also one of the PalMOD settings), depending on the number of MPI processes assigned to the model.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f02

Figure 2The relative time contribution of radiation calculations in ECHAM6 (using the classical radiation scheme) in the CR simulations.

Download

The entire crux of the problem in the classical radiation scheme can be attributed to the sequential organization of the components inside the model. In this architecture, atmospheric processes are stepped forward one at a time in every time step, and the computation time of each component directly contributes to the overall simulation time. In particular, the radiation component significantly delays the following calculation of other atmospheric physics and dynamics during the entire course of simulation. As is apparent in Fig. 1, this architecture prolongs the radiation time step in proportion to the high computational cost of the radiation component. It will be shown in the next section that this long response time of the radiation calculations, however, is not inevitable and can be avoided by re-organizing the component inside the model.

Moreover, the sequential organization of components in the classical ECHAM6 creates another obstacle that hinders the optimization of the model. In fact, ECHAM6 traditionally benefits from MPI and implements domain decomposition parallelism to expedite the computations. As Fig. 3 shows, the radiation calculations display a higher scalability than the main model in this framework. This can be attributed to the column-wise organization of the atmospheric physics and the embarrassingly parallel nature of the workload. In other words, since the individual columns (iterating the k index in an $(i, j, k)$ discretization) have no cross-dependency in (i,j), it allows for fine-grained parallelism (Balaji et al., 2016). This is the reason why the radiation component can intrinsically scale better beyond the limitations of the main model. Such a higher scalability is instrumental in reducing the high computational cost of the radiation calculations. However, the sequential architecture of the classical ECHAM6 restrains the benefit by forcing the radiation component to use the same computational resources as the rest of the model. As a consequence, the component is hindered by the limited scalability of the whole model. In the next section, however, it will be shown that re-organization of the components in ECHAM6 is essential to improve the overall performance of the model.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f03

Figure 3The scaling curve of the radiation component vs. the atmospheric model ECHAM6 shows that the radiative transfer has a higher scalability and keeps scaling beyond the limitations of the main model.

Download

3 The concurrent radiation scheme

It was discussed in the previous section that, even in light of larger radiation time steps, radiation calculations impose a daunting cost on the atmospheric simulation at coarse and low resolution in ECHAM6. It was also shown that the sequential treatment of resolving atmospheric processes is implicated in the long response time of the radiation component in every radiation time step and restricts the benefit of the higher scalability of the radiation calculations.

The concurrent radiation scheme, on the other hand, puts forward a feasible solution to the problem. It implements an additional level of parallelism inside the model by applying coarse-grained component concurrency to the radiation calculations. This approach eliminates the high response latency from the radiation time step and paves the way for higher scalability in the model. In contrast to the classical scheme, the concurrent radiation scheme starts resolving the radiative transfer much earlier before the next radiation time step arrives. As a result, the main model receives feedback from the radiation component much faster upon the request. This technique minimizes the response latency of the component and reduces the overall simulation time. In this approach, the radiative transfer is calculated concurrently with other atmospheric processes along the course of normal time steps. The coupling fields are also exchanged between the radiation component and the main model within the radiation time steps but without the typical delay experienced in the classical scheme. Figure 4 describes a method for casting the radiative transfer as a concurrent component using distributed-memory computing. This technique organizes the radiation component and the main model on separate MPI processes and enables the concurrent calculation of the radiative transfer and other atmospheric processes.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f04

Figure 4The re-organization of the radiation component in parallel with the rest of the atmospheric physics and dynamics in ECHAM6. In the first (radiation) time step, ATM (the main model) sends the input data to RAD (the radiation component), but it has to wait a long time until it receives the results from RAD. In the following radiation time steps, however, the data exchange takes place immediately one after the other (i.e. ATM first receives the results of radiation calculations and immediately provides the input data to RAD for the next radiation calculations). This way, ATM is supposed to experience a minimum idle time when it interacts with RAD.

Download

Due to the close time dependency of radiation on evolving model fields, the concurrency between the atmosphere and radiation component is only enacted between consecutive radiation time steps. In addition, the synchronization between the concurrent components takes place during each radiation time step. As is shown in the next section, the interprocess synchronization overhead, however, is negligible compared to the cost of the radiation calculations. The data exchange between the radiation and the atmospheric model benefits from the communication library YAXT (Behrens et al., 2014), which was developed at the German Climate Computing Center (DKRZ) in Hamburg. YAXT simplifies the formulation of the communication problem and generates suitable communication objects to efficiently execute the data exchange. The library is specially suited for bulk communication as given in this use case. This is due to the automatic generation of MPI data types that enable direct access to model data without requiring additional data copies or packing and unpacking overhead to create messages. YAXT is built on top of MPI, takes high-level descriptions of arbitrary domain decomposition, and automatically derives an efficient collective data exchange. Figure 5 shows the use of the YAXT library for coupling two concurrent components with different domain decomposition.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f05

Figure 5The YAXT library facilitates MPI communication between concurrent components with different domain decomposition layouts.

Download

In addition, running the concurrent components on separate MPI processes is a prerequisite to applying arbitrary domain decomposition to the radiation calculations. It offers a way forward to scale the component beyond the limits traditionally imposed by the atmospheric model. Hence, the concurrent radiation scheme prepares the ground for improving the physical consistency between the radiative and physicochemical atmospheric states. This approach can accordingly minimize the discrepancy between Δt_rad and Δt_atm (Pauluis and Emanuel, 2004; Xu and Randall, 1995; Balaji et al., 2016). Furthermore, the independent resource allocation feature is essential for an efficient load balancing between concurrent components and parallel efficiency of the model. The next section presents these merits in more detail with some concrete examples.

4 Results I: performance analysis

This section presents the performance evaluation of the concurrent radiation scheme in ECHAM6 in comparison with the classical approach. It should be emphasized that the new scheme utilizes almost the same original implementation of radiation calculations with a radically different orchestration. The new organization therefore exerts a major impact on the overall simulation time rather than the pure computational performance of the radiation component. In consequence, the performance evaluation presented in this section explicitly aims at the assessment of the whole model and will not be limited to the radiation component. For the purpose of this study, a new version of ECHAM6 (based on ECHAM-6.3.05p2) is deployed with both classical and concurrent radiation schemes which can be configured to calculate the radiative transfer with or without separate MPI processes. Experiments are performed on the Mistral supercomputer at DKRZ on a machine configuration with Intel Haswell processors (E5-2680v3 12C 2.5 GHz) and a Mellanox FDR Infiniband high-speed interconnect. All runs allocate a layout of one MPI process per CPU core on the computing nodes equipped with two processors which are exclusively dedicated to the experiments. The performance results presented in this paper are obtained from the Atmospheric Model Intercomparison Project (AMIP) experiments performing simulations at the CR from 1976 to 1981.

To understand the impact of the concurrent radiation scheme on the overall performance, it is useful to extract the scaling curves of the model with both radiation schemes and study the gained speedup via the concurrent radiation scheme. For this purpose, the frequency at which the radiative transfer is calculated is by default every 2 h in all runs, i.e. $Δ t_{rad} = 8 \times Δ t_{atm}$ with the normal time step of 15 min. Additionally, once the model is configured to use the concurrent radiation scheme, an equal number of MPI processes, and thus identical domain decomposition, is assigned to the radiation component and the main model.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f06

Figure 6The scaling curves of ECHAM6 using the classical and concurrent radiation schemes.

Download

Figure 6 compares two scaling curves which reflect the performance of the model with the classical (the blue curve) and concurrent radiation schemes (the red curve). The horizontal axis shows the total number of MPI processes allocated by the model. It is worth emphasizing that ECHAM6 (using the classical radiation scheme) uses the same MPI processes to calculate radiative transfer and other atmospheric processes. However, when the model is configured to use the concurrent radiation scheme, half of the allocated MPI processes are exclusively dedicated to the radiation calculations and the other half to the rest of the atmospheric physics and dynamics. The y axis, on the other hand, reflects the throughput of the model in terms of the number of simulated years per day (SYPD). As can be inferred from Fig. 6, ECHAM6 can achieve only a maximum performance of 450 SYPD at 576 MPI processes using the classical radiation scheme at the CR resolution. However, it yields a significant improvement using the concurrent radiation scheme and reaches a maximum performance of 734 SYPD at 1152 MPI processes. It is noteworthy that, due to the limited number of grid points at the CR resolution, running the classical model at higher domain decomposition is not justified theoretically. Needless to say, it does not attain any significant performance improvement in practice either, as asserted in Fig. 3 where the scaling curve of ECHAM6 tends to flatten towards the end. This should explain why the blue curve in Fig. 6 stops at 576 MPI processes as opposed to the red curve scaling beyond. On this account, the concurrent radiation scheme acquires a new significance as it becomes conducive to higher scalability of the model.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f07

Figure 7The red curve shows the methodical speedup of ECHAM6 using the concurrent radiation scheme. The blue curve shows the methodical speedup that the model would achieve asymptotically.

Download

The red curve in Fig. 7 displays the methodical speedup of the model using the concurrent radiation scheme. Here, the methodical speedup means improved runtime of the model by making use of the proposed concurrency, in contrast to the classical definition of speedup, whereby additional resources are used for the same computation. The methodical speedup is therefore the ratio of the overall performance of the model using the concurrent radiation scheme (using 2N resources) divided by the performance of the model using the classical radiation scheme (using N resources). The methodical speedup S is defined as follows.

\begin{matrix} (1) & \begin{aligned} {SYPD}_{concurrent} = & The number of simulated years per \\ day with the concurrent radiation \\ scheme (using 2 N MPI processes) \end{aligned} \\ (2) & \begin{aligned} {SYPD}_{classical} = & The number of simulated years per day \\ with the classical radiation scheme \\ (using N MPI processes) \end{aligned} \\ (3) & Methodical speedup = S = \frac{{SYPD}_{concurrent}}{{SYPD}_{classical}} \end{matrix}

On this account, for each point on the speedup curve(s), the number of resources assigned to the model using the classical radiation scheme is half of the resources allocated by the model using the concurrent radiation scheme. Hence, the x axis indicates only the total number of allocated MPI processes to the model if the concurrent radiation scheme is used by the model. However, the model allocates half of the MPI processes shown at the x axis when it adopts the classical radiation scheme. The red curve shows that the model achieves an actual speedup ranging from a minimum of 1.6× to over 1.9× when it benefits from the concurrent radiation scheme. The blue curve, however, shows the asymptotic speedup which the model would achieve if there were no cross-dependency, and thus no communication latency, between the radiation component and the main model.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f08

Figure 8Comparing the communication costs of data exchange between the radiation component and the main model to the computation time of the radiative transfer (the red curve) and other atmospheric processes (the blue curve) as well as to the overall simulation time (the green curve).

Download

Moreover, the coupling fields between the radiation component and the rest of the model are exchanged at every radiation time step and potentially contribute to the overall simulation time. There is nevertheless mounting evidence in Fig. 8 indicating that the communication overhead compared to both radiation and total runtime of the model is negligible (less than 1.4 %) and therefore has little impact on the total simulation time. Consequently, the performance of the model (using concurrent radiation scheme) is mainly affected by the relative cost of the radiation calculations, which has a strong dependency on the number of allocated computing resources.

Figure 9 shows the efficiency of the resource utilization when the model runs the concurrent radiation scheme. The parallel efficiency of the concurrent radiation scheme is defined as the ratio of the methodical speedup S to the relative number of allocated resources R, as shown below.

\begin{matrix} (4) & \begin{aligned} Ratio of allocated resources = R \\ = \frac{\begin{matrix} (resources assigned to the concurrent \\ radiation scheme) \end{matrix}}{\begin{matrix} (resources assigned to the classical \\ radiation scheme) \end{matrix}} \\ = \frac{2 N}{N} = 2 \end{aligned} \\ (5) & Parallel efficiency = E = \frac{S}{R} = \frac{S}{2} \end{matrix}

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f09

Figure 9The parallel efficiency of ECHAM6 (using the concurrent radiation scheme).

Download

As is visible in the figure, the model achieves a parallel efficiency of 80 % or more across the scaling curve. However, attaining the maximum parallel efficiency requires an optimal distribution of workload among MPI processes. A close investigation reveals a load imbalance between the concurrent components inside the model. This problem starts appearing when the radiation and other atmospheric calculations are configured to use identical domain decomposition and allocate an equal number of MPI processes. The measured parallel efficiency in Fig. 9 suggests that the MPI processes assigned to the radiation calculations experience an idle time during the course of simulation.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f10

Figure 10The red curve (RAD) shows the increasing idle time of the MPI processes responsible for resolving radiative transfer. It suggests that RAD is a dominant computation in all the configurations before 576 MPI processes and does not wait for ATM. The blue curve (ATM), on the other hand, shows that resolving other atmospheric physics and dynamics is not a dominant computation for all the configurations before 576 MPI processes, and ATM therefore experiences a long idle time. However, ATM becomes dominant towards the end, inflicting a long idle time on RAD. Yet the idle time of ATM is not lifted completely. This can be attributed mainly to the unavoidable waiting time of ATM in the first (radiation) time step (as reflected in Fig. 4) and some infrequent (slightly) longer radiation time steps.

Download

Figure 10 depicts the length of time that the concurrent components have to remain idle while waiting for the slower component to catch up and become ready to exchange the radiation results. The red and blue curves show the respective idle times that the radiation component (denoted by RAD) and other atmospheric processes handled by the main model (denoted by ATM) experience. The idle times appear when the radiative transfer is resolved faster or slower than the other atmospheric calculations. At lower numbers of MPI processes, as shown in Fig. 10, calculating the radiative transfer takes longer and thus forces ATM to wait a relatively long time in each radiation time step for feedback from RAD. At higher numbers of MPI processes, however, RAD scales better and finishes the calculations faster than ATM. It therefore has to wait for the arrival of the next radiation time step so that the radiation results can be transferred to the main model. It should be noted that Fig. 10 shows the maximum length of time that an MPI process (assigned to RAD or ATM) has to wait for its peer MPI process (assigned to the other component) until it catches up. From Fig. 10, it can reasonably be inferred that the total idle time experienced by ATM and RAD becomes minimum at 384 MPI processes. This can also explain why the parallel efficiency in Fig. 9 has an extremum at this point. It is also apparent in Fig. 10 that the radiation component experiences a longer idle time as the number of MPI processes increases. This behaviour accounts for the higher scalability of the radiation component, which was already reflected in Fig. 3.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f11

Figure 11A contrived model configuration in which the same domain decomposition is assigned to the radiation calculations and the main model. Here, this setup imposes an idle time on the radiation component, leading to an inefficient resource usage.

Download

The load imbalance between the concurrent processes is directly affected by the number of MPI processes assigned to the radiation component as well as the frequency at which the radiative transfer is calculated. Figure 11 schematically illustrates a contrived configuration in which the radiation component is forced to remain idle almost half of the total runtime. To remove such an idle time, Fig. 12 provides an inspiring example. It takes advantage of the higher scalability of the radiation component and reduces the radiation time step to half. This new configuration eliminates the idle time from the MPI processes assigned to the radiation component and hence removes the load imbalance between the concurrent components successfully. As a consequence, the resource utilization of the model improves significantly and attains a parallel efficiency close to 100 % without affecting the achieved speedup. This example presents a viable solution to decrease the gap between Δt_rad and Δt_atm in pursuit of a more consistent atmospheric model.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f12

Figure 12A model configuration for assigning identical domain decomposition to the radiation component and the main model but reducing the radiation time step to half ( $Δ t_{rad} = n / 2 \times Δ t_{atm}$ ), hence improving the parallel efficiency of the model.

Download

The concurrent radiation scheme, however, puts forward a general solution to remove the load imbalance between the radiation component and the main model. This solution provides a remedy for the idle time imposed on the main model at some configurations (such as 48, 96, 192, 288, or 384 MPI processes as shown in Fig. 10) which exhibit a suboptimal resource efficiency due to the slow calculation of radiative transfer. In this approach, the radiation component is enabled to adopt finer domain decomposition and allocates a higher number of resources (in comparison to the main model) in order to catch up with the fast calculation of other atmospheric processes. By the same token, Fig. 13 suggests a configuration in which the radiation component adopts coarser domain decomposition and allocates a lower number of MPI processes compared to the main model. This arrangement is also a remedy to remove the load imbalance at the configurations (such as 576, 768, and 1024 MPI processes as shown in Fig. 10) in which the radiation component experiences a long idle time due to the slow calculation of other atmospheric processes.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f13

Figure 13A model configuration for assigning coarser domain decomposition to the radiation calculations. This setup allows for allocating a lower number of MPI processes to the radiation component and hence improves the parallel efficiency of the model.

Download

In addition, the concurrent radiation scheme offers an opportunity for coupling the radiation component to the other atmospheric processes at every normal time step (i.e. Δt_rad=Δt_atm). This feature can ultimately bring the model to physical consistency between the radiative and physicochemical atmospheric states, although probably with a negligible impact on the model's accuracy. It is notable that the current implementation of the concurrent radiation scheme in ECHAM6 already provides the technical support for the adoption of finer or coarser domain decomposition for the radiation calculations. In particular, the YAXT library simplifies the data exchange between the concurrent components with disparate domain decomposition. The scientific viability of these schemes, however, requires further investigations, and the results will be presented in a follow-up paper.

5 Results II: scientific evaluation

We evaluate the concurrent radiation scheme in ECHAM-6.3.05 at the CR and LR configurations, hereafter termed as CR_CRR and LR_CRR, which differ in their horizontal grid spacings. Both CR_CRR and LR_CRR share the vertical resolution of 47 levels. The classical radiation scheme has been tuned to optimize the simulated climate (Mauritsen et al., 2019). However, the concurrent radiation scheme at these two configurations has not been individually tuned. The parameters of the convection scheme, e.g. the convection conversion rate for cloud water to rain in the concurrent radiation scheme, are the same as in the classical scheme. The evaluation and documentation of the concurrent radiation scheme in ECHAM-6.3 are based on the Atmospheric Model Intercomparison Project (AMIP) historical experiments. Experiments were performed according to the AMIP protocol (Eyring et al., 2016). The historical forcings include the emissions of short-lived species and long-lived greenhouse gases, solar forcing, stratospheric aerosol forcing, monthly sea surface temperatures, and sea ice concentrations. Experiments span from 1960 to 2013. The monthly output from 1980 to 2013 is taken for analysis. AMIP experiments performed with the classical ECHAM-6.3.05, as well as the observations and reanalysis, serves as a reference. Observational and reanalysis data used in this paper are listed in Table 1.

Jones et al. (2012)Adler et al. (2003)Loeb et al. (2012)Loeb et al. (2012)Dee et al. (2011)Dee et al. (2011)Dee et al. (2011)

Table 1Observation and reanalysis data for model evaluation.

Download Print Version | Download XLSX

5.1 Mean state

We first evaluate the simulated mean climate by the concurrent radiation scheme. CRUTEM4 observations are interpolated to the model grids for the comparison against simulated surface air temperature (SAT). Across all configurations, global annual bias in SAT over land exhibits similar spatial patterns (Figs. 14a, c and S1a, c in the Supplement). A general warm bias occurs over eastern Siberia, central Asia, the Tibetan Plateau, North America, and central Australia, while sporadic cold biases exist in south Asia, Africa, and northern South America. For a rough assessment, global root mean square error (RMSE) is calculated for SAT. Horizontal resolution exhibits a larger impact on the SAT bias than the radiation scheme. Both radiation schemes show a smaller SAT bias in LR relative to CR (Figs. 14a, c and S1a, c). The concurrent radiation scheme illustrates a somewhat ambiguous reduction in the annual SAT bias, although it is insignificant at the 95 % confidence interval (Fig. 14e).

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f14

Figure 14Annual bias in surface air temperature (SAT) over land (K) and total precipitation (mm d⁻¹) in the (a, b) concurrent and (c, d) classical radiation experiments relative to the Climatic Research Unit time series 4.01 data set (CRU TS; Harris et al., 2014) and the Global Precipitation Climatology Project data set v2.3 (GPCP; Adler et al., 2003) for the period 1980–2013, respectively. Differences in the (e) SAT over land and (f) total precipitation between the concurrent and classical radiation scheme for the period 1980–2013. Hatching indicates significance at the 95 % confidence interval using the Student's t test.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f15

Figure 15Annual bias in shortwave cloud radiative forcing (CRF), longwave CRF, and net shortwave radiation fluxes (W m⁻²) at the top of the atmosphere (TOA) in the (a, b, c) concurrent and (d, e, f) classical radiation experiments relative to data from Clouds and the Earth's Radiant Energy System Energy Balanced and Filled product (CERES-EBAF) surface fluxes edition 4.0 for the period 2001–2013. Differences in the (g) shortwave CRF, (h) longwave CRF, and (i) net shortwave radiation at TOA between the concurrent and classical radiation scheme for the period 2001–2013. Hatching indicates significance at the 95 % confidence interval using the Student's t test.

All experiments suffer from severe precipitation bias over the tropics compared to the GPCP satellite measurements (Figs. 14b, d and S1b, d). Specifically, excessive precipitation is simulated over the tropical Indian Ocean, western tropical Pacific, and western tropical Atlantic Ocean, while deficient precipitation occurs over the east tropical Pacific Ocean, South America, and the eastern tropical Atlantic Ocean. Consistent with the response of SAT bias, the global RMSE decreases along with a refinement of horizontal resolution, whereas the concurrent radiation scheme leads to larger bias in the precipitation. Particularly, the precipitation bias over the equatorial region of South America and western equatorial Africa increases in the concurrent radiation (Figs. 14f and S1f), which is likely linked to the intensification of local SAT bias (Figs. 14e and S1e). Clouds play a complex and crucial role in the Earth's radiation budget, which significantly affects the simulated surface climate. To understand the biases in SAT, we evaluate the errors in cloud feedback by estimating cloud radiative forcing (CRF) following (Cess et al., 1990):

\begin{matrix} (6) & CRF = F_{all-sky} - F_{clear-sky}, \end{matrix}

where F_all-sky refers to radiative flux at the top of the atmosphere (TOA) and F_clear-sky represents the radiative flux assuming the absence of clouds. CRF is positive when clouds warm the surface atmosphere and vice versa. The radiative effect of clouds consists of two competing components in the radiation budget: warming impact on the surface through the emission of longwave (LW) radiation and cooling impact by shading for the shortwave (SW) radiation. The sign and spatial structure of CRF bias are largely determined by the SW CRF bias, which is partially compensated for by the LW CRF bias (Figs. 15 and S2). CRF exhibits a larger bias in CR relative to LR with both radiation schemes (Figs. 15a, b, d, e, and S2a, b, d, e). In CR_CRR and CR_SEQ, strong negative SW CRF biases exist over regions of upwelling and the western coast of Australia, whereas strong positive CRF biases occur over the Arctic, the Southern Ocean, the eastern tropical Indian Ocean, the northern equatorial Pacific, and Atlantic Ocean. A refinement of horizontal resolution largely alleviates negative CRF biases over the upwelling regions, whereas the CRF bias over other regions barely changes. Over the tropics, the concurrent radiation scheme shows a slight increase in the bias compared to the classical radiation scheme (Figs. 15g, h and S2g, h).

Increased CRF bias in the concurrent radiation scheme is largely attributed to the response of SW CRF errors (Figs. 15g and S2g), with a negligible contribution from the errors in LW CRF (Figs. 15h and S2h). Across all experiments, there are widespread discrepancies between the spatial distribution of biases in the CRF and SAT except over the Tibetan Plateau, where a positive CRF bias agrees well with the warm SAT bias. To further explore such inconsistency, biases in the net SW radiation at TOA are estimated (Figs. 15c, f and S2c, f). SAT biases over North America, the Tibetan Plateau, and central Australia are associated with the excessive net SW radiation at TOA, resulting from the negative biases of the surface albedo (not shown). In contrast, warm SAT biases over eastern Siberia and tropical South America exist along with deficient SW radiation biases, which suggests a dynamical cause for the biased SAT. A misrepresentation of the atmospheric circulation in ECHAM-6.3.05 may be responsible for this discrepancy. The concurrent radiation scheme exhibits strong negative biases in the net SW radiation at TOA in the tropics relative to the classical radiation scheme (Figs. 15i and S2i). This implies that model parameters concerning the cloud formation should be carefully adjusted for the concurrent radiation scheme. Specifically, the relative humidity threshold for cloud formation in the upper troposphere and the lowest model level should be changed to improve the match with observational records (Mauritsen and Roeckner, 2020).

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f16

Figure 16Mean bias in zonal-mean temperature (K) and zonal wind (m s⁻¹) in the (a, b) concurrent and (c, d) classical radiation experiments relative to ERA-Interim for the period 1980–2013 (shading) and the climatological mean from ERA-Interim (contours). Differences in (e) zonal-mean temperature and (f) zonal wind between the concurrent and classical radiation for the period 1980–2013. Hatching indicates significance at the 95 % confidence interval using the Student's t test.

Download

The zonal-mean temperature biases in the troposphere are smaller than in the stratosphere for all experiments (Figs. 16 and S3). In the lower troposphere (between 850 and 1000 hPa), a cold bias amounting to −4 K occurs over the Antarctic for all experiments. Thermal biases in the mid-troposphere are relatively small. Moderate warm biases up to 2 K extend from the tropics to the Arctic. In the lower stratosphere (between 250 and 100 hPa), all experiments suffer from prominent warm biases over the tropics and mid-latitudes, as well as severe cold biases over the high latitudes in both hemispheres. Such biases are significantly reduced in LR (Fig. S3a and c) relative to CR (Fig. 16a and c) experiments, consistent with the notion by Stevens et al. (2013) that zonal-mean temperature bias can be significantly reduced by enhancing the horizontal resolution. The concurrent radiation scheme barely affects the biases in zonal-mean temperature in CR simulations, yet it slightly reduces the biases in the lower stratosphere between 40 and 90^∘ S (Figs. 16e and S3e).

The patterns of zonal-mean westerly wind biases are common to all configurations (Figs. 16b, d and S3b, d) and reflect the meridional structure of temperature biases (Figs. 16a, c and S3a, c). In CR_CRR and CR_SEQ, temperature biases drive a northward shift of westerly winds between 30 and 60^∘ S. LR_CRR and LR_SEQ exhibit more alleviated westerly wind biases than their CR counterpart due to reduced temperature bias. In the tropics, westerly wind biases are characterized by overly strong easterly biases in the low and mid-troposphere and large westerly biases in the upper troposphere. The concurrent radiation scheme intensifies the easterly wind biases over the tropics in CR and LR (Figs. 16f and S3f). As suggested by Stevens et al. (2013), the tropical bias pattern is an indication of excessive heating associated with the deep convection.

5.2 Climate variability

To explore the simulated climate variability by the concurrent radiation, we present an analysis of the El Niño–Southern Oscillation (ENSO) teleconnection and interannual variability in the extratropics in the form of the Northern and Southern Annular Mode.

5.2.1 ENSO feedbacks and teleconnections

The El Niño–Southern Oscillation (ENSO) is the leading mode of interannual variability in the tropical Pacific. ENSO is mainly characterized by variations of sea surface temperature (SST) in the eastern and central equatorial Pacific. Multiple negative and positive coupled atmosphere–ocean processes that either favour or suppress the growth of SST anomalies govern the ENSO behaviour (Philander, 1989; Neelin, 1998; Jin et al., 2006). Equatorial Pacific SST anomalies associated with ENSO can affect the tropical convection and result in zonal shifts of the Walker circulation (Philander, 1989; Bayr et al., 2014). Further, the changes in convection stimulate Rossby waves that propagates to the middle and high latitudes through the atmospheric bridge. Many studies (Guilyardi et al., 2004; Guilyardi, 2006; Guilyardi et al., 2009) have shown that the atmospheric model dominates the simulated ENSO properties in coupled climate models. Therefore, it is crucial to evaluate the ENSO teleconnections that project the influence of ENSO globally.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f17

Figure 17Covariance (cov(mm d⁻¹, mm d⁻¹)) of DJF (December–January–February) precipitation anomalies with the normalized time series of precipitation anomaly in the Niño4 region for (a) GPCP observations as well as the (b) concurrent and (c) classical radiation experiments. Covariance (cov(m, mm d⁻¹)) of DJF 500 hPa geopotential height anomalies in the Northern Hemisphere with the normalized time series of precipitation anomaly in the Niño4 region for (d) ERA-Interim, (e) concurrent, and (f) classical radiation experiments.

The covariance between the global DJF (December–January–February) precipitation anomalies and the DJF precipitation anomalies over the Niño4 region is shown in Figs. 17 and S4. CR_SEQ and LR_SEQ typically capture the pattern over the tropical Pacific, tropical Atlantic, and North America (Figs. 17c and S4c) as indicated by the GPCP observations. The concurrent radiation scheme exhibits similar teleconnection patterns as the classical radiation scheme, yet it influences the magnitude of the response to ENSO. CR_CRR underestimates the magnitude of atmospheric teleconnection over the Maritime Continent (Fig. 17b), whereas LR_CRR overestimates the teleconnection over this region (Fig. S4b). Additionally, CR_CRR and LR_CRR exhibit an artificially positive response in the southern tropical Indian Ocean relative to their counterpart using the classical radiation scheme.

Covariance of DJF geopotential height anomalies at 500 hPa (Z500) and normalized DJF precipitation anomalies over the Niño4 region are calculated to investigate the diabatic forcing of the tropical Pacific on the boreal winter atmospheric circulation in the Northern Hemisphere (Figs. 17 and S4). ERA-Interim depicts positive covariance over Canada and southern Greenland and negative relations over the northeast Pacific, north Atlantic, and Siberia. CR_SEQ is able to simulate the pattern over the northeast Pacific, Canada, and Greenland. However, it fails to capture the component over Siberia, and the signal over the North Atlantic is shifted northeastward. Increasing the horizontal resolution to LR shows no improvement for the teleconnection pattern with the classical radiation scheme. CR_CCR retains the main features of ENSO teleconnection as shown by CR_SEQ, whereas LR_CCR exhibits a weak response of atmospheric circulation to the diabatic forcing over the tropical Pacific compared to ERA-Interim (Fig. 17d). This implies that a tuning for the convection scheme in the concurrent radiation scheme is required.

5.2.2 Northern Annular Mode

The Northern and Southern Hemisphere Annular Mode (NAM and SAM, also known as the Arctic and Antarctic Oscillation) are the leading modes of variability of the extratropical circulation in both hemispheres. Both annular modes explain the month-to-month and year-to-year (especially the cold season) variability of the atmospheric circulation, which exhibit pronounced impacts on the climate in the mid-latitudes and the polar region (McAfee and Russell, 2008; Kidston et al., 2009). The NAM is defined as the leading empirical orthogonal function (EOF) of the DJF sea level pressure (SLP) in the Northern Hemisphere. As shown by the ERA-Interim reanalysis data, the NAM is characterized by zonally symmetric structures (Fig. 18a). There are large and positive loadings over the polar cap region, surrounded by zonally ring-shaped negative loadings centred over the northeast Pacific and the North Atlantic. The leading mode explains 30.4 % of total variance. All experiments show larger explained variance of the NAM than ERA-Interim (Figs. 18 and S5). Horizontal resolution does not show a strong influence on the NAM pattern, except that CR_CRR and CR_SEQ exhibit slightly higher explained variance relative to their LR counterparts. The concurrent radiation scheme captures the centres of action better than the classical scheme, especially over the North Atlantic sector. The centre of action over the North Atlantic is shifted eastward in LR_SEQ compared to ERA-Interim and LR_CRR.

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f18

Figure 18The leading empirical orthogonal functions (EOFs) of December–January–February (DJF) sea level pressure (SLP) anomalies calculated from (a) ERA-Interim as well as the (c) concurrent and (e) classical radiation experiments for the period 1980–2013. Linear regressions of the DJF 2 m air temperature on the NAM index (normalized PC1) from (b) ERA-Interim, (d) concurrent, and (f) classical radiation experiments for the period 1980–2013.

The atmospheric teleconnection associated with the NAM is calculated by regressing the DJF SAT anomalies on the time series of principal components (PC1) corresponding to the leading EOF (Figs. 18b, d, f and S5b, d, f). Associated with the positive phase of the NAM, warm SAT anomalies occur over Europe, Siberia, and the western United States, while cold temperature anomalies exist over western Alaska, far eastern Russia, Greenland, and eastern Canada (Fig. 18b). CR_SEQ significantly underestimates the response over Greenland, Baffin Bay, and Siberia but overestimates the response over western America. Additionally, an artificially negative response occurs over western Canada (Fig. 18b). CR_CCR shares similarities with CR_SEQ, with minor differences in the magnitude of the response, which is also seen in the LR experiments. The atmospheric teleconnection simulated by two LR experiments generally agrees better with ERA-Interim than the CR simulations, which reproduce the widespread negative SAT anomalies over Europe and Siberia (Fig. S5d and f).

5.2.3 Southern Annular Mode

Similarly, the JJA (June–July–August) SAM is defined as the leading EOF of JJA SLP anomalies in the Southern Hemisphere. Similar to the NAM, the SAM depicts large loadings over the Antarctic and three surrounding centres of action over the southern Pacific, Indian, and Atlantic Ocean at approximately 45^∘ S (Fig. 19a). CR_CCR and CR_SEQ simulate slightly westward-shifted centres of action over the southern Indian Ocean (Fig. 19c and e). The explained variances by the leading EOFs are overestimated in these two experiments. A refinement of horizontal resolution improves the simulation of the SAM pattern. Among all experiments, LR_CCR shows the best agreement with ERA-Interim (Figs. 19a and S6c). This may be linked to small biases in the mean atmospheric circulation (Fig. S3b).

https://gmd.copernicus.org/articles/14/7439/2021/gmd-14-7439-2021-f19

Figure 19The leading empirical orthogonal functions (EOFs) of June–July–August (JJA) sea level pressure (SLP) anomalies calculated from (a) ERA-Interim as well as the (c) concurrent and (e) classical radiation experiments for the period 1980–2013. Linear regressions of the JJA 2 m air temperature on the SAM index (normalized PC1) from (b) ERA-Interim, (d) concurrent, and (f) classical radiation experiments for the period 1980–2013.

The SAT response to the positive phase of SAM is computed by regressing the SAT anomalies on the PC1 of SLP in the Southern Hemisphere (Figs. 19 and S6). There are warm temperature anomalies over most of the Antarctic and Ross Sea, while cold anomalies exist over the Weddell. LR_CCR and LR_SEQ underestimate the SAT response over the Weddell Sea, whereas CR_CCR and CR_SEQ fail to reproduce the SAT response over both the Ross Sea and Weddell Sea (Figs. 19d, f and S6d, f). Overall, an increase in the horizontal resolution improves the simulated climate variability and its atmospheric teleconnections in both the Northern and Southern Hemisphere; however, the concurrent radiation scheme barely changes these features.

6 Conclusions

This paper presents the implementation of the concurrent radiation scheme in the atmospheric model ECHAM6 and demonstrates its impact on the performance and stability of the model. A detailed analysis shows that the radiative transfer is a relatively expensive component of the model, especially for the CR resolution and LR resolution, which are also used in palaeoclimate simulation. Although the component exhibits a higher scalability profile, the study reveals that it cannot freely scale to its full potential due to the sequential architecture of the classical ECHAM6. The concurrent radiation scheme, on the other hand, organizes the radiation component in parallel with the rest of the atmospheric physics and dynamics and hides its long computation time. The experiments explicitly asserted a noticeable model speedup across the scaling curve with a strong dependency on the relative computational cost of the radiative transfer to the other atmospheric processes. Unlike the classical scheme, this approach enables the radiation component to adopt any viable domain decomposition arbitrarily, which may differ from the main model's configuration. The component can accordingly follow a different scaling scheme and benefit from the higher scalability of radiation calculations. This salient feature can eventually decrease the discrepancy between the radiation time step Δt_rad and normal atmospheric time step Δt_atm with the objective of creating more physical consistency in the model.

The simulated mean climate and internal climate variability by the classical and concurrent radiation scheme have been evaluated. A suite of AMIP experiments has been performed on CR and LR configurations using the two radiation schemes. In terms of long-term mean state biases, e.g. biases in land surface temperature, precipitation, cloud radiative forcings, and zonal-mean temperature and wind, a refinement of horizontal resolution exhibits better agreement with the observations. The concurrent radiation scheme generally yields similar results as the classical radiation, except some minor improvements in the mean atmospheric circulation in the Southern Hemisphere. Regarding the climate variability and associated atmospheric teleconnections, LR simulations agree better with ERA-Interim than their CR counterparts. The concurrent radiation scheme on LR improves the atmospheric teleconnection to the SAM, which is likely linked to the alleviated bias in the mean circulation. On the other hand, the classical radiation scheme on LR shows better atmospheric teleconnection to ENSO than the concurrent radiation. One possible reason is that the classical radiation scheme in ECHAM6 has been properly tuned towards the observations. To conclude, the concurrent radiation scheme presented in this study substantially improves the scalability of ECHAM6, with the major features in the mean climate and internal variability retained.

Code and data availability

The source code of the atmospheric model ECHAM6 adopted for the project PalMod (for the concurrent execution of radiative transfer) and used for generating the plots presented in this paper is available under https://doi.org/10.35089/WDCC/SC_PalMod_ECHAM6 (MPI-M and DKRZ, 2021). The output data for generating the plots presented in this paper are available under https://doi.org/10.5281/zenodo.4589140 (Heidari et al., 2021).

Supplement

The supplement related to this article is available online at: https://doi.org/10.5194/gmd-14-7439-2021-supplement.

Author contributions

MRH wrote the Abstract and Sects. 1, 2, 3, and 4, and he contributed to the Conclusions. He also conducted the performance analysis and performed the required AMIP experiments. ZS wrote Sect. 5 and contributed to the Conclusions. He performed the AMIP experiments required in Sect. 5 and analysed the output. ED contributed to the AMIP experiment required for the performance analysis. JB wrote the excerpt on the YAXT library and provided detailed advice on the performance analysis. HB had the major role in funding acquisition and supervision of the project leading to this publication. He also provided detailed advice on the direction of the paper and the performance analysis. All co-authors contributed to the research and the text in this paper.

Competing interests

The contact author has declared that neither they nor their co-authors have any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Acknowledgements

The authors are grateful to the colleagues at the Max Planck Institute for Meteorology for sharing their results and ideas on implementing a similar approach in the ICON model. The authors are also grateful to Thomas Ludwig and his Scientific Computing group at Universität Hamburg for their kind support. The authors would like to thank the reviewers for their corrections and helpful comments.

Financial support

This research has been supported by the Bundesministerium für Bildung und Forschung (grant no. FKZ:01LP1515A).

Review statement

This paper was edited by Fiona O'Connor and reviewed by Robin Hogan and one anonymous referee.

References

Adler, R. F., Huffman, G. J., Chang, A., Ferraro, R., Xie, P., Janowiak, J., Rudolf, B., Schneider, U., Curtis, S., Bolvin, D., Gruber, A., Susskind, J., Arkin, P., and Nelkin, E.: The version 2 Global Precipitation Climatology Project (GPCP) monthly precipitation analysis (1979–present), J. Hydrometeor., 4, 1147–1167, https://doi.org/10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2, 2003. a, b

Balaji, V.: The Flexible Modeling System, in: Earth system modelling, edited by: Valcke, S., Redler, R., and Budich, R., Volume 3, Springer briefs in earth system sciences, Springer, Berlin, Germany, 33–41, 2012. a

Balaji, V., Benson, R., Wyman, B., and Held, I.: Coarse-grained component concurrency in Earth system modeling: parallelizing atmospheric radiative transfer in the GFDL AM3 model using the Flexible Modeling System coupling framework, Geosci. Model Dev., 9, 3605–3616, https://doi.org/10.5194/gmd-9-3605-2016, 2016. a, b, c, d, e, f, g, h, i, j, k

Bayr, T., Dommenget, D., Martin, T., and Power, S. B.: The eastward shift of the Walker Circulation in response to global warming and its relationship to ENSO variability, Clim. Dynam., 43, 2747–2763, 2014. a

Behrens, J., Hanke, M., and Jahns, T.: How to use MPI communication in highly parallel climate simulations more easily and more efficiently, in: EGU General Assembly Conference Abstracts, 27 April–2 May 2014, Vienna, Austria, p. 13569, 2014. a

Cess, R. D., Potter, G., Blanchet, G., Boer, J., Del Genio, A., Déqué, M., Dymnikov, V., Galin, V., Gates, W., Ghan, S., Kiehl, J., Lacis, A., Le Treut, H., Li, Z., Liang, X., McAvaney, B., Meleshko, V., Mitchell, J., Morcrette, J., Randall, D., Rikus, L., Roeckner, E., Royer, J., Schlese, U., Sheinin, D., Slingo, A., Sokolov, A., Taylor, E., Washington, W., Wetherald, R., Yagai, I., and Zhang, M.: Intercomparison and interpretation of climate feedback processes in 19 atmospheric general circulation models, J. Geophys. Rese.-Atmos., 95, 16601–16615, 1990. a

Cotronei, A. and Slawig, T.: Single-precision arithmetic in ECHAM radiation reduces runtime and energy consumption, Geosci. Model Dev., 13, 2783–2804, https://doi.org/10.5194/gmd-13-2783-2020, 2020. a, b

Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bidlot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer, A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hólm, E. V., Isaksen, L., Kållberg, P., Köhler, M., Matricardi, M., McNally, A. P., Monge-Sanz, B. M., Morcrette, J.-J., Park, B.-K., Peubey, C., de Rosnay, P., Tavolato, C., Thépaut, J.-N., and Vitart, F.: The ERA-Interim reanalysis: configuration and performance of the data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828, 2011. a, b, c

Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., and Taylor, K. E.: Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization, Geosci. Model Dev., 9, 1937–1958, https://doi.org/10.5194/gmd-9-1937-2016, 2016. a

Giorgetta, M. A., Jungclaus, J. H., Reick, C. H., Legutke, S., Bader, J., Böttinger, M., V., Brovkin, T., Crueger, M., Esch, K., Fieg, K., Glushak, V., Gayler, H., Haak, H., Hollweg, T., Ilyina, S., Kinne, L., Kornblueh, D., Matei, T., Mauritsen, U., Mikolajewicz, W., Mueller, D., Notz, F., Pithan, T., Raddatz, S., Rast, R., Redler, E., Roeckner, H., Schmidt, R., Schnur, J., Segschneider, K., D. Six, M., Stockhause, C., Timmreck, J., Wegner, H., Widmann, K. H., Wieners, M., Claussen, J., Marotzke, B., and Stevens, B.: Climate and carbon cycle changes from 1850 to 2100 in MPI-ESM simulations for the Coupled Model Intercomparison Project phase 5, J. Adv. Model. Earth Syst., 5, 572–597, 2013. a

Guilyardi, E.: El Niño–mean state–seasonal cycle interactions in a multi-model ensemble, Clim. Dynam., 26, 329–348, 2006. a

Guilyardi, E., Gualdi, S., Slingo, J., Navarra, A., Delécluse, P., Cole, J., Madec, G., Roberts, M., Latif, M., and Terray, L.: Representing El Nino in coupled ocean–atmosphere GCMs: the dominant role of the atmospheric component, J. Climate, 17, 4623–4629, 2004. a

Guilyardi, E., Braconnot, P., Jin, F.-F., Kim, S. T., Kolasinski, M., Li, T., and Musat, I.: Atmosphere feedbacks during ENSO in a coupled GCM with a modified atmospheric convection scheme, J. Climate, 22, 5698–5718, 2009. a

Harris, I., Jones, P. D., Osborn, T. J., and Lister, D. H.: Updated high-resolution grids of monthly climatic observations – the CRU TS3.10 Dataset, Int. J. Climatol., 34, 623–642, https://doi.org/10.1002/joc.3711, 2014. a

Heidari, M. R., Song, Z., and Degregori, E.: Outputs of AMIP experiments using ECHAM-6.3.05p2 with the concurrent radiation scheme, Zenodo [data set], https://doi.org/10.5281/zenodo.4589140, 2021. a

Hogan, R. J. and Bozzo, A.: Mitigating errors in surface temperature forecasts using approximate radiation updates, J. Adv. Model. Earth Syst., 7, 836–853, 2015. a

Hogan, R. J. and Bozzo, A.: A flexible and efficient radiation scheme for the ECMWF model, J. Adv. Model. Earth Syst., 10, 1990–2008, 2018. a, b

Hogan, R. J. and Hirahara, S.: Effect of solar zenith angle specification in models on mean shortwave fluxes and stratospheric temperatures, Geophys. Res. Lett., 43, 482–488, 2016. a

Jin, F.-F., Kim, S. T., and Bejarano, L.: A coupled-stability index for ENSO, Geophys. Res. Lett., 33, L23708, https://doi.org/10.1029/2006GL027221, 2006. a

Jones, P. D., Lister, D. H., Osborn, T. J., Harpham, C., Salmon, M., and Morice, C. P.: Hemispheric and large-scale land surface air temperature variations: an extensive revision and an update to 2010, J. Geophys. Res., 117, D05127, https://doi.org/10.1029/2011JD017139, 2012. a

Kidston, J., Renwick, J., and McGregor, J.: Hemispheric-scale seasonality of the Southern Annular Mode and impacts on the climate of New Zealand, J. Climate, 22, 4759–4770, 2009. a

Loeb, N. G., Lyman, J. M., Johnson, G. C., Allan, R. P., Doelling, D. R., Wong, T., Soden, B. J., and Stephens, G. L.: Observed changes in top-of-the-atmosphere radiation and upper-ocean heating consistent within uncertainty, Nat. Geosci., 5, 110–113, https://doi.org/10.1038/NGEO1375, 2012. a, b

Mauritsen, T. and Roeckner, E.: Tuning the MPI-ESM1. 2 global climate model to improve the match with instrumental record warming by lowering its climate sensitivity, J. Adv. Model. Earth Syst., 12, e2019MS002037, https://doi.org/10.1029/2019MS002037, 2020. a

Mauritsen, T., Bader, J., Becker, T., Behrens, J., Bittner, M., Brokopf, R., Brovkin, V., Claussen, M., Crueger, T., Esch, M., Fast, I., Fiedler, S., Fläschner, D., Gayler, V., Giorgetta, M., Goll, D., Haak, H., Hagemann, S., Hedemann, C., Hohenegger, C., Ilyina, T., Jahns, T., Jimenéz-de-la-Cuesta, D., Jungclaus, J., Kleinen, T., Kloster, S., Kracher, D., Kinne, S., Kleberg, D., Lasslop, G., Kornblueh, L., Marotzke, J., Matei, D., Meraner, K., Mikolajewicz, U., Modali, K., Möbis, B., Müller, W., Nabel, J., Nam, C., Notz, D., Nyawira, S., Paulsen, H., Peters, K., Pincus, R., Pohlmann, H., Pongratz, J., Popp, M., Raddatz, T., Rast, S., Redler, R., Reick, C., Rohrschneider, T., Schemann, V., Schmidt, H., Schnur, R., Schulzweida, U., Six, K., Stein, L., Stemmler, I., Stevens, B., von Storch, J.-S., Tian, F., Voigt, A., Vrese, P., Wieners, K.-H., Wilkenskjeld, S., Winkler, A., and Roeckner, E.: Developments in the MPI-M Earth System Model version 1.2 (MPI-ESM1. 2) and its response to increasing CO₂, J. Adv. Model. Earth Syst., 11, 998–1038, 2019. a

McAfee, S. A. and Russell, J. L.: Northern annular mode impact on spring climate in the western United States, Geophys. Res. Lett., 35, L17701, https://doi.org/10.1029/2008GL034828, 2008. a

Morcrette, J.-J.: On the effects of the temporal and spatial sampling of radiation fields on the ECMWF forecasts and analyses, Mon. Weather Rev., 128, 876–887, 2000. a

Morcrette, J.-J., Mozdzynski, G., and Leutbecher, M.: A reduced radiation grid for the ECMWF Integrated Forecasting System, Mon. Weather Rev., 136, 4760–4772, 2008. a

Mozdzynski, G. and Morcrette, J.-J.: Reorganization of the radiation transfer calculations in the ECMWF IFS, European Centre for Medium-Range Weather Forecasts, https://doi.org/10.21957/pxjpl93ov, 2014. a

MPI-M and DKRZ: Source code of the Max Planck Institute atmospheric general circulation model (ECHAM6) adopted to the project PalMod, World Data Center for Climate (WDCC) at DKRZ [code], https://doi.org/10.35089/WDCC/SC_PalMod_ECHAM6, 2021. a

Neelin, J. D., Battisti, D. S., Hirst, A. C., Jin, F.-F., Wakata, Y., Yamagata, T., and Zebiak, S. E.: ENSO theory, J. Geophys. Res., 103, 14261–14290, 1998. a

Pauluis, O. and Emanuel, K.: Numerical instability resulting from infrequent calculation of radiative heating, Mon. Weather Rev., 132, 673–686, 2004. a, b

Philander, S. G.: Geophysical Interplays: El Niño, La Niña, and the southern oscillation, International Geophysics Series, vol. 46, Academic Press, San Diego, CA, USA, 293 pp., 1989. a, b

Pincus, R. and Stevens, B.: Paths to accuracy for radiation parameterizations in atmospheric models, J. Adv. Model. Earth Syst., 5, 225–233, 2013. a

Rasp, S.: Statistical methods and machine learning in weather and climate modeling, PhD thesis, Faculty of Physics, Ludwig-Maximilians-Universität, available at: https://edoc.ub.uni-muenchen.de/23867/1/Rasp_Stephan.pdf (last access: 30 November 2021), 2019. a

Salby, M. L.: Fundamentals of atmospheric physics, International geophysics series, vol. 61, Elsevier, New York, USA, 1996. a, b, c

Stevens, B., Giorgetta, M., Esch, M., Mauritsen, T., Crueger, T., Rast, S., Salzmann, M., Schmidt, H., Bader, J., Block, K., Brokopf, R., Fast, I., Kinne, S., Kornblueh, L., Lohmann, U., Pincus, R., Reichler, T., and Roeckner, E.: Atmospheric component of the MPI-M Earth system model: ECHAM6, J. Adv. Model. Earth Syst., 5, 146–172, 2013. a, b, c, d, e

Wallace, J. M. and Hobbs, P. V.: Atmospheric science: an introductory survey, 2nd edn., International geophysics series, vol. 92, Elsevier, https://doi.org/10.1016/C2009-0-00034-8, 2006. a, b, c, d, e, f

Xu, K.-M. and Randall, D. A.: Impact of interactive radiative transfer on the macroscopic behavior of cumulus ensembles. Part I: Radiation parameterization and sensitivity tests, J. Atmos. Sci., 52, 785–799, 1995. a, b

Articles

Short summary

To improve our understanding of climate system dynamics and their variability, the numerical atmospheric model ECHAM6 is used to simulate a complete glacial cycle over the past 120 000 years. However, performing such simulations takes a long time even on state-of-the-art supercomputers. To accelerate the model simulation, we propose calculating radiative transfer processes in parallel with adiabatic processes in the atmosphere, which reduces the simulation time by nearly half.