Stable climate simulations using a realistic GCM with neural network parameterizations for atmospheric moist physics and radiation processes

In climate models, subgrid parameterizations of convection and cloud are one of the main reasons for the biases in precipitation and atmospheric circulation simulations. In recent years, due to the rapid development of data science, Machine 10 learning (ML) parameterizations for convection and clouds have been proven the potential to perform better than conventional parameterizations. At present, most of the existing studies are on aqua-planet and idealized models, and the problems of simulated instability and climate drift still exist. In realistic configurated models, developing a machine learning parameterization scheme remains a challenging task. In this study, a group of deep residual multilayer perceptrons with strong nonlinear fitting ability is designed to learn a parameterization scheme from cloud-resolving model outputs. Multi-target 15 training is achieved to best balance the fits across diverse neural network outputs. The optimal machine learning parameterization, named NN-Parameterization, is further chosen among feasible candidates for both high performance and long-term simulation. The results show that NN-Parameterization performs well in multi-year climate simulations and reproduces reasonable climatology and climate variability in a general circulation model (GCM), with a running speed of about 30 times faster than the cloud-resolving model embedded Superparameterizated GCM. Under real geographical boundary 20 conditions, the hybrid ML-physical GCM well simulates the spatial distribution of precipitation and significantly improves the frequency of precipitation extremes, which is largely underestimated in the Community Atmospheric Model version 5 (CAM5) with the horizontal resolution of 1.9°×2.5°. Furthermore, the hybrid ML-physical GCM simulates a stronger signal of the Madden-Julian oscillation with a more reasonable propagation speed, which is too weak and propagates too fast in CAM5. This study is a pioneer to achieve multi-year stable climate simulations using a hybrid ML-physical GCM in actual land-ocean 25 boundary conditions. It demonstrates the emerging potential for using machine learning parameterizations in climate simulations. https://doi.org/10.5194/gmd-2021-299 Preprint. Discussion started: 29 September 2021 c © Author(s) 2021. CC BY 4.0 License.


SPCAM setup and data generation
The GCMs used in this study are the CAM5.2 developed by the National Center for Atmospheric Research and its superparameterized version SPCAM (Khairoutdinov and Randall, 2001;Khairoutdinov et al., 2005). A complete description of CAM5 is given by Neale et al. (2012). The dynamic core of CAM5 has a horizontal resolution of 1.9°×2.5° and 30 vertical levels with a model top at about 2 hPa. To represent moist processes, CAM5 adopts a plume-based treatment of shallow 130 convection (Park and Bretherton, 2009), a mass-flux parameterization scheme for deep convection (Zhang and McFarlane, 1995), and an advanced two-moment representation of cloud microphysical processes (Morrison and Gettelman, 2008;Gettelman et al., 2010). In the AMIP experiments we conducted, CAM5 is coupled to a land surface model Community Land Model version 4.0 (Oleson et al., 2010) and uses prescribed sea surface temperatures and sea ice concentrations.
In this study, SPCAM is used to generate the training data. In SPCAM, a two-dimensional (2-D) CRM is embedded in 135 each grid column of the host CAM. The 2-D CRM has 32 grid points in the zonal direction and 30 vertical levels that are shared with the host CAM. The CRM handles convection and cloud microphysics to replace the conventional parameterization schemes, and the radiation is calculated on the CRM subgrids to include the cloud-radiation interaction at cloud scale (Khairoutdinov et al., 2005), referring to CRM radiation hereafter. The other physical processes and the dynamic core are computed on the CAM grid as usual. One conceptual advantage of using SPCAM as the reference simulation is that the subgrid 140 and grid-scale processes are clearly separated, making it easy to define the parameterization task for an ML algorithm (Rasp, 2020).

NN-Parameterization
In the NNCAM, a DNN emulator is trained with the target data from the 2-D CRM and CRM radiation to achieve better results than the CAM5 with conventional parameterizations. In the following section, we refer to this DNN emulator as the  Parameterization. As shown in Figure 1, the NN-Parameterization replaces the CRM moist physics and radiation calculations in SPCAM. It is coupled with the dynamic core and other parameterization schemes in each time integration loop in NNCAM.
NN-Parameterization is a challenging deep learning application. It integrates the NNs into a scientific computing program for continuous time integration. In the numerical model system, the prediction errors of the NNs are at the risk of being amplified by the continuous iterations with the dynamic core and other physical processes, causing model state drift and even 150 model crashes in NNCAM. This is expected to be more difficult in the GCMs with real land-ocean distributions. Because of the energy exchange between the land and the atmosphere, the nonlinearity of the simulation system is more complicated than the idealized aqua-planet models, bringing more uncertainties and more numerical sensitivities. In this section, we propose a DNN based Parameterization that can fit the grid average of the 2-D CRM data in SPCAM, and on this basis, achieve high computational performance and stable time integration.

Data sets
The NN-Parameterization, as a deep learning emulator of the CRM and the CRM radiation in SPCAM, is designed to replace physics processes in the host CAM5, including deep and shallow convection, cloud microphysics and macrophysics, and radiation. Therefore, the inputs of this emulator are borrowed from CRM inputs such as the grid-scale state variables and forcings composed of the dynamic core and the planetary boundary diffusion. They are the specific humidity , temperature 160 T, largescale water vapor forcing ( ) and temperature forcing ( ) . Additionally, we select surface pressure and solar insolation (SOLIN) at the top of the model from the radiation module. The outputs of NN-Parameterization are subgrid-scale tendencies of moisture ( ) and of temperature ( ) at each model level as well as net shortwave and longwave radiative fluxes at both the surface and the TOA. This heating is composed of moist heating in the CRM and the GCM-grid-averaged radiative heating from the CRM radiation module. Since we use the real geography, we also include direct and diffuse 165 downwelling solar radiation fluxes as output the NN-Parameterization to force the coupled land surface model, which is critical to improve the performance of the NNCAM. Specifically, they are solar downward visible direct to surface (SOLS), solar downward near infrared direct to surface (SOLL), solar downward visible diffuse to surface (SOLSD), and solar downward near infrared diffuse to surface (SOLLD). Those downwelling solar radiation fluxes with separation of direct versus diffusion are introduced for different land cover types and processes in the land surface model (Mooers et al., 2021). The precipitation 170 is derived from column integration of predicted moisture tendency to keep basic water conservation. Table 1 lists the input and output variables and their normalization factors. There are 30 model levels for each profile variables. Therefore, the input vector consists of 122 elements for 4 profile variables and 2 scalars, while the 68-element output vector is made of 2 profiles and 8 scalars. All input and output variables are normalized to ensure that they are in the same magnitude before they are put into the NN-parameterization for training, testing, model prognostic validation. The 175 normalization factor for each variable shown in the supplemented codebase is determined by the maximum of its absolute values.
In this study, the target model SPCAM is run for 3 years from January 1, 1997 to December 31, 1999 with a time step of 30 min. The first year of SPCAM simulations is for spinup, the second and third years are for training, with a total of 484 million samples. In the training process, 64% of the data are randomly selected for training and 16% is for validation in every 180 training epoch to monitor the performance and to prevent overfitting, and the remaining 20% of the data are used for testing.

Multi-Target ResMLP
In this study, the NN-Parameterization is an isolated column subgrid parameterization. This means that the inputs and outputs of NN-Parameterization are both 1-D vectors. Compared with machine learning tasks such as computer vision and natural language processing, the data input and output of NN-Parameterization are relatively simpler. Therefore, we choose the 185 commonly used multilayer perceptron (MLP) to achieve a better generalization. In the development of the NN parameterization scheme, it is found that the NN has a different fitting capability for different types of output variables. It usually has higher accuracy in predicting radiative fluxes and temperature tendency, while the accuracy is lower in predicting moisture tendency. This is also found in Gentine et al. (2018), in which the coefficient of determination (R 2 ) of radiative heating tendency is higher than that of moisture tendency at most model levels. The physical 190 processes behind the data representing convection and radiation are different, introduce high nonlinearity in the NN-Parameterization.
The distribution of different physical variables also varies greatly. Using a single NN with one target to train the relatively independent physical variables, i.e., moisture tendency, temperature tendency, and radiation fluxes, inevitably causes mutual interference. Since gradient descending is applied to optimize the network in training, mutual interference between different 195 targets is expected to cause the cancel out of gradient directions used for descending (Crawshaw et al., 2020;Zhang et al., 2021) and ultimately affect the convergence of the network.
As shown in Figure 2, we divide the single network prediction target, a single stacked 1-D output, into multiple targets, and use multiple NNs to train these targets separately. By doing so, we avoid the gradient cancellation between multiple targets and improve the convergence speed and fitting accuracy when training the network. 200 After dividing multiple targets, the learning target of NN-Parameterization still contains a lot of nonlinear information.
This research attempts to construct a deeper neural network so that NNs can have stronger nonlinear fitting abilities. As the depth of a fully connected neural network increases, its performance becomes saturated and begins to decline when the depth increases further (He et al., 2016). For this reason, we use residual blocks to solve the problem of network degradation. By testing different depths and widths, we finally determined the network architecture for the optimal performance. As shown in 205 Figure 3, a total of 7 residual blocks are used to form a deep neural network, and each residual block includes two 512 nodewide dense layers connected with a layer jump. Therefore, the deep neural network that has 14 layers with 3.5 million parameters is named ResMLP hereafter. As shown in Figure 4, under multi-target training with the same hyperparameters, ResMLP's fitting accuracy (R 2 ) for different output variables is better than MLP, achieving a balanced prediction result, that is, the R 2 of each variable is as close to 1 as possible, especially tendencies of moisture and temperature. 210 Ott et al. (2020) found that on aqua-planet, the fitting accuracy of an NN-Parameterization is positively correlated with the stability of prognostic validation. In the real-world prognostic validation, we also find that the fitting accuracy of NN-Parameterization has an important impact on stability. When NN-Parameterization is under-fitting, NNCAM can only run for a few days before it crashes. To obtain a feasible NN-Parameterization, a well-fit is necessary. As shown in Table 2, after multiple trials, we determine the hyperparameter configuration of ResMLP for high fitting accuracy. It can support NNCAM 215 to run stably for at least several months.
However, we find that a high R 2 may not guarantee the stability of real-world prognostic validation. We have trained several groups of ResMLPs with doubled samples and epochs, having their R 2 increased by 1% to 2%. Surprisingly, NNCAM using these ResMLPs crashed even earlier than before. As a result, high fitting accuracy is necessary but not sufficient for both high performance and long-term simulations. In this work, we proposed a trial-and-error method to effectively find the optimal neural network that can guarantee both multi-year and high performance simulations. Firstly, we used the hyperparameter configuration of Table 2 as a baseline and prepared 50 groups of ResMLPs with similar R 2 as candidate models using different train samples and epochs. Secondly, we conducted comprehensive prognostic tests on these candidate neural networks and obtained the feasible NN-Parameterization schemes that can support NNCAM's stable simulation for multiple years. To our knowledge, running the relatively shorter NNCAM simulation is enough to screen out the feasible networks for stable 225 simulations since we found that the ResMLP groups that cannot support long-term integration made NNCAM to collapse within half a year of simulation; and the feasible groups, on the other hand, coupled stably in NNCAM for the first half a year and showed no sign of crashes in the 10-year prognostic simulation. Finally, the optimal NN-Parameterization was selected for the best performance ResMLP group among the feasible candidats.
implementation ensures that it can be flexibly deployed in GCMs. However, embedding DNN in NNCAM is still a troublesome task with no existing coupling framework to support many of the latest network structures. This problem will restrict developers from building more powerful DNNs and deploying them in NNCAM.
In this research, we regard NN-Parameterization as a component model coupled to NNCAM. We develop the coupler to 255 bridge NN-Parameterization with the host CAM5. Through this coupler, the neural network can communicate with the dynamic core and other physical schemes in NNCAM in each time step. When NNCAM is running, as shown in ① in Figure 5, the coupler receives the state and forcing output from dynamic core in Fortran based CAM5. For each input variable, we use the native MPI interface in CAM5 to gather the data of all processes to the master process into a tensor. Then, as shown in ② of Figure 5, the coupler will transmit the gathered tensor through the data buffer to the NN-Parameterization running on the same 260 node as the master process. The NN-Parameterization gets the input, infers the outputs, and transmits them back to the coupler.
As shown in ③ of Figure 5, the coupler will first write these tendencies and radiation fluxes back to the master process and then broadcast the data to CAM5 processes running on the computing nodes through the MPI transmission interface. Therefore, other parameterizations get the predictions from NN-Parameterization to complete the follow-up procedures (④ in Figure 5).
In practice, the DNN-GCM Coupler introduces a data buffer that supports system-level interface, which is accessible by 265 both Fortran based GCM and Python based DNN without supplementary foreign codes. This can avoid code compatibility issues when building Machine Learning coupled numerical models. It supports all mainstream machine learning frameworks, including native PyTorch and Tensorflow. Based on the coupler, one can efficiently and flexibly deploy the Deep Learning Model in NNCAM, and can even take advantage of the latest developed neural networks.
All neural network models deployed through DNN-GCM Coupler can support GPU accelerated inference to achieve 270 excellent computing performance. In this study, we ran SPCAM and NNCAM on 196 CPU cores. NNCAM also used 2 GPUs for acceleration. During the NNCAM runtime, each time step of NNCAM requires NN-Parameterization to complete an inference and conduct data communication with NNCAM. This is a typical high-frequency communication scenario. We evaluated the amount of data (about 20MB for CAM5 with the horizontal resolution of 1.9°×2.5°) that needs to be transmitted for each communication, and determined to establish a data buffer on a high-speed solid-state drive to ensure a balance of 275 performance and compatibility. It takes about 1x10 -2 seconds to access the data buffer in each time step, which is enough to support the efficient simulation of NNCAM. The Simulation Years per Day (SYPD) of NNCAM based on DNN-GCM Coupler has a performance improvement of nearly 30x compared to SPCAM, and reaches half the speed of CAM5.

Offline Validation of NN-Parameterization
To assess how well the NN-Parameterization learns the subgrid tendencies from the CRM and its effects on radiation in condition, it is necessary to conduct such evaluation before prognostic experiments. We choose mean fields and coefficient of determination ( 2 ) as the two metrics in the offline testing. 285 The mean diabatic heating and drying rates produced by convection and large-scale condensation in SPCAM and NN-Parameterization are in close agreement. Figure 6 shows the latitude-height cross-sections of the annual mean heating and moistening rates in SPCAM and the corresponding NN-Parameterization. At 5 °N, SPCAM shows maximum latent heating in the deep troposphere, corresponding to deep convection at the ITCZ. In the subtropics, there is heating and moistening in the lower troposphere, corresponding to stratocumulus and shallow convection in the subtropics. In the midlatitudes, there is a 290 secondary heating maximum below 400 hPa due to midlatitude storm tracks. All these features are well reproduced by NN-Parameterization. Note that in the midtroposphre, the ITCZ peak in the drying rates is slightly weaker in NN-Parameterization compared with that of SPCAM (Figure 6c and 6d).
In addition to the mean fields, the high prediction skill of NN-Parameterization is also shown in the spatial distribution of 2 . We choose pressure-latitude cross-sections to better demonstrate 2 for the 3D variables such as diabatic heating and 295 moistening. Therefore, zonal averages are calculated in advance. For diabatic heating, 2 is above 0.7 over the entire mid to low troposphere, and the high skill regions with 2 greater than 0.9 concentrates in low levels but are extended to midtroposphere in storm tracks ( Figure 7a). As for the moistening rate, the high skill zones concentrate in the mid to upper troposphere (Figure 7b), leaving low skill areas below. Those regions with low accuracy generally locate in the mid to low troposphere in tropics and subtropics, corresponding to deep convection at ITCZ and shallow convection in subtropics. 300 The global distribution of 2 for the derived precipitation is shown in Figure 8. Our NN-Parameterization shows a great prediction skill globally, especially in the midlatitude storm tracks. The prediction skill is relatively low in many areas between 30°S to 30°N and some midlatitude continents. In particular, the prediction skill of precipitation is not ideal in the ITCZ deep convection regions. Moreover, for shallow convection in Subtropical Eastern Pacific and Subtropical Eastern Atlantic, the precipitation prediction skill hits bottom, corresponding to the subtropical low skill zones for moistening rate (Figure 7b). 305 Generally, NN-Parameterization shows high performance in the offline testing regarding mean fields and fitting 2 . As

Long-term Prognostic Validation 315
As described in Section 2.2, NN-Parameterization is coupled in NNCAM to replace the conventional deep and shallow convection, microphysics and macrophysics, and radiation parameterizations. At the same time, the planetary boundary layer diffusion, as well as the dynamic core are still kept in the host GCM. Starting with the SPCAM checkpoint on July 1, 1998, NNCAM is run for 9 years and a half to December 31, 2007. After a spinup for half a year, we choose the next four years from January 1, 1999 to December 31, 2002 for evaluation. Although long-term stable run of NNCAM is achieved, a very slow 320 climate drift is still inevitable. Thus, we do not put the last 5 years into analysis. In addition, as the referencing target model, SPCAM is only run from January 1, 1997 to December 31, 1999 due to excessive computing resources consumption. We gather the results of the last two years to compare with NNCAM simulations. CAM5, as the coarse-grided conventional parameterized control model, is run from January 1, 1998 to December 31, 2002. Similarly, the first year is for spinup and the last four years for analysis. In analysis of prognostic results, the following are selected for demonstration of climatology and 325 variability: multi-year mean fields of temperature and humidity, precipitation, and the Madden Julian Oscillation.

Vertical profiles of temperature and humidity
In this section, we first evaluate the vertical structure of the mean temperature and humidity. Figure 9 presents the zonally averaged vertical profiles of air temperature and specific humidity as simulated by the NNCAM and the CAM5, in contrast to the SPCAM simulations. Overall, the NNCAM does an excellent job in reproducing the thermal structure. The resulting mean 330 state in temperature and specific humidity of NNCAM closely resembles SPCAM throughout the troposphere. The larger deviations are temperature biases in the tropopause, where the cold-point region is thinner and warmer in NNCAM than in SPCAM and CAM5. In addition, there are cold biases above 200 hPa over polar regions and slight dry biases over the equator in NNCAM. Figure 10 shows the spatial distributions of winter (December-January-February) and summer (June-July-August) mean precipitation simulated by SPCAM, NNCAM, and CAM5, with SPCAM simulation results as reference precipitation. In SPCAM (Figure 10a and 10b), massive precipitation can be found in regions of Asian monsoon and midlatitude storm tracks over the northwest Pacific and Atlantic oceans. In the tropics, the primary peaks of rainfall are in the eastern Indian Ocean and Maritime Continent regions. Furthermore, two zonal precipitation bands are located at 0°-10°N in the equatorial Pacific and 340

Precipitation 335
Atlantic oceans, constituting the northern ITCZ. The southern South Pacific Convergence Zone (SPCZ) is mainly located around 5°S-10°S near the western Pacific warm pool region and experiences a southeast tilt as it extends eastward into the central Pacific. The main spatial patterns of SPCAM precipitation climatology are properly reproduced by both NNCAM and CAM5. In NNCAM, strong rainfall centers are well simulated over the tropical land regions over Maritime Continent, the Asian monsoon region, and South America and Africa (Figure 10c and 10d). In addition, the heavy summertime precipitation over the northwestern Pacific is well represented in both SPCAM and NNCAM (Figure 10b and 10d). In CAM5, there is too little precipitation over that area (Figure 10f), which is a common bias in many GCMs. However, the equatorial region is too dry in NNCAM, especially over sea surface area in boreal winter (Figure 10c). Figure 11 shows the annually averaged zonal mean precipitation from SPCAM, NNCAM, and CAM5. NNCAM generally reproduces very similar latitudinal variations to that of SPCAM, but with weaker precipitation intensity near the equator. 350 Precipitation of both NNCAM and CAM5 agrees with SPCAM in global annual averages (Figure 11a). NNCAM results even come closer with SPCAM than CAM5 in boreal summer (Figure 11f). In ANN and DJF average, precipitation of NNCAM is underestimated over tropical continents but overestimated in subtropical land regions compared with SPCAM targets (Figure   11d, e). When the ocean areas are included, the SPCZ in NNCAM is excessively separated from the ITCZ in boreal winter. Its The MJO is characterized by the eastward propagation of deep convective structures with an average phase speed of around 5 m s -1 along the equator. Generally, it generally forms over the Indian Ocean, strengthens over the Pacific, and weakens in the eastern Pacific due to interaction with cooler SSTs (Madden and Julian, 1971). Figure 14

Conclusions and Discussion
This study investigates the potential of DNN based parameterizations embedded into SPCAM in reproducing long-term climatology and climate variability. We present NN-Parameterization, a group of organized ResMLPs, to emulate convection, cloud processes and radiation in a SPCAM with a realistic global land-ocean distribution. The input variables to the NN-Parameterization include specific humidity, temperature, largescale water vapor and temperature forcings, surface pressure 390 and solar insolation. The output variables of the NN-Parameterization consist of the subgrid tendencies of moisture and temperature as well as radiation fluxes. The output variables are divided into multiple groups, and each group is trained as one target through independent neural network following the same architecture. Both long-term stable and high performance climate simulations are finally obtained in this work. To effectively bridge the host CAM and NN-Parameterization, we have expanded the coupler idea in a earth system model to ensure the flexibility of embedding DNN into GCM. As a result, we can 395 efficiently use DNNs to construct NN-Parameterization that can support NNCAM stable simulation for multi-years. This study is the first attempt to achieve stable climate simulations using a hybrid ML-physical GCM, which is configurated with real land-ocean distributions.
The offline test shows the great skills of the NN-Parameterization in emulating the CRM and CRM radiation in SPCAM.
The overall diabatic heating and drying rates in the NN-Parameterization and SPCAM are in close agreement. When 400 implemented in the host coarse-grided CAM5 to replace the conventional schemes of moist physics and radiation, the NN-Parameterization successes in an extensive long-term stable prognostic simulation and performs well in reproducing the mean vertical structures in temperature and humidity, and the precipitation distributions. The prominent MJO signal and its phase speed are also well captured using our NN parameterization.
Machine learning parameterizations implemented in aqua-planet configurated 3D GCM have been well studied in many 405 previous research. Some faced instability in coupled simulations (Brenowitz & Bretherton, 2019), and some tried to solve such instability through online learning (Rasp, 2020). Some others achieved in long-term stable prognostic simulations with deep fully-connected neural networks (Rasp et al., 2018;Yuval et al., 2021) as well as random forest (Yuval & O'Gorman, 2020).