A multi-year short-range hindcast experiment for evaluating climate model moist processes from diurnal to interannual time scales

We present a multi-year short-range hindcast experiment and its experiment procedure for better evaluating both the mean state and variability of atmospheric moist processes in climate models from diurnal to interannual time scales to 10 facilitate model development. We use the Community Earth System Model version 1 as the based model and performed a suite of 3-day long hindcasts every day starting at 00Z from 1997 to 2012. Three processes – the diurnal cycle of clouds during different cloud regimes over the Central U.S., precipitation and diabatic heating associated with the Madden-Julian Oscillation propagation, and the response of moist processes to sea surface temperature anomalies associated with the El Niño-Southern Oscillation – are evaluated as examples to demonstrate how one can better utilize simulations from this experiment design to 15 gain insights into model errors and their connection to physical parameterizations or large-scale state. This is achieved by comparing the hindcasts with corresponding long-term observations for periods based on different phenomena. These analyses can only be done through this multi-year hindcast approach to establish robust statistics of the processes under well-controlled large-scale environment. Furthermore, comparison of hindcasts to the typical simulations in climate mode with the same model allows one to infer what portion of a model’s climate error directly comes from fast errors in the parameterizations of moist 20 processes. As demonstrated here, model biases in the mean state and variability associated parameterized moist processes usually develop within a few days, and manifest within weeks to affect the simulations of large-scale circulation and ultimately the climate mean state and variability. Therefore, model developers can achieve additional useful understanding of the underlying problems in model physics by conducting a multi-year hindcast experiment. 25 30 https://doi.org/10.5194/gmd-2020-39 Preprint. Discussion started: 15 April 2020 c © Author(s) 2020. CC BY 4.0 License.

model can complement the traditional way of conducting AMIP-type model evaluation. In the present paper, we propose a multi-year short-range hindcast experiment and its experiment protocol for better evaluating both the mean state and variability 65 of atmospheric moist processes in climate models from diurnal to interannual time scales to facilitate model development. We will demonstrate the unique value of diagnosing systematic model errors from diurnal to interannual time scales with this suite of multi-year short-range hindcasts paired with long-term observations, such as from various satellites or from major field programs like the U.S. Department of Energy ARM Program. Process-level understanding can be achieved by comparing hindcasts with observations for periods based on the phenomena of interest rather than the climatological mean state. Three 70 processes -the diurnal cycle of clouds during different cloud regimes at the ARM SGP Site, precipitation and diabatic heating associated with the Madden-Julian Oscillation (MJO), and moist processes response to sea surface temperature (SST) anomalies associated with the El Niño-Southern Oscillation (ENSO) -are evaluated as examples to gain insights into model errors and their connection to physical parameterizations. We also demonstrate that systematic errors in the mean state of moist processes over the global scale are very robust and do not show significant interannual variation in either error magnitudes or 75 patterns over large spatial domains. The remainder of this manuscript is organized into three sections. Section 2 describes the hindcast experiment design, experiments performed and validation datasets. Section 3 presents three examples of how we can better utilize this suite of multi-year short-range hindcasts to evaluate the variability of moist processes over various time scales. Section 4 present a summary.

Model and experiment design
All simulations were conducted with the CAM5 (version cesm1_0_5, FC5 compset, Neale et al. 2012) using the finite volume dynamical core at a horizontal resolution of 0.9° latitude by 1.25° longitude and 30 vertical levels. The land model is the Community Land Model version 4.0 (CLM4) with the same horizontal resolution.

85
The hindcast procedure is based on Ma et al. (2015). We applied the horizontal velocities, temperature, specific humidity and surface pressure from the ERA-Interim Reanalysis (Dee et al. 2011) for the initial atmospheric states. A nudging simulation with CAM5/CLM4 was also performed to acquire other necessary variables (e.g., cloud and aerosol fields), which are not available from the ERA-Interim Reanalysis for the atmospheric initial conditions. The nudging simulation started from January 1 st , 1996 and stop at December 31 st , 2012 with a 6 h relaxation time scale. Land initial conditions are taken from an offline 90 land model simulation forced by reanalysis and observations including precipitation, surface winds, and surface radiative fluxes (N. Viovy 2013, unpublished data). The offline land model simulation started from 1990 to 2012 and we performed five cycles (1990 to 2012) for the offline simulation to allow proper spin-up of the land conditions. The multi-year hindcast experiment is a suite of 3-day long hindcasts starting at 00Z every day for the years of 1997 to 2012 (Figure 1) using the initial conditions obtained from the procedure described above. We concatenated each hindcast from 24-48 (48-72) hours lead time 95 to form a pseudo Day 2 (Day 3) time series of 16-years duration from 1997 to 2012. Day 1 data are not analyzed to minimize the impact of model spin-up (Ma et al. 2013;. We also conducted a 16-year long Atmospheric Model Intercomparison Project (AMIP, Gates 1992) simulation with the same model for the same period. In this AMIP simulation, the state of the atmosphere evolves freely without constraints. Both 100 experiments are prescribed with the National Oceanic and Atmospheric Administration (NOAA) Optimum Interpolation weekly SSTs and sea ice (Reynolds et al. 2002). To compare with high-temporal frequency observations collected at the ARM permanent sites as well as at various major field campaign locations within the simulation period, we have additionally generated output at model timestep (30 minutes interval) in additional to output for the entire global domain. Figure 2 and Table 1 identify their geographical locations and output grids. 105

Comparison datasets
Daily global observational precipitation is adopted from the Global Precipitation Climatology Project Version 1.2 (GPCP, Adler et al. 2003). Absorbed shortwave flux at top of atmosphere (SWAbs), outgoing longwave radiation (OLR) as well as net surface shortwave and longwave fluxes are obtained from Clouds and the Earth's Radiant Energy System (CERES) Energy Balanced And Filled (EBAF) observations (Loeb et al. 2009;Kato et al. 2013, Edition 2.8). Total cloud fraction is from the 110 International Satellite Cloud Climatology Project (ISCCP) D2 dataset (Rossow et al. 1999). Global winds and surface turbulent heat fluxes are from the ERA-Interim Reanalysis. Vertical profiles of cloud fraction at the ARM SGP site are from the ARM Best Estimate (ARMBE; Xie et al. 2010) Active Remote Sensing of Clouds data (ARSCL; Clothiaux et al. 2000Clothiaux et al. , 2001. The available time period of the dataset is listed in Table 2. We interpolated all the datasets onto CAM5's grid for better comparison.

Example analysis 115
Our goal here is to demonstrate the usefulness of the multi-year hindcasts in providing a different perspective on several longstanding moist processes errors in GCMs through three examples.

Cloud regimes at the ARM SGP site
One common application of hindcasts for model evaluation is during major field campaigns where intensive observations are available at very high temporal scale. However, field campaigns are usually confined to a certain short period and cannot 120 determine the robust aspects of certain cloud processes, which are available only from long-term monitoring as provided by satellites or permanent ground-based sites. From over ten years of cloud radar observations at the ARM SGP site, Zhang and Klein (2010)  hindcasts. We use the model cloud fraction for this comparison because the variables for using a radar simulator (Zhang et al. 2018; were not saved at the time the hindcasts were done. This analysis, which cannot be achieved as easily from the usual AMIP simulations, is a more precise way of model parameterization evaluation because it minimizes the impact of erroneous large-scale states on the clouds. This is because the atmospheric large-scale state is closer to observations during 130 each diurnal cycle of the hindcasts than it is in the AMIP simulation. Furthermore, multi-year hindcasts provide a sufficient number of events to make a meaningful comparison with observations so that conclusions from such studies are more statistically robust. In Figure 3, the model overestimates high clouds regardless of cloud regime, even for clear sky condition. For clear sky 135 condition, the model also shows middle-and low-level clouds. One possible explanation is that the deep convection scheme in the model is triggered whenever the convective available potential energy (CAPE) is larger than 70 J kg -1 . During the daytime in the warm seasons, CAPE is usually larger than the threshold and deep convection is easily triggered, resulting in the transport of water vapor and detrainment of cloud condensates. For the shallow convection regime, the model overestimates middle-level clouds by ~4-6% but underestimates shallow clouds by ~10%. For afternoon deep convective cloud regime, the 140 model cannot simulate the transition from shallow to deep convective clouds. The deep convection clearly starts too early from around 11 local time rather than 15 in the afternoon. Also, the model underestimates both shallow and middle-level cloud fraction by ~10%. The model completely misses the nighttime convection regime, and only shows some deep convection starting around noon. The too early afternoon convection and the lack of nocturnal convection over land are common model problems as reported in previous studies (e.g., Dai 2006;Jiang et al. 2006;Covey et al. 2016). We do realize that there are 145 already small errors in the Day 2 large-scale state and they can also contribute to the errors in the simulated cloud fields.
Nevertheless, their impact is still much smaller compared to the errors due to parameterizations deficiencies.
With multi-year hindcasts and long-term cloud observations to build up robust statistics, these comparisons help identify specific cloud regime deficiencies under very similar large-scale meteorological conditions, and scheme developers can further 150 focus on improving specific processes represented in the cloud and convection parameterizations.

Model biases associated with MJO
The MJO Julian 1971, 1972) is the dominant mode of intra-seasonal variability in the tropics. MJO has significant impacts on the global water cycle as it can interact with many weather and climate phenomena . Nevertheless, contemporary GCMs still simulate poor MJO including its weak amplitude and lack of eastward propagation (Jiang et al.2015, 155 Ahn et al. 2017. Recent studies suggest that the instability and propagation of the MJO are regulated by various feedback processes including cloud-radiation and wind-evaporation feedbacks (Sobel and Maloney 2012, 2013; Adams and Kim 2016; Ciesielski et al. 2017). These feedback processes need to be well represented in GCMs in order to produce realistic MJO. A particularly relevant process responsible for the eastward propagation of the MJO is the "pre-conditioning" process consisting of low-level moistening and shallow convective heating structure at the eastern edge of MJO deep convection (e.g., Jiang et 160 al. 2011;Johnson and Ciesielski 2013;Powell and Houze 2013;Xu and Rutledge 2014). This process destabilizes the environment encouraging subsequent development of deep convection.
As each MJO event is unique from one another, one can take advantage of the multi-year hindcasts to composite precipitation, winds and diabatic heating profiles (Q1, Yanai et al. 1973) based on observed MJO phases with the focus on identifying robust 165 model biases associated with the MJO. Figure 4 present the observed composites of November to April 20-100 day band-pass filtered NOAA Interpolated Outgoing Longwave Radiation (OLR) anomalies and horizontal wind anomalies at 850mb from ERA-Interim, as a function of the eight phases of the MJO (Wheeler and Hendon 2004). The observed MJO shows a core of deep convection (center of negative OLR anomalies) over the Indian Ocean around 80°E associated with low-level convergence in winds during Phase 2. The core of deep convection slowly propagates eastward, and the intensity of OLR 170 decreases after the core of MJO crosses over the Maritime Continent and reaches the central Pacific (Phases 6-8). Figure 5 shows composites of November to April precipitation and horizontal wind biases from Day 3 hindcasts, as well as 20-100 day band-pass filtered NOAA Interpolated OLR anomalies, as a function of the eight phases of the MJO. The observed OLR anomalies are superimposed to highlight the location of the MJO at each phase. We find that there is a dry bias in Day 3 hindcasts over the core of deep convection (center of negative OLR anomalies) associated with MJO, and a wet bias to the 175 east over the region of suppressed convection (center of positive OLR anomalies) for all the phases as the MJO moves eastward.
The dry bias is largest over Indian Ocean during Phase 2 with magnitude ~ -6 mm day -1 , and the wet bias is largest over western Pacific during Phase 8 with magnitude ~5-6 mm day -1 . The dry bias is usually attributed to the lack of organized convection in the model (Moncrieff et al. 2017), and the wet bias is consistent with the too frequently triggered deep convection scheme even under suppressed large-scale condition. Further, there is a persistent dry bias over Borneo and part of Sumatra for all the 180 phases indicating a possible local effect of diurnal cycle of convection. The dry bias is more significant during Phase 4 and 5 as the MJO crosses over the Maritime Continent. The 850 mb winds show a biased low-level convergence near the Equator consistent with the excessive precipitation bias to the east over the region of suppressed convection.
During Phases 2 and 3 when the MJO is over the Indian Ocean, the anomalous Q1 profiles reveal that the magnitude of shallow 185 heating is very weak (<0.4 K day -1 ) to the east over the region of suppressed convection between 100°E and 120°E in Phase 2 and the heating is not restricted to low levels between 120°E and 150°E in Phase 3. Instead, there is an anomalous heating associated with deep convection in Phase 3, which is not evident in the observations as indicated from previous studies (Jiang et al. 2011). This suggests that the model fails to simulate the pre-conditioning moistening processes and the gradual transition from shallow to deep convection as MJO propagates.

Variations of moist processes associated with ENSO
Being the leading mode of interannual variability in the tropics and extra-tropics, ENSO has significant impact on both regional and global temperature, circulation, and moist processes through teleconnections. To gain insights into whether or not errors in the response of these fields to SST anomalies can be attributed to parameterization errors or whether errors in the circulation 205 response to SST anomalies also contribute, one can further contrast the multi-year hindcasts with the behavior of a companion AMIP simulation with the same boundary conditions (SST and sea ice). To this end, we first selected several fields to compute their monthly anomalies and then regressed these anomalous fields onto the Nino 3.4 index. Figure 7 shows the regression maps of precipitation, SWAbs, surface net flux (from atmosphere to the surface), and the surface zonal wind stress from observations, Day 2 hindcasts and the AMIP simulation (pattern statistics are shown in Table 2). The motivation for selecting 210 these fields is because the tropical response of precipitation represents the atmospheric diabatic heating that forces circulation anomalies. On the other hand, surface radiation, turbulent heat fluxes, and wind stress provide critical heat and momentum forcings for SST anomalies and govern the ENSO behavior. The performance of these fields from an uncoupled atmospheric GCM is considered to be highly relevant for evaluation when it couples to an ocean model (Sun et al. 2006;Guilyardi et al. 2009). 215 The responses of these fields from Day 2 hindcasts show a better agreement both in the spatial patterns and magnitude with observations compared to the AMIP response (right column in Figure 7). This is especially evident for precipitation, absorbed shortwave flux and zonal wind stress over the Western North Pacific, South Pacific Convergence Zone and Indian Ocean. The remote teleconnections may be chaotic or poorly done by the model, causing a poor simulation in the AMIP mode. The circulation anomalies are well constrained in the hindcasts and the response SST anomalies is much superior. This shows that 220 remote errors are mostly the result of poor circulation on long-term time scales although the poor circulation may be caused by model physics in the first place and deteriorates through feedback processes with time. This is evident as there are still biases in the hindcasts indicating problems from parameterizations in representing those response to SST changes even over local Nino 3.4 region. Surface net flux and zonal wind stress also show a greater change between hindcast and AMIP response compared to 225 precipitation and SWAbs. It is reasonable for the latter two moist processes to show less changes as they are fast processes and the biases associated with model parameterizations usually develop within a few days of model integration Ma et al. 2014). It is also reasonable for zonal wind stress to show greater change as the low-level winds are well constrained for the hindcasts. For surface net heat flux, the errors are contributed from various flux terms including radiation and turbulent heat fluxes, which are affected by both model physics and dynamics. Therefore, the net heat flux shows the 230 lowest spatial correlation and root mean square errors in both hindcasts and the AMIP simulation compared to other fields.

Robustness of systematic errors
One question raised from earlier studies Ma et al. 2014) of the correspondence between short-and longtime-scale errors is whether systematic errors of moist processes show significant interannual variation in the mean state biases. Figure 8 shows the pattern statistics between errors from the individual annual means in the hindcasts or AMIP simulation, 235 and errors in the 16-year mean of the AMIP simulation (the reference fields) for precipitation, total cloud fraction (from the ISCCP cloud simulator), SWAbs, and OLR. Compared to the long-term mean errors in the AMIP simulations, annual mean errors of the individual years for these fields show very similar magnitude in correlation and the normalized spatial standard deviation from the hindcasts at either time lag. This is also the case for individual AMIP years although the correlations and standard deviations show slightly larger spread. Compared to Day 2 hindcasts, the correlations and standard deviations from 240 Day 3 hindcasts are closer to those from the AMIP simulations indicating the bias growth toward the AMIP bias with hindcast lead time. We further find that the magnitude of correlations for annual mean errors between individual hindcast years and the long-term AMIP simulations are not sensitive to the ENSO phase in a given year for these fields. This is also the case if seasonal means are compared (Figures not shown here). These results suggest that mean errors in the moist processes are very robust and do not show significant interannual variations. Indeed, averaging the hindcast errors over many years (indicated by 245 "2" or "3" in Figure 8) only slightly improves the agreement with the AMIP reference field. Thus, one can identify robust model errors in the mean state from only one year of hindcasts with enough ensemble members. A similar conclusion with multiple years of short AMIP-type simulations was also suggested by Wan et al. (2014). These results suggest that relatively short simulations will be effective at reducing the systematic moist process errors of a very high-resolution climate model which is too expensive to regularly perform multi-year simulations. 250 It is also of interest to compare the absolute magnitude of errors in individual years to that of the long-term systematic error in the AMIP simulation. To do so, we calculated the annually cloud error metrics proposed in Klein et al. (2013) in Figure 9.
These metrics are scalar measures of performance in simulating the space-time distribution of several cloud measures, with better performance indicated by smaller E values. It is not surprising that the hindcasts show better performance in all the cloud 255 metrics as the large-scale circulation and state are not too far from the reanalysis. This is also true for the interannual variations in global mean cloud radiative effect at the top of the atmosphere (Figure 10). We find that all the metrics and the cloud https://doi.org/10.5194/gmd-2020-39 Preprint. Discussion started: 15 April 2020 c Author(s) 2020. CC BY 4.0 License. radiative effect show interannual variations indicating that the circulation and state anomalies make a significant contribution to interannual variations although these metrics or errors in the cloud radiative effect are not sensitive to ENSO phase. We further find that there is a larger difference between hindcasts and AMIP in the total cloud amount error metric (ETCA) implying 260 that errors in the large-scale circulation and state make a larger contribution to errors in ETCA than cloud radiative properties ( Figure 10).

Summary
In this study, we propose a multi-year short-range hindcast experiment and its experiment protocol for better evaluating both the mean state and variability of atmospheric moist processes in climate models from diurnal to interannual time scales to 265 facilitate model development. We also demonstrate that one can obtain unique understanding into robust GCM systematic moist processes errors by diagnosing these processes with corresponding observations for periods based on different phenomena. The present study also demonstrates that it is now feasible to systematically evaluate climate model moist processes in deterministic weather-prediction mode just as the moist processes in weather prediction models are often evaluated in analyses or re-analyses (Jakob 1999, Yang et al. 2006. Results from the multi-year hindcasts also suggest that systematic errors in the mean state of moist processes are very robust and do not show significant interannual variation in error magnitude or patterns over large spatial domain. 280 In addition to processes indicated above, further studies on monsoon variability (e.g., South American and Asian monsoons, Chen et al. 2019), land-atmosphere interactions (Phillips et al. 2017), or detailed MJO studies with longer hindcast duration (Klingaman et al. 2015), are currently being explored with these hindcasts. Indeed, GCMs usually perform more poorly for climate variability than for mean state. As demonstrated in previous studies and here, model mean biases associated 285 parameterized moist processes usually develop within a few days and manifest within weeks to affect the simulations of largescale circulation and ultimately the climate mean state and variability. Therefore, model developers can achieve useful understanding of the underlying problems in model physics by conducting multiple years of hindcasts as demonstrated in the present work. Although newer version of the CAM and CLM is now available (CAM6/CLM5), similar systematic errors associated with moist processes remain present in the latest model version. Therefore, it is still worthwhile to study these 290 hindcasts. In the meantime, we also plan to conduct another suite of multi-year hindcasts with the latest DOE Exascale Energy Earth System Model (E3SM, Golaz et al. 2019). The hindcasts will also be available to the community once available.
Finally, the multi-year hindcast approach presented in this study is also intended as one of the experiment protocols which will be used in the Diurnal Cycle of Precipitation (DCP, https://portal.nersc.gov/cfs/capt/diurnal/) model intercomparison project 295 under the Global Energy and Water cycle Exchanges (GEWEX) Global Atmospheric System Studies (GASS). This project is aimed to understand what processes control the diurnal and sub-diurnal variation of precipitation over different climate regimes in observations and in models. The project will also identify the deficiencies and missing physics in current GCMs to gain insights for further improving the parameterization of convection.

Code and data availability 300
The model code is the CESM1 (cesm1_0_5, FC5 compset, F09_F09 resolution) and is available over http://www.cesm.ucar.edu/models/cesm1.0/. All model necessary input files are available over https://svn-ccsm-     Table 1 for detailed longitude and latitude information.