Short ensembles : an efficient method for discerning climate-relevant sensitivities in atmospheric general circulation models

This paper explores the feasibility of an experimentation strategy for investigating sensitivities in fast components of atmospheric general circulation models. The basic idea is to replace the traditional serial-in-time long-term climate integrations by representative ensembles of shorter simulations. The key advantage of the proposed method lies in its efficiency: since fewer days of simulation are needed, the computational cost is less, and because individual realizations are independent and can be integrated simultaneously, the new dimension of parallelism can dramatically reduce the turnaround time in benchmark tests, sensitivities studies, and model tuning exercises. The strategy is not appropriate for exploring sensitivity of all model features, but it is very effective in many situations. Two examples are presented using the Community Atmosphere Model, version 5. In the first example, the method is used to characterize sensitivities of the simulated clouds to time-step length. Results show that 3-day ensembles of 20 to 50 members are sufficient to reproduce the main signals revealed by traditional 5-year simulations. A nudging technique is applied to an additional set of simulations to help understand the contribution of physics–dynamics interaction to the detected time-step sensitivity. In the second example, multiple empirical parameters related to cloud microphysics and aerosol life cycle are perturbed simultaneously in order to find out which parameters have the largest impact on the simulated global mean top-of-atmosphere radiation balance. It turns out that 12-member ensembles of 10-day simulations are able to reveal the same sensitivities as seen in 4-year simulations performed in a previous study. In both cases, the ensemble method reduces the total computational time by a factor of about 15, and the turnaround time by a factor of several hundred. The efficiency of the method makes it particularly useful for the development of high-resolution, costly, and complex climate models.


Introduction
Climate, by definition, is the statistical characterization of the state of the earth's atmosphere, land, and ocean on time scales longer than a few months (e.g., IPCC, 2013).Because of the strong natural variabilities resulting from non-linear interactions between relevant processes, atmospheric general circulation models (AGCMs), used in sensitivity studies, need to be integrated for multiple years, usually decades, in order to obtain statistically meaningful and robust signals.However, state-of-the-art AGCMs are computationally expensive to integrate when resolution is high, or when a large number of simulations are needed.Recent examples of such studies include those of Wehner et al. (2013), Zhao et al. (2013), Yang et al. (2012Yang et al. ( , 2013)), and Qian et al. (2014), to name a few.
The high computational costs have motivated researchers to look for alternative methods to facilitate extracting signals from noise in climate models.For example, Kooperman et al. (2012) showed that anthropogenic aerosol indirect effects could be estimated from substantially shorter simulations if temperature and horizontal winds in the AGCM are relaxed (nudged) towards prescribed conditions to reduce variability in those fields, while allowing the model to calculate the responses to aerosol emissions in cloud, water, and aerosol fields.For more general applications, however, nudging can hide sensitivities in the constrained fields, as well as in feedback that involves these quantities.
In the climate modeling community, it has been widely recognized that fast processes (those that produce a model response to a perturbation on a timescale of days for simulations with fixed sea-surface temperature, such as those related to clouds) are important sources of discrepancies between the observed and simulated climate, and between the future climate projections provided by different models (Cess et al., 1990;Colman, 2003;Soden and Held, 2006;Ringer et al., 2006;Dufresne and Bony, 2008).In addition, it has been noticed that, when climate models are used in shortrange weather-prediction experiments starting from realistic initial conditions, many of the key model biases form within a very short time period, i.e., hours to a few days (see e.g., Phillips et al., 2004;Williamson et al., 2005;Boyle et al., 2005;Rodwell and Palmer, 2007;Martin et al., 2010;Xie et al., 2012;Ma et al., 2013Ma et al., , 2014;;Klocke and Rodwell, 2013).There has been increasing interest in running climate models in weather-prediction mode to diagnose model errors.The most well-known examples include the Climate Change Prediction Program -Atmospheric Radiation Measurement (CCPP-ARM) Parameterization Testbed (CAPT) initiative of the US Department of Energy (Phillips et al., 2004), and the phase II experiment of the Transpose Atmospheric Model Intercomparison Project (Transpose-AMIP II, Williams et al., 2013) that was run alongside phase 5 of the Coupled Model Intercomparison Project (CMIP5, Taylor et al., 2012).
In this study, we demonstrate that the important role of fast processes in the climate system can be exploited in more general ways to provide an alternate strategy to efficiently carry out model-sensitivity experiments and tuning exercises.The basic idea is to replace the traditional serial-in-time longterm climate integrations by generating representative ensembles of shorter simulations (details are discussed in later sections).Significant gain in computational efficiency can be expected for two reasons: firstly, unlike a serial-in-time multi-year simulation, the ensemble of realizations can be integrated simultaneously.This introduces an additional dimension of parallelism to better exploit modern supercomputer systems that consist of order 10 5 -10 6 cores, leading to substantial reduction of the turnaround time in sensitivity experiments.Secondly, in comparison to a long-term integration which can be understood as an ensemble with autocorrelated realizations, the use of independent members increases the effective sample size.One can thus expect equally robust statistics to be obtained from a smaller number of simulation days, resulting in a reduction of total CPU time.
While the Transpose-AMIP-type evaluation focuses on comparison with observations to understand the initial development of model biases, in this study we are interested in model sensitivities to parametric and structural changes near the model's equilibrium climate.Using the Community Atmosphere Model, version 5 (CAM5, Neale et al., 2010, cf. Sect.2), we present two examples to elaborate the ensemble strategy and evaluate its effectiveness (Sects. 3 and 4).Further discussions and conclusions are given in Sect. 5.

Model and initial conditions
The climate model used here is CAM5.1 (Neale et al., 2010), with a finite volume dynamical core that uses the numerical schemes of Lin and Rood (1996) and Lin (2004) to represent the hydrostatic adiabatic fluid dynamics and large-scale tracer transport.Deep convection is treated with the massflux-type parameterization of Zhang and McFarlane (1995), with further modifications by Richter and Rasch (2008) and Neale et al. (2008).Shallow convection is parameterized as in Park and Bretherton (2009).Large-scale condensation and stratiform cloud fraction are handled by the parameterization of Park et al. (2014).The stratiform cloud microphysics is represented by a two-moment scheme that explicitly calculates the mass and number concentrations of cloud liquid, cloud ice, rain, and snow (Morrison and Gettelman, 2008;Gettelman et al., 2008Gettelman et al., , 2010)).The vertical transport of heat, momentum, and moisture by turbulent eddies is represented following the work of Bretherton and Park (2009).Solar and terrestrial radiation calculations are performed using the Rapid Radiative Transfer Model for GCMs (RRTMG, Iacono et al., 2008;Mlawer et al., 1997).The life cycle of aerosols is represented with a comprehensive module that describes the aerosol size distribution with three log-normal modes (MAM3, Liu et al., 2012).Land surface processes, including hydrological and biogeochemical processes, dynamical vegetation and biogeophysics, are handled by the Community Land Model, version 4 (CLM4, Lawrence et al., 2011).A detailed description of the CAM5 model can be found in Neale et al. (2010).All simulations in the present paper used the tropospheric version of CAM5 with 30 vertical layers, at a horizontal resolution of 1.9 • latitude × 2.5 • longitude.The default model time step for this configuration is 30 min.
As mentioned in the introduction, the motivation for exploring a new experimentation strategy is to reduce the wallclock time and CPU time spent on model integration.We thus intend to perform as few as possible simulations that are as short as possible.This requires the ensemble members to be appropriately sampled, so that the ensemble average is representative of the long-term climate.Based on the viewpoint that climate is the "average weather", we initialize individual realizations using atmospheric states representing different synoptic patterns of the large-scale circulation.The source of such initial conditions could be global weather analyses, as done in CAPT and Transpose-AMIP, which would require interpolation and adjustments to take into account the different grids and topography used for the analysis data and by the CAM5 model.Initialization of the aerosol module would remain an issue, because detailed information about aerosol concentrations in different size ranges is not normally provided by the analyses.Considering that our focus here is not to compare with observation, it is not necessary to have a realistic initialization of model state variables that matches particular meteorological events.We therefore chose to use initial conditions generated by the GCM itself, using an inexpensive model configuration.For the application examples discussed in the present paper, the CAM5 model was integrated for 20 years at a 1.9 • latitude × 2.5 • longitude resolution using the default choices for model parameters and model time step, driven by annually cycled monthly mean climatological sea-surface temperature distributions and seaice concentrations.Emissions of aerosols and reactive gases are specified by their values in the year 2000 following Lamarque et al. (2010).Model state variables, including the meteorological fields, aerosol concentrations, and land surface variables, are archived at 5-day intervals in the "native" format of the initial condition files.
This initialization procedure requires minimal effort because output from a prior simulation can be used directly in the ensemble simulations, or conveniently interpolated for studies that involve different spatial resolutions.Also, the same archive can be used in different sensitivity studies.When the model configuration (e.g., parameters, resolution, or time step) changes, the simulated climate can change accordingly, in which case the integrations starting from the aforementioned initial conditions will need some time to adjust before entering the new quasi-equilibrium.Identification of the spin-up phase is one of the issues that we attempt to address in the following sections, and we will demonstrate that the initial adjustment is indeed short in the examples shown in Sects.3 and 4.

Example I: time-step sensitivity of clouds
In this section, the utility of the ensemble approach is demonstrated using simulations in which the model time step used in CAM5 is reduced from the default value of 30 to 4 min.We focus on cloud and precipitation-related model variables.Our example is motivated by the desire (in a separate study with a focus on scientific issues) to characterize the time-step sensitivities of the atmospheric water cycle in CAM5, and to assess numerical convergence.Since we also want to distinguish different climate regimes, the analysis here focuses on a particular season (boreal winter) to avoid the additional complexity introduced by seasonal variations in geographic locations.
For evaluation purposes, two simulations (with 30 and 4 min time steps respectively) were first performed for 5 years (plus a 1-year spin-up) in the conventional way.Sensitivities in the simulated climate were identified by comparing fields from the multi-year December-January-February (DJF) averages of the two simulations.This pair of simulations is hereafter regarded as the "reference" simulations.
We also performed ensembles of short simulations with the two time-step lengths.Within an ensemble, all realizations were assigned a start time of 1 January using different initial conditions drawn from the 20-year archive (cf.Sect.2) of dates in the DJF season that were at least 10 days apart, in order to ensure independence and representativeness.The same set of initial conditions were used for the 30 and 4 min ensembles.
Other aspects of the simulation set-up were identical for the long-term and short ensemble simulations.For example, both were forced by yearly cycled climatological sea-surface temperature and sea-ice concentrations, as well as the year 2000 emissions for aerosols and their precursors (Lamarque et al., 2010).
In the analysis, we concentrate on the geographical distribution and radiative properties of clouds.We focus on whether i. ensemble averages of a moderate number of realizations can reasonably represent the long-term climate; ii. integrations of a few days are sufficient to get rid of the spin-up phase; iii. responses to time-step change detected with the ensemble approach agree quantitatively well with those revealed by the conventional long-term simulations; iv. there is a clear gain of computational efficiency.
For evaluation purposes, the ensemble simulations were integrated for a period of 20 days.

Representing the mean state
Our evaluation of the ensemble strategy starts with question I by examining the mean state simulated with the default time step (30 min).Because the initial conditions were generated using the same model configuration and experimental setup (in other words, sampled from the same climate), there is no spin-up issue here.The question is how many realizations are needed to average out the "weather noise" and obtain the "climate signal".In Fig. 1, the vertically integrated total cloud amount is shown for the two different experimentation methods, where the 5-year DJF average in the long simulation is compared with the day-1 average of a 50-member ensemble.The agreement with the reference simulation is remarkable.High cloud fractions associated with the Intertropical Convergence Zone (ITCZ), the South Pacific Convergence Zone (SPCZ), midlatitude storm tracks, and high-latitude regions in the winter hemisphere are well captured.Less frequent occurrences of clouds over the subtropical ocean high pressure systems and desert areas are also well represented.It is worth noting that the moderately sized ensemble not only reproduces these basic features of the geographical distribution, but also captures the magnitude of total cloud amount quite well at most grid points.The same can be said for other key aspects of the model climate, as can be seen in Table 1.Global mean values in the 5-year simulation and the 1-day ensemble differ only by a few percent at most, and the pattern correlations are high (> 0.9).This suggests that, at least for the default model configuration, the ensemble average of very short integrations is a good representation of the long-term climatology.In the following, we demonstrate that the ensemble simulations are also able to accurately reproduce the response of cloud-related fields to parameterization changes (in this case, time-step length).

Fast response of clouds
Our ensemble simulations with 4 min time steps are initialized using snapshots of atmospheric and land-surface conditions, sampled from the model climate resulting from a 30 min time step.Because cloud processes operate on short timescales, we expect quick responses to changes in model time step.This is indeed observed in the simulations.
The 5-year simulations indicate that a reduction from a 30 to 4 min time step leads to an overall increase of total cloud cover.The most prominent signals occur in the shallow cumulus regions where the absolute changes range from about 10 % to more than 40 % in boreal winter (Fig. 2a), corresponding to typical relative changes from 20 % to more than 100 % (not shown).Such characteristic patterns are apparent in the ensemble simulations on the first model day (Fig. 2b).Although the differences are somewhat smaller than the 5year DJF averages, statistical tests suggest they are significant at the 95 % confidence level.By the third day, the magnitudes of the differences between 4 and 30 min simulations are close to those seen in the 5-year average.The ensembles simulations can also capture changes in vertical structures.Figure 3 shows the zonally averaged stratiform cloud-ice mass concentration as an example.According to the 5-year simulations (Fig. 3b), a shorter time step leads to higher ice concentrations throughout the troposphere.The largest increases occur in the lower levels over the storm tracks, and in the tropical upper troposphere between 300 and 400 hPa, where deep convection detrains condensate into the environment.There is a secondary center of large increase near 150 hPa, corresponding to frequent homogeneous ice nucleation.The close resemblance between the 5-year DJF average (Fig. 3b) and the day-3 average of the ensemble results (Fig. 3c) indicates that the characteristic distributions of cloud ice are well established within a couple of model days.We reinforce the conclusion of fast spin-up by showing the day-to-day variation of global mean cloud cover (Fig. 4a), vertically integrated liquid and ice water path (LWP and IWP, Fig. 4b-c), long-wave and shortwave cloud forcing (LWCF and SWCF, Fig. 4d-e IWP (Fig. 4a-c) are more sensitive to the time step change than SWCF, LWCF and PRECL are (Fig. 4d-f).On the other hand, although individual members from the 30 min and 4 min ensembles can have the same globally averaged 345 SWCF, LWCF, or PRECL, the 95 % confidence intervals of the ensemble mean do not overlap on any day during the integration period, suggesting that the time step sensitivities are nevertheless statistically significant.

Ensemble size 350
So far we have shown the effectiveness of the short simulations using results from 50-member ensembles.We now demonstrate the robustness of the method and discuss the choice of ensemble size.The essence of the experimentation method we propose in this paper is to approximate the 355 long-term temporal average by the ensemble average over a short period.The accuracy of this approximation naturally depends on the ensemble size and properties of the state variable in question.averages, as well as the year-to-year variations, are shown on the right-hand part of each panel for comparison.Figure 4 indicates that, during the 20-day integration period of the ensemble simulations, there is no obvious trend either in the 50-member averages of the 30 and 4 min ensembles or in their differences; in addition, the ensemble averages agree reasonably well with the 5-year averages.Therefore, for detecting fast changes in cloud properties and distribution, it is sufficient to perform simulations that are only a few days in length.The additional computing time spent on longer integration does not provide significantly more information.
Figure 4 also provides a quantification of the time-step sensitivity in the depicted variables with respect to their natural variability.Comparing differences between the 4 and 30 min ensemble averages with the ensemble spreads, one can conclude that the total cloud fraction, LWP and IWP (Fig. 4a-c) are more sensitive to the time-step change than SWCF, LWCF and PRECL are (Fig. 4d-f).On the other hand, although individual members from the 30 min and 4 min ensembles can have the same globally averaged SWCF, LWCF, or PRECL, the 95 % confidence intervals of the ensemble mean do not overlap on any day during the integration period, suggesting that the time-step sensitivities are nevertheless statistically significant.

Ensemble size
So far we have shown the effectiveness of the short simulations using results from 50-member ensembles.We now demonstrate the robustness of the method and discuss the choice of ensemble size.The essence of the experimentation method we propose in this paper is to approximate the longterm temporal average by the ensemble average over a short period.The accuracy of this approximation naturally depends on the ensemble size and properties of the state variable in question.

Global averages
In Fig. 5, the accuracy of estimated global averages is analyzed for the ice water path and large-scale precipitation rate.Results are shown for 10, 20, 50, 90, and 180 ensemble members.At each ensemble size, the day-to-day variability of the ensemble mean daily average is indicated by the vertical extent of a filled box, with its top and bottom showing the maximum and minimum values during the 20-day simulation period.The 20-day averages are denoted by the black dot in each box.Based on the conclusion drawn from Fig. 4 about fast spin-up, it is reasonable to assume when a sufficiently large number of realizations are obtained, the ensemble mean values averaged over 20 days will indicate the longterm climatological mean within a small uncertainty induced by natural variability.Thus, the vertical size of a colored box in Fig. 5 can be used as a measure of approximation error in the global averages estimated from a single-day simulation at the corresponding ensemble size.In panels Fig. 5b and d, the 4 / 30 min differences are normalized by the 20-day average of the 180-member ensemble mean, in order to indicate the uncertainties in relative terms.
Figure 5 conveys several messages.First, the 20-day averages change very little with ensemble size (Fig. 5a and c), suggesting that the sampling method is representative in  IWP (Fig. 4a-c) are more sensitive to the time step change than SWCF, LWCF and PRECL are (Fig. 4d-f).On the other hand, although individual members from the 30 min and 4 min ensembles can have the same globally averaged 345 SWCF, LWCF, or PRECL, the 95 % confidence intervals of the ensemble mean do not overlap on any day during the integration period, suggesting that the time step sensitivities are nevertheless statistically significant.

Ensemble size 350
So far we have shown the effectiveness of the short simulations using results from 50-member ensembles.We now demonstrate the robustness of the method and discuss the choice of ensemble size.The essence of the experimentation method we propose in this paper is to approximate the 355 long-term temporal average by the ensemble average over a short period.The accuracy of this approximation naturally depends on the ensemble size and properties of the state variable in question.capturing the impact of time-step change.Second, as the number of independent realizations increases, variances in the daily average decrease, producing more accurate approximations of the long-term climatology (Fig. 5a and c).Third, different model variables are associated with different variability, thus require different numbers of realizations.Last but not least, good estimates of the time-step sensitivities can be obtained with rather small ensembles.With 20 independent members, the global mean IWP ( PRECL) calculated from a 1-day simulation agrees within 10 % (25 %) with the 180-member 20-day average (Fig. 5b and d).Similar accuracies are obtained in our experiments for the other variables shown in Fig. 4 for global averages.To capture regional differences, some highly variable fields may need more realizations, as discussed below.

Climate regimes
When assessing model sensitivities, it is often necessary to examine not only global averages, but also regional features and climate regimes.Because clouds are highly variable in their occurrence and properties, regional patterns are sometimes difficult to detect due to the low signal-to-noise ratio.For example, Kooperman et al. (2012) showed that to get a clear signature of the anthropogenic aerosol indirect effect, it is necessary to run CAM5 for multiple decades in conventional climate simulations.
Figure 6 shows the SWCF changes ( SWCF) caused by a reduction of model time step in the 5-year integrations (Fig. 6a) and, on day 3, in the ensemble simulations (Fig. 6b  and c).Both methods reveal a systematic increase of SWCF in the trade cumulus regions, while the reduction of cloud forcing in the ITCZ and SPCZ are more clearly seen in the ensemble results.It is remarkable that the convergence zones emerge clearly in Fig. 6b with only 50 ensemble members, a result attributable to our initialization method that uses the same set of initial conditions for the 4 and 30 min simulations.In the early stage of the integration (first ∼ 5 days), the large-scale environmental conditions remain similar in each pair of ensemble members, resulting in the synoptic systems and convective activities occurring at similar locations, thus avoiding strong noise in SWCF associated with synoptic scale variabilities in the circulation.
In Fig. 6b, SWCF, in the shallow cumulus and deep convection regions have similar magnitudes, but those in the latter regime do not pass the statistical test because of the large natural variability of deep convection in the convergence zones.The evaluation procedure can be made more robust using regime compositing, e.g., by assessing the SWCF changes over tropical (20 • S-20 • N) ocean grid points, where the convective precipitation is important (in this case, defined to be where the convective precipitation rate exceeds 3 mm day −1 ). Figure 7 indicates that, for such a "deep convection SWCF", ensembles of 20 members are sufficient to distinguish the difference between the 4 and 30 min simulations.For the purpose of verification, we performed additional simulations and present them in Fig. 6c.The 360member results confirm that the SWCF patterns and magnitudes detected by the 50-member ensembles are not incidental.

Combining ensembles with nudging
The nudging technique has been repeatedly used in model evaluation and intercomparison studies as a method for constraining model meteorology, reducing uncertainties induced by natural variability, and facilitating comparison with observations (e.g., Jeuken et al., 1996;Feichter and Lohmann, 1999;Machenhauer and Kirchner, 2000;Ghan et al., 2001;Kooperman et al., 2012).Here, we briefly show that nudging can be applied in combination with ensembles to assess the contribution of physics-dynamics interaction to the model's time-step sensitivity.
Two sets of ensemble simulations, each with 50 members, were performed with the 30 and 4 min time steps, respectively, with the horizontal wind and temperature relaxed   and c).Both methods reveal a systematic increase of SWCF in the trade cumulus regions, while the reduction of cloud forcing in the ITCZ and SPCZ are more clearly seen in the ensemble results.It is remarkable that the convergence zones emerge clearly in Fig. 6b with only 50 ensemble members, a result attributable to our initialization method that uses the same set of initial conditions for the 4 min and 30 min simulations.In the early stage of the integration (first ∼ 5 days), the large-scale environmental conditions remain similar in each 420 pair of ensemble members, resulting in the synoptic systems and convective activities occurring at similar locations, thus avoiding strong noise in ∆SWCF associated with synoptic scale variabilities in the circulation.
In Fig. 6b, ∆SWCF in the shallow cumulus and deep con- vection regions have similar magnitudes, but those in the latter regime do not pass the statistical test because of the towards those from the unconstrained 30 min time-step simulations.Each pair of control (30 min time step) and sensitivity (4 min) experiments starting from the same initial conditions were nudged to the same temperature and wind fields, while different pairs were relaxed towards different largescale conditions.As in Kooperman et al. (2012), a 6 h relaxation time was used.Figure 8 compares the globally/regionally averaged total cloud fraction, IWP, and SWCF in the free-running and nudged simulations.SWCF in the deep and shallow convection regions are presented separately because the two regimes are associated with opposite sensitivities to time step (cf.Fig. 6).As expected, the unconstrained and nudged 30 min simulations give very similar results.The ensemble averages are not distinguishable in a statistical sense; the 8 H. Wan large natural variability of deep convection in the convergence zones.The evaluation procedure can be made more robust using regime compositing, e.g., by assessing the SWCF 430 changes over tropical (20 • S-20 • N) ocean grid points where the convective precipitation is important, in this case defined to be where the convective precipitation rate exceeds 3 mmday −1 .Figure 7 indicates that for such a "deep convection SWCF", ensembles of 20 members are sufficient to 435 distinguish the difference between the 4 min and 30 min simulations.For the purpose of verification, we performed additional simulations and present them in Fig. 6c.The 360member results confirm that the ∆SWCF patterns and magnitudes detected by the 50-member ensembles are not inci-440 dental.

Combining ensembles with nudging
The nudging technique has been repeatedly used in model evaluation and intercomparison studies as a method for constraining model meteorology, reducing uncertainties induced 445 by natural variability, and facilitating comparison with observations (e.g., Jeuken et al., 1996;Feichter and Lohmann, 1999;Machenhauer and Kirchner, 2000;Ghan et al., 2001;Kooperman et al., 2012).Here we briefly show that nudging can be applied in combination with ensembles to assess the 95 % confidence intervals are comparable, and the ensemble spreads are also similar.The 4 min simulations, on the other hand, are significantly different.When wind and temperature are constrained, the differences between 4 and 30 min simulations are reduced by about 30 % for the variables shown in the figure, suggesting that fast interactions (feedback) between resolved dynamics and parameterized physics increase the time-step sensitivity of the CAM5 model.

Computational efficiency
The results presented above provide clear answers to the questions posed at the beginning of this section.To detect time-step sensitivities in cloud-related fields, it is sufficient to perform 20 to 50 independent 3-day simulations.The ensemble method reveals signals that agree well with those detected by 5-year simulations performed in the traditional way, but costs substantially less in terms of total CPU time, and dramatically less in terms of the experiment "completion time" in situations where there are more processors available than a single job can use effectively or is allowed to use without a long queuing time, and many realizations can be run simultaneously.Our experience showed that, on the Yellowstone supercomputer (Computational and Information Systems Laboratory, 2012) at the National Center for Atmospheric Research (NCAR) Computational and Information System Lab (CISL), a 5-year simulation with a 4 min time step typically takes about 4 to 7 days of wall-clock time to finish with 64 processes running in parallel (the actual duration depends on the amount of model output, as well as traffic in the queuing system).For the ensemble simulations, a set of 50 3-day simulations usually takes less than 20 min to finish, counting from the instant when the jobs are submitted until the point at which the last job is completed, resulting   vestigation of time step sensitivity.In the next section, we use an additional example to show that the method can also be very useful in other sensitivity studies.
4 Example II: parametric sensitivity of the global mean top-of-atmosphere radiation balance 515 The parameterization schemes of sub-grid scale processes in AGCMs include various empirical, uncertain constants whose values are often adjusted to obtain desired radiation balance at the top of the model atmosphere (TOA), and to achieve good fidelity when evaluated against observations 520 (e.g., Mauritsen et al., 2012;Golaz and Levy, 2013).There is a large volume of literature discussing the sensitivities of model behavior to empirial parameters.In the context of global climate change, there is also increasing interest in assessing the impact of such parameters on the uncertainties in 525 future climate projections (e.g., Murphy et al., 2004;Stainforth et al., 2005;Collins et al., 2006).
Because there are a large number of adjustable parameters in AGCMs, and many of them have wide ranges of possible values, systematic investigations of model sensitivity in-530 evitably require numerous simulations.Earlier studies that varied the value of one parameter at a time (e.g., Lohmann and Ferrachat, 2010) only covered very small portions of the full parameter space.In recent years, the use of advanced sampling approaches such as Latin hypercube (McKay et al.,535 1979) and quasi-Monte Carlo method (Caflisch, 1998) have allowed more extensive explorations of the parameter space (e.g., Lee et al., 2012Lee et al., , 2013;;Zhao et al., 2013).Perturbing multiple parameters simultaneously not only allows for a dramatic reduction of the number of simulations needed for the 540 sensitivity study, but also provides the opportunity to investigate parameter interactions, leading to a more comprehensive understanding of model sensitivity.
On the other hand, even with efficient sampling approaches applied, systematic investigations of parametric 545 sensitivity are still inherently expensive because of the high dimensionality of the parameter space.For instance, to simultaneously perturbe O(10 1 ) parameters, one needs to sample O(10 2 ∼ 10 3 ) points from the parameter space to ensure sufficient coverage.Performing long-term climate sim-550 ulations with this many model configurations requires a substantial amount of computer time.In this section, we demonstrate that there are circumstances in which a very good characterization of the parametric sensitivity can be obtained with small ensembles of short integrations, resulting in a signifi-555 Figure 8.Comparison of the free-running (F) and nudged (N) simulations performed with 30 min (blue) and 4 min (green) time steps.Meanings of the whiskers, boxes, and hinges are the same as in Figs. 4 and 7.Each ensemble consists of 50 independent members.In the nudged simulations, temperature and horizontal wind were relaxed, using a nudging time scale of 6 h towards those from the unconstrained simulations that used 30 min time steps.Panels (a) and (b) show the globally averaged, vertically integrated total cloud fraction and ice water path, respectively.Panels (c) and (d) show the shortwave cloud forcing (SWCF) averaged over the shallow and deep convection regions.SWCF associated with shallow convection is the average over ocean grid points between 30 • N and 30 • S, where the frequency of occurrence of shallow convection is larger than 0.5, and the daily mean convective precipitation rate is lower than 1 mm day −1 .SWCF associated with deep convection is calculated in the same way as in Fig. 7 (cf.Sect.3.3.2).All results are shown for the third simulation day. in a reduction of turnaround time by a factor of several hundred.Such a fast turnaround will be particularly helpful when additional simulations are conducted with varied model configurations to identify the source of the time-step sensitivity, and when even smaller time steps are used to assess the convergence properties of the model behavior.
From the results above, we conclude that the ensemble method, as applied, is both effective and efficient for the investigation of time-step sensitivity.In the next section, we use an additional example to show that the method can also be very useful in other sensitivity studies.

Example II: parametric sensitivity of the global mean top-of-atmosphere radiation balance
The parameterization schemes of sub-grid scale processes in AGCMs include various empirical, uncertain constants whose values are often adjusted to obtain desired radiation balance at the top of the model atmosphere (TOA), and to achieve good fidelity when evaluated against observations (e.g., Mauritsen et al., 2012;Golaz and Levy, 2013).There is a large volume of literature discussing model sensitivities to empirical parameters.In the context of global climate change, there is also increasing interest in assessing the impact of such parameters on the uncertainties in future climate projections (e.g., Murphy et al., 2004;Stainforth et al., 2005;Collins et al., 2006).
Because there are a large number of adjustable parameters in AGCMs, and many of them have wide ranges of possible values, systematic investigations of model sensitivity inevitably require numerous simulations.Earlier studies that varied the value of one parameter at a time (e.g., Lohmann and Ferrachat, 2010) only covered very small portions of the full parameter space.In recent years, the use of advanced sampling approaches such as Latin hypercube (McKay et al., 1979) and quasi-Monte Carlo method (Caflisch, 1998) have allowed more extensive explorations of the parameter space (e.g., Lee et al., 2012Lee et al., , 2013;;Zhao et al., 2013).Perturbing multiple parameters simultaneously not only allows for a dramatic reduction of the number of simulations needed for the sensitivity study, but it also provides the opportunity to investigate parameter interactions, leading to a more comprehensive understanding of model sensitivity.
On the other hand, even with efficient sampling approaches applied, systematic investigations of parametric sensitivity are still inherently expensive because of the high dimensionality of the parameter space.For instance, to simultaneously perturbe O(10 1 ) parameters, one needs to sample O(10 2 ∼ 10 3 ) points from the parameter space to ensure sufficient coverage.Performing long-term climate simulations with that many model configurations requires a substantial amount of computer time.In this section, we demonstrate that there are circumstances in which a very good characterization of the parametric sensitivity can be obtained with small ensembles of short integrations, resulting in a significant reduction in the computational cost.

Reference simulations
A recent study by Zhao et al. (2013) investigated the sensitivity of TOA radiative fluxes in present-day climate simulations to the values of 16 parameters in CAM5.The 16 parameters included 5 adjustable constants related to stratiform cloud microphysics (indices 1-5 in Table 2), 3 parameters related to the physical properties of aerosols (indices 6-8 in Table 2), and 8 scale factors for aerosol emissions (indices 9-16 in Table 2).To efficiently explore the high-dimensional parameter space, the quasi-Monte Carlo sampling method (Caflisch, 1998)   terms of sample dispersion.From the 16-dimensional parameter space, 256 sample points were drawn.Each sample point corresponds to one set of values for the 16 parameters, to which we hereafter refer as a "parameter combination".For each parameter combination, an AMIP (Atmospheric Model Intercomparison Project, Gates et al., 1998) simulation was conducted during the years 2000 to 2004.The average of the last 4 years (2001)(2002)(2003)(2004) was used in their sensitivity analysis to identify which parameters have the largest impact on the model's radiation budget.

Short ensembles
In this study we demonstrate that it is possible to use short ensembles to reproduce the results of Zhao et al. (2013).The same 256 parameter combinations were used in our simulations, while each of their 4 + 1 year AMIP runs were replaced by an ensemble of short simulations started in each month of the year, so that the ensemble averages characterize the annual averages examined in the reference study.As in Sect.3, the initial conditions were taken from a prior longterm simulation.The same set of 12 initial conditions were used for all 256 ensembles.

Spin-up time
It is worth noting that 11 out of the 16 perturbed parameters (indices 6 through 16 in Table 2) directly affect the concentrations of aerosols.How these aerosol-related parameters affect the TOA radiative fluxes is a key question to be answered by the sensitivity analysis.The AMIP simulations of Zhao et al. (2013) were initialized with zero aerosol mass and number concentrations.Such an initialization in CAM5 usually requires a spin-up of several months (or longer) before the aerosol concentrations have evolved and approach the climatological values.Therefore, the first simulation year was discarded in the study of Zhao et al. (2013).
For our ensembles, all simulations were started with aerosol concentrations that were spun-up under the default model configuration and were consistent with the corresponding meteorological fields.This set-up is expected to require shorter spin-up than the zero-aerosol conditions.On the other hand, after the aerosol emissions, solubility factors, and cloud parameters were perturbed (Table 2), we expect an initial adjustment by at least a few days, considering that the global mean aerosol lifetime is about 4 days in MAM3 (cf.Tables 3, 5-8 in Liu et al., 2012).To get a quantitative assessment of the spin-up time, we monitored the time evolution of the aerosol optical depth (AOD) in the ensemble simulations.In Fig. 9, the global mean AOD is shown for the Geosci.Model Dev., 7,[1961][1962][1963][1964][1965][1966][1967][1968][1969][1970][1971][1972][1973][1974][1975][1976][1977]2014 www.geosci-model-dev.net/7/1961/2014/associated with shallow convection is the average over ocean grid points between 30 • N and 30 • S where the frequency of occurrence of shallow convection is larger than 0.5, and the daily mean convective precipitation rate is lower than 1 mmday −1 .SWCF associated with deep convection is calculated in the same way as in Fig. 7 (cf.Sect.3.3.2).All results are shown for the third simulation day.vestigation of time step sensitivity. I the next section, we use an additional example to show that the method can also be very useful in other sensitivity studies.
4 Example II: parametric sensitivity of the global mean top-of-atmosphere radiation balance The parameterization schemes of sub-grid scale processes in AGCMs include various empirical, uncertain constants whose values are often adjusted to obtain desired radiation balance at the top of the model atmosphere (TOA), and to achieve good fidelity when evaluated against observations (e.g., Mauritsen et al., 2012;Golaz and Levy, 2013).There is a large volume of literature discussing the sensitivities of model behavior to empirial parameters.In the context of global climate change, there is also increasing interest in assessing the impact of such parameters on the uncertainties in 525 future climate projections (e.g., Murphy et al., 2004;Stainforth et al., 2005;Collins et al., 2006).
Because there are a large number of adjustable parameters in AGCMs, and many of them have wide ranges of possible values, systematic investigations of model sensitivity in-530 evitably require numerous simulations.Earlier studies that varied the value of one parameter at a time (e.g., Lohmann and Ferrachat, 2010) only covered very small portions of the full parameter space.In recent years, the use of advanced sampling approaches such as Latin hypercube (McKay et al.,535 1979) and quasi-Monte Carlo method (Caflisch, 1998) have allowed more extensive explorations of the parameter space (e.g., Lee et al., 2012Lee et al., , 2013;;Zhao et al., 2013).Perturbing multiple parameters simultaneously not only allows for a dramatic reduction of the number of simulations needed for the 540 sensitivity study, but also provides the opportunity to investigate parameter interactions, leading to a more comprehensive understanding of model sensitivity.
On the other hand, even with efficient sampling approaches applied, systematic investigations of parametric 545 sensitivity are still inherently expensive because of the high dimensionality of the parameter space.For instance, to simultaneously perturbe O(10 1 ) parameters, one needs to sample O(10 2 ∼ 10 3 ) points from the parameter space to ensure sufficient coverage.Performing long-term climate sim-550 ulations with this many model configurations requires a substantial amount of computer time.In this section, we demonstrate that there are circumstances in which a very good characterization of the parametric sensitivity can be obtained with small ensembles of short integrations, resulting in a signifi- first 60 days.The daily mean values that are averaged over the 256 ensembles are indicated by the thick curve.Variations among the ensemble averages are shown by the vertical bars, with the lower and upper ends indicating the minimum and maximum, respectively.As expected, the globally averaged AOD of different ensembles are similar at the beginning of the integration.They quickly diverge in the next few days and then stabilize.After about 10 days, there are no substantial changes in either the average or the spread of the 256 ensembles.We thus use the day-10 average for the sensitivity analysis below.In other words, we compare the parametric sensitivities derived from the 12-member ensemble averages at day 10 with the results in Zhao et al. (2013), which were based on 4-year averages.

Global mean radiation budget
Our analysis starts with the TOA net radiative flux (FNET).To give a first sense of the model's response to the parameter perturbation, Table 3 lists the mean and standard deviation of the 256 simulations/ensembles.Similar statistics are presented in the same table for the total cloud forcing (CF), as well as for the shortwave and long-wave cloud forcing (SWCF and LWCF).The mean FNET obtained with the two methods differ by about 3 % (0.11 W m −2 ), while the discrepancies in CF, SWCF, and LWCF are smaller in terms of relative differences.Variations among the 256 experiments tend to be somewhat smaller in the 4-year AMIP simulations, probably because the substantially larger number of days involved in the temporal average leads to a stronger smoothing effect.
The sensitivities of global mean FNET to individual parameters are shown in Fig. 10.In each panel, the global mean 4-year averages (Fig. 10, upper row) or day-10 ensemble averages (Fig. 10, lower row) corresponding to the 256 parameter combinations are sorted into eight bins according to the value of one perturbed parameter.The square mark The 4-year AMIP simulations (Fig. 10, upper row) indicate that the global mean FNET increases with dcs (the size threshold between cloud ice and snow) and factic (solubility factor of aerosols in convective clouds).It decreases with ai (a fall-speed parameter for cloud ice), cdnl (the minimum concentration of cloud droplet number), wsubmin (the minimum sub-grid vertical velocity for droplet activation), and e_sst (tuning factor for sea-salt emission).FNET is less sensitive to the other parameters.
As discussed in Zhao et al. (2013), the detected sensitivities in FNET are mainly attributable to clouds, while the contribution of clear-sky areas is relatively small.Therefore, in Fig. 11 we present the responses of SWCF and LWCF to the parameter perturbation.The long-wave cloud forcing is primarily affected by cloud microphysics parameters ai, as, cdnl, and dcs.The shortwave cloud forcing is additionally affected by wsubmin, and the aerosol-related parameters factic (solubility factor of aerosols in convective clouds) and e_so2 (tuning factor for the emission of anthropogenic SO 2 ).
Comparing the upper and lower rows of Figs. 10 and 11, we see that not only the qualitative conclusions drawn in the previous paragraphs stay unchanged in the ensemble simulations, but also the quantitative details of the functional relationships between FNET/SWCF/LWCF and the perturbed parameters are correctly reproduced by the short simulations.Considering that the ensemble results used to derive these relationships were averaged over only 12 realizations and 1 model day, the agreement with the 4-year climate simulations is rather remarkable.
In Figs. 10 and 11, the relative contributions of individual parameters to the total variation of FNET, SWCF, and LWCF are noted above the corresponding panels.These numbers were obtained by applying the generalized linear model (GLM) which assumes the relationships between the output variables (i.e., FNET, SWCF, and LWCF)     quadratic, and interaction terms.Percentages given in red in the figures are statistically significant at the 95 % confidence level.Details of the GLM fitting are described in Sect.2.3.3 in Zhao et al. (2013), and are not repeated in this paper.Here, we only point out that the GLM provides a quantitative way to rank the relative importance of the empirical parameters in determining the total variation in the output variables.The rankings derived from the 4-year simulations and the day-10 ensembles agree quite well.For example, dcs, wsubmin, e_sst and factic are identified by both methods as the most influential parameters for FNET.In terms of the percentage contribution of individual parameters to the total variance, the results derived from the 4-year simulations and the short ensembles are also similar.There are a few cases in which the percentage is regarded as significant in the 4-year simulations but insignificant in the ensembles (e.g., e_soag for FNET, facti and e_acnum for LWCF), but these are typically associated with small contributions and thus should not be considered as large discrepancies.

Computational efficiency
The 12 10-day simulations cost about 1/15 of the total CPU time in comparison to the original 5-year (4 years plus a 1year spin-up) simulations, a substantial reduction in computational cost.As for the turnaround time, on Yellowstone at NCAR CISL, the 256 × 12 simulations submitted as separate jobs finished within 8 h of wall-clock time.Typically, the queuing system allowed 50 to 100 jobs running in parallel.These numbers were smaller than the total number of parameter combinations (256).Therefore, in this case, the reduction of turnaround time was mainly achieved from the smaller number of simulation days required by the ensemble strategy.On larger computing facilities that could allow more than 256 simultaneous jobs from a single user, it would be possible to make fuller use of the available resources using the ensemble strategy, but not with the long-term simulations.On a dedicated system that could accommodate O(10 3 ) concurrent simulations, it would be possible to complete all of our ensemble simulations within 1 hour.Such a fast turnaround can be very useful in systematic sensitivity studies, where influential parameters can be identified from a large number of candidates within a reasonable time period, and more attention can subsequently be paid to the most important parameters.Furthermore, during the development of climate models, it is often necessary to adjust empirical parameters after major updates of model components, so that the long-term global mean TOA radiative flux stays close to zero.Since the global mean FNET, SWCF, and LWCF are among the most important metrics for model tuning, our results suggest that short ensembles can be useful in such exercises, as well.

Conclusions and discussion
We have demonstrated that ensembles of short simulations can be used to estimate the fast responses of a climate model to perturbations.The strategy can produce signatures that agree quantitatively and qualitatively with those produced by traditional multi-year brute-force simulation strategies, at a fraction of the computational and wall-clock cost.
Our first example explored the response of simulated clouds to a change in model time step.that 3-day integrations are sufficient to reproduce the timestep sensitivities seen in the commonly used 5-year climate simulations, due to the rapid response in cloud fields.For the global mean total cloud fraction, liquid water path, and ice water path, the time-step-induced changes can be clearly detected with 20 ensemble members.For the global mean largescale precipitation rate, which has higher natural variability, and for the regional features of cloud forcing, robust signals can be detected from ensembles of 50 members.A combined use of ensemble and nudging led to the finding that interactions between the resolved dynamics and parameterized physics provide positive feedback that enhances the model's time-step sensitivity.
The second example demonstrated that the strategy is capable of characterizing sensitivities of the global mean TOA radiation budget to 16 empirical parameters related to stratiform cloud microphysics and aerosol life cycle.This type of investigation is inherently expensive in terms of computational cost, because a large number of simulations are needed to sufficiently sample the high-dimensional parameter space.Following a previous study by Zhao et al. (2013), we used the quasi-Monte Carlo method to obtain 256 sample points (parameter combinations) from the 16-dimensional parameter space.For each parameter combination, ensemble simulations were conducted with one realization, starting from each month of the year 2001.We showed that parametric sensitivities of the global mean TOA FNET and cloud forcing derived from 12-member ensemble averages at day 10 agree very well with results obtained by Zhao et al. (2013), who used 4-year AMIP simulations in their analysis.The short ensembles correctly identified the most influential parameters for the FNET and cloud forcing, and successfully reproduced the functional relationships between these quantities and the perturbed parameters.
These results indicate that, although climate is, by definition, a long-term average and is associated with strong natural variability, fast processes and robust features exist that do not need very long simulations to characterize them.This fact is already widely known, and has formed the foundation for the CAPT and Transpose-AMIP activities in which climate models are run in weather-forecast mode to reveal the biases, with respect to observations.Here, we have shown that the philosophy behind the Transpose-AMIP-type evaluation can be applied in more general ways to carry out sensitivity studies.Using short ensembles instead of traditional multiyear climate simulations, sensitivity studies can be carried out more efficiently, benefitting from a substantial reduction of the total CPU time spent on numerical integration, as well as a much faster turnaround in the investigation because the independent ensemble members introduce an additional dimension of parallelism that can be exploited with current flagship supercomputers.
The strategy discussed in this paper using simulations that last a few days certainly has limitations.It cannot be used as formulated here to investigate modes of climate variability or feedback mechanisms that operate on time scales of months to years, thus could not replace long-term simulations when long time scales are important.For example, in the time-step sensitivity experiments discussed in Sect.3, while the 5-year simulations reveal an increase of DJF precipitation in the SPCZ, when time step is shortened (not shown), the ensemble simulations do not indicate statistically significant differences in this region.This is probably because systematic changes in the SPCZ involve feedback from the large-scale circulation that can not sufficiently spin-up in just a few days.
Nevertheless, since fast processes are important contributors to the sensitivities and uncertainties in current climate models, short ensembles can help to obtain first-order estimates of rapid responses in the climate system rather quickly.Such economical, approximate answers can be useful in various situations.For example, in systematic studies of parametric uncertainties, short ensembles can be used in preliminary investigations to pre-select influential parameters from a large number of candidates, and to narrow down possible ranges of parameter values.In convergence studies, short ensembles may be the only way to conduct simulations at ultrahigh spatial and/or temporal resolutions that would otherwise be impractical to complete.As the climate modeling community actively pursues higher resolutions, more physically based parameterizations, and inclusion of new, highly sophisticated processes, wide applications can be anticipated for the experimentation method discussed here.
It should be noted that, in this paper, we are advocating the ensemble method as a general strategy, not a recipe.As can be seen from the two examples, for different variables and physical processes, one must generate ensembles differently, and may need different spin-up time and/or ensemble sizes.The most beneficial experimentation design for a particular research question needs to be figured out on a case-bycase basis.Whenever affordable, one should first evaluate the short ensembles against traditional climate simulations.If it is impractical to do so, we recommend testing the experimental design using a range of integration lengths and ensemble sizes, so as to obtain a better understanding of the robustness of the results.
An additional remark worth making here is that the definitions of fast and slow processes need to be understood in relative terms.In this paper, where an atmosphere-only GCM was used, we considered time scales of a few days as "short", and simulations of multiple years as "long".In other situations, fast and slow processes can be reclassified.For example, if one were interested in identifying how seasonal features, such as the Asian summer monsoon, responded to anthropogenic and natural forcings (e.g., Ganguly et al., 2012;Vinoj et al., 2014;Song et al., 2014), or to changes in model formulation (e.g., Zhou and Li, 2002;Chen et al., 2010), it might be possible to generate realizations of simulations that last a few months, and use ensemble averages to remove multi-year and multi-decade scale noise that would otherwise require hundreds of years of simulations.As such, the ensemble strategy may have much wider applications than demonstrated in the present paper.

Figure 1 .
Figure 1.Total cloud cover (unit: %) in CAM5 simulations using the default model time step (30 min).(a) 5-year December-January-February (DJF) average from a long-term climate simulation.(b) 50-member ensemble average of the first model day in a set of short simulations.Further details are explained in Sect.3.1.

Figure 2 .
Figure 2. Differences in total cloud cover (unit: %) between simulations using 4 and 30 min time steps.(a) 5-year DJF average from a climate simulation performed in the traditional way; (b) 50member ensemble average of the first simulation day; (c) as in (b) but for the third simulation day.Stippling in panels (b) and (c) indicates where the differences are statistically significant at the 95 % confidence level.Further details are explained in Sect.3.2.

Fig. 3 .
Fig. 3. (a) Zonally averaged, 5 yr DJF mean mass concentration of stratiform cloud ice (unit: mgkg −1 ) simulated by CAM5 using a 30 min time step.(b) 5 yr DJF mean cloud ice mass concentration differences between simulations using 4 min and 30 min time steps (unit: mgkg −1 ).(c) As in panel (b) but showing the 50-member ensemble mean at day 3. Stippling in panel (c) indicates locations where the differences are statistically significant at 95 % confidence level.Further details are explained in Sect.3.2.

Figure 3 .
Figure 3. (a) Zonally averaged, 5-year DJF mean mass concentration of stratiform cloud ice (unit: mg kg −1 ) simulated by CAM5 using a 30 min time step.(b) 5-year DJF mean cloud ice mass concentration differences between simulations using 4 and 30 min time steps (unit: mg kg −1 ).(c) As in panel (b) but showing the 50member ensemble mean at day 3. Stippling in panel (c) indicates locations where the differences are statistically significant at 95 % confidence level.Further details are explained in Sect.3.2.

Fig. 4 .Fig. 5 .
Fig. 4. Global mean values of some cloud-related variables from the 50-member ensembles and from the 5 yr climate simulations.Blue and green indicate simulations performed with 30 min and 4 min time steps, respectively.Left part of each panel show the daily mean global averages in the first 20 days of the ensemble simulations.Lower and upper ends of the whiskers denote the 10th and 90th percentiles.The hinges in the middle indicate ensemble mean.The filled boxes show the 95 % confidence interval of the mean.In the right part of each panel, the January and DJF averages of the 5 yr climate simulations are shown.The bottom and top of each box correspond to the minimum and maximum January or DJF averages in the simulation period.Hinges in the middle indicate the 5 yr average.Further details are explained in Sect.3.2.

Figure 4 .
Figure 4. Global mean values of some cloud-related variables from the 50-member ensembles and from the 5-year climate simulations.Blue and green indicate simulations performed with 30 and 4 min time steps, respectively.The left part of each panel shows the daily mean global averages in the first 20 days of the ensemble simulations.The lower and upper ends of the whiskers denote the 10th and 90th percentiles.The hinges in the middle indicate ensemble mean.The filled boxes show the 95 % confidence interval of the mean.In the right part of each panel, the January and DJF averages of the 5-year climate simulations are shown.The bottom and top of each box correspond to the minimum and maximum January or DJF averages in the simulation period.Hinges in the middle indicate the 5-year average.Further details are explained in Sect.3.2.

Fig. 5 .
Fig. 5. Impact of ensemble size on the estimated (a) global mean ice water path, (c) global mean large-scale precipitation rate, and (b, d) their sensitivity to model time step.In panels (a) and (c), blue and green correspond to simulations performed with 30 min and 4 min time steps, respectively.The dots inside filled boxes are ensemble mean values averaged over the entire integration period (20 days).Top and bottom of the boxes denote the maximum and minimum daily averages.Similarly, the 4 min/30 min differences are shown in panels (b) and (d), except that all values are normalized by the 20 day average of the 180-member ensemble mean, in order to show the relative differences among the estimates associated with different ensemble sizes.Further details are explained in Sect.3.3.

Figure 5 .
Figure 5. Impact of ensemble size on the estimated (a) global mean ice water path, (c) global mean large-scale precipitation rate, and (b, d) their sensitivity to model time step.In panels (a) and (c), blue and green correspond to simulations performed with 30 and 4 min time steps, respectively.The dots inside filled boxes are ensemble mean values averaged over the entire integration period (20 days).Top and bottom of the boxes denote the maximum and minimum daily averages.Similarly, the 4 / 30 min differences are shown in panels (b) and (d), except that all values are normalized by the 20-day average of the 180-member ensemble mean, in order to show the relative differences among the estimates associated with different ensemble sizes.Further details are explained in Sect.3.3.

Fig. 6 .
Fig.6.Shortwave cloud forcing differences (unit: Wm −2 ) between simulations using 4 min and 30 min time steps.(a) 5 yr DJF average from a climate simulation performed in the traditional way; (b) day 3 average from ensemble simulations with 50 independent members; (c) day 3 average from ensemble simulations with 360 independent members.In panels (b) and (c), stippled regions are associated with differences significant at the 95 % confidence level.

Figure 6 .
Figure 6.Shortwave cloud forcing differences (unit: W m −2 ) between simulations using 4 and 30 min time steps.(a) 5-year DJF average from a climate simulation performed in the traditional way; (b) day-3 average from ensemble simulations with 50 independent members; (c) day-3 average from ensemble simulations with 360 independent members.In panels (b) and (c), stippled regions are associated with differences significant at the 95 % confidence level.

Fig. 7 .
Fig. 7. Shortwave cloud forcing (SWCF) associated with tropical deep convection in simulations performed with 30 min (blue) and 4 min (green) time steps.The SWCF is averaged over tropical ocean grid points where the daily mean convective precipitation rate exceeds 3 mmday −1 .Like in Fig. 4, horizontal bars in the middle of filled boxes indicate the mean value of each ensemble.Lower and upper ends of the whiskers correspond to the 10th and 90th percentiles, respectively.Filled boxes show the 95 % confidence interval of the ensemble mean.Further details are explained in Sect.3.3.

450Figure 7 .
Figure 7. Shortwave cloud forcing (SWCF) associated with tropical deep convection in simulations performed with 30 (blue) and 4 min (green) time steps.The SWCF is averaged over tropical ocean grid points, where the daily mean convective precipitation rate exceeds 3 mm day −1 .Like in Fig. 4, horizontal bars in the middle of filled boxes indicate the mean value of each ensemble.Lower and upper ends of the whiskers correspond to the 10th and 90th percentiles, respectively.Filled boxes show the 95 % confidence interval of the ensemble mean.Further details are explained in Sect.3.3.

Fig. 8 .
Fig.8.Comparison of the free-running ("F") and nudged ("N") simulations performed with 30 min (blue) and 4 min (green) time steps.Meanings of the whiskers, boxes and hinges are the same as in Figs.4 and 7.Each ensemble consists of 50 independent members.In the nudged simulations, temperature and horizontal wind are relaxed towards those from the 30 min time step unconstrained simulations, using a nudging time scale of 6 h.Panels (a) and (b) show the globally averaged, vertically integrated total cloud fraction and ice water path, respectively.Panels (c) and (d) show the shortwave cloud forcing (SWCF) averaged over the shallow and deep convection regions.SWCF associated with shallow convection is the average over ocean grid points between 30 • N and 30 • S where the frequency of occurrence of shallow convection is larger than 0.5, and the daily mean convective precipitation rate is lower than 1 mmday −1 .SWCF associated with deep convection is calculated in the same way as inFig.7 (cf.Sect.3.3.2).All results are shown for the third simulation day.

Fig. 9 .
Fig. 9. Time evolution of the global mean aerosol optical depth (AOD, at 550 nm wavelength) in the ensemble simulations described in Sect. 4. The thick curve shows the AOD averaged over 256 ensembles that used different values for 16 empirical parameters in the CAM5 model (cf.Table 2).Vertical bars indicate the spread (minimum to maximum) among the 256 ensembles.

Fig. 9 .
Fig. 9. Time evolution of the global mean aerosol optical depth (AOD, at 550 nm wavelength) in the ensemble simulations described in Sect. 4. The thick curve shows the AOD averaged over 256 ensembles that used different values for 16 empirical parameters in the CAM5 model (cf.Table 2).Vertical bars indicate the spread (minimum to maximum) among the 256 ensembles.

555Figure 9 .
Figure 9.Time evolution of the global mean aerosol optical depth (AOD, at 550 nm wavelength) in the ensemble simulations described in Sect. 4. The thick curve shows the AOD averaged over 256 ensembles that used different values for 16 empirical parameters in the CAM5 model (cf.Table 2).Vertical bars indicate the spread (minimum to maximum) among the 256 ensembles.

Figure 10 .
Figure10.Sensitivities of the global mean top-of-atmosphere net radiative flux (FNET, unit: W m −2 ) to the empirical parameters listed in Table2.CAM5 simulations were carried out using 256 different model configurations corresponding to 256 sampling points drawn from the 16-dimensional parameter space (cf.Table2 and Sect.4.1).In each panel, the global mean FNET corresponding to the 256 model configurations are sorted into eight bins according to the values of one perturbed parameter.The spread (minimum to maximum) of FNET within a bin is shown by a vertical bar, while the mean value is indicated by a square mark.Note that the FNET shown here is the anomaly relative to the mean of the 256 simulations/ensembles.The mean values that have been subtracted are given in Table3.Numbers noted above the panels are the relative contribution of individual parameters to the total variation of FNET, estimated using a generalized linear model (cf.Sect.4.4).Red font means the contribution is statistically significant at the 95 % confidence level.The upper row shows results obtained from the AMIP simulations ofZhao et al. (2013) (cf.Sect 4.1).The lower row shows results from the ensemble simulations performed in this study.

Figure 11 .
Figure 11.As in Fig. 10 but showing parametric sensitivities of the global mean (a) shortwave and (b) long-wave cloud forcing (unit: W m −2 ).

Table 1 .
Global mean values and pattern correlations of the atmospheric mean state in a multi-year climate integration and an ensemble of short simulations.The 5-year mean climatology of December-January-February (DJF) is compared to the 50-member mean of day 1 average.All simulations were performed with a 30 min model time step.Further details are explained in Sect.3.1.

Table 2
). Vertical bars indicate the spread (minimum to maximum) among the 256 ensembles.

Table 2 .
Zhao et al. (2013)rs in the cloud microphysics and aerosol life cycle parameterizations in CAM5 that are perturbed in the sensitivity analysis described in Sect. 4. Adapted from Table1inZhao et al. (2013).
Table 2).Vertical bars indicate the spread (minimum to maximum) among the 256 ensembles.

Table 3 .
Global mean TOA net radiative flux (FNET), total cloud forcing (CF), shortwave cloud forcing (SWCF), and long-wave cloud forcing (LWCF) in the parametric sensitivity simulations described in Sect. 4. The numbers given are the average of 256 simulations/ensembles ± one standard deviation (unit: W m −2 ).

www.geosci-model-dev.net/7/1961/2014/ Geosci. Model Dev., 7, 1961-1977, 2014
and input parameters are polynomial functions that include linear, H. Wan et al.: Experimentation strategy for climate models Sensitivities of the global mean top-of-atmosphere net radiative flux (FNET, unit: Wm −2 ) to the empirical parameters listed in Table2.CAM5 simulations were carried out using 256 different model configurations corresponding to 256 sampling points drawn from the 16-dimensional parameter space (cf.Table2 and Sect.4.1).In each panel, the global mean FNET corresponding to the 256 model configurations are sorted into 8 bins according to the values of one perturbed parameter.The spread (minimum to maximum) of FNET within a bin is shown by a vertical bar, while the mean value is indicated by a square mark.Note that the FNET shown here is the anomaly relative to the mean of the 256 simulations/ensembles.The mean values that have been subtracted are given in Table3.Numbers noted above the panels are the relative contribution of individual parameters to the total variation of FNET, estimated using a generalized linear model (cf.Sect.4.4).Red font means the contribution is statistically significant at the 95 % confidence level.The upper row shows results obtained from the AMIP simulations ofZhao et al. (2013) (cf.Sect 4.1).The lower row shows results from the ensemble simulations performed in this study.

Table 2 .
CAM5 simulations were carried out using 256 different model configurations corresponding to 256 sampling points drawn from the 16-dimensional parameter space (cf.Table2and Sect.4.1).In each panel, the global mean FNET corresponding to the 256 model configurations are sorted into eight bins according to the values of one perturbed parameter.The spread (minimum to maximum) of FNET within a bin is shown by a vertical bar, while the mean value is indicated by a square mark.Note that the FNET shown here is the anomaly relative to the mean of the 256 simulations/ensembles.The mean values that have been subtracted are given in Table3.Numbers noted above the panels are the relative contribution of individual parameters to the total variation of FNET, estimated using a generalized linear model (cf.Sect.4.4).Red font means the contribution is statistically significant at the 95 % confidence level.The upper row shows results obtained from the AMIP simulations ofZhao et al. (2013) (cf.Sect 4.1).The lower row shows results from the ensemble simulations performed in this study.
the panels are the relative contribution of individual parameters to the total variation of FNET, estimated using a generalized linear model (cf.Sect.4.4).Red font means the contribution is statistically significant at the 95 % confidence level.The upper row shows results obtained from the AMIP simulations ofZhao et al. (2013) (cf.Sect 4.1).The lower row shows results from the ensemble simulations performed in this study.