Reduced Complexity Model Intercomparison Project Phase 1: introduction and evaluation of global-mean temperature response

. Reduced-complexity climate models (RCMs) are critical in the policy and decision making space, and are di-rectly used within multiple Intergovernmental Panel on Climate Change (IPCC) reports to complement the results of more comprehensive Earth system models. To date, evaluation of RCMs has been limited to a few independent studies. Here we introduce a systematic evaluation of RCMs in the form of the Reduced Complexity Model Intercomparison Project (RCMIP). We expect RCMIP will extend over multiple phases, with Phase 1 being the ﬁrst. In Phase 1, we focus on the RCMs’ global-mean temperature responses, comparing them to observations, exploring the extent to which they emulate more complex models and considering how the relationship between temperature and cumulative emissions of CO 2 varies across the RCMs. Our work uses experiments which mirror those found in the Coupled Model Intercomparison Project (CMIP), which focuses on complex Earth system and atmosphere–ocean general circulation models. Us-Published ing both scenario-based and idealised experiments, we examine RCMs’ global-mean temperature response under a range of forcings. We ﬁnd that the RCMs can all reproduce the approximately 1 ◦ C of warming since pre-industrial times, with varying representations of natural variability, volcanic eruptions and aerosols. We also ﬁnd that RCMs can emulate the global-mean temperature response of CMIP models to within a root-mean-square error of 0.2 ◦ C over a range of experiments. Furthermore, we ﬁnd that, for the Representative Concentration Pathway (RCP) and Shared Socioeconomic Pathway (SSP)-based scenario pairs that share the same IPCC Fifth Assessment Report (AR5)-consistent stratospheric-adjusted radiative forcing, the RCMs indicate higher effective radiative forcings for the SSP-based scenarios and correspondingly higher temperatures when run with the same climate settings. In our idealised setup of RCMs with a climate sensitivity of 3 ◦ C, the difference for the ssp585–rcp85 pair by 2100 is around 0 . 23 ◦ C ( ± 0 . 12 ◦ C) due to a difference in effective radiative forcings between the two scenarios. Phase 1 demonstrates the utility of RCMIP’s open-source infrastructure, paving the way for further phases of RCMIP to build on the research presented here and deepen our understanding of RCMs.


Introduction
Sufficient computing power to enable running our most comprehensive, physically complete climate models for every application of interest is not available. Thus, for many applications, less computationally demanding approaches are used. One common approach is the use of reduced-complexity climate models (RCMs), also known as simple climate models (SCMs).
RCMs are designed to be computationally efficient tools, allowing for exploratory research, and have smaller spatial, if any, and temporal resolution than complex models. Typically, they describe highly parameterised macro-properties of the climate system. Usually this means that they simulate the climate system on a global-mean, annual-mean scale, although some RCMs even use coarse-resolution spatial grids and monthly time steps. As a result of their highly parameterised approach, RCMs can be of the order of a million or more times faster than more complex models (in terms of simulated model years per unit CPU time).
The computational efficiency of RCMs means that they can be used where computational constraints would otherwise be limiting. For example, in the hierarchy of climate models -RCMs, the Earth system models of intermediate complexity (EMICs) and Earth system models (ESMs) -it is only RCMs that are sufficiently efficient for large probabilistic ensembles for hundreds of scenarios. In addition, some integrated assessment models (IAMs) require iterative climate simulations. In such cases, only RCMs are com-putationally feasible because hundreds to thousands of climate realisations must be integrated by the IAM for a single scenario to be produced. RCMs also enable the exploration of interacting uncertainties from multiple parts of the climate system or the constraining of unknown parameters by combining multiple lines of evidence in an internally consistent setup. In the context of the assessment reports of the Intergovernmental Panel on Climate Change (IPCC), a prominent example is the climate assessment of emission scenarios by IPCC Working Group 3 (WGIII). Hundreds of emission scenarios were assessed in the IPCC's Fifth Assessment Report (AR5; see Clarke et al., 2014) as well as its more recent Special Report on Global Warming of 1.5 • C (SR1.5; see Rogelj et al., 2018;Huppmann et al., 2018). (Scenario data are available at https://secure.iiasa. ac.at/web-apps/ene/AR5DB (last access: 22 October 2020) and https://data.ene.iiasa.ac.at/iamc-1.5c-explorer/ (last access: 22 October 2020) for AR5 and SR1.5 respectively; both databases are hosted by the IIASA Energy Program.) For the IPCC's forthcoming Sixth Assessment Report (AR6), it is anticipated that the number of scenarios will be in the several hundreds to a thousand (for example, see the full set of scenarios based on the Shared Socioeconomic Pathway (SSPs) at https://tntcat.iiasa.ac.at/SspDb, last access: 22 October 2020). Both the number of scenarios and the tight timelines of the IPCC assessments render it infeasible to use the world's most comprehensive models to estimate the climate implications of these IAM scenarios.

Evaluation of reduced-complexity climate models
The validity of the RCM approach rests on the premise that RCMs are able to replicate the behaviour of the Earth system and response characteristics of our most complete models. Over time, multiple independent efforts have been made to evaluate this ability. In 1997, an IPCC technical paper (Houghton et al., 1997) investigated the simple climate models used in the IPCC Second Assessment Report and compared their performance with idealised atmosphere-ocean general circulation model (AOGCM) results. Later, van Vuuren et al. (2011b) compared the climate components used in IAMs, such as DICE (Nordhaus, 2014) and FUND (Waldhoff et al., 2011);van Vuuren et al. (2011b) also included the RCM MAGICC (version 4 at the time; Wigley and Raper, 2001), which was used in several IAMs. They focused on five CO 2 -only experiments to quantify the differences in the behaviour of the RCMs used by each IAM. Harmsen et al. (2015) extended the work of van Vuuren et al. (2011b) to consider the impact of non-CO 2 climate drivers in the Representative Concentration Pathway (RCPs). Recently, Schwarber et al. (2019) proposed a series of impulse tests for simple climate models in order to isolate differences in model behaviour under idealised conditions. Despite these efforts, the RCM community does not yet have a systematic, regular intercomparison effort. This led to the following statement in SR1.5 : "The veracity of these reduced-complexity climate models is a substantial knowledge gap in the overall assessment of pathways and their temperature thresholds". This study provides a first step to fill this gap via a systematic intercomparison. A systematic intercomparison is also likely to provide other benefits, similar to those that the AOGCM and ESM modelling communities have gained over multiple iterations of CMIP (Carlson and Eyring, 2017). Developing a systematic comparison for RCMs will provide similar benefits to the RCM community, including building a community of reduced-complexity modellers, facilitating comparison of model behaviour, improving understanding of RCMs' strengths and limitations, and ultimately improving RCMs.
An ongoing comprehensive evaluation and assessment of RCMs requires an established protocol. The Reduced Complexity Model Intercomparison Project (RCMIP) proposed here provides such a protocol (also see https://www.rcmip. org/, last access: 22 October 2020). In the RCMIP community call (available at https://www.rcmip.org/, last access: 22 October 2020) RCMs were broadly defined as follows: " [. . . ] RCMIP is aimed at reduced-complexity, simple climate models and small emulators that are not part of the intermediate complexity EMIC or complex GCM/ESM categories". In practice, we encouraged any group in the scientific community who identifies with the label of RCM to participate in RCMIP; see Table 1 for an overview of the models which participated in RCMIP Phase 1.
We aim for RCMIP to provide a focal point for further development and an experimental design which allows models to be readily compared and contrasted, mirroring the regular comparisons which are performed for AOGCMs and ESMs in each of CMIP's iterations. We intend for RCMIP to facilitate more regular and targeted assessment of RCMs.
Thus, whilst RCMIP mirrors many of the experimental setups developed within CMIP6, RCMIP focuses on RCMs and is hence not one of the official CMIP6  endorsed intercomparison projects (that are instead targeted at ESMs). Nonetheless, RCMs are part of the climate model hierarchy, so we aim to make comparing the RCMIP results with results from other modelling communities, specifically CMIP, as simple as possible. Accordingly, RCMIP replicates selected experimental designs of many of the CMIPendorsed MIPs, particularly the DECK  and ScenarioMIP (O'Neill et al., 2016) simulations.
In what follows, we describe RCMIP Phase 1. In Sect. 2, we detail the domain of RCMIP Phase 1 and its research questions. In Sect. 3, we provide an overview of the participating models and their configuration. In Sect. 4, we describe the experimental setup. In Sect. 5 we present results from RCMIP Phase 1, before presenting possible extensions in Sect. 6 and conclusions in Sect. 7.

Research questions
The key point of this paper is to introduce RCMIP, its goals and its setup. As a proof of concept, we also include key initial research questions, the implemented experimental setup and associated results from RCMIP's first phase.
2.1 Research question 1: is the reduced-complexity modelling community ready to run an intercomparison and how long would such an intercomparison take to run?
Model intercomparisons require significant effort on the part of the organising community and each of the modelling teams involved. The reduced-complexity modelling community has not undertaken such an effort previously; hence the first question is whether the community is ready to perform an intercomparison. In addition to whether an intercomparison is possible, the second part of the first question is how long and how much effort is required to perform the intercomparison. The most successful intercomparisons are built on standardised protocols for experiment design, model setup and data handling. To date, no such standards exist for the reduced-complexity modelling community.
Here we investigate how easily the benefits of systematic intercomparison can be brought to the reduced-complexity modelling community by performing the first of many envisaged rounds of intercomparison. In the process, we gain vital insights into the effort, timelines and scope which can reasonably be managed by the participating modelling teams. Such knowledge is vital for planning future efforts.
2.2 Research question 2: can reduced-complexity climate models capture observed historical global-mean surface air temperature (GSAT) trends?
The second research question focuses on a key metric for evaluating RCMs against observations. This research question evaluates the extent to which each RCM's approximations and parameterisations cause its response to deviate from observational data. However, given the limited amount of observations available, comparing only with observations leaves us with little understanding of how RCMs perform in scenarios apart from a historical one in which anthropogenic emissions are heating the climate. Recognising that there are a range of possible futures, it is vital to also assess RCMs in other scenarios. Prominent examples include stabilising or falling anthropogenic emissions, strong mitigation of non-CO 2 climate forcers and scenarios with CO 2 removal. The limited observational set motivates RCMIP's third research question: evaluation against more complex models.  Hooss et al., 2001;Bruckner et al., 2003;Kriegler, 2005) AR5IR (ar5ir-2box, ar5ir-3box) Global Impulse response Myhre et al. (2013) CICERO-SCM (CICERO-SCM)

Research question 3: to what extent can
reduced-complexity models emulate the global-mean temperature response of more complex models?
Whilst the response of more comprehensive models may not represent the behaviour of the actual Earth system, they are the best available representation of our understanding of the Earth system's physical processes. By evaluating RCMs against more complex models, we can quantify the extent to which the simplifications made in RCMs limit their ability to capture physically based model responses -for example the extent to which the approximation of a constant climate feedback in some RCMs limits the RCM's ability to replicate ESMs' longer-term response under either higher forcing or lower overshoot scenarios (Rohrschneider et al., 2019).

Research question 4: what can a multi-model ensemble of RCMs tell us about the difference between the SSP-based and RCP scenarios?
The SSP-based scenarios (O'Neill et al., 2016;Riahi et al., 2017) are the cornerstone of CMIP6's ScenarioMIP and are an update of CMIP5's RCP scenarios (van Vuuren et al., 2011a). One of the key intents behind some of the SSP-based scenarios is that they share the same nameplate 2100 radiative forcing level as the RCPs (e.g. ssp126 and rcp26, ssp245 and rcp45), the idea being that they would have similar climatic outcomes despite their different atmospheric concentration inputs. However, the nameplate radiative forcing comparisons between RCPs and SSPs were undertaken on the basis of IPCC AR5-consistent stratospheric-adjusted radiative forcings (Myhre et al., 2013). Taking into account new insights into respective CO 2 and CH 4 forcings, as well as effective radiative forcings, different climate responses can be expected. In fact, Wyser et al. (2020) suggest that the difference in atmospheric concentrations results in non-trivial differences in climate projections. Unfortunately, evaluating the scenario differences between RCP and SSP-based scenarios with a large, identical set of CMIP models is difficult because of the computational cost (many CMIP6 modelling groups will not perform all CMIP6 ScenarioMIP experiments, let alone performing extra CMIP5 experiments). With an ensemble of RCMs, we can provide further insight into how much the change in emissions pathways affects climate projections using identical models, building on the insights from the CMIP groups which can afford to run the required experiments. In addition, RCMs also offer one other benefit: they can diagnose effective radiative forcing directly. As a result, RCMs can provide more detailed insights into the reasons for differences because they provide a more detailed breakdown of the emissions-climate change cause-effect chain. In contrast, diagnosing effective radiative forcing from CMIP models is a difficult task which requires a number of extra experiments, all of which come at additional computational cost .

Research question 5: how does the relationship
between cumulative CO 2 emissions and global-mean temperature vary both between RCMs and within a parameter ensemble of an RCM?
The relationship between cumulative CO 2 emissions and global-mean temperature is key to deriving the transient climate response to emissions (IPCC, 2018), a key metric in the calculation of our remaining carbon budget (Rogelj et al., 2019). Here we investigate how this relationship varies between RCMs and within a parameter ensemble from a given RCM. Whilst a multi-model ensemble demonstrates variance due to model structure, the parameter ensemble demonstrates variance that arises solely as a result of changes in the strength of the response of individual components. These insights build on results from experiments with more complex models (see e.g. Arora et al., 2020), which cannot perform such large perturbed parameter ensembles because of computational cost.

Participating models and their configuration
Fifteen models have participated in RCMIP Phase 1 (see Table 1 for an overview and links to key description papers; note that GIR has been renamed FaIR-v2 since the preparation of this paper). We encourage any other interested groups to join further phases of the project. Even within the reduced-complexity category, there is considerable variation in both model complexity and the number of climate components (Table 1). At the simplest end, we have the radiative-forcing-driven (see Sect. 4) impulse response models, represented by the AR5IR model variants. These models project global-mean temperature only and, in the setup submitted here, provide only annual-mean values (although they can be run at higher temporal resolution if desired). At the other end of the spectrum, we have MAGICC, which includes representations of 43 greenhouse gas cycles, includes parameterisations of the relationship between aerosol emissions and aerosol effective radiative forcing, distinguishes between different hemispheres and land/ocean regions of the globe, has 50 ocean layers in each hemisphere, and runs on a monthly time step internally (although all output is annual mean only). Some models take a more hybrid approach, increasing complexity in only a single component whilst retaining simplicity elsewhere. Examples of increased complexity in specific domains include OS-CAR's regionalised land carbon cycle and EMGC's representation of natural variability.
An in-depth description of these models and their differences is beyond the scope of this paper (but is planned for future research). For readers interested in the details of all the participating models, we refer to the references provided in Table 1.

Model configuration
RCMs are usually highly flexible. Their response to anthropogenic and natural drivers strongly depends on the configuration in which they are run (i.e. their parameter values). In RCMIP Phase 1, we have requested that all models provide one set of simulations in which their equilibrium climate sensitivity is equal to 3 • C. Whilst this does not define the entirety of a model's behaviour, it removes a major cause of difference between model output which is not related to model structure. Within Phase 1 of RCMIP, we have given modelling groups the freedom to choose whether they apply any additional constraints or not.
On top of the 3 • C climate sensitivity configuration, we have also invited groups to submit two other configuration categories. The first is any other best guess or default configurations, where each participating modelling group is free to choose their own best guess (the details of which can be found in the references provided in Table 1). The second is configurations deliberately designed to emulate specific ESMs from CMIP5 and CMIP6. Given the complexities involved in calibration (see e.g. Meinshausen et al., 2011;, not all modelling groups submitted such CMIP5-and CMIP6-specific configurations. However, for those groups that did, these emulation setups provide valuable insight into the extent to which the model's structure limits its ability to reproduce the behaviour of more complex models. Given the complexity of the topic, we leave decisions about how to calibrate their model up to the individual modelling teams (details of each group's approach can be found in the references provided in Table 1). A more top-down approach will be undertaken in a future phase of RCMIP (see Sect. 6).

Experimental design
RCMs generally model multiple steps in the emissionsclimate change cause-effect chain, including gas cycles (emissions-to-concentration step), radiative forcing parameterisations (concentrations-to-radiative-forcing step) and temperature response (radiative-forcing-to-warming step). Here, effective radiative forcing and radiative forcing are defined following Myhre et al. (2013). In contrast to radiative forcing, effective radiative forcing includes rapid adjustments beyond stratospheric temperature adjustments and thus is a better indicator of long-term climate change.
Each point in the chain can be used as the starting point for simulations; i.e. the simulation might be defined in terms of prescribed concentrations, emissions or radiative forcing. In Phase 1 of RCMIP, we focus on experiments which are defined in terms of concentrations to facilitate a direct compar-ison with CMIP experiments, most of which are also defined in terms of concentrations.
RCMIP Phase 1 focuses on 19 experiments, which can be broken down into two categories: scenario-based and idealised. We provided all inputs following, and requested that all outputs follow, a standard format to facilitate ease of data analysis and re-use (Sect. S1 in the Supplement). This common data format was developed for RCMIP and combines elements of the integrated assessment community standard (Gidden and Huppmann, 2019) and the CMIP6 definitions of variables and scenarios.

Scenario-based experiments
Scenario-based experiments examine model responses to historical transient forcing as well as a range of future scenarios. The historical experiments provide a way to compare RCM output against observational data records (research question 2) and are complementary to the idealised experiments (Sect. 4.2), which provide a cleaner assessment of model response to forcing. The future scenarios probe RCM responses to a range of possible climate futures, both continued warming and stabilisation or overshoots in forcing. The variety of scenarios is a key test of model behaviour, evaluating them over a range of conditions rather than only over the historical period. Direct comparison with CMIP output then provides information about the extent to which the simplifications involved in RCM modelling are able to reproduce the response of the most advanced physically based ESMs (research question 3).
RCMIP Phase 1's scenario experiments are historical, ssp119, ssp126, ssp245, ssp370, ssp434, ssp460, ssp534over, ssp585, rcp26, rcp45, rcp60 and rcp85. We focus on simulations (historical plus future) which cover the range in forcing scenarios from the CMIP6 ScenarioMIP exercise (O'Neill et al., 2016;Riahi et al., 2017) and CMIP5 RCP scenarios (van Vuuren et al., 2011a). These quickly reveal differences in model projections over the widest available scenario range which can also be compared to CMIP6 output. The CMIP5 experiments are particularly useful as they provide a direct comparison between CMIP5 and CMIP6 scenarios (research question 4), something which has only been done to a limited extent with more complex models (Wyser et al., 2020).
All of these experiments are defined in terms of concentrations of well-mixed greenhouse gases. Here, "well-mixed greenhouse gases" refer to CO 2 , CH 4 , N 2 O, hydrofluorocarbons (HFCs), perfluorocarbons (PFCs) and hydrochlorofluorocarbons (HCFCs). However, scenario experiments include more than just well-mixed greenhouse gases, so these concentrations are supplemented by aerosol precursor species emissions, ozone-relevant emissions and natural effective radiative forcing variations. Here, "aerosol precursor species emissions" refer to emissions of sulfur, nitrates, black carbon, organic carbon and ammonia. "Ozone-relevant emis-sions" refer to emissions of carbon monoxide and nonmethane volatile organic compounds (NMVOCs). For models which do not include the steps of aerosol emissions to effective radiative forcing or ozone-relevant emissions to ozone effective radiative forcing, prescribed effective radiative forcings can instead be used. Here "natural effective radiative forcing variations" refer to effective radiative forcing due to natural volcanic eruptions and changes in solar irradiance. All data sources are described in Sect. S2.
The key difference between the RCMIP experiments and the CMIP experiments is that some RCMs include more anthropogenic drivers than CMIP models. Specifically, CMIP models do not include the full range of HFC, PFC and HCFC species, instead using equivalent concentrations . In addition, some CMIP models will not include the effect of aerosol precursors such as nitrates, ammonia and organic carbon (McCoy et al., 2017).

Idealised experiments
In addition to the scenario-based experiments, RCMIP Phase 1 also includes a number of idealised experiments. All of these experiments are defined in terms of CO 2 concentrations alone. These experiments provide an easy point of comparison with output from other models, particularly CMIP output, as well as information about basic model behaviour and dynamics which can be useful for understanding the differences between models.
The experiments reveal differences in model response to forcing, particularly whether the RCM response to forcing includes non-linearities. In addition, these experiments also provide a direct comparison with CMIP experiments (i.e. more complex model behaviour) and are a key benchmark when examining an RCM's ability to emulate more complex models (research question 3). In these concentration-driven experiments, RCMs report emissions (often referred to as "inverse emissions") and carbon cycle behaviour consistent with the prescribed CO 2 pathway. These inverse emissions are key to exploring the variation in the relationship between surface air temperature change and cumulative emissions of CO 2 Matthews et al., 2009;Meinshausen et al., 2009;Zickfeld et al., 2009) over a range of models and parameter values (research question 5).

Output variables
Phase 1 of RCMIP focuses on five key output variables. The focus on a limited set allows us to discern major differences between RCMs and provides insights into the reasons for such differences. The first variable of interest is surface air temperature change. We choose this variable because it is comparable to available observations and CMIP output and is also policy-relevant.
In addition to surface air temperature change, we request total, anthropogenic, CO 2 and aerosol effective radiative forcing. These forcing variables are key indicators of the long-term drivers of climate change within each model as well as being key metrics for the IAM community. In particular, aerosol effective radiative forcing is highly uncertain and a key source of difference between RCMs.
The final variable we request is CO 2 emissions. Given that all our experiments are defined in terms of concentrations, we request CO 2 emissions compatible with the prescribed CO 2 pathways.

Results
Within 3 months of beginning RCMIP and publishing the protocols, 15 different RCMs submitted data. Given that this is the first phase of RCMIP, we expect even shorter turnarounds in future. The submitted results demonstrate that the RCM community, via RCMIP, now has the capacity to run multi-model studies, and to run them comparatively quickly. In addition, the number of participating modelling groups demonstrates that the RCMIP infrastructure is accessible to a wide range of modelling teams.
All the RCMs are able to capture the approximately 1 • C of warming seen in the historical observations (Fig. 1), compared to a pre-industrial reference period Rogelj et al., 2019). However, the RCMs vary in the detail which they represent. Most of the RCMs include some representation of the impact of volcanic eruptions, most notably the drop in global-mean temperatures after the eruption of Mount Agung in 1963. In addition, most of the RCMs do not capture natural variability driven by processes such as the El Niño-Southern Oscillation (Wolter and Timlin, 2011), the Pacific Decadal Oscillation (Zhang et al., 1997) and the Indian Ocean Dipole (Saji et al., 1999). The exception to this is the EMGC model, which includes representations of the impact of all of these processes. At the other end of the complexity spectrum, we have the CO 2 -only model, GREB. Unlike the other RCMs, GREB lacks the volcanic and aerosolinduced cooling signals of the 19th and 20th centuries.
RCMIP also facilitates a comparison of model calibrations and CMIP output (Fig. 2). Examining multiple emulation setups, we see that RCMs can reproduce the temperature response of CMIP models to forcing changes to within a rootmean-square error of 0.2 • C ( Table 2). A detailed compar- Figure 1. Historical global-mean annual mean surface air temperature (GSAT) simulations. Thick black line is observed GSAT Rogelj et al., 2019). Medium-thickness lines are default configurations for RCMIP models. Thin grey solid lines are CMIP6 models. In order to provide time series up until 2019, we have used data from the combination of historical and ssp585 simulations.  (Table S1 and Figs. S1-S24).
In scenario-based experiments, it appears to be harder for RCMs to emulate CMIP output than in idealised experiments. We suggest two key explanations. The first is that effective radiative forcing cannot be easily diagnosed in SSPbased scenarios and hence it is hard to know how best to force the RCM during calibration. The second is that the forcing in these scenarios includes periods of increase, sudden decrease due to volcanoes and longer-term stabilisation rather than the simpler changes seen in the idealised experiments. Fitting all three of these regimes is a more difficult challenge than fitting the idealised experiments alone. Only 6 models ( Table 2) have been able to submit emulation configurations. Furthermore, each RCM is calibrated to a different number of CMIP models, with some modelling teams unable to provide any calibrations at all. The reason is that there is to date no common resource of calibration data from the CMIP6 repositories. The technical challenge of diagnosing, stitching together, creating area-weighted averages and de-drifting a large amount of CMIP6 output data within a short time period has turned out to be a hurdle for many modelling teams. As an offspring from RCMIP, we attempt to address this challenge for the future by providing a unifying data portal (see https://cmip6.science.unimelb.edu.au/, last access: 22 October 2020, Nicholls et al., 2020b).
The ensemble of RCMs also provides insights into the differences between CMIP5 and CMIP6 generation scenarios ("RCP" and "SSP-based" scenarios respectively) when these scenarios are run with identical models (Fig. 3). In the selection of models which have submitted all RCP and SSP-based scenario pairs, the SSP-based scenarios are 0.20 • C (standard deviation 0.10 • C across the available models) warmer than their corresponding RCPs (Fig. 3b). This difference is driven by the 0.39 ±0.24 W m −2 larger effective radiative forcing in the SSP-based scenarios (Fig. 3d), which itself is driven by the 0.53 ± 0.44 W m −2 larger CO 2 effective radiative forcing in the SSP-based scenarios (Fig. 3f). These results add to the work of Wyser et al. (2020), which suggests that, even when run with the same model (in a concentration-driven setup), the SSP-based scenarios result in warmer projections than the RCPs. When we run one of the RCMs (MAGICC) with an AR5-consistent stratospheric-adjusted radiative forcing definition (Myhre et al., 2013), the SSP-based and RCP scenarios are within 6 % of each other in 2100 (although their AR5-consistent stratospheric-adjusted radiative forcing trajectories can differ by up to 15 % at different times over the 21st century). Thus, we find that the update to effective radiative forcing (Forster et al., 2016), mainly using the formulations presented in Etminan et al. (2016) plus any rapid adjustment terms (Smith et al., 2018b), increases the total forcing in the SSP-based scenarios, because their generally higher CO 2 concentrations are partially, but not fully, offset by lower CH 4 concentrations (see e.g. Fig. 11 in Meinshausen et al., 2020). There is a clear need for further, more comprehensive exploration of the differences between the RCP and SSP-based scenarios.
Finally, we present variations in the relationship between surface air temperature change and cumulative CO 2 emissions from the 1pctCO2 and 1pctCO2-4xext experiments Figure 3. Output from the RCP and SSP-based scenarios up until 2100. The left-hand column shows raw model output. The right-hand column shows the difference between RCP and SSP-based scenario pairs for a given model's output. The shaded range shows 1 standard deviation about the median (solid lines). Output is shown for surface air temperature change (GSAT, a and b), effective radiative forcing (c and d), CO 2 effective radiative forcing (e and f) and aerosol effective radiative forcing (g and h). The results here are based on a limited set of models: CICERO-SCM, MAGICC, OSCAR, GIR (since renamed FaIR-v2) and FaIR. Only these models have performed the required RCP and SSP-based scenario pair experiments. (Fig. 4). To date, only three models (GIR (since renamed FaIR-v2), MCE and OSCAR) have been able to provide the required outputs (in particular deriving inverse emissions from these concentration-defined experiments). From the available results, it is clear that the relationship between these two key variables varies over MCE's parameter ensemble, from weakly sub-linear to weakly super-linear. Such variation can have notable implications for the remaining carbon budget (Nicholls et al., 2020a). We also see that the MCE model's parameter ensemble covers a large range, dwarfing the differences between it and the GIR (since renamed FaIR-v2) and OSCAR models, which are shown here in their 3 • C climate sensitivity configurations. This suggests that, at least for RCMs, the response of individual components and their configuration is more important than model structure, although this conclusion is tempered by the paucity of available results.
6 Options for future RCMIP phases RCMIP Phase 1 provides proof of concept of the RCMIP approach to RCM evaluation, comparison and examination. However, Phase 1 has been limited to a very specific set of questions, and there is wide scope to use RCMs to examine other scientific questions of interest. In this section we present a number of ways in which further research and phases of RCMIP could build on the work presented in this paper.
The first is an exploration of probabilistic outputs. Most RCMs can be calibrated, i.e. have their parameters adjusted, such that they reproduce our best-estimate (typically median) observations. However, RCMs are also used in a probabilistic mode. In this mode a parametric ensemble is run for a given RCM and set of climate forcers. The results are then used to capture the likelihood that different climate changes will unfold, particularly the likelihood of reaching different warming levels. Given the widespread use of probabilistic distributions, particularly for quantifying likely ranges of climate sensitivity and climate projections (see e.g. Meinshausen et al., 2009;Skeie et al., 2018;Vega-Westhoff et al., 2019), examining the differences between existing probabilistic model setups is an obvious next step.
Secondly, there are a wide range of RCMs available in the literature. This variety can be confusing, especially to those who are not intimately involved in developing the models. An overview of the different models, their structure and relationship to one another (in the form of a genealogy) would help reduce the confusion and provide clarity about the implications of using one model over another.
Thirdly, emulation results have generally only been submitted for a limited set of experiments. Hence it is still not clear whether the emulation performance seen in idealised experiments also carries over to scenarios, particularly the SSP-based scenarios. As the number of available CMIP6 results continues to grow, this area is ripe for inves-tigation and will lead to improved understanding of the limits of the reduced-complexity approach. The development of a common resource (see https://cmip6.science.unimelb.edu. au/, last access: 22 October 2020; Nicholls et al., 2020b) for RCM calibration will greatly aid this effort by ensuring that each group has access to the same set of calibration data.
Finally, whilst evaluating RCMs is a useful exercise, the root causes of these differences may not be clear. This can be addressed by performing experiments which specifically diagnose the reasons for differences between models, for example simple pulse emissions of different species or prescribed step changes in atmospheric greenhouse gas concentrations. Such experiments could build on existing research (van Vuuren et al., 2011b;Schwarber et al., 2019) and would allow even more comprehensive examination and understanding of RCM behaviour. This would require custom experiments, particularly for the carbon cycle, which is strongly coupled to other parts of the climate system. However, unlike in the case of ESMs, adding extra RCM experiments adds relatively little technical or human burden, because RCMs are computationally cheap and because RCMIP's standardised formats facilitate highly automated experiment pipelines.

Conclusions
RCMs are used in many applications, particularly where computational constraints prevent other techniques from being used. Due to their importance in climate policy assessments and in carbon budget calculations, as well as their applicability to a wide range of scientific questions, understanding the behaviour and output from RCMs is highly relevant and requires continuous updating with the latest science. Here we have presented the Reduced Complexity Model Intercomparison Project (RCMIP), an effort to facilitate the evaluation and understanding of RCMs in a systematic, standardised and detailed way. We hope this can greatly improve ease of use of, and familiarity with, RCMs.
We have performed RCMIP Phase 1, which provides an initial database of experiments conducted with 15 participating models from the RCM community. RCMIP Phase 1 focused on basic comparisons of RCMs with observed globalmean temperature changes, comparisons of RCMs with the global-mean temperature response of more complex models, the difference between the SSP-based and RCP scenarios, and an exploration of the relationship between cumulative CO 2 emissions and surface air temperature change in the RCMs. These initial comparisons demonstrate that RCMIP's infrastructure is a useful tool for such intercomparisons and that the RCM community is able to perform such intercomparisons on timescales of the order of months. Further work will examine the relationship between different RCMs, RCMs' probabilistic projections and the cause of differences between RCMs. RCMIP fills a gap in our understanding of RCM behaviour, in particular, how different RCMs perform relative to each other as well as how they compare with observations. This gap is particularly important to fill given the widespread use of RCMs throughout the integrated assessment modelling community and in large-scale climate science assessments. We welcome requests, suggestions and further involvement from throughout the climate modelling research community. With our efforts, we aim to increase understanding of and confidence in RCMs, particularly for their many users at the science-policy interface.
The other participating models are not yet available publicly for download or as open source. Please also refer to their respective model description papers for notes and code availability.
Author contributions. ZN and RG conceived the idea for RCMIP. ZN, MM and JL set up the RCMIP website (https://www.rcmip. org/, last access: 22 October 2020), produced the first draft of the protocol and derived the data format. All authors contributed to updating and improving the protocol. ACC2 results were provided by KT and EK. AR5IR and Held et al. two-layer model were provided by ZN. CICERO-SCM results were provided by JF, BS, MS and RBS. EMGC results were provided by LM, AH and RJS. ESCIMO results were provided by UG. FaIR results were provided by CS. GIR results were provided by NL. GREB results were provided by DD, CF, DM and ZX. Hector results were provided by AS and KD. MAGICC results were provided by MM, JL and ZN. MCE results were provided by JT. OSCAR results were provided by TG and YQ. WASP results were provided by PG. ZN wrote, except for the model descriptions, the first manuscript draft, produced all the figures and led the manuscript writing process with support from RG. All authors contributed to writing and revising the manuscript. no. CE170100023). Katsumasa Tanaka benefited from state assistance managed by the National Research Agency in France under the "Programme d'Investissements d'Avenir" under the reference "ANR-19-MPGA-0008". Robert Gieseke has been supported by the German Federal Ministry for the Environment, Nature Conservation and Nuclear Safety (grant no. NO16_II_148_Global_A_IMPACT) while at PIK in the beginning of RCMIP. The EMGC work was supported by the NASA Climate Indicators and Data Products for Future National Climate Assessments (INCA) programme (award NNX16AG34G).
Review statement. This paper was edited by Carlos Sierra and reviewed by three anonymous referees.