ATTRICI 1.0-counterfactual climate for impact attribution

Climate has changed over the past century due to anthropogenic greenhouse gas emissions. In parallel, societies and their environment have evolved rapidly. To identify the impacts of historical climate change on human or natural systems, it is therefore necessary to separate the effect of different drivers. By definition this is done by comparing the observed situation to a counterfactual one in which climate change is absent and other drivers change according to observations. As such a counterfactual baseline cannot be observed it has to be estimated by process-based or empirical models. We here present ATTRICI (ATTRIbuting Climate Impacts), an approach to remove the signal of global warming from observational climate data to generate forcing data for the simulation of a counterfactual baseline of impact indicators. Our method identifies the interannual and annual cycle shifts that are correlated to global mean temperature change. We use quantile mapping to a baseline distribution that removes the global mean temperature related shifts to find counterfactual values for the observed daily climate data. Applied to each variable of two climate datasets, we produce two counterfactual datasets that are made available through the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP) along with the original datasets. Our method preserves the internal variability of the observed data in the sense that observed (factual) and counterfactual data for a given day remain in the same quantile in their respective statistical distribution. That makes it possible to compare observed impact events and counterfactual impact events. Our approach adjusts for the long-term trends associated with global warming but does not address the attribution of climate change to anthropogenic greenhouse gas emissions.


Introduction
Global mean temperature rises and has recently surpassed 1°C warming above pre-industrial levels. Climate change as a phenomenon of today exerts influence on, for example, freshwater resources, terrestrial water systems, coastal systems, oceans, food production systems, the economy, human health, security and livelihoods (IPCC 2014). The causal chain from climate change to impacts is often complex and intertwined with additional drivers, like land-use change, agricultural practices, forest management, building practices, or urban expansion (IPCC 2014).
Attribution aims to quantify the drivers of change in a system. The term is used differently in different research fields as illustrated in Fig. 1. The attribution of changes in the climate system itself (Hegerl et al. 2010;IPCC 2013) asks whether changes in climatic variables are induced by anthropogenic interference, i.e. the emission of greenhouse gases ("climate attribution"). As a sub-domain of this area of research, extreme event attribution aims to quantify to what degree the change of strength or occurrence probability of weather extremes can be attributed to anthropogenic climate forcing (Trenberth, Fasullo, and Shepherd 2015;NAS 2016;Stott et al. 2016). In contrast, "impact attribution", which we want to facilitate with the method and data presented here, aims to quantify to what degree observed changes in natural, human and managed systems can be attributed to climate change, no matter what caused the climate change. Instead of tracing back climate change to greenhouse gas emissions, the central task is to separate effects of climate change from the effects of non-climaterelated drivers. Such drivers are, for example, changes in management that alter climate-induced changes in crop yields (Butler, Mueller, and Huybers 2018;Iizumi et al. 2018;Zhu et al. 2019) or land-use changes adding to climate-driven changes in biodiversity (Hof et al. 2018). Accordingly, working group II of the IPCC AR5 (IPCC 2014, chap. 18) has defined that an impact of climate change is detected if the observed state of the system differs from a counterfactual baseline that characterizes the system's behavior in the absence of changes in climate where "climate change refers to any long-term trend in climate, irrespective of its cause" (IPCC 2014, chap. 18.2.1).
This counterfactual baseline may be stationary or may change over time, for example due to direct human influences such as changes in agriculture (Butler, Mueller, and Huybers 2018), water management, or land-use patterns (Wang and Hijmans 2019). As such, a counterfactual baseline cannot be observed, it has to be estimated by empirical or process-based models.
The counterfactual climate data introduced here is designed to facilitate climate impact attribution following the AR5 definition.
Our counterfactual climate data is derived from two observational climate datasets of the Inter-Sectoral Impact Model Comparison project (ISIMIP, www.isimip.org) that are described in the Data section. In combination with historical impact simulations based on these factual climate datasets (ISIMIP experiment obsclim), impact model simulations forced by the counterfactual climate data we present here (ISIMIP experiment counterclim) 1 allow for impact attribution according to the Many climate impacts manifest through extreme climate conditions. The approach presented here preserves the fluctuations of the original historical climate records as extreme events are adjusted by the "contribution of the observed trend to event magnitude" (Diffenbaugh et al. 2017). Simulated historical and counterfactual impacts based on our historical and counterfactual climate datasets can therefore be compared event by event.
As impact attribution does not need to address the cause of climate change, we can build the counterfactual dataset through the removal of climate change related shifts derived from the historical dataset itself. This makes it different from counterfactual climate model simulations, which draw a realization of climate that is different from the observed realization and therefore do not yield the same timing of extreme climate conditions.

Data
For ISIMIP3, we construct counterfactual climate data for two global observational datasets, which both have daily temporal and 0.5° spatial resolution. The first of these observational datasets is from phase 3 of the Global Soil Wetness Project (GSWP3; (Dirmeyer et al. 2006)) and covers the years 1901-2010. It is a dynamically downscaled and bias-adjusted version of the 20th Century Reanalysis (20CR; (Compo et al. 2011)) and has been used as a meteorological forcing dataset in several climate impact assessments (e.g., (Müller Schmied et al. 2016;Chang et al. 2017;Schewe et al. 2019)). GSWP3 has high temporal consistency as it was produced using the same reanalysis, downscaling and bias adjustment system for all years.
The second observational dataset, GSWP3-W5E5 was generated for phase 3a of ISIMIP and covers the years 1901 to 2016.
Next to GSWP3, it is based on W5E5 (Lange 2019a), a dataset that was compiled to support the bias adjustment of climate input data carried out in phase 3b of ISIMIP. W5E5 combines the WFDE5 dataset (WATCH Forcing Data methodology applied ERA5 reanalysis data; (Cucchi et al. 2020)) over land with data from the latest version of the European Reanalysis (ERA5; (Hersbach et al. 2020)) over the ocean. Since W5E5 only covers the years 1979 to 2016, which is insufficient for attribution studies, it was extended backward in time to generate GSWP3-W5E5, which covers the years 1901 to 2016. In this extended dataset, GSWP3 data for 1901 to 1978 were homogenized with W5E5 data using the ISIMIP2BASD v2.3 bias adjustment method (Lange 2019b(Lange , 2020 to reduce discontinuities at the 1978-1979 transition. While GSWP3-W5E5 is considered less consistent in time than GSWP3 despite this homogenization, its advantage is its high quality from 1979 onwards (Cucchi et al. 2020) and the coverage of the years 2011-2016.

Methodology
Assuming that "climate change refers to any long-term trend in climate, irrespective of its cause" (IPCC 2014, chap. 18) we here present a method to develop time series of stationary "no climate change" climate data from observational daily data by removing the long-term trend while preserving the internal day-to-day variability. In the following, we first describe our general approach and then provide a more formal description.
A very basic de-trending approach would estimate a linear trend separately for each day of the year and grid cell, and subtract this estimated trend from the daily data. This approach would yield trend lines that freely fluctuate from day to day.
To make better use of the available data and ensure a smooth variation of trends from one day to the other our method uses a functional form (finite number of periodic Fourier modes) to model the annual cycle. We also go beyond the very basic approach by setting up probability models with explicit representations of the statistical distribution of the climate variables, which allows for non-normal distributions to represent our data. This is particularly important for a probability model of precipitation that can account for positivity constraints and separate trends in the number of wet days and the intensity of precipitation on wet days.
Finally, we use global mean temperature instead of time as a predictor of the long-term changes in the different climate variables. Over the considered time period from 1901 to 2016, global mean temperature has not changed linearly. As global warming is considered to be an important driver of the long-term changes in regional climate variables, it is expected to be a better predictor of these changes than time. In fact, the classical "pattern scaling approach" builds on this assumption (Santer et al. 1990;Mitchell 2003). We use global warming as a potentially powerful predictor of regional climate change without claiming causality. As we only aim for stationary climate data by removing the long-term trend in observed climate data, irrespective of its cause, we use global mean temperature as a good candidate for a predictor that allows for achieving this aim, and finally check the created counterfactual climate data for stationarity.

Probability model
We aim to capture the statistics of a climate variable in the historical record with a parametric distribution A. We call this distribution the factual distribution of the climate variable. This distribution evolves in time through the time dependence of its parameters. We model the parameters as linear functions of both the global mean temperature and the annual cycle. This is exemplified for the expected value μ (T ,t ) of the Gaussian distribution in Eq. 1-3: Here T is the global mean temperature change since 1901, t is the number of days since the first day of the historic observations, and n is the number of modes used to approximate the periodicity of the climate variables. Each of the Fourier coefficients a k , b k is modeled as a linear function of the global mean temperature T : In particular The approach enables a long-term change of the parameter μ (T ,t ) through a 0 (T ) and a change of amplitude and phase of the annual cycle through a k (T ) and b k (T ). We produce a counterfactual distribution B from the factual distribution A by restricting T to the early period in which it does not deviate significantly from zero. The probabilistic model is illustrated for daily temperatures at an exemplary grid cell in panel A of Fig. 2. The difference between the expected values of distribution A (blue line) and B (orange line) is due to a vertical shift through a change in a 0 (see Eq. (3)) and a distortion of the annual cycle through a change in the Fourier coefficients (see Eq. (2)).
The parametric distribution type is the same for A and B. It depends on the climate variable and is listed in Table 1. We estimate the evolution of the factual distribution A individually for each climate variable and grid cell, using a Bayesian approach and building on the pymc3 python package (Salvatier, Wiecki, and Fonnesbeck 2016). We then utilize the distributions A and B to quantile-map each value from the observed dataset to a counterfactual value (Wood et al. 2004;Cannon, Sobie, and Murdock 2015;Lange 2019b). Quantile mapping is different for each day of the time series because our approach accounts for the annual cycle and a change in the annual cycle. In Fig. 2  We use a Gaussian distribution to model the variables tas, rlds, rsds, ps, and hurs. We introduce two auxiliary variables tasrange and tasskew to construct the tasmin and tasmax counterfactuals. We do not estimate counterfactual time series for tasmin and tasmax individually to avoid large relative errors in the daily temperature range as pointed out by (Piani et al. 2010). Following (Piani et al. 2010), we estimate counterfactuals of the daily temperature range, tasrange = tasmax -tasmin, and the skewness of the daily temperature, tasskew = (tas -tasmin) / tasrange. We then derive counterfactual tasmin and tasmax from counterfactual tas, tasrange and tasskew. We use a Gaussian distribution to model tasrange and tasskew.
We use the parameter model described in Eqs.
(1) to (3) for the expected value μ (T ,t ) for all variables with Gaussian distribution. Except for tasskew, we use one Fourier mode to only capture the major mode of the annual cycle. We use two modes for tasskew as it often shows two modes in the annual cycle. For all variables described with a Gaussian distribution, the standard deviation σ of the Gaussian distribution is assumed to be constant, i.e. independent of both time and global mean temperature. We choose σ to be constant for the Gaussian distribution because a dependency on global mean temperature leads to instabilities in the estimation of the parameters of μ (T ,t ).
The variables tasrange, rsds and hurs have bounds, which we handle in the quantile mapping step. By definition, tasrange is bound to be positive and taskew is bound to be positive and less than one. To avoid invalid values through quantile mapping, we do not quantile-map values that would result in counterfactual values outside the valid range. The counterfactual values are the same as the factual value in these cases. This happens rarely, as values close to the bounds are rare. Near-surface relative humidity (hurs) is bound to be positive and less than or equal to one, and surface downwelling shortwave radiation (rsds) is bound to be greater than or equal to zero. To avoid invalid counterfactual values for hurs and rsds, we set values that are outside the valid range after quantile mapping to the closest valid value.
We use a Weibull distribution to model surface wind speed (sfcWind). The distribution has a scale parameter α and a shape parameter β, which both need to be positive. We use the logistic transformation to map the parameter model of Eq. (1) to positive values to model the scale parameter β. We assume the shape parameter α to be independent of both time and global mean temperature.
The prior distributions used for the Bayesian estimation of the parameters in Eq. distribution has three parameters. The parameter p for the Bernoulli distribution describes the probability of a dry day. A given day is defined to be a dry day if the amount of rain is below a threshold of 0.1mm per day. Wet days are all days where the threshold is exceeded. The gamma distribution describes the amount of rain on wet days. We describe the gamma distribution with its expected value μ and its standard deviation σ . All three parameters are modeled as a linear function of the global mean temperature following Eq. (3). As opposed to the other variables, we do not model the annual cycle explicitly for precipitation. This is equivalent to the assumption that global mean temperature influences precipitation similarly throughout the year.
The dry-day probability p is bound to values between 0 and 1. Therefore we use a beta distribution as prior for the intercept value a 0 intercept in Eq. (3). For the slope value a 0 slope we use a shifted beta distribution that ranges from −a o intercept to 1 −a 0 intercept . In combination with the normalized global mean temperature predictor this limits the prior of p and ensures that the modeled p is within 0 ≤ p ≤1. For the parameters μ and σ , we use exponential distributions as priors to ensure positive values. Similar to the priors for p, the priors for the slope values are shifted by the intercept value. With this approach we detect independent trends in dry-day probability and wet-day precipitation intensity.
The computation of counterfactual values for precipitation via quantile mapping has some special cases that we handle as follows. We need to change the number of dry days if the dry-day probability p differs between distributions A and B. If p is larger in the counterfactual distribution, wet-days with the least precipitation amounts become dry days. If p is smaller in the counterfactual distribution, we pick dry days in the observed time series at random and assign them a small precipitation amount above the threshold.
A counterfactual huss is derived from counterfactual tas, ps and hurs using the equations of (Buck 1981) as described in (Boucher and Best 2010).

Results
We provide counterfactual "no climate change" daily forcing data for the 10 climate variables available in the GSWP3 and GSWP3-W5E5 datasets available within ISIMIP and listed in Table 1. The counterfactual datasets are free to download through the ISIMIP data portal 2 along with the underlying original GSWP3 and GSWP3-W5E5 data. In the following we illustrate key features of the GSWP3-W5E5 dataset. The respective figures for GSWP3 can be found in the Appendix.
To test whether our approach is able to remove the signal of climate change from the observed time series of each climate variable we compare differences between multi-year averages over the first and last 30 years of the factual and counterfactual data in Figure 4 for tas, tasmin, tasmax, pr and rsds, and in Figure 5 for rlds, ps, sfcWind, hurs and huss. Our 2 https://esg.pik-potsdam.de/search/isimip/?project=ISIMIP3a&product=input&dataset_type=Climate%20atmosphere %20counterfactual method strongly reduces the observed trends (Figure 4 and 5, left columns), yielding quasi-stationary time series for most locations and variables (Figure 4 and 5, right columns). Remaining non-stationarities are largest for precipitation in the Tibet region, Western and Eastern Africa and South America and for wind speed in Greenland. We expand on reasons for these exceptions in the discussion section.
In addition, we compare regional averages over 21 world regions (Giorgi and Francisco 2000) derived from the original and the counterfactual datasets. We show averages over Southern South America in Figures 6-8 (Figures 6 and 9). By construction, it retains the year-to-year variability, i.e. dry years stay dry and wet years stay wet.
To check that the annual cycle in the counterfactual dataset matches the annual cycle in the early period  of the factual data when climate change was still negligible, we show multi-year regional mean annual cycles in Figures 7 and 10 for Southern South America and Northern Europe, respectively (see Appendix for the other regions). The comparison of the early annual cycle to the late annual cycle in the original observational data shows that for many variables and regions the signal of climate change depends on the season, as monthly means have not changed uniformly and the shape of the annual cycle has changed.
In Southern South America, wind speeds have increased relatively uniformly over all months while trends in relative humidity vary strongly with the season (Figure 7). Our approach successfully captures seasonalities of trends and removes the changes so that the annual cycle of the counterfactual in the late period (orange line) is similar to the factual one in the early period (thick blue line) for all considered variables in Southern South America.
In Northern Europe, monthly means have changed relatively uniformly from the early to the late period except for surface air pressure ( Figure 10). Our approach successfully maps the late factual annual cycle (thin blue line) to the early factual annual cycle (thick blue line), as indicated by the counterfactual annual cycle (orange line). However, for precipitation and for surface pressure, the annual cycle is complex and encompasses several modes. Our model for precipitation, which adjusts dry day probability and the amount of rain, but has no explicit annual cycle, correctly captures the constant shift towards lower precipitation, but does not fully represent the seasonality of the trends. Therefore the annual cycle in the counterfactual dataset does not match the factual early annual cycle. Similarly, as we use only one Fourier mode to model the annual cycle of air pressure, we do not capture the complex pattern of the early period. This leads to counterfactual precipitation and air pressure (orange lines) that do not fully match the early factual cycle (thick blue lines).
To illustrate the performance of the method on the daily scale, daily regional mean factual and counterfactual values for years 2015 and 2016 are shown in Figures 8 and 11. In Southern South America (Figure 8), the shift from observations (blue dots) to the counterfactual (orange dots) for relative humidity is largest in November to February and minimal around June.

205
For surface air pressure, observations are mainly shifted in the second half of the year. This is different to wind, for which a long-term trend throughout the year is evident for Southern South America. Quantile mapping therefore reduces wind speeds throughout the year. In Northern Europe (Figure 11), surface air pressure, wind and relative humidity exhibit both positive and negative shifts from factual to counterfactual values at different times of the year.
We present plots for the other Giorgi regions analogous to Figures 6-11 in the Appendix. While for most regions our approach is appropriate to generate a stationary climate, there are some exceptions similar to the ones discussed for Northern Europe. In summary, in North Asia, Southeast Asia, the Tibet and the Sahara region, the shift in precipitation is not fully captured. In the Western Sahara region, surface downwelling shortwave radiation is not fully adjusted. The respective counterfactuals need to be applied with care for these regions.

Discussion
Answering the question whether and to what extent observed changes in natural, human and managed systems are already happening in response to ongoing climate change needs a comparison of the observed state of the considered system to its hypothetical state without climate change. Climate impact models can be considered as ideal tools to address this question as they are usually designed to represent the response of impact indicators to climate disturbances but also account for direct human interventions such as agricultural management changes, water abstraction or implementation of flood protection.
Within the model, individual drivers can be controlled such that a "full forcing" run (observed climate change + observed direct human interventions) can be compared to a counterfactual "no climate change" baseline run (counterfactual climate + observed direct human interventions). To be able to attribute observed changes in natural and human systems to climate change in this way, it is necessary that impact models are able to explain the observed changes in the considered system, that is, the full forcing run needs to reproduce the observed changes in the considered system. Despite the increasing explanatory power of impact models (Jägermeyr and Frieler 2018;Müller et al. 2017) they are still rarely used for climate impact attribution according to the AR5 definition. By providing climate forcing data for counterfactual "no climate change" runs, we facilitate climate impact attribution and utilize the strength of impact models to address the important question to what degree climate change is already affecting natural and human systems.
The proposed approach to generate counterfactual "no-climate change" forcing data is designed to remove the signal of climate change from observational climate data assuming that "climate change refers to any long-term trend in climate, irrespective of its cause", according to the AR5 definition. We apply our method to derive counterfactual stationary climate data from GSWP3 and GSWP3-W5E5. The counterfactual along with the observational datasets are provided through ISIMIP3a.
The design of the counterfactual data and the impact simulation framework does not directly allow for an attribution of impacts to human greenhouse gas and aerosol emissions, but only separates climate from other drivers of change. However, once this first step of separation is successful, attribution to anthropogenic emissions can be done in a second step (two-step attribution (Hegerl et al. 2010;Stone et al. 2013). Such a second step is necessary to for example attribute a fraction of an impact to a greenhouse gas emitter and support climate litigation (Marjanac, Patton, and Thornton 2017;Burger, Wentz, and Horton 2020).
Our method ultimately builds on the correlation between a regional climate variable and global mean temperature change to remove long-term trends in climate. Using global mean temperature as the predictor does not imply causality. It is well possible that changes in the statistics of regional climate variables have other origins than global warming. Our method cannot infer the origin of change in a regional climate variable and removes shifts in the data regardless of its cause as long as these shifts are correlated to global warming. For example, a shift in local climate data not related to global warming but regional forcing by land use changes or aerosol emission or induced by e.g., changes in measurement technology or data processing or human error would also be captured by our method as long as the effects are correlated with the global warming signal. Our method does not intend to replace climate simulations with counterfactual greenhouse gas forcings such as the histNAT CMIP6 experiments (Gillett et al. 2016) that are required to attribute changes in climate or impacts to anthropogenic emissions.
As meteorological observations were sparse in the early 20th century, data quality is generally lower in this earlier period than in the later periods covered by GSWP3 and GSWP3-W5E5. Yet our approach estimates model parameters for all available historical values at once (40177 values for GSWP3 and 42369 for GSWP3-W5E5) so that counterfactuals do not rely heavily on data from the early period. This at least partly mitigates the problem of data quality in the early period. It does not resolve a problematic non-stationary counterfactual for wind speeds over Greenland. Wind speeds are high in the early period and reach a low plateau by 1960 in both datasets, leading to increasing wind speeds in the counterfactual towards the high values of the early period ( Figures A22 and A81). Our results also show that the harmonization applied for the production of the GSWP3-W5E5 dataset did not work perfectly in all cases. For example, it introduced a jump in shortwave downwelling radiation over Northern Europe (Fig. 9), which is also present in the counterfactual data. Such discontinuities are not present in the GSWP3 dataset (see Data section).
Producing a precipitation counterfactual poses particular challenges as it involves the adjustment of the number of wet and dry days as well as the intensity of wet-day precipitation. Problems found for the Tibet region ( Figure 4) are probably due to discontinuities in the factual data at around 1930 ( Figure A52). While Figure 4 suggests that precipitation de-trending does not work well over tropical Africa and South America, time series of regional mean precipitation for the Amazon region, West Africa and East Africa ( Figures A4, A28 and A31) show that our counterfactuals for these regions show the expected behaviour and the impression from Figure 4 is only due to the capped color scale used there.
With the methods and data presented here, we aim to advance the field of climate impact attribution and reveal past and present societal and environmental sensitivities to climate change. Getting a better understanding of the drivers of observed changes in natural and human systems will help us to better estimate future risks related to ongoing global warming and develop adequate adaptation measures.