Representation of climate extreme indices in the ACCESS1.3b coupled atmosphere–land surface model

Abstract. Climate extremes, such as heat waves and heavy precipitation events, have large impacts on ecosystems and societies. Climate models provide useful tools for studying underlying processes and amplifying effects associated with extremes. The Australian Community Climate and Earth System Simulator (ACCESS) has recently been coupled to the Community Atmosphere Biosphere Land Exchange (CABLE) model. We examine how this model represents climate extremes derived by the Expert Team on Climate Change Detection and Indices (ETCCDI) and compare them to observational data sets using the AMIP framework. We find that the patterns of extreme indices are generally well represented. Indices based on percentiles are particularly well represented and capture the trends over the last 60 years shown by the observations remarkably well. The diurnal temperature range is underestimated, minimum temperatures (TMIN) during nights are generally too warm and daily maximum temperatures (TMAX) too low in the model. The number of consecutive wet days is overestimated, while consecutive dry days are underestimated. The maximum consecutive 1-day precipitation amount is underestimated on the global scale. Biases in TMIN correlate well with biases in incoming longwave radiation, suggesting a relationship with biases in cloud cover. Biases in TMAX depend on biases in net shortwave radiation as well as evapotranspiration. The regions and season where the bias in evapotranspiration plays a role for the TMAX bias correspond to regions and seasons where soil moisture availability is limited. Our analysis provides the foundation for future experiments that will examine how land-surface processes contribute to these systematic biases in the ACCESS modelling system.


Introduction
Climate extremes, including heat waves, heavy precipitation events or droughts have important effects on ecosystems and society (Easterling, 2000;Ciais et al., 2005;Pall et al., 2011). Many climate extremes are related to natural variability (Arblaster and Alexander, 2012;Seneviratne et al., 2012). However, climate change has the ability to modify the frequency, intensity, spatial extent, duration, and timing of climate extremes, and can lead to events unprecedented in the historical record . Given the impact of extremes, it is important to understand their causes, how they might change in the future and the role of potential interacting processes and feedbacks that might amplify them. This is urgent given that some extremes appear to be increasing in frequency Coumou and Rahmstorf, 2012;Donat and Alexander, 2012;Perkins et al., 2012) and that extremes are considered to be a particularly challenging aspect of climate change adaptation (IPCC, 2012).
Extreme events can be directly influenced by land-surface processes. Heat waves for example can be amplified by landsurface processes including dryness and decreased vegetation (Zaitchik et al., 2006;Fischer et al., 2007;Koster et al., 2009;Hirschi et al., 2010;Stéfanon et al., 2012;Lorenz et al., 2013). Limited soil moisture availability and less active vegetation decreases evapotranspiration and, therefore, more energy is available for the sensible heat flux which increases temperatures (e.g. Seneviratne et al., 2010). Temperature variability is also affected by land-surface processes (Seneviratne et al., 2006) and Jaeger and Seneviratne (2010) found a tendency towards a greater impact of land-surface Published by Copernicus Publications on behalf of the European Geosciences Union.
processes on maximum temperatures, as distinct from minimum temperatures. The land-atmosphere coupling mechanisms for maximum and minimum temperatures differ. A clear relationship between maximum temperature and realistic soil moisture initialisation was found by Hirsch et al. (2014). This is in contrast to minimum temperatures where the influence of soil moisture is less clear given the role of net longwave emission in modulating nighttime temperatures. Interactions between the land surface and precipitation also exist, but the scale of the impact is less clear. It remains uncertain how soil moisture affects rainfall, with no agreement on the sign of the feedback (Findell and Eltahir, 2003;Ek and Holtslag, 2004;Taylor and Ellis, 2006). Pitman et al. (2012) analysed several global climate models to examine the impact of land-use changes on temperature and precipitation extremes and found opposing effects to the impact of increasing CO 2 for some extreme indices and additive impacts for others. In short, to understand the changes in extremes linked with natural variability or increasing CO 2 we need to understand how land-surface processes influence climate extremes.
Land-atmosphere feedbacks are difficult to investigate using observations alone because of the lack of suitable longterm data sets and the uncertainty regarding how feedbacks might change in the future with climate change. Climate models are useful tools for investigating land-atmosphere feedbacks and their influence on extreme events (Fischer et al., 2007;Jaeger and Seneviratne, 2010;Lorenz et al., 2010). In Australia, a new global earth system model (AC-CESS) has been developed (Bi et al., 2013;Kowalczyk et al., 2013). ACCESS1.0 compares well with other models in the Coupled Model Intercomparison Project, version 5 (CMIP-5) with regards to the representation of extremes (Sillmann et al., 2013). A more recent version of the model, AC-CESS1.3, has also been used to run CMIP-5 simulations and performs similarly to ACCESS1.0 . Two major differences between ACCESS1.0 and AC-CESS1.3 are the parameterisation of the land surface and the cloud scheme. In ACCESS 1.0, the UK Meteorological Office Surface Exchange Scheme (MOSES) is used, but is replaced in ACCESS1.3 by the Community Atmosphere Biosphere Land Exchange (CABLE1.8) model. The two versions of the model were compared by Kowalczyk et al. (2013), although this analysis did not examine the representation of extreme events.
In this study, we undertake an analysis of the ACCESS1.3 model in terms of its ability to simulate a selection of extremes. We use ACCESS1.3, but replace CABLE1.8 with CABLE2.0 (the most recent released version) in an overall modelling system labelled ACCESS1.3b. We use an Atmospheric Model Intercomparison Project (AMIP) style experimental design (Gates, 1992) involving simulations over the 1950-2012 period with prescribed sea surface temperatures and sea ice concentration. The use of the AMIP experimental design decreases the uncertainty in terms of sea surface temperatures and associated teleconnections including the El Niño-Southern Oscillation. Our analysis focuses on the simulation of climate extreme indices defined by the Expert Team on Climate Change Detection and Indices (ETCCDI) which are provided by two observational data sets by Donat et al. (2013a, b). Our goal is to assess the skill of AC-CESS1.3b in simulating extremes and to identify systematic biases, strengths and weaknesses. This provides us with the foundation for future experiments aimed at resolving deficiencies, particularly where these relate to land-surface processes. This study is organised as follows: Sect. 2 gives an overview of the model used and the data sets we used for evaluation. Section 3 provides our results. Section 4 provides a discussion and finally Sect. 5 concludes our study.  (Taylor et al. (2000), http://www-pcmdi.llnl.gov/projects/ amip/AMIP2EXPDSN/BCS/amipbc_dwnld.php) and regridded and converted to the UM's data format at the UK Meteorological Office. We performed simulations at 1.25 • latitude × 1.875 • longitude resolution (N96 resolution), 38 vertical levels, and a 30 min time step. The simulation covers the 1950-2012 period, the first year is used as a spin-up period and not included in the analysis. Orography in AC-CESS1.3b is derived from the 30" GLOBE data set (GLOBE Task Team and others, 1999). However, since this data set has deficiencies over Australia (Bi et al., 2013), it is improved for the Australian region using the Geoscience Australia highquality data set (Hilton et al., 2003).

The atmosphere: Unified Model (HadGEM3)
The atmospheric model in ACCESS1.3b is the Unified Model developed at the UK Meteorological Office (Davies et al., 2005;Martin et al., 2006 in the UM are non-hydrostatic, fully compressible and the advection scheme is semi-Lagrangian. The vertical coordinates are height based and follow the terrain, and a regular Arakawa C grid is used in the horizontal. The radiation scheme is a general two-stream scheme developed by Edwards and Slingo (1996), but was improved in terms of pressure and temperature scaling. In addition, the tripleclouds scheme of Shonk and Hogan (2008) was included to improve the representation of horizontal cloud inhomogeneity. Calculation of the radiation scheme is performed eight times per day (3-hourly). Convection is parameterised by a modified mass flux scheme based on Gregory and Rowntree (1990). The boundary layer mixing scheme represents nonlocal mixing in unstable layers and an explicit entrainment parameterisation (Lock et al., 2000). The cloud microphysics scheme contains water vapour, total cloud fraction, cloud liquid water and cloud ice as prognostic variables and we use the PC2 prognostic condensate scheme described in Wilson et al. (2008). Atmospheric chemistry includes the aerosol sulfate, soot, biomass, dust (from IGBP soils, although values are very low), sea salt, and biogenic (climatology only) aerosols. Aerosol emissions are prescribed by monthly climatologies, and aerosols can be advected and deposited. More details can be found in Hewitt et al. (2011) andBi et al. (2013).

The land surface: Community Atmosphere Biosphere Land Exchange (CABLE2.0)
Land-surface models simulate biogeophysical and biogeochemical processes and handle the exchange of surface fluxes between the land surface and the atmosphere. Since the extremes explored in this paper are intimately associated with how the land surface is parameterised we provide some detail on how CABLE represents terrestrial processes. Further detailed descriptions of CABLE1.4 can be found in Wang et al. (2011) and CABLE1.8 in Kowalczyk et al. (2013). CABLE consists of three submodels: (1) canopy processes, (2) soil and snow, and (3) carbon pool dynamics and soil respiration. Canopy processes are simulated by a onelayer two-leaf canopy scheme, distinguishing between sunlit and shaded leaves for the calculation of photosynthesis, stomatal conductance and leaf temperature (Wang and Leuning, 1998). The vegetation is placed above the ground, which allows for aerodynamic and radiative interactions between the ground and the canopy. CABLE includes a sub-grid tiling approach at the surface, meaning that several surface types can exist within a grid cell (ten vegetation types and three non-vegetated types are distinguished; up to five tiles can be used within each grid cell). The soil model has six layers and the Richards equation is solved for soil moisture, while soil temperature is calculated from the heat conduction equation. The snow model has three snowpack layers and calculates the temperature, density and thickness of the snow. The carbon pool model used is simple, and net primary productivity is calculated from the annual carbon assimilation corrected for respiratory losses (carbon fluxes are not assessed in this study). The differences between CABLE1.8 used in Kowalczyk et al. (2013) and CABLE2.0 used here are small. They include bug fixes and updated optical leaf properties (transmission and reflectance) that are better calibrated for the snow-free soil albedo used by ACCESS.
CABLE has been extensively evaluated (Abramowitz et al., 2008;Wang et al., 2011) and an earlier version was used in the Land Use Change IDentification of robust impacts (LUCID) project (Pitman et al., 2009;de Noblet-Ducoudré et al., 2012). Furthermore, Mao et al. (2011) documents the performance of a low-resolution GCM of intermediate complexity, CSIRO Mk3L, coupled to an earlier version of CABLE (version 1.4b) with a focus on terrestrial quantities. This analysis provides strong evidence that the coupled model produces a reasonable large-scale climatology. More recently, Zhang et al. (2013) ran CABLE2.0 offline with GSWP2 (Global Soil Wetness Project) forcing and compared it with other participating land-surface models in GSWP and gridded observations. They found that whilst global mean evapotranspiration (ET) simulated by CABLE agreed well with other land-surface models and observations, CABLE underestimated ET in the tropics and had significant runoff errors. In addition, CABLE showed a large sensitivity to soil and vegetation parameters in tropical rainforests and mid-latitude forest regions.

ETCCDI indices and data sets
The Expert Team on Climate Change Detection and Indices (ETCCDI) defined a set of 27 indices calculated from daily maximum (T MAX ) and minimum (T MIN ) temperatures and daily precipitation (http://www.climdex.org/ indices.html). These indices were developed to investigate changes in intensity, duration and frequency of extreme climate events. Most of these indices describe moderate extremes with return periods of a year or shorter. We calculate all indices using freely available software (http:// www.climdex.org/climdex_software.html) and compare the indices from our simulations to the HadEX2 data set (Donat et al., 2013b). Only a subset of the indices is analysed in detail (see Table 1). We chose four indices that examine the frequency of high (warm days TX90p, warm nights TN90p) and low (cool days TX10p, cool nights TN10p) temperature extremes, and one temperature index investigates the amplitude between the coldest and hottest temperature per day (diurnal temperature range DTR). Two of the chosen indices examine wet precipitation extremes (maximum 1-day precipitation amount Rx1day, consecutive wet days CWD) and one index looks at dryness (consecutive dry days CDD). For temperature extremes, we chose mainly indices based on percentiles which are relative to the base period 1961-1990 because they are applicable over all climate zones and show robust trends in observational data sets (Zhang et al., 2011;Donat et al., 2013b). The precipitation indices were chosen to examine several aspects of precipitation extremes, high precipitation amounts in Rx1day, high precipitation frequency in CWD and low precipitation frequency and drought in CDD.

HadEX2 data set
The HadEX2 data set, described in detail by Donat et al. (2013b), contains 17 temperature and 12 precipitation indices. These are derived from daily maximum and minimum temperature and precipitation observations for the period covering 1901 to 2010. The indices were calculated for each station and then the monthly and annual indices were interpolated onto a 2.5 • latitude × 3.75 • longitude grid. Donat et al. (2013b) derived linear trends from the gridded fields and tested these trends for statistical significance. The highquality in situ observations were primarily sourced from the European Climate Assessment and Data set (ECA&D) and associated data sets in southeast Asia and Latin America, GHCN-Daily (USA-only), ETCCDI regional workshops and individual researchers. As a result, the spatial availability of HadEX2 data varies with time. Trend estimates can be influenced by the number of stations included in a data set. To compare time series and trends of model and observations we calculated global averages. We apply a time independent masking of the model data and HadEX2, only including grid points where more than 50 years of observational data (out of 60) are available. This minimises the deteriorating effect of variable spatial coverage on the trend calculations. The spatial coverage of the Rx1day index is larger in monthly than annual fields because the decorrelation length scale is larger for monthly compared to yearly extreme precipitation indices. To obtain a spatial coverage that is as good as possible, we calculate the annual maximum Rx1day amounts from the maximum of the monthly Rx1day fields provided by HadEX2. This increases data availability in data sparse regions (e.g. tropics); however, it needs to be taken into account that this may include stations which are less representative of a certain grid point. Therefore, grid points around areas with missing data need to be interpreted with care.
When comparing models and gridded observational data sets for extremes, it needs to be kept in mind that scaling effects likely play a role. That is, the gridded observational data set is derived from annual extremes at each station, whereas the models represent a grid-point average for each day. Therefore, annual maxima from climate models are expected to be lower in intensity, especially for precipitationbased indices (Kiktev et al., 2003;Tebaldi et al., 2006).

Other data sets
We use the HadGHCND gridded daily temperature data set  derived from near-surface maximum and minimum temperature observations. It covers the period from 1951 to the present on a 2.75 • latitude × 3.75 • longitude grid. It was designed for the analysis of climate extremes and the evaluation of climate models. Note that the data coverage varies with time.
We also use the Global Precipitation Climatology Project (GPCP) Version-2 precipitation (http://www.esrl.noaa.gov/ psd/data/gridded/data.gpcp.html) data set. This is derived from a combination of satellite and rain-gauge measurements (Adler et al., 2003). GPCP is available as a global, monthly analysis of surface precipitation at 2.5 • ×2.5 • resolution from 1979 to the present (we use December 1979-November 2012 here). GPCP has been shown to agree well with groundbased observations (Ma et al., 2009;Pfeifroth et al., 2013).
The NASA "Clouds and the Earth's Radiant Energy System" (CERES EBAF Surface Ed2.7) data set provides satellite-based estimates of surface radiative fluxes. This data set was specifically created for evaluation of climate models (http://ceres-tool.larc.nasa.gov). It includes surface downwelling shortwave and longwave radiation, surface upwelling shortwave and longwave radiation and estimates for clearsky radiation from 2001 to 2009. Kato et al. (2013) found that the biases over land were, on average, 21.7 W m −2 for downward shortwave and 21.0 W m −2 for downward longwave radiation. Therefore, biases between ±10 W m −2 are not taken into account in our analysis.
Given the sparse coverage and limited availability of flux observations, satellite estimates provide the "next-best approximation". Although these are strictly models, and not true observations, the algorithms are usually constrained by as much data as possible (e.g. the GLEAM ET product is driven with gridded precipitation observations), and hence, these products have a well-defined accuracy and are, therefore, useful for comparing against global climate models (GCMs), which have much larger degrees of freedom. For the calculation of the biases we used the coarsest grid involved, either interpolating the model output to the coarser grid of the observational data set or interpolating the observations to the model resolution. Table 2 summarises the data sets used for evaluation.

Statistical significance testing
We perform a modified t test, as described in Zwiers and von Storch (1995), to indicate which biases between the model run and observations are statistically significant for T MAX , T MIN and total precipitation (P TOT ). This modified t test accounts for autocorrelation within the time series. Total precipitation, T MAX and T MIN are robust observations for long time series and are the underlying data of the extreme indices.

Probability density functions and skill score
We calculate probability density functions (PDFs) for T MIN and T MAX from the model and observations to investigate which part of the distributions are most important for the biases. The PDFs are based on T MAX (time, lat, lon), T MIN (time, lat, lon) containing monthly means for the 1951-2011 time period for the corresponding season and region. We use R's kernel density function, using the default Gaussian smoothing kernel and a bandwidth estimated via normal reference distribution to plot the lines.
We use a skill score defined in Perkins et al. (2007) which measures the overlap between two PDFs by summing up the cumulative minimum probability at each bin. A perfect skill equals one, whereas values close to zero indicate a poor agreement. We use a bin width of 0.5 • C for the calculation of the histograms as in Perkins et al. (2007).

Results
First we present the seasonal averages of T MAX , T MIN and P TOT . Daily T MAX , T MIN and total precipitation data form the basis for the calculation of the ETCCDI indices. The seasonal averages are calculated over December-January-February (DJF), March-April-May (MAM), June-July-August (JJA) and September-October-November (SON). Then we present biases in several (annual) ETCCDI indices before investigating the causes of the differences between model and observations.

Minimum and maximum temperature and total precipitation
We calculate seasonal averages from daily T MIN and T MAX for the 1951-2011 period from ACCESS1.3b (Figs. 1a and 2a) and compare them to gridded observations from HadGHCND (Figs. 1b and 2b). The overall seasonal patterns are reproduced reasonably well by ACCESS. T MAX shows a negative bias in most regions except North America and parts of southeastern Europe and Africa in JJA (Fig. 1c), whereas T MIN shows a positive bias almost globally (Fig. 2  c), except for the Arctic and Himalayas. Since HadGHCND does not have a complete coverage in all grid boxes over the whole time period we analysed, regional biases can be influenced by temperature trends, e.g. in East Africa where there is only data between ∼ 1960 and 1990 . The opposing T MAX and T MIN biases commonly lead to a good simulation of the mean temperature (Kowalczyk et al., 2013, Fig Hemisphere mid-latitudes of ∼2 • C, exceeding 5 • C over North America. Figure 2, in contrast, suggests a warm bias of ∼2 • C in T MIN almost everywhere, exceeding 5 • C over North Asia in DJF, and North America in JJA. Global patterns of P TOT are well represented in ACCESS1.3b compared to GPCP during the 1980-2012 time period (Fig. 3a and 3b).
ACCESS1.3b tends to overestimate total seasonal precipitation (Fig. 3c) in most regions, although there is a small underestimate over Europe in most seasons. The wet precipitation bias is largest in the tropics (exceeding 5 mm d −1 ), but elsewhere it is generally small (<1 mm d −1 ). A large negative precipitation bias exists in India in JJA and SON where the monsoon is displaced. This bias has previously been reported by Kowalczyk et al. (2013) and Bi et al. (2013). These biases are statistically significant, indicated by stippling, in most regions.

ETCCDI indices
Results for ETCCDI indices in ACCESS1.3b are compared to the HadEX2 data set. Since the indices are calculated for station data in HadEX2 and then gridded, one would expect that model output might appear smoother with less extremes than the observational data set (Donat et al., 2013b). In particular, one would expect precipitation-based extremes estimates calculated from station-based observations to be more intense. However, we did not find a general underestimation of the variability in the extreme indices in the model. The percentile-based indices are expected not to differ much from HadEX2, since the percentiles are calculated from the model data and are 10 % on average during the base period  per definition. Hence, differences between ACCESS and HadEX2 are mainly driven by different trends and do not depend on biases in absolute values of T MAX and T MIN . The two indices that examine cold extremes, cool nights (TN10p) and cool days (TX10p) are shown in Fig. 4. The ACCESS1.3b model represents the global patterns of both TN10p (Fig. 4e) and TX10p (Fig. 4f) reasonably well. ACCESS1.3b also captures the decreasing trends in TN10p (Fig. 4g) and TX10p (Fig. 4h)  TN90p and warm days, TX90p) are also captured well by ACCESS1.3b (Fig. 5). The regional differences for hot extremes are larger than for cold extremes. There is a large overestimate in the occurrence of TN90p in the Southern Hemisphere, particularly over South America, but this difference also affects North America, Australia and southern Africa (Fig. 5e). Similar regions are affected by an overestimation of TX90p (Fig. 5f). Despite these regional differences, ACCESS1.3b estimates the global increasing trends in both TN90p (Fig. 5g) and TX90p (Fig. 5h) remarkably well. We note that this might be because data availability in some of the regions with large differences is too low to pass the requirement of 50+ years of data in HadEX2 to be included in the global average. Also note the close agreement in interannual variability between ACCESS and HadEX2, suggesting a strong influence from sea surface temperatures, which are prescribed here, on hot extremes. The final temperature index is the diurnal temperature range (DTR, Fig. 6). This is simulated poorly by ACCESS1.3b and is globally underestimated by up to 4 • C (Fig. 6c). This result is anticipated given the seasonal overestimation of T MIN and underestimation of T MAX . The underestimation of DTR is shown clearly in the global time series (Fig. 6d). We put this large underestimation into context with CMIP-5 simulations and reanalysis in the discussion in Sect The annual maximum consecutive 1-day precipitation (Rx1day) shows both, regions of overestimation and underestimation (Fig. 7). The pronounced underestimation over India is clearly related to the missing monsoon (Fig. 7c). the underestimation of summer rainfall. Central Eurasia also shows an underestimation of Rx1day due to the underestimation in total precipitation during summer, autumn and winter. Overall, ACCESS1.3b underestimates Rx1day (Fig. 7d) by between 1951 and 2010. We also analysed Rx5day (maximum annual consecutive 5-day precipitation) that showed an overestimation from ACCESS on global average (not shown). However, this is an artefact of how we calculate these indices in HadEX2, as maxima out of the monthly maxima, which have a better coverage than the annual maximum (see Sect. 2.3) and the lack of observational data in the tropics. This problem is less pronounced for Rx1day; however, biases around the areas with missing values in HadEX2 have to be taken with care. Consecutive wet days (CWD) are clearly overestimated over the Northern Hemisphere (which is where CWD can be derived due to the low coverage in the Southern Hemisphere), while consecutive dry days (CDD) are underestimated (Fig. 8). There is no clear overall trend in the time series of CWD and CDD ( Fig. 8g and  h). Overall, there is a clear picture of ACCESS1.3b heavily overestimating consecutive wet days, and underestimating consecutive dry days in those regions where the observations are complete enough to derive these indices. The biases in extreme precipitation indices are largely influenced by the bias in total precipitation in ACCESS1.3b. This is not a surprise; climate models commonly rain too often, but as low intensity precipitation ("drizzle problem", e.g. Dai, 2006). On a global scale, ACCESS1.3b has too many consecutive wet days, so it rains too often, and underestimates consecutive 1-day precipitation. The biggest bias identified is the underestimation of the diurnal temperature range, due to an overestimation of T MIN and an underestimation of T MAX . Therefore, the next section focuses on the distributions of T MIN and T MAX .

Probability density functions of T MAX and T MIN
The probability density functions (PDFs) of T MAX and T MIN for ACCESS1.3b and the HadGHCND data set are shown in Fig. 9 (DJF) and Fig. 10 (JJA). We restrict our analysis of the PDFs to four regions with good data coverage in HadGHCND. These regions are defined in Table 3 and correspond to Asia, Australia, Europe and North America. The results from the PDFs are summarised in each panel using the skill score defined by Perkins et al. (2007) which measures the overlap of the PDFs (perfect agreement is a skill score of 1.0).
For DJF, the three northern hemispheric regions (Asia, Europe, and North America) reproduce the PDFs of the observational data set well. In Asia (Fig. 9a), the lower tail of the T MAX distribution is almost perfectly captured. The upper tail is also captured well, although there is a small deviation between 10 and 20 • C. In T MIN the upper tail is well reproduced by ACCESS1.3b, but the lower tail shows a bias of ∼5 • C with too frequent T MIN simulated around −20 • C. In North America (Figure 9d), the biases are the opposite; the upper tail of T MAX is better captured than the lower tail, while the lower tail of the T MIN distribution is reproduced well. For Europe (Fig. 9c), both T MAX and T MIN distributions only show small deviations from the observations. For Australia (Fig. 9b), the upper tail of T MAX is reasonably captured, but the mean of T MAX is underestimated and the lower tail shows a bias of ∼ 5 • C. The lower tail of T MIN in Australia is better captured than the upper tail, but the whole PDF of T MIN is shifted to the right in the model. Overall the PDF for both T MAX and T MIN are simulated with a skill score exceeding 0.8 for all regions except Australia in DJF. There is a clear problem with the PDF for T MAX in Australia in DJF linked to a large bias associated with the lower tail of the distribution.
In JJA (Fig. 10), Australia (Fig. 10b) reproduces the distributions of T MIN and T MAX better than the northern hemispheric regions Asia and North America. The lower tail of  T MIN is almost perfect but the upper tail has a bias of ∼ 5 • C. For T MAX the upper tail is reproduced well, but the lower tail is shifted to the left in the model by about 3 • C. Overall, however, ACCESS1.3b captures the T MIN and T MAX for Australia in JJA with a skill score exceeding 0.8. The PDFs for the Northern Hemisphere region are less well captured than in DJF (Fig. 10a, c and d), with half the skill scores below 0.8 for these regions. Europe (Fig. 10c) captures the lower tail for T MIN well, but the upper tail is slightly overestimated. For T MAX , the PDF is shifted to the left in the model Geosci. Model Dev., 7, 545-567, 2014 www.geosci-model-dev.net/7/545/2014/  Fig. 11. Biases between ACCESS1.3b and CERES in net shortwave radiation (a) and net longwave radiation (b). The considered time period is the overlapping time period between model run and observations, 2001-2009. by ∼ 3 • C and the mean of the distribution is underestimated. However, for Europe the skill scores in JJA are still larger than 0.8. T MAX in Asia (Fig. 10a) shows a similar picture, although the biases are larger than in Europe. T MIN is shifted to the right, especially the main peak that is also underestimated leading to a low overall skill score. In North America (Fig. 10d), only the lower tail of T MIN is reasonably reproduced by the model, the main peak is underestimated and the upper tail shifted to the right by ∼5 • C. The lower tail for T MAX in North America in JJA is too low and the upper tail too high in ACCESS1.3b. Generally, the lower tail of T MIN is reproduced better than the upper tail, whereas the upper tail in T MAX is often reproduced better than the lower tail.

Discussion
The driver of temperatures at the Earth's surface is the surface radiation balance, but different components of the radiation balance are associated with T MIN and T MAX . The daily minimum temperature, which normally occurs just before sunrise, is mainly determined by longwave radiation the night. The magnitude of incoming long waves (LW IN ) depends on sky temperature and emissivity and is affected by cloud cover and humidity, while outgoing LW depends on the emissivity and temperature of the Earth's surface. Maximum temperatures during the day are dependent on the incoming solar radiation (SW) and modulated by cloud cover and aerosols. Surface temperatures are also affected by the surface albedo, availability of soil moisture for evapotranspiration and stability conditions of the atmosphere. We examined the biases in net longwave (LW NET ) and net shortwave (SW NET ) from ACCESS1.3b to explain the biases in temperature. The CERES satellite product is used to estimate the biases in the radiative fluxes, which has a well-defined level of accuracy. When compared to CERES, ACCESS generally has an excess amount of SW absorbed at the surface (Fig. 11a) in all seasons. In the Northern Hemisphere this is small in DJF and largest in JJA, where the bias exceeds 50 W m −2 over Europe and North America. There are other regions with biases exceeding 50 W m −2 , including central Africa, India and the Amazon delta (Fig. 11a). The high bias in SW NET (Fig. 11a) is likely associated with a low cloud bias enabling excessive incoming SW. This is evident in JJA over India, where the Indian monsoon is severely underpredicted; see Fig. 3. The bias in LW NET is generally negative (Fig. 11b), especially in the arid and semi-arid areas, pointing to either outgoing LW being overestimated or incoming LW being underestimated. Outgoing LW radiation is directly proportional to the surface temperature to the 4th power. In areas with positive biases in T MIN and T MAX , ACCESS overestimates outgoing LW, which could explain the negative bias in LW NET radiation in central Eurasia and North America in JJA. Overall, the largest errors in SW NET are in JJA in the Northern Hemisphere and India as well as the Amazon delta in SON. While the largest biases in LW NET occur in warm seasons in the arid and semi-arid areas of North Africa, central Eurasia, the Middle East, India, North America and Australia, the biases in LW and SW lead to an overall overestimation of total net radiation in the Northern Hemisphere spring and summer and most of the tropics (not shown).
We calculate temporal correlations between the biases in T MIN , T MAX and radiation per season at each longitude, latitude (using the NCAR Command Language "NCL" function "escorc") when biases are larger than ±1 • C or ±10 W m −2 respectively. The bias in T MIN correlates strongly with the bias in incoming LW (Fig. 12a). This provides further evidence to associate this temperature bias with problems in the ACCESS1.3b simulated cloud cover. Franklin et al. (2013b) evaluated cloud fraction in ACCESS1.3 in detail. They found that clouds are represented reasonably but found differences in the horizontal distribution. These include a tendency towards too few clouds throughout the subtropics and trade wind regions, an underestimation of up to 25 % in DJF over Russia as well as an underestimation of 30 % in JJA over North America. The bias in T MAX correlates with the bias in SW NET , but the correlation is weaker than for T MIN and LW IN . For example, regions with large negative biases in T MAX (Fig. 1c) in the Himalayas, the Arctic, and southwestern South America, which are persistent in all seasons, do not always correspond to a negative bias in SW NET (Fig. 11a).
In the Northern Hemisphere summer, correlations between SW NET biases and T MAX biases are strong (Fig. 12b) and usually exceed ∼ 0.8. However, in SON and MAM, and in particular in DJF in the Northern Hemisphere, the correlation between SW NET biases and T MAX becomes weaker and even negative at some grid points (Fig. 12b). The weaker correlations between SW NET and biases in T MAX (Fig. 12b) point to factors other than atmospheric processes playing a role, and these are likely to be linked to land processes. ACCESS1.3b is generally lacking in its capacity to capture T MAX . This was apparent in Fig. 1c for T MAX and Fig. 6 for DTR. Reflecting on Figs. 9 and 10, the simulation of T MAX was shifted to the left in the ACCESS1.3b model in both DJF and JJA in all four regions. It is noteworthy that the largest biases tended to be at the lower tail of the PDF for T MAX (only in Europe and Asia in DJF was this not true). The most straightforward explanation for this is linked with evapotranspiration. For instance, Watterson (1997) found close spatial correlations between DTR and SW NET minus the evaporative and sensible fluxes, or LW NET . Examining how well a land-surface model simulates evapotranspiration is challenging because a bias in this quantity can result from poor forcing (rainfall, SW and LW), poor surface states (soil moisture) or poor parameterisation of the relationship between the states and the fluxes. It is also challenging because there are considerable uncertainties in estimates of evapotranspiration from observation-based products. We use GLEAM (see Sect. 2.5), recognising that this product is a model-based estimate of evapotranspiration and that there are likely significant uncertainties associated with the estimates. To decrease these uncertainties we use a second ET product, the multidata synthesis LandFlux-EVAL. Figure 13a shows the simulation of evapotranspiration in ACCESS1.3b compared with GLEAM, Fig. 13b shows the same for LandFlux-EVAL. There is a systematic bias in simulated evapotranspiration, commonly reaching 30 W m −2 and regionally exceeding 50 W m −2 . In almost all cases, AC-CESS1.3b simulates excess evapotranspiration. This is in contrast to Zhang et al. (2013), who found an underestimation of ET in the tropics in offline CABLE2.0 runs. There are, however, some important exceptions; there is too little ET over the Indian subcontinent in JJA and SON linked with the failure of the monsoon in this model. There is also a lack of evapotranspiration over parts of North America, despite the excess SW, in JJA. However, the pattern of excess evapotranspiration shown in Fig. 13a and b is large-scale and systematic. The patterns of the evapotranspiration biases are dissimilar to the LW NET biases (Fig. 11b) and are weakly linked to the biases in SW NET (Fig. 11a). The largest positive ET biases occur in densely forested areas (e.g. tropics) and in the Northern Hemisphere in summer. As shown in Fig. 1c, most of the biases in T MAX are small, or negative except over the mid-latitudes of the Northern Hemisphere in JJA, which are closely linked to the negative rainfall bias in the model (Fig. 3c). This general low bias in T MAX could be explained by the excessive evapotranspiration, which has been found in other climate models as well . Figure 13c shows the temporal correlation (calculated as for temperature and radiation using NCL's "escorc") between biases in T MAX and biases in evapotranspiration (or latent heat flux, LH, in W m −2 ). Small biases (< ±1 • C for T MAX and < ±10 W m −2 for LH) are masked to focus on the correlation of significant biases. We expect a negative correlation in areas where either ET is too low and T MAX is too high or ET is too high and T MAX is too low. There are many regions where the biases in LH and T MAX are negatively correlated. These include regions over the mid-latitudes of the Northern Hemisphere in JJA, Eurasia in SON and the Southern Hemisphere in DJF and JJA. There are also large areas where the correlation is positive, including parts of the midlatitudes of the Northern Hemisphere in SON, high latitudes in MAM and southeast Asia in MAM, JJA and SON. Unfortunately, using LH to explain biases in T MAX is limited by major gaps in T MAX observations. Despite this, the regions where the clearest negative correlations are found, and the seasons they occur within, are not unexpected. Areas with negative correlations correspond to areas where ET is limited by soil moisture availability, areas where the correlation between T MAX and ET biases is positive relate to areas where ET is limited by radiation/temperature (see Seneviratne et al., 2010;Jung et al., 2010;Wang and Dickinson, 2012). In regions where ET is limited by soil moisture, a high influence from the land surface on temperature is expected due to strong land-atmosphere coupling (e.g. Seneviratne et al., 2010;Mueller and Seneviratne, 2012). These tend to be transitional regions between wet and dry climates during the summer season in both hemispheres. This link between biases and coupling is an area we will pursue in the future.
One question might be how the ACCESS1.3b simulation of the ETCCDI indices compares with other models. Our use of AMIP makes a direct comparison with other models unfeasible. However, Sillmann et al. (2013) have provided an evaluation of climate extreme indices from CMIP-5 models for the present climate. In addition to HadEX2, they included four reanalysis data sets in their analysis. Some of the reanalysis also shows large biases to the observations, partly due to different computational approaches when calculating indices from daily grid-point averages in comparison to grids of station extremes (Donat et al., 2014). Therefore, biases between reanalyses/model output and observations are expected to some degree because of scaling effects. Sillmann et al. (2013) concluded that CMIP-5 models are generally able to simulate climate extremes and their trend patterns in comparison to HadEX2. The percentile indices TN10p, Geosci. Model Dev., 7, 545-567, 2014 www.geosci-model-dev.net/7/545/2014/ 563 TX10, TN90p and TX90p compare very well with CMIP-5 since they are calculated relative to their specific PDF, thus insensitive to biases in absolute temperature values. Sillmann et al. (2013) also found that models and reanalyses disagree with HadEX2 for DTR. HadEX2 shows much larger values for DTR than the median of the analysed CMIP-5 models and most reanalyses. The question might arise if the comparison of the models to HadEX2 is fair for DTR. As mentioned in Sect. 2.5, extreme indices derived from model output are expected to be less intense than those derived from station observations. The spatial-scale mismatch between the model and HadEX2 probably explains a small part of the bias. However, the spatial mismatch between models and observations plays less of a role for indices based on monthly averages such as DTR. HadGHCND T MAX and T MIN seasonal averages also suggest an underestimation of the DTR. In addition, Lewis and Karoly (2013) also found deficiencies in the CMIP-5 models in simulating trends in DTR. Hence, the underestimation of DTR is a common problem in many climate models although it remains possible that the model-derived DTR is not directly comparable with the observed derived value. Rx1day was not considered in Sillmann et al. (2013). For Rx5day, ACCESS1.3b's global mean is higher than the median of the CMIP-5 ensemble investigated by Sillmann et al. (2013), CWD is at the lower end of the CMIP-5 models and CDD is also lower than the CMIP-5 median. ACCESS1.0 was among the models that reproduces most temperature and precipitation indices reasonably well in Sillmann et al. (2013). Therefore, overall, ACCESS1.3b performs comparably to other CMIP-5 models for ETCCDI, with some indices simulated particularly well, and others in a more limited way.

Conclusions
To provide a benchmark for how well the ACCESS1.3b climate model simulates extremes, we undertook an AMIPstyle simulation involving simulations over the 1950-2012 period with prescribed sea surface temperatures and sea ice concentration. Our goal was to identify strengths and weaknesses in the ACCESS1.3b modelling system to provide a basis for experiments and model developments to resolve these weaknesses. Our analysis is founded on the capacity of the model to simulate daily T MAX , T MIN and precipitation. From these three variables we calculated climate extremes derived by the Expert Team on Climate Change Detection and Indices (ETCCDI). This work builds on earlier analyses of the mean climate of the ACCESS1.3 model, that included CA-BLE1.8 rather than CABLE2.0 by Kowalczyk et al. (2013) and Bi et al. (2013). These analyses showed that ACCESS1.3 captured the large-scale mean temperature and precipitation well, and compared favourably with other climate models in CMIP-5. Our analysis highlighted a large (2-6 • C) cold bias in the simulation of T MAX in all seasons and in all regions except North America. We also showed a large positive bias (1-5 • C) in T MIN in all seasons and in all regions. As a consequence, ACCESS1.3b fails to represent the diurnal temperature range well in comparison with the HadEX2 data. However, the model captures patterns in, and trends in, indices for cool nights (TN10p) and cold days (TX10p) extremely well, although there is an overestimation in the change in both indices between ∼ 1975 and 2010. Warm nights (TN90p) and warm days (TX90p) are also captured well. ACCESS1.3b simulates rainfall indices quite variably. Rainfall intensity (Rx1day) is simulated reasonably well, but consecutive wet days are badly overestimated and consecutive dry days are badly underestimated in the model. The biases in temperature related indices are very likely associated with a large positive bias in net shortwave radiation (Fig. 12a) and a large negative bias in net longwave radiation (Fig. 12b). Some of the precipitation biases are related to the common "drizzle" problem. Our results highlight challenges in simulating climate extremes in climate models, a result previously identified (Kiktev et al., 2003;Kharin et al., 2013;Sillmann et al., 2013). However, our results provide a benchmark from which we will now examine how land processes can be improved to capture these extremes. There are some clear ways forward for improving the model. Some of the biases are likely linked with a bias in simulating evapotranspiration and this will be a priority to resolve. For example, application of the GLACE methodology (Koster et al., 2006) could be used to quantify the degree of land-atmosphere coupling in ACCESS. Other biases might be linked with albedo, especially the correct parameterisation of snow albedo, which is a common challenge in land-surface models. It will be more challenging to identify how to improve the cloud climatology, but by identifying these biases, and the impact these have on extreme indices, we provide a clear statement of the state of ACCESS1.3b and a benchmark from which the model can be improved.