GlobSim (v1.0): deriving meteorological time series for point locations from multiple global reanalyses

Simulations of land-surface processes and phenomena often require driving time series of meteorological variables. Corresponding observations, however, are unavailable in most locations, even more so, when considering the duration, continuity and data quality required. Atmospheric reanalyses provide global coverage of relevant meteorological variables, but their use is largely restricted to grid-based studies. This is because technical challenges limit the ease with which reanalysis data can be applied to models at the site scale. We present the software toolkit GlobSim, which automates the downloading, interpolation and scaling of different reanalyses – currently ERA5, ERA-Interim, JRA-55 and MERRA-2 – to produce meteorological time series for user-defined point locations. The resulting data have consistent structure and units to efficiently support ensemble simulation. The utility of GlobSim is demonstrated using an application in permafrost research. We perform ensemble simulations of ground-surface temperature for 10 terrain types in a remote tundra area in northern Canada and compare the results with observations. Simulation results reproduced seasonal cycles and variation between terrain types well, demonstrating that GlobSim can support efficient land-surface simulations. Ensemble means often yielded better accuracy than individual simulations and ensemble ranges additionally provide indications of uncertainty arising from uncertain input. By improving the usability of reanalyses for research requiring time series of climate variables for point locations, GlobSim can enable a wide range of simulation studies and model evaluations that previously were impeded by technical hurdles in obtaining suitable data.


Introduction
Models that represent the interactions between the land surface and the atmosphere are often used to investigate biogeochemical, cryospheric and hydrologic phenomena.Because they require meteorological forcing -with daily or finer temporal resolution, for extended periods and without gaps -site-specific applications such as process studies or model testing are limited to few locations where high quality ground observations are available.In this context, global atmospheric reanalyses can substitute for lacking observations or supplement incomplete records.They assimilate a broad range of observations into numerical weatherprediction models, usually have coarse grid spacing (10-100 km) and are often used for large-area studies in atmospheric and hydrological modeling (e.g., Žagar et al., 2018;Albergel et al., 2018).Their application to point locations (e.g., Ekici et al., 2015;Westermann et al., 2016), however, is currently limited, likely because accessing data is technically involved.The software GlobSim, which is presented here, aims to contribute to overcoming this obstacle.
The suitability of reanalysis data for individual projects depends on the environment studied, the skill of the reanalysis in representing it and characteristics of the intended application.Several global reanalysis products are available to drive simulations or ensembles of simulations.Their relative suitability for specific simulation studies and locations is likely to vary because they rely on different assumptions, parameterizations or assimilated input data, and these differences result in biases that are spatially heterogeneous and specific to one or several meteorological variables (Decker et al., 2012;Zhang et al., 2016).The use of model ensembles is established to evaluate uncertainty or improve predic-Published by Copernicus Publications on behalf of the European Geosciences Union.
tive accuracy in simulation studies (e.g., Tebaldi and Knutti, 2007).Ensembles consist of multiple simulations for a given location and time that differ in one or more ways.For instance, they can be generated by varying the forcing data (e.g., Guo et al., 2018), the models themselves (e.g., Westermann et al., 2015b) or the structure and parameters of a model (e.g., Gubler, 2013).Because meteorological time series that drive models are a major source of uncertainty, having multiple reanalysis products available for ensemble simulations is an important step in estimating and reducing overall simulation uncertainty.
Four main challenges impede the use of reanalysis data for simulating point locations.(i) Technical delivery: available data are well documented but their structure largely reflects the needs and conventions common in atmospheric simulation, and as a consequence, individuals used to working with data from meteorological stations may find their handling difficult.For example, ERA-Interim (ERAI) provides certain variables as accumulations over time intervals and these must be disaggregated if instantaneous values are desired.(ii) Differences between reanalyses: reanalyses differ in their spatial and temporal grids as well as the conventions and units used for their variables and files (cf.Arsenault et al., 2018).(iii) Spatial scale: the coarse-grid reanalysis data require spatial interpolation to the locations of interest and heterogeneous environments such as mountains or coasts may require additional, subsequent scaling procedures (e.g., Fiddes and Gruber, 2014;Sen Gupta and Tarboton, 2016;Cao et al., 2017).(iv) Differences between reanalysis and model: reanalysis data usually require unit conversion, computation of derived variables (e.g., wind direction from northward and eastward wind speeds) and temporal interpolation in order to be suitable for driving specific models.Although addressing these four main challenges is not conceptually difficult, it does represent a technical hurdle that must be overcome in order to more fully materialize the benefits of reanalysis data for driving models at point locations.
This contribution describes the software GlobSim, named as a portmanteau of "global simulator", which produces time series from multiple reanalyses for driving model simulation at point locations.As a demonstration, we apply Glob-Sim to ground-surface temperature (GST) simulations in a densely instrumented and well-described location in a tundra environment near Lac de Gras, Northwest Territories, Canada.This demonstrator is relevant for investigating permafrost and its changes, one example of a research field where the lack of data to drive simulations is particularly severe.The objectives of this study are (1) to describe the software GlobSim and test its results for blunders and (2) to quantify the performance of simulations supported by Glob-Sim in a demonstrator application.For this, we compare ensemble members and means to statistical summaries of observations for different terrain types.Because reanalyses are imperfect and their performance can vary regionally and temporally (Decker et al., 2012;Fiddes and Gruber, 2014), the demonstrator can inspire, but not quantitatively underpin, applications in other areas, at other times and using other models.The study area has been chosen to test whether the combination of coarse-scale reanalysis data with fine-scale information on surface and subsurface characteristics (the terrain types) can reproduce the seasonal temperature cycles observed and the resulting fine-scale differentiation of ground temperature.This is relevant to potential users of atmospheric simulation data and also for the development of atmospheric models, as it can inform decisions related to the trade-off between increased resolution in the atmosphere or increased tiling (subgrid) resolution of their land-surface components.Finally, improved simulation of land-surface processes and phenomena will become increasingly important for supporting decision making under climate change because it can help to estimate likely future environmental conditions.In this context, flexible model evaluation and application globally is an important first step.

Downscaling of reanalyses
Most global atmospheric models produce output at coarse spatial scales (10-200 km), and for many applications, these data need to be downscaled, a process that has been described as making the link between the state of variables representing a large space and the state of variables representing a much smaller space (Benestad et al., 2008).For this, two main approaches exist: dynamical downscaling (e.g., Bieniek et al., 2016), which relies on nested atmospheric models with increasingly fine resolution and decreasing spatial extent, and empirical-statistical downscaling (e.g., Daly et al., 2008), which employs observations to derive mapping functions for linking coarse and fine scales.As GlobSim is intended to operate in areas without observations, this section emphasizes empirical-statistical downscaling methods that are tolerant to application far away from the observation with which they were derived.
Several empirical-statistical methods exist to downscale gridded meteorological data or to spatialize observations in heterogeneous environments.These applications are different but, especially when aiming to perform downscaling in areas without observations, show significant overlap.For example, Hungerford et al. (1989); Liston and Elder (2006) and Thornton et al. (2012) produced spatial data by interpolating observations based on topoclimatic variables (e.g., elevation, slope angle and slope aspect) and vegetation, and Daly et al. (2008) produced grids of mean monthly precipitation and temperature with a method that is frequently used in application studies (e.g., Jafarov et al., 2012).More recently, lapse rates for adjusting surface air temperature to fine-scale topography were derived from reanalysis pressure-level data (Gruber, 2012;Fiddes and Gruber, 2014).This allows lapse rates to vary temporally and physically consistent with the atmospheric conditions.Cao et al. (2017) further developed the surface air temperature downscaling by parameterizing fine-scale inversions such as cold air pooling.Downscaling methods for other variables such as shortwave and longwave radiation, precipitation and wind speed suitable for mountains exist (Fiddes and Gruber, 2014;Sen Gupta and Tarboton, 2016) and can inform application also in gently sloping terrain, and the potential of these scaling methods has been demonstrated in simulation studies (e.g., Fiddes et al., 2015;Westermann et al., 2015a).

Ensemble simulation
Model uncertainty arises from input data and the models themselves (Gupta et al., 2005;Gubler et al., 2013) but the quantitative evaluation of this uncertainty is difficult when models are complex (Murphy et al., 2004;Wang et al., 2016).This is also true when simulations of land-surface processes or phenomena are forced by reanalyses because their uncertainty -arising from, e.g., the observational data, assumptions, model structure, initialization and parameters usedpropagates into the final results.Here, ensemble simulation based on multiple reanalyses allows exploring the relative contribution that reanalysis quality has on the overall uncertainty of final results obtained for variables related to landsurface processes or phenomena.Additionally, the average of ensemble members has been shown to improve predictive accuracy relative to individual simulations (e.g., Tebaldi and Knutti, 2007;McGuire et al., 2016).One of the most widely known ensemble simulation examples is the Coupled Model Intercomparison Project (CMIP), which aims to contribute to the understanding of past, present and future climate changes in a multi-model context.

Structure and approach
GlobSim is a Python (version 3.7) software package designed to download and process important global atmospheric reanalyses and to derive time series with consistent variables, units and time intervals for specific locations (Fig. 1).It comprises three parts: (i) downloading for retrieving original data; (ii) interpolation of original, gridded variables to point locations as time series; and (iii) scaling of sitelevel time series to common units and temporal resolution, and possibly, the application of additional empirical downscaling function.GlobSim is designed to be controlled via simple parameter files containing keyword-value pairs (e.g., download area, date range, output locations, variables).As downscaling methods can be easily added, it provides a basis for broader application and development of existing methods (Fiddes andGruber, 2012, 2014;Sen Gupta and Tarboton, 2016;Cao et al., 2017).

Reanalyses
The four reanalyses currently implemented in GlobSim (Table 1) are ERAI, ERA5, the Japanese 55-year Reanalysis (JRA-55) and the Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2).Earlier reanalyses, such as CFSR, ERA-40, JRA-25 and MERRA, were not implemented in GlobSim because they have either been superseded by newer products or have ended.During the writing of this contribution, the discontinuation of ERAI in 2019 has been announced.
ERAI has 60 levels in the vertical dimension, with the highest at 1 mb, and the data are interpolated to 37 pressure levels (Dee et al., 2011).A reduced Gaussian grid with approximately uniform 79 km (T255) spacing for surface and other grid-point fields is used.ERAI covers the period from 1 January 1979 to 2019.The data contain analyses (at 00:00, 06:00, 12:00 and 18:00 UTC) for surface and pressure levels, as well as forecasts of instantaneous (e.g., 2 m surface air temperature, air temperature and relative humidity) and accumulated (e.g., total precipitation and radiation components; from 00:00 and 12:00 UTC, in 3, 6, 9 and 12 h steps) variables.
ERA5 is the fifth-generation atmospheric reanalysis produced by ECMWF to replace ERAI.Data are currently available from 1979 onward and later in 2019 are expected to be available starting in 1950.ERA5 is produced with a horizontal resolution of 31 km, a temporal resolution of 1 h and 137 vertical model levels, although the output pressure levels are identical to ERAI (Hersbach and Dee, 2016).ERA5 assimilates improved input data that better reflect observed changes in climate forcing, as well as many new or reprocessed observations that were not available during the production of ERAI.It also provides an estimate of uncertainty based on a 10-member ensemble with a temporal resolution of 3 h and spatial resolution of 62 km (Albergel et al., 2018).
JRA-55 is produced using a four-dimensional data assimilation system that uses many types of satellite data (Kobayashi et al., 2015).It is the second Japanese atmospheric reanalysis and covers the period from 1958 to nearreal time.JRA-55 has a spatial resolution of 1.25 • for assimilation (TL319) and 37 vertical pressure levels (1000-1 mb) for most variables in upper air, except dew-point depression, specific humidity, relative humidity, cloud cover, cloud water, cloud liquid water and cloud ice, which are produced for 27 levels from 1000 to 100 mb, only.The temporal resolution is 6 h for all levels and data (Kobayashi et al., 2015).

Download
GlobSim downloads reanalysis data from each sponsoring agency (Table 1).Downloads are mediated through application program interfaces (APIs) which allow scripting access to data servers through Python.Server requests are split into temporal chunks in order to efficiently download large datasets.Different types of data (surface analysis, surface forecasts and pressure-level analysis) are queried individually.The outputs are stored as NetCDF4 files for each temporal chunk and reanalysis field, retaining the structure and conventions used by the sponsoring agencies.

Interpolation
GlobSim spatially interpolates gridded variables to the latitude and longitude of point locations using bilinear interpolation through the Python interface (ESMPy) of the Earth System Modeling Framework (ESMF) toolkit (O'Kuinghttons et al., 2016).Pressure-level variables are also interpolated spatially on each pressure level and in a subsequent step, vertical interpolation is performed at that location.First, geopotential height is normalized to obtain pressure-level elevation (E) (m) for each time step: where φ is the geopotential height (m 2 s −2 ) and g 0 is the acceleration due to gravity of 9.807 m s −2 .Then, pressurelevel variables are vertically interpolated to the desired elevation.Geopotential height and other pressure-level variables are linearly extrapolated to locations where the pressure is greater than that of the lowest level with reanalysis data based on the values of the two lowest available pressure levels (Yessad, 2018).Although the 3-D interpolation combining surface and pressure-level information is not demonstrated here due to the relatively flat study area, it will be useful for further development of GlobSim by integrating additional scaling methods.

Scaling
We use the term "scaling" to describe the conversion of variables and units from their reanalysis-specific origins to a common standard as well as the possible application of additional temporal interpolation and downscaling procedures to produce data at fine spatial and temporal scales.For example, 3-hourly shortwave radiation from reanalyses can be interpolated to hourly resolution and used at mountain locations.When, additionally, fine-scale terrain-Sun geometry The lines represent accumulated variables in the forecasts which are reported as the integral of a flux over the time interval spanned by each arrow.Because the surface forecasts in ERAI are accumulated from the beginning of the respective forecast cycle (the solid blue lines) rather than from the end of the previous step, they must be disaggregated (the dashed blue lines) using Eq. ( 2) to prevent overlap in the integration periods.
and horizon shading are taken into account, the resulting data are likely to be more accurate (Fiddes and Gruber, 2014).The output variables of GlobSim are named following CF conventions (v1.6) with standard_name attributes that are consistent with the units used in the UDUNITS packages.The units of meteorological variables are first converted to the standard ones and then interpolated to achieve the required temporal resolution.Some of the forecast variables are not directly obtained from the reanalyses but instead are derived based on calculations involving other available variables (e.g., wind speed may be calculated from its northward and eastward components).
All analyzed fields in ERAI (e.g., surface analysis and pressure levels) and many forecast fields (e.g., temperature) are instantaneous.Some forecast variables (e.g., precipitation amount, surface downwelling longwave flux in air), however, are provided as accumulations in the reanalyses (Fig. 2).This means that each value is described as a change relative to another time in the forecast cycle instead of as an instantaneous flux.In contrast to the other reanalyses, the forecasts in ERAI are accumulated from the beginning of the respective forecast cycle (i.e., from 00:00 or 12:00 UTC; the solid blue lines in Fig. 2) rather than from the last step of the previous forecast cycle.
In the scaling procedure, ERAI forecasts are first disaggregated to total amounts starting from the end of each previous step (the dashed blue lines in Fig. 2) for respective forecast cycle (00:00 and 12:00 UTC) and can be expressed as where I denotes values spatially interpolated from the reanalysis grid (see Sect. 3.3.2),and S is the scaled results for that point location.The subscript s denotes the step and d the disaggregated value.The accumulated variables in forecasts are then converted to averages at the prescribed time resolution by dividing the length of the time step: where the temporal scale factor (t n ) could be given as where t res and t out is the raw temporal resolution of reanalysis and required temporal resolution of outputs with the unit of hours (t).For ERAI, the disaggregated downwelling shortwave flux in air was found to be slightly negative (> −0.05 W m −2 ) at some time steps.These values were interpreted to be artifacts and set to zero.The surface downwelling longwave flux in air (LW d ) is not directly available from MERRA-2 and is instead calculated as the sum of the longwave flux emitted from surface (LW e ) and the surface net downward longwave flux (LW n ): (5) In ERAI, relative humidity (RH) (%) is derived by following Lawrence (2005): where T a (K) is the near-surface air temperature and T d (K) is the dew-point temperature.Wind speed and direction are calculated from the eastward (U ) and northward (V ) components: and where W s is the wind speed in m s −1 , W d is wind direction in degrees and a tan 2 is the two-argument arctangent used to compute an unambiguous angle when converting from Cartesian to polar coordinates.

Demonstrator application
Reanalysis products are carefully designed and tested before release.In addition, many studies have evaluated their performance by intercomparison, by comparison with observations (e.g., Jiang et al., 2015) and by applying them to model simulations (e.g., Albergel et al., 2018;Beck et al., 2019).For this reason, we focus on the application of GlobSim for demonstration rather than the direct testing of reanalysis variables or their interpolated products.An intercomparison of GlobSimderived meteorological variables for the test area (Fig. 3) helps to appreciate differences or detect blunders in conversion.Below, we use GlobSim to drive ground-surface temperature (GST) simulation using a permafrost/land-surface model for a remote location in northern Canada underlain by permafrost.We then analyze ensemble results and their deviance from observations.

Study area
The research area is centered at 64 • 42 N, 110 • 36 W, near the north shore of Lac de Gras in the Northwest Territories, Canada (Fig. 4).The area is located within the zone of continuous permafrost, the mean annual air temperature (MAAT) is −9.0 • C and the total annual precipitation is 284 mm during September 1998-August 2007 (Ekati A; Environment Canada, 2019), with about 50 % occurring as snow (Jones et al., 2003).Typically, snow cover lasts for about 7 months and because of strong winds (Hu et al., 2003), snow depth shows strong spatial variability; it is shallowest in much of the higher-elevation, convex terrain (e.g., tops of eskers) and deepest in low-lying areas with taller shrub vegetation or in the lee of larger terrain features (Holubec et al., 2003).
The area has undulating to moderately rugged topography dominated by glacial features (Kerr et al., 1997;Dredge et al., 1999), mostly on the order of 10-20 m in relief (Dredge et al., 1999) and composed of glacial till (Haiblen et al., 2018).Till deposits are described according to their thicknesses, as veneers (< 2 m), blankets (2-10 m) or hummocky (5-30 m).Eskers are the prevalent glaciofluvial deposits, reaching heights of 35 m and occasionally containing massive ice on the order of 2-5 m thick (Haiblen et al., 2018;Wolfe et al., 1997).The poorly drained low-lying areas have peat deposits and are generally associated with ice-wedge polygons.The region is within the southern Arctic ecozone described by continuous shrub tundra (Wiken et al., 1996).The most common shrubs are dwarf birch (Betula pumila) and Labrador tea (Ledum decumbens).Uplands are well drained with lichen and mosses, whereas wetlands are typically colonized with sedges and mosses.

Observations and quality control
GST was measured at 156 locations in order to capture the fine-scale spatial variability and to test the performance of GlobSim in supporting GST simulation beneath different terrain classes.Study plots measuring 15 m × 15 m were established that reflect the different terrain types in the area (Gruber et al., 2018).Each plot was instrumented with

Distinguishing terrain types
The variability of GST regimes near Lac de Gras is controlled by many factors, the most significant of which are surficial geology, vegetation type and height and topography (cf.Hu et al., 2003).The 156 sites are hence grouped into 10 classes based on surface and subsurface characteristics as described at a scale of 15 m × 15 m (Fig. 5).This classification is subjective and was developed to identify common terrain types that could be easily recognized in the field and that explain a significant part of the observed spatial variation of GST.Surface offset is used here to quantify local ground temperature variations (Smith and Riseborough, 2002).It is defined as SO = MAGST − MAAT, where MAGST is the mean annual ground-surface temperature and MAAT is the mean annual air temperature.

Process-based numerical model
GEOtop (version 2.0), a physically based numerical model, is used for this demonstrator application because it describes the complex abiotic processes in permafrost environments well (e.g., Endrizzi and Marsh, 2010;Gubler et al., 2013;Endrizzi et al., 2014;Pan et al., 2016).It represents the heat and water transfer in soil as well as the energy transfer between the soil and the atmosphere.Additionally, it solves equations describing the interactions of water and energy during soil freezing and thawing (Dall'Amico et al., 2011;Endrizzi and Gruber, 2012) and simulates the water and energy transfer in the snow cover.GEOtop, driven by ERA-Interim forcing data, was found to be suitable for reproducing ground temperature observations (Mugford et al., 2012;Fiddes et al., 2015).

Model settings and parameters
GEOtop was forced by the four reanalyses obtained via GlobSim, providing hourly data for precipitation, wind velocity, wind direction, short-and longwave radiation, relative humidity and near-surface air temperature.Given the gentle topography and small test area, the time series were derived for the geographic center of measurement sites, only (Fig. 4).Model parameters and initial conditions were specified for each terrain type based on field observations.For example, the uppermost soil type was derived from drill logs and soil pit measurements (Table 2).Vegetation height was measured within each 1 m × 1 m subplot around the data logger (Table 3).Some parameters, which are challenging to measure directly, were then subsequently refined, within the range of plausible values (cf.Gubler et al., 2013), through experimentation with the model.The snow correction factor, which scales the simulated amount of snow via precipitation amount, is used to capture the differences among the 10 terrain classes.It was determined by fitting the simulated meltout date of the snow (Schmid et al., 2012) to observations (Fig. 6).Snow blowing is parameterized as wind compaction in 1-D (Pomeroy et al., 1993) for all the terrain types except the tall shrub site.
Soil parameters were estimated based on soil textural data collected at each plot (Subedi, 2016).More specifically, the soil particle thermophysical properties (e.g., thermal conductivity and heat capacity) were estimated based on typi-   cal values for common material types in each unit.The soil freezing characteristic curve parameters (van Genuchten α, van Genuchten n, saturated and residual water content) for rock and organic material were taken from Gubler (2013).
For mineral soil, values were obtained by averaging the soil textural data within each class and applying the ROSETTA model v3.0 (Zhang and Schaap, 2017).For the ice-wedge trough, the uppermost layer was assigned a the water/ice content of 95 % to simulate the ice-wedge itself.Leaf and stem area index (LSAI) and surface density of canopy were esti-mated based on measured LAI and moss thickness.Canopy fraction was determined based on plot photos (Gruber et al., 2018).Although the same vegetation reflectivity (0.11) and transmissivity (0.15) was used for each terrain type, overall reflectivity and transmissivity are in fact dependent on vegetation and subsurface conditions.A lower boundary condition of zero heat flux was used for these shallow simulation.Although borehole analysis revealed a heat flow of 0.046 W m −2 based on temperature measurements in two deep boreholes near Lac de Gras (Mareschal and Jaupart, 2004), it is not appropriate to apply these measurements to the shallow profile simulations because of the transient changes in the temperature profile near the surface during recent decades.We use 30 soil layers with a total depth of 12 m (Fig. 7) and an initial soil temperature of −2 • C. Reanalyses data for July 2000-June 2010 were used to spin up the model by running it 10 times (100 years) before simulations were conducted from July 2010 onward.
Reanalysis produces multi-decadal meteorological variables (Table 1), and this makes simulating long-term changes of land-surface processes possible.To demonstrate the utility of GlobSim for supporting long-term simulation, we conducted an additional deep ground temperature simulation from 1980 to 2017 for a single terrain type.The soil profile increased to 60 layers with a total depth of 50 m via expending the bedrock layer.The model was spun up by repeating the reanalysis of January  SV is short vegetation, IWP is ice-wedge polygon, SG is sedges and grass, TB is till blanket, IWT is ice-wedge trough, SH is shrubby hummock, SF is sedge fen, and TS is tall shrub.The soil water content is specified for each soil layers present in Fig. 7. DecayCoeffCanopy is the decay coefficient of the eddy diffusivity profile in the canopy, and VegSnowBurying is the coefficient of the exponential snow burying of vegetation.The different stratigraphic profiles are listed in Fig. 7.The simulation depths of 12 and 50 m were used for GST and long-term simulations, respectively.
times (500 years).To improve simulation efficiency, we simplified GEOtop simulation by assuming the vegetation and soil moisture were constant over time.This is warranted, as we aim to simply demonstrate the potential for long-term application.

Comparison of observation and simulation
To compare simulations with observations, the mean bias (BIAS) and root mean squared error (RMSE) were computed for each time series as and where T mod is the modeled temperature, T obs is the observed temperature, and N is the total number of measurements.

Observed temperature
During the 2 years measured, the MAGST (SO) at individual sites span a large range of 11.43 • C, from −8.03 (0.46) • C on an esker with low vegetation, soil moisture and snow cover (Fig. 6), to 3.40 (11.89) • C in the tall shrubs, which retain snow (Fig. 5, Table 4).Daily GST shows seasonal patterns that are similar for each terrain type, MAGST ranges within terrain types are reduced to 1.19-6.50• C, and mean surface offsets differ clearly.
The surface air temperature is best approximated by ERA5, which has the lowest daily RMSE of 1.94 • C and bias of 0.15 • C, while MERRA-2 was the worst with an RMSE of 3.65 • C and a bias of −1.62 • C (Table 5).During 2015-2017, the mean MAAT derived from the four reanalyses was −8.11 ± 1.25 • C, which is in good agreement with the observed MAAT of −8.32 • C. The ensemble mean, which had a daily RMSE of 1.87 • C, outperformed all individual reanalyses during the measured period and reduced RMSE by 0.07-1.78• C (Fig. 8).MAAT is mean annual air temperature, MAGST is mean annual ground-surface temperature, and SO is surface offset.
Figure 7. Soil profiles for the different terrain types used to partition surface temperature observations.Parameters for each of the subsurface materials are provided in Table 2.The stratigraphic unit (I-V) associated with each of the terrain types is listed in Table 3.In the long-term simulation, the layer of bedrock is extended to 50 m.
Figure 8c-l compare the daily ensemble means of simulated GST to observation means for the 10 terrain types.The performance varies significantly among terrain types and reanalyses, with the RMSE ranging from 1.09 to 3.00 • C (Table 5).Even within the same terrain class, the mean RMSE difference for the four reanalyses was 0.60 ± 0.22 • C and was up to 0.89 • C for the sedge fen terrain type.The overall RMSE for the four reanalyses was 1.96 • C for daily means and 0.92 • C for annual means.
Most (81 %) of the simulated daily GST is within the observation range, although the spread in simulated values is generally smaller than in observations (Fig. 8).The spread of model results is generally greatest sometime between Jan-uary and May of each year, with the exact timing depending on the terrain type.The variability of observations also depends on the terrain type, with the lowest variability at sedge fen sites and the greatest at till blanket sites.The ensemble mean shows good agreement with the observation mean, with RMSE ranging from 1.02 to 2.45 • C depending on the terrain type.
The GST ensemble mean usually performed better compared to the individual ensemble members based on single reanalyses, as was the case with surface air temperature.For 6 of 10 terrain types, the simulated GST ensemble mean achieved better results based on the RMSE.Moreover, the ensemble mean had the smallest RMSE for the daily GST when averaged over all sites.Consequently, the RMSE was reduced by 0.01-0.31• C for daily GST and by 0.00-1.01• C for SO as a whole (Fig. 9).Overall, simulations forced by MERRA-2 achieved the best performance for MAGST with the BIAS and RMSE of −0.03 and 0.75 • C, respectively.Compared to other individual reanalyses, the simulations forced by ERA5 often yield the second best RMSE (8/14) if not the best (2/14) (Table 5).
The long-term simulation shows warming trends for MAAT and for modeled annual mean ground temperature (MGT) at all depths.The modeled warming rate of the ensemble mean was 0.42 • C per decade for MAAT and 0.42, 0.21 and 0.13 • C per decade for the annual mean ground temperature at depths of 0.1, 10 and 20 m, respectively (Fig. 10).

Interpretation
The observed spatial variability of GST, with MAGST difference up to 11.5  method for terrain type used here is reasonable.Furthermore, it shows that conceptualizing and simulating GST via terrain types can be expected to explain a significant part of the variation observed.
The variables in GlobSim output are similar between reanalyses at one co-located point, indicating that major blunders are unlikely to exist in the GlobSim code.The data approximate surface air temperature near Lac de Gras well and GST simulation results from GEOtop agree well with observations both in terms of summary statistics (Table 5) and the reproduction of the underlying seasonal ground thermal regime (Fig. 8).While quality requirements and simulation quality will differ with application and geographic area, our demonstrator shows the suitability of reanalyses and Glob-Sim for driving process-based numerical simulation at a site scale.It shows that the combination of driving meteorological data, site conditions and a suitable model is able to reproduce spatial differences as well as the temporal evolution of ground temperature.The availability of multi-decadal reanalysis time series opens the door for better investigating transient processes and phenomena at and below the land surface.Ground temperature is one of many possible variables to investigate and in this study was chosen as an exemplar because it reflects the motivation and expertise of the authors.
The greatest model spread occurs between January and May when the thickness of the insulating snowpack accumulates differences in meteorological variables over the winter period and plays an important role in controlling ground temperature.This increase in model spread is also conspicuously absent in the esker terrain type, for which the effect of snow is minimal.The low variability of observed GST for the sedge fen terrain type may be due in part to the small number of measurement sites in this terrain type (n = 4).However, the tall shrubs class has the same number of measurements and exhibits variability similar to other terrain types with a higher number of measurements (e.g., shrubby hummocks, n = 20).This suggests that the observed spatial variation of GST in each terrain type is a good first-order indicator of its full range of variability.
The relatively small range of GST in the model ensemble when compared to the observations is likely due to the small number of simulations, in which only differing reanalyses are used but not perturbed physics or differing land-surface models.The large difference of RMSE for simulated groundsurface temperature based on the four forcing reanalyses indicates the potential uncertainty caused by forcing datasets.Ensemble means often achieve better performance compared to the individual ensemble members, indicating the potential of this approach to improve simulation results.Both findings underscore the value of using more than one reanalysis for driving simulation studies.
MERRA-2 underestimated surface air temperature near Lac de Gras, while the simulation overestimated the ground temperature.As a result, the BIAS of MAGST forced by MERRA-2 is very close to 0 and the RMSE is smallest due to opposing biases canceling out.This is also demonstrated by the highest RMSE of surface offset for MERRA-2.

Observed and simulated temperature
Our results highlighted the magnitude of spatial heterogeneity in ground temperature (see also Smith, 1975;Gubler et al., 2011;Schmid et al., 2012;Morse et al., 2012;Gisnås et al., 2014)   The ensemble ranges of modeled and observed GST (Fig. 8) reflect two distinct sources of variability.The former stems from differences in the forcing data, and the latter is due to terrain characteristics.However, both ranges inform how well a simulation would represent a particular type of location within the study area and while a direct comparison of the two ranges may not be valid, the observed variability helps contextualize the uncertainty introduced by the forcing data relative to changes in GST over as little as a few me-ters.From this study, we see that the relative importance of reanalysis uncertainty varies seasonally and spatially.
While we do not aim to evaluate the quality of reanalyses themselves, the calculated air temperature bias indicates how accurate reanalyses and GlobSim are for our study area.It is therefore to contextualize these results with those of studies dedicated to a formal evaluation of the reanalyses.For example, in Greenland, Reeves Eyre and Zeng (2017) found average monthly SAT biases averaged over all stations of 0.  found 6-hourly ERAI bias values to mostly fall between −1.5 and 4.5 • C but were as large as 7.5 • C (Decker et al., 2012).The SAT bias calculated at our sites ranges from −1.62 • C (MERRA-2) to 1.18 • C (ERAI) (Table 5).The numbers of this and previous studies remind us that the application of reanalyses for simulating surface phenomena is bound to be imperfect and that the success of application in one area cannot be transferred to other areas uncritically.

Advantages and limitations of GlobSim
Previous work has investigated parameter uncertainty with ensemble simulation (Harp et al., 2016;Gubler et al., 2013) but not investigated the effect of uncertainty in the forcing data.Other studies have used multiple forcing datasets to drive ensembles at the grid scale.Jafarov et al. (2012) used a five-member GCM composite product for driving a permafrost model on a 2 km × 2 km grid using mean monthly air temperature and precipitation.Here, the coarse tempo-ral resolution and the GCM-specific realization of weather events preclude some types of detailed investigation with observations.Guo et al. (2017) used three different reanalyses to evaluate the effect of forcing data on permafrost model uncertainty on a 0.5 • × 0.5 • grid, excluding fine-scale variation.For simulation at finer scales, dynamic and/or statistical downscaling of GCM and regional climate model (RCM) outputs has been used (Salzmann et al., 2007;Marmy et al., 2016).The downscaling and debiasing, however, are often limited to areas with detailed observations.
Although the demonstrator presented here is relatively simple, it addresses a critical gap in permafrost research and likely for other modeling communities as well.It provides a basis upon which to implement improved downscaling methods and to work towards debiasing GCM results with reanalyses (Cannon, 2016) at the point scale.Specifically, GlobSim helps to (1) fill gaps in meteorological observations caused by instrument failure, (2) evaluate models at locations where observations to compare with model results (ground temperature in this demonstrator) are available, but meteorological observations to drive models are lacking and (3) predict environmental phenomena that are driven by atmospheric conditions (permafrost in this demonstrator) at locations for which no observational data exist.Such predictions could also support the analysis and interpretation of field manipulation experiments or long-term monitoring data.In keeping with the permafrost example, these experiments could investigate the effects of vegetation change or different snow management practices (O'Neill and Burn, 2017).In remote locations, long-term meteorological observations are sparse, which limits the application of models that require detailed inputs.The recent publication of two such datasets illustrates the importance -and the general lack -of complete records which can be used to both force and evaluate models (Boike et al., 2019(Boike et al., , 2018)).While the methods contained in GlobSim are simple, it nevertheless provides an important simplification of the application of reanalysis data toward simulation studies outside atmospheric science.As GlobSim outputs use the CF conventions (Hassell et al., 2017), it contributes to the ease of using multiple data sources.
Our results have demonstrated the performance of Glob-Sim for site-level simulation in one particular field area with gentle topography.Simulation accuracy for other areas and application will differ, especially in mountains and near coasts where topography and other heterogeneity presents additional challenges for downscaling.Additional scaling rules such as TopoScale (Fiddes and Gruber, 2014), RED-CAPP (Cao et al., 2017) or those implemented in MSDH (Sen Gupta and Tarboton, 2016) may be added in the future, making GlobSim more suitable in mountains.

ERA5 ensemble
ERA5 provides uncertainty estimates for all parameters at 3 h intervals and at a horizontal resolution of 62 km.This is achieved using an ensemble of 10 members that differ in their assimilated observations, initial conditions and model structure.In other words, ERA5 itself is an ensemble simulation.In GlobSim, only the ensemble mean of ERA5 is currently used.In a next step, efforts will be devoted to compare ensembles of ERA5 and fully incorporate them into GlobSim.

Conclusions
We describe and test the software GlobSim, which has been designed to support ensemble simulations at a the site level.Specially, GlobSim is designed to easily retrieve, interpolate and scale reanalyses in order to produce time series of meteorological variables with common structure, temporal resolution and units.It currently supports four reanalyses: ERAI, ERA5, JRA-55 and MERRA-2.We demonstrate the utility of GlobSim by driving a model of ground-surface temperature in a tundra environment with permafrost and comparing its output to observations.Our results support three conclusions: 1. GlobSim improves the usability of reanalyses for landsurface simulations by deriving time series of climate variables in uniform format from multiple reanalyses for point locations.
2. GlobSim enables efficient ensemble simulations at single or multiple points.
3. Compared to simulations forced by individual reanalyses, ensemble means often yielded better performance with reduced RMSE in addition to providing information about predictive uncertainty.

Figure 1 .
Figure 1.GlobSim schematic: reanalysis data are processed in three main steps based on user-specified parameter files.The color of each box identifies user input and output (light blue), GlobSim processing (orange) and raw reanalysis data (grey).

Figure 2 .
Figure 2. The representation of surface analysis (SA), surface forecast (SF) and pressure-level (PL) data in the four reanalyses supported by GlobSim.The circles and triangles represent instantaneous variables which are reported at specific temporal resolutions.The lines represent accumulated variables in the forecasts which are reported as the integral of a flux over the time interval spanned by each arrow.Because the surface forecasts in ERAI are accumulated from the beginning of the respective forecast cycle (the solid blue lines) rather than from the end of the previous step, they must be disaggregated (the dashed blue lines) using Eq.(2) to prevent overlap in the integration periods.
B.Cao et al.:  GlobSim 1.0 -meteorological time series for point locations from multiple global reanalyses

Figure 3 .
Figure 3. Intercomparisons of monthly variables derived from GlobSim which are used in GEOtop during the period September 2015-August 2017.SW d and LW d is the surface downwelling shortwave flux in air and downwelling longwave flux in air, respectively.

Figure 4 .
Figure 4. Study area and approximate location of measurement sites.Many point locations overlap at the scale of the map, so the positions of their markers have been dispersed to better illustrate the number and type of sites.The centroid of all measurement sites (red star) was used as the location for which GlobSim reanalysis data were derived.The top right inset map shows the location of the study area (red square) within the Northwest Territories (blue region) of Canada.The elevation tinted hillshade basemap is provided by the Natural Resources Canada Federal Geospatial Platform Elevation Data Web Mapping Service and is derived from the Canadian Digital Elevation Model (CDEM).Water-body outlines were obtained from the ArcGIS online data library.

Figure 5 .
Figure 5.The 10 terrain types used to partition the landscape and observations.

Figure 6 .
Figure 6.Simulated ensemble mean daily snow depth using different snow correction factors (SCFs) to reproduce the snow meltout date for terrain types during the period September 2015-August 2017.Max is the maximum snow depth during the measured period in meters.

Figure 8 .
Figure 8.Comparison of ensemble (ENS) simulation results with observations (OBS) for (a) daily near-surface air temperature (SAT); (b) daily ground-surface temperature (GST) for all the sites and; (c-l) daily GST for individual terrain classes.The date range for all figures is September 2015 to September 2017.
Figure 9.Comparison of ensemble results and observations for (a) mean annual ground-surface temperature (MAGST) and (b) surface offset (SO) for the 10 terrain types.

Figure 10 .
Figure 10.Changes of (a) mean annual air temperature (MAAT) derived from reanalyses, modeled annual mean ground temperature (MGT) at the depth of (b) 0.1 m, (c) 10 m and (d) 20 m for shrubby hummocks (SH) terrain type of between 1980 and 2017.The numbers in the lower right of each figure correspond to the magnitude of the warming trend for each reanalysis and their ensemble mean with the unit of • C per decade.ENS corresponds to ensemble mean.The trend lines were calculated using a linear regression model, and all the slopes were statistically significant with p < 0.05.

Table 1 .
Characteristics of the four reanalyses currently supported by GlobSim.

Table 2 .
GEOtop soil parameters used to simulate different stratigraphic units.

Table 3 .
GEOtop parameters for different terrain types.

Table 4 .
Summary of temperatures in the 10 terrain types.
• C over 30 km, highlights the need for spatially differentiated simulation in order to represent the different thermal regimes among locations or terrain types as well as their different transient responses to climate forcing.The reduced standard deviation and range of observed MAGST within terrain types indicates that the classification www.geosci-model-dev.net/12/4661/2019/Geosci.Model Dev., 12, 4661-4679, 2019

Table 5 .
Deviance of simulations from the mean of observations within each terrain type.The smallest BIAS and RMSE values for each row are indicated in bold font.