Parameter calibration and stomatal conductance formulation comparison for boreal forests with adaptive population importance sampler in the land surface model JSBACH

We calibrated the JSBACH model with six different stomatal conductance formulations using measurements from 10 FLUXNET coniferous evergreen sites in the Boreal zone. The parameter posterior distributions were generated by the adaptive population importance sampler (APIS), then the optimal values were estimated by a simple stochastic optimisation algorithm. The model was constrained with in-situ observations of evapotranspiration (ET) and gross primary production (GPP). We identified the key parameters in the calibration process. These parameters control the soil moisture stress function and the 5 overall rate of carbon fixation. The JSBACH model was also modified to use a delayed effect of temperature for photosynthetic activity in spring. This modification enabled the model to correctly reproduce the springtime increase in GPP for all conifer sites used in this study. Overall, the calibration and model modifications improved the coefficient of determination and the model bias for GPP with all stomatal conductance formulations. However, only the coefficient of determination was clearly improved for ET. The opti10 misation resulted in best performance by the Bethy, Ball-Berry and the Friend and Kiang stomatal conductance models. We also optimised the model during a drought event in a Finnish Scots pine forest site. This optimisation improved the model behaviour, but resulted in significant changes to the parameter values except for the unified stomatal optimisation model (USO). Interestingly, the USO model demonstrated the best performance during this event.


Introduction
Plants exchange carbon dioxide (CO 2 ) and water vapour (H 2 O) with the atmosphere. Sufficient soil water, irradiance and adequate temperature are required to maintain the exchange rates during the growing season. Disturbances in these conditions such as drought, cold temperature or low radiation cause the plants to respond to the environmental stress via stomatal closure and the decrease in photosynthesis and transpiration (Lagergren and Lindroth, 2002;Mäkelä et al., 2004;Gao et al., 2017). 5 The capability of plants to recover from such events depends on species and their adaptation to site conditions (Kozlowski and Pallardy, 2002). Stress is part of the normal annual cycle of the plants, but occasionally it may exceed the limits of recovery.
Soil water deficit and high water vapour pressure deficit can result in suppressed plant transpiration (Bréda et al., 1993;Kropp et al., 2017). Globally, soil drought has been recognised as one of the main limiting factors for plant photosynthesis (Nemani et al., 2003) and boreal forests are known to occasionally suffer from soil drought (Muukkonen et al., 2015;Gao et al., 10 2016). The recovery of photosynthetic capacity in spring has been connected to temperature history, and to frequency of severe night frosts (Bergh et al., 1998;Bergh and Linder, 1999) that can reverse the recovery. Understanding, and correctly modelling, these phenomena are especially important for boreal forests (Bonan, 2008) under changing environmental conditions. Ecosystem and land surface models, describing the plant photosynthesis, transpiration and soil hydrology related processes, usually include descriptions and parameterisations for various stress effects. These parameters often lack a theoretical foun- 15 dation (Gao et al., 2002;Medlyn et al., 2011) and descriptions of vegetation drought response and phenology have been recognized to need better formulations and design (Richardson et al., 2012;Powell et al., 2013;Xu et al., 2013;Medlyn et al., 2016). These deficiencies restrict a model's predictive capability under changing environmental conditions, and call for specific parameterisations for different plant types and vegetation zones.
Stomatal conductance models describe the pathway of CO 2 and water through the leaf stomata by an electric circuit analogy 20 (Nobel, 1999). The variations in stomatal opening and mesophyll structure are interpreted as resistances to water flow and the process is idealised via generalised parameterisation. Stomatal conductance models mainly differ in their choice of variables driving the stomatal closure, and their performance has been recently assessed in modelling studies by e.g. Egea et al. (2011); Knauer et al. (2015); Franks et al. (2018). However, it can be hypothesised that the choice of the stomatal conductance model affects the ecosystem model parameters more broadly as the stomatal conductance formulations vary in their responses to the 25 different conditions. A holistic assessment of the performance of the stomatal conductance models together with ecosystem model parameter optimisation has been missing.
In many other studies, where the aim has been to optimise land surface model parameters, the optimisation is based on estimating the gradient of the cost function: Knorr et al. (2010) for JSBACH, Kuppel et al. (2012); Peylin et al. (2016) for ORCHIDEE and Raoult and Luke (2016) for JULES. Gradient-based methods are faster than Markov chain Monte Carlo 30 (MCMC) methods as they strongly steer the sampling process to reach a minimum in the cost function (see e.g. Gelman et al., 2013). This approach also enables a more indefinite setting of parameter ranges (limits for acceptable parameter values) when compared to methods that sample the full parameter space. However, they are prone to get stuck in local minima, especially when the dimensionality of the parameter space increases. In the last few years, similar parameter estimations have also been done for CLM by Post et al. (2017) using the DREAM (zs) (MCMC) algorithm with multiple chains, and for JULES by Iwema et al. (2017) with the BORG algorithm that employs multiple optimisation algorithms simultaneously. The DREAM algorithm is fully iterative, which limits the number of parallel processes to the number of parallel chains in use (when we do not account for the possibility of the model parallelisation that can be substantial). The applicability of the BORG algorithm is dependent on the algorithms in use and the expertise of the user (to choose the right algorithms etc.).

5
APIS is a Monte Carlo (MC) method that can be run iteratively as presented by Martino et al. (2015) but it is also straightforward to parallelise, since all samples prior to each adaptation (in our simulations 2000 draws) can be drawn and estimated simultaneously. This latter feature is useful to decrease the amount of real time required to run the algorithm when computer resources are not the limiting factor -APIS requires considerably fewer sequential estimates than typical Markov chain methods. In the iterative mode, automatic stopping rules can be easily implemented to indicate when additional samples are not 10 required to improve the estimates. The APIS algorithm samples the full parameter space (as do MCMC methods) and can utilise a mixture of parameter prior distributions. Therefore, APIS can estimate complicated multidimensional probability distributions with relative ease. These aspects make APIS an attractive alternative to the other sampling and optimisation methods mentioned above.
In this study we apply the land surface model JSBACH for 10 boreal coniferous evergreen forest eddy covariance sites 15 to examine the performance of different stomatal conductance models, and their effect on calibrated parameters related to photosynthesis, phenology and hydrology. First, we utilise APIS to sample the full parameter space with the different stomatal conductance formulations and to locate different modes of the target distributions (peaks of high probability). Second, using the distributions generated by APIS as the prior distributions, we optimise the parameters using a simple stochastic optimisation method. Finally, we assess the inter-site variability and the robustness of the calibrated parameters together with different 20 stomatal conductance formulations. Optimised parameters for a specific drought are also investigated and compared with the parameters for the general optimisation.

Materials and methods
We will next introduce the measurement sites, followed by the model and modifications made to it. Afterwards we will give a general overview of the simulations as well as the sampling process, the algorithms and methods used to analyse the results. 25

Sites and measurements
We use data from 10 FLUXNET (doi:10.17616/R36K9X) sites characterised as coniferous evergreen forests. Site descriptions with appropriate references are provided in Table 1. The site-level half-hourly eddy-covariance (EC) measurements were quality checked and gap-filled when needed to produce continuous half-hourly and daily time series. The gap-filled and lowquality (based on FLUXNET data quality flags) measurements were masked, and the daily aggregates (usually means) were 30 accepted as part of the calibration process if at least 60% of the values between 4:00 and 20:00 (i.e. daytime measurements) for that day were unmasked. The daily aggregates of ET and GPP were used to calibrate and validate the model, whereas the half-hourly data were used as climate forcing (as explained later in Section 2.4. Based on the quality and quantity of their respective measurements, the sites were divided into calibration and validation sites. Essentially, if we have enough data from a site, it is used for both calibration and validation purposes. We required the site to have at least eight years of measurements, where the first five were used for calibration, and the consecutive three for 5 validation. Otherwise we used the site only for a three year validation. The FLUXNET datasets were missing both the longand shortwave radiation for the two Russian sites, Fyodorkovskoye (RU-Fyo) and Zotino (RU-Zot). These were generated from ERA Interim data. The soil types of all of these sites can mostly be identified as mineral soils with varying sand, clay and peat contents. Fyodorovskoye and Poker Flat (US-Prr) are natural peatlands and Lettosuo (FI-Let) is a drained peatland site.
The measurement error in the EC flux data were separated into systematic and random errors. The main systematic errors 10 (density fluctuations, high-frequency losses, calibration issues) were taken into account as part of the post-processing of the data, and the random errors tend to dominate the uncertainty of the instantaneous fluxes. The random error is often assumed Gaussian but can be more accurately approximated by a symmetric exponential distribution (Richardson et al., 2006). It increases linearly with the magnitude of the flux, with a standard deviation typically less than 20% of the flux (Richardson et al., 2008;Rannik et al., 2016). Our treatment of the measurement (and model) errors is explained in Section 2.9.
15 Table 1. Descriptions for the sites used in this study sorted by their FLUXNET identifier. The first six sites are used for both calibration and validation purposes, with the first five years of each site used for calibration. The last three years as well as the last four sites are used for validation only. The reported elevation is in meters above sea level, LAI is the one-sided leaf area index and the average stand age is in years, along with average annual precipitation (P) in mm and temperature (T) in degrees Celsius. . We ran JSBACH offline using meteorological measurements from the flux towers to force the model. Implications of this one-way coupling with the atmosphere include lack of feedback from the surface energy balance to the atmosphere, i.e. latent and sensible heat fluxes and surface thermal radiation do not directly affect prescribed air temperature or humidity. Similarly, the feedback of the surface to the vertical transfer coefficients within the atmospheric surface layer is missing as the wind speed that drives mixing is prescribed. Furthermore, since we use site level data (each site is represented as a single grid point), the grid resolution does not affect the results. 5 We focus only on the most essential parts of JSBACH relating to our work. A more complete model description with details on e.g. soil heat transfer, water balance and coupling to the atmosphere can be found in Roeckner et al. (2003), whereas Raddatz et al. (2007) provides a more descriptive synopsis on land-surface interactions, Reick et al. (2013) complements both with an addition of land cover change processes, and Hagemann and Stacke (2015) introduces soil hydrological mechanisms within a multilayer scheme applying five layers.

10
In JSBACH, the land surface is divided into grid-cells, which are split into bare soil and vegetative areas. The vegetative area is further divided into tiles representing the most prevalent vegetation classes, called plant functional types (PFTs) ). In our site-level simulations, the model was set to use only one PFT, coniferous evergreen trees. The seasonal development of leaf area index (LAI) for the trees is regulated by air temperature and soil moisture with a single limiting value (for all sites) for the maximum of LAI. This maximum value was fixed and the site-specific fractions of vegetative area were 15 adjusted to reproduce the measured site level LAI.
The predictions of phenology are produced by the Logistic Growth Phenology (LoGro-P) sub-model in JSBACH (Böttcher et al., 2016). Photosynthesis is described by the biochemical photosynthesis model (Farquhar et al., 1980). Following Kattge et al. (2009), we set the maximum electron transport rate (J max ) at 25 degrees Celsius to 1.9 times the maximum carboxylation rate (V C,max ), which is in line with e.g. Leuning (2002); Ueyama et al. (2016). The photosynthetic rate is dependent on the used 20 stomatal conductance formulation, introduced in Section 2.3. Radiation absorption is estimated by a two stream approximation within a three-layer canopy (Sellers, 1985). Especially in sparse canopies, radiation absorption is affected by clumping of the leaves which is here taken into account according to the formulation by Knorr (1997).
Parameters detailing site-specific soil properties, such as soil porosity and field capacity, were derived from FLUXNET datasets and the references in Table 1. We approximated the soil composition and generated these properties following Hage-25 mann and Stacke (2015).

Modifications to the JSBACH model
All parameters of interest, presented in Table 2, were extracted from the JSBACH model code to an external file to facilitate the simulations. The default values of newly added parameters (not originally in JSBACH: τ , q, g 0 , g 1 ) were derived from a synthesis of literature values. Most of the parameter ranges (limiting values for the parameters) were adapted from our previous 30 work on a similar topic (Mäkelä et al., 2016). The parameter grouping was done to enhance optimisation and the mechanism is explained in Section 2.7. Group I consists of parameters most directly affecting photosynthesis, group II parameters are intimately involved with soil moisture, and group III are the logistic growth phenology (LoGro-P) model parameters. The equations governed by these parameters are presented in Appendix A. The start of the growing season in the JSBACH model is defined by a "spring event" in the LoGro phenology model (appendix A3) that induces leaf growth. The phenology model calculates a sum of ambient temperature (heatsum) since last autumn that is above the cutoff value T alt , presented in Eq. (A10). It also calculates a variable threshold, defined in (A12), for the heatsum to reach. The threshold decreases based on the number of days the ambient temperature is below T alt , whereas the heatsum increases. When the heatsum reaches the threshold, the plant leaves are free to grow.

5
However, coniferous evergreen trees do not shed all of their leaves for winter and the existing foliage enables them to quickly initiate photosynthesis in the following spring. The start of the photosynthetically active season in the model has been observed to occur too early in the Boreal region by e.g Böttcher et al. (2016). In order to correct this behaviour i.e. to restrain the respiration and photosynthesis of conifers in the early spring, we utilise a delayed effect of temperature for photosynthetic activity, introduced by Mäkelä et al. (2004). To calculate the reduction, we must first define the state of photosynthetic acclimation that 5 Mäkelä et al. (2004, p.371) present as: "an aggregated measure of the state of those physiological processes of the leaves that determine the current photosynthetic capacity at any moment".
The state of acclimation (S) is calculated from air temperature (T ) with a delay prescribed by parameter τ (this is similar to the calculation of T S in appendix A14). S is then inserted into sigmoidal relation Eq. (1) to calculate a factor γ, a formulation that is adapted here from Kolari et al. (2007). Finally, γ is used to reduce the photosynthetic efficiency in Eq. (A1). T 1/2 denotes 10 the inflection point where γ reaches half of γ max , k is the curvature of the function and γ = 1 when S ≥ 10.
The JSBACH model was also modified to include altogether six different stomatal conductance formulations following Knauer et al. (2015). These formulations include the pre-existing Baseline and Bethy versions as well as the Ball-Berry model and three of its variants. Model information is gathered in Table 3 for easy referencing and the detailed formulations are given 15 in appendix B. The limits of the slope of the stomatal conductance formulation parameter (g 1 ) were set to reflect commonly observed values from physiological measurements (Egea et al., 2011). The limits of g U SO  We have also included two additional parameters (a and d in Table 2) for the Friend and Kiang (Friend and Kiang, 2005) stomatal conductance formulation in B3. These parameters were not originally included in the optimisation, but the resulting 20 cost function (9) values were poor when compared to the other formulations. At that point, these parameters were included in the optimisation process. This increases the degrees of freedom for the Friend and Kiang model by two and therefore may give it an advantage when compared to the other Ball-Berry type formulations, which has to be considered in the interpretation of the results.
All of the stomatal conductance models contain an empirical water stress factor β, which reduces stomatal conductance as a function of volumetric soil water content (θ).
In JSBACH, the stomatal conductance (g s ) is primarily resolved to estimate carbon fixation. The same g s is then later used to calculate transpiration (A8). In the original JSBACH formulation (i.e. the Baseline version), the g s is first resolved for 5 unstressed canopy and then scaled by the water stress factor β. The Bethy approach is similar, but the conductance can also be limited by water supply (B2). In cases when the water supply is not the limiting factor, the calculations are similar to the Baseline version. In all of the empirical Ball-Berry variants, the stomatal conductance can be written as g s = g 0 + cβg 1 . The residual conductance (g 0 ) and the slope of the function (g 1 ) are both formulation specific parameters as well as the factor c, that incorporates net photosynthesis and effects of atmospheric humidity and CO 2 concentration. The parameters g 0 and g 1 are 10 part of our sampling and optimisation processes (group I in Table 2 when applicable).
The water stress factor (β) limits the carbon fixation and transpiration via the stomatal conductance formulation. Following Egea et al. (2011), it is also used to directly limit the net assimilation rate (A n ), as seen in (A1). The additional scaling (or limiting) factor for A n takes the form β q , so it is a function of both soil water content θ and the parameter q. Maximal reduction is achieved when q = 1 and the reduction factor reverts to β. The minimal reduction occurs when q = 0 and the reduction factor 15 resembles a step function (at θ = θ pwp ). For any other value of q, it is a continuous convex function between the two extremes

Model simulations
The site level measurements, used as model inputs, are air temperature, air pressure, precipitation, humidity, wind speed and CO 2 concentration as well as short-and longwave and potential shortwave radiation. Additionally, evapotranspiration (ET) and 20 gross primary production (GPP), derived from the eddy covariance (EC) measurements, are used to constrain and evaluate the model (as explained later in Sections 2.8 and 2.9). We drive the model with half-hourly data but output daily values.
The initial state of the JSBACH model can be generated from predefined values of state variables (usually empty initial storage pools) or the model can be restarted from a file describing the state of some previous run. Depending on the area of interest, a model spin-up may be required to bring the model into a steady state. In our simulations, some of the more slowly 25 changing variables (e.g. soil water content and LAI) need to be equilibrated, so a spin-up is required. This can be achieved by running the model over a set of measurements multiple times, each time restarting from the final state of the previous run.
The calibration period consists of the first five years given for the calibration sites in Table 1. The spin-up is achieved by looping over these five years, altogether four times (20-year spin-up) and then saving the state of the model at the end of the run. The actual calibration is started from the beginning of the calibration period, using the previously saved state variables. To 30 reduce any bias this induces, the first year in the calibration run is removed from the cost function calculations. The spin-ups for the validation sites in Table 1 are similarly generated.
During the summer 2006, the Hyytiälä (FI-Hyy) measurement site suffered from a severe drought (Gao et al., 2017), leading to visible discolouration of needles. These events are difficult for models to capture and hence are of interest to modellers.
We have previously and unsuccesfully attempted to optimise the JSBACH model (Mäkelä et al., 2016)

15
Using Bayes' rule on conditional probability we can write the parameter posterior density (p(θ θ θ, M|x)) as a function of the likelihood (L(x|θ θ θ, M)), parameter prior distributions (π(θ θ θ)) and the model evidence (Z(x|M)). As usual and from here on, we do not write M in the Bayes' formula: We can now utilise the posterior density as a probability density for the parameters and infer the expectation values: Above θ θ θ i is the i-th element of the parameter vector. Generally, Eq. (4) cannot be analytically solved, hence it is usually estimated numerically. Commonly this is achieved by one of the many Markov chain Monte Carlo (MCMC) methods, but in this study we apply the adaptive population importance sampler (APIS) defined by Martino et al. (2015). APIS (Martino et al., 2015) is a Monte Carlo (MC) method that utilises a population of importance samplers (IS) to jointly estimate the target pdf 25 (p(θ θ θ|x)) and the normalising constant (Z(x)) by a deterministic mixture approach (Veach and Guibas, 1995;Owen and Yi, 2000), whereas the MCMC methods do not care about the value of Z. We denote the importance sampling density as q(θ θ θ).
Above r is the reweighing factor that is the driving force in importance sampling. We will next give a summary description of the sampling process with comparison to a general multichain MCMC approach (since MCMC methods are more commonly 30 used in these types of situations).
1. The initialisation of a multichain MCMC sampler and APIS are very similar. In our simulations, APIS is set up as 40 simultaneous and independent importance samplers. This is similar to an independent 40-chain MCMC sampler. Each sampler or chain has a random starting location drawn from a uniform distribution defined by the parameter ranges, given in Table 2. The initial sampling (or prior) distribution for each sampler is also randomly generated -we use truncated Gaussian distributions with diagonal covariance matrices, where the standard deviations are randomised. The sampling 5 distributions will evolve throughout the process.
2. In an MCMC setup, the model would be run once (for each chain), evaluated and then the draw (parameter values) accepted or rejected accordingly. In APIS, instead of a single element (one run) we use a sample size of 50. This means that we draw 50 elements with each IS sampler (or "chain") independently. These draws are then evaluated and reweighted as presented in Eq. (5). 3. The 50 reweighted draws (for each IS sampler separtely) are used to calculate a new location for the sampling distribution. This location is automatically accepted (no rejection criteria) and we also adapt the shape of the distribution using the self-normalising AMIS estimator by Cornuet et al. (2012).
4. Additionally, all of the draws in APIS are used to calculate "global" estimates of the parameter expected values. This process utilises the deterministic mixture approach (Veach and Guibas, 1995;Owen and Yi, 2000) and is fully iterative 15 with no need for any recalculations as the previous estimates are directly adjusted (no information is lost either).
MCMC chains track the evolution of single elements, and occasionally adjust the sampling distribution. The sample size in APIS is larger (it is not a Markov chain method) and the focus is on the evolution of the locations of the sampling distributions, not on the individually drawn elements. These location parameters are expected to be around all the modes of the target and the deterministic mixture ensures the stability of the estimation of the (global) parameter expected values. As an importance 20 sampler, APIS is also a variance reducing method.
Before taking a more detailed look at APIS, we make some further notes about the sampling process. The first element of the 50 draws (item 2 in the list above) is always fixed as the current mean. We run the spin-up (Section 2.4) and generate the model starting state only for the proposal means, and use the same state for the other 49 draws (perturbed around the proposal mean). This requirement stems from a need to reduce computational time as running the model to a steady state is costly. This 25 approach might induce some discrepancies, but they are mitigated by removing the first year of the calibration simulations (as explained in Section 2.4). We also slightly reduce the importance weights of the 49 samples (more reduction for samples further from the proposal mean), when calculating the new location parameters (item 3 in the list above) -the reduction only (slightly) slows the adaptation of the IS sampler locations. Finally, we note that this approach ensures that we run the proposal means, that are the focus in APIS, with the correct spin-up.

Adaptive population importance sampler
Normally, only the location parameters of the IS proposals are adapted, but we also adapt the shape parameters using the selfnormalising AMIS estimators by Cornuet et al. (2012). APIS is able to utilise different or a mixture of normalised proposals densities, but we use truncated Gaussian proposals with diagonal covariance matrices.
In our simulations, APIS is formed of 40 independent IS estimators. Each estimator draws a sample θ θ θ i , i ∈ {1, ..., N }, of 5 size N = 50 at a time from their own proposal distribution q j (θ θ θ), j ∈ {1, ..., M }, M = 40. The estimator then calculates the importance weights (w ij = p(θ θ θi|x) qj (θ θ θi) ) for each sample. The location (µ µ µ j ) and shape (C j ) parameters (Cornuet et al., 2012) of each proposal are updated using only samples (and weights) drawn from q j . The new shape parameters are formed as a mean of the previous estimate and C j , as calculated below.
The simple IS estimators alone are rarely sufficient if the target is even slightly complicated. One classical way of tackling this problem is to join multiple IS estimators together. The simplest approach is to calculate the weights for each of these estimators separately and to normalise the result by the combined sum of all weights. However, this leaves the estimators susceptible to "bad" proposals. APIS suppresses the bad proposals by utilising the deterministic mixture approach (Veach and Guibas, 1995;Owen and Yi, 2000) presented in Eq. (7), where each proposal q j is evaluated at all the drawn samples and 15 weighed by the amount of samples drawn (N j = 50) from that proposal. This is equivalent to joining the normalised proposal densities together and evaluating the joint pdf.
The parameter expectation values and the normalising constant in Eq. (5) can now be estimated by Monte Carlo integration using weights calculated in Eq. (7). 20

Parameter optimisation
The APIS algorithm is a rather robust method meant for examining the full target probability distribution and locating the modes of the target distribution. Adaptation in APIS utilises multiple draws simultaneously, which can easily lead to few parameters controlling this process (the marginal density of one or few parameters dominates the calculations). Since we also did not run the model spin-up for all drawn samples (although the discrepancies should be minimal), we utilise a simple custom 25 stochastic optimiser to locate the optimal set of parameter values. This optimiser is run after the APIS calibration simulations and separately for the drought period. The optimiser utilises the exact same datasets (calibration, validation, observations etc.) as APIS, the spin-up is generated for all drawn samples separately and the initial state of the algorith is the mean value of the APIS final configuration (location parameters).
Our optimiser is a simple random sampler amplified by the "velocity" of the last jump (the idea is similar to Hamiltonian 30 or Hybrid Monte Carlo by Duane et al. (1987)). We draw a set of samples from a small Gaussian proposal distribution in the vicinity of the current best estimate and calculate the cost function for the samples. Whenever a better point is found (smaller cost function), we jump to that (update the mean of the proposal distribution). The "velocity" of the jump (for us merely distance of change in each parameter) is then added to the new mean (with a maximal limit of one standard deviation in the proposal distribution), but it is reduced and eventually removed if a better sample is not found.
The covariance matrix of the proposal distribution is recalculated at predefined intervals (for all parameters). Additionally, 5 we utilise a subset sampling procedure, where the samples are first drawn from the full parameter space, in the next step they are drawn only from group I in Table 2 (the rest are kept at their current optimal values), followed by groups II and III and then back to the full parameter space. When the number of parameters is reduced, we are more likely to find a better set of parameter values. We have kept the parameters mostly affecting the same processes in the same group, but some dependencies may not be apparent and hence it is also important to draw samples from the full parameter space.

Simulation analysis
Even though APIS is not a Markov chain method, we can (naively) interpret the evolution of the location paramaters of each IS sampler as chains. The resulting 40 chains have random starting positions but they are relatively short (we present results from the Bethy calibration, where the chains were adjusted 100 times), hence we did not discard any of the samples. We test the convergence of these chains with the Gelman-Rubin diagnostic tests (Gelman and Rubin, 1992), comparing the variance 15 between the chains to the variance within each chain, and calculating the potential scale reduction factors (R). We also test the stability of the (parameter) global expected value estimate (using the deterministic mixture approach) by calculating the difference of the final global expected value and the mean of the location parameters (at each iteration). We denote this test as δ and report the number of the iterations when this difference is below 5% of the parameters range, given in Table 2.
In order to visualise the results, we have utilised a Gaussian kernel density estimation (KDE) to produce distributions from 20 the APIS simulation location parameters. In practice, KDE places a Gaussian distribution centred at each sample and the constructed composite distribution is an estimate of the underlying actual distribution. The bandwidth for the distributions is calculated using the Scott's rule (Scott, 2004): the data covariance matrix is multiplied by a factor n −1 d+4 , where n is the number of data points and d is the number of dimensions.
The effectiveness of each parameter was calculated from the final state of each optimisation process. This was done by 25 first setting all parameters to their optimised values. Then we (evenly) sampled each parameter separately from their range of acceptable values, given in Table 2, and calculated the corresponding cost functions. For each parameter the maximum difference in these cost function values (and the optimised value) was recorded. The parameters (within each optimisation) were then ordered by these numbers (with highest difference meaning highest effectiveness) and separated into three groups with highest (most effective) and lowest (least effective) effectiveness values, and the rest. This effectiveness relates to how the 30 APIS "sees" the sampling process -the 50 draws are evaluated simultaneously and a very effective parameter can easily mask the influence of a less effective (the marginal density of one or few parameters dominates the calculations).
We report the slope of the regression line (b) and the coefficient of determination (r 2 ), between the observations (y i ) and the model output (x i ). The slope of the regression line is highly indicative of the model bias (difference of the expected values of the observations and the model). Hence we interpret the bias directly from b (in our results the regression lines pass near origin so the differences this induces are negligible).

Cost function
The Bayesian framework requires a likelihood function that optimally combines pointwise model and observational errors. The

5
JSBACH model error is unknown as is the (pointwise) observation error. We could use a general type of error estimate (such as that of 20% of the flux value) for the observations, but would have to include a minimal site and instrumentation dependent precision. In this study, the full error is treated as Gaussian white noise. Because of these limitations, we are calling and defining our likelihood as a cost function. It is calculated with the same parameter values for each site, using site spesific daily measurements with the gap-filled, low-quality and winter (between the 315th and the 75th day of the year) values removed 10 (resulting in N ET and N GP P points). These site level estimates are averaged to produce the actual cost function, which is then returned for the algorithm to produce an estimate that is independent of the characteristics of any single site.
The cost function (9) 20 We also use a modified version of this cost function, where the NMSE's are weighted by factors based on coefficients of determination (r 2 ) defined in Eq. (8). This latter cost function is only used during the separate drought period optimisation for Hyytiälä. During the drought we are more interested in the correct timing of the change in GPP and ET fluxes, rather than the size of the actual change. The aim is to correctly reproduce the changes in the water use efficiency (WUE) of plants, which we interpret here as the pointwise ratio of (ecosystem level) GPP to ET. The NMSE values ensure that the overall amplitude of the 25 fluxes will remain satisfactory.
3 Results First we present the performance of the APIS algorithm and the parameters themselves, followed by site and stomatal conductance model specific results, and finally an examination of the Hyytiälä drought event in 2006. For simplicity, we use the name of the stomatal conductance model to refer to the JSBACH model utilising that stomatal conductance formulation.
The evolution of the APIS algorithmic process is presented in Fig. 1 for three parameters from the calibration of the Bethy 5 model. The chosen parameters highlight different levels of identifiability for the algorithm (with the given cost function). The first parameter (f C3 ) shows a well identifiable situation, where the algorithm quickly locates the area of high probability. The second parameter (θ dr ) is also identifiable but the speed of convergence is diminished. The last example (C decay ) represents situations where the parameter is not constrained. We have included images of the APIS chains for the other parameters as supplement S1 along with parameter posterior estimates at 20 iterations with the Bethy and Ball-Berry formulations.

10
We also report the results of the Gelman-Rubin (Gelman and Rubin, 1992) and δ tests in Table 4. Both of these tests indicate that the algorithm is performing well at 20 iterations -the values ofR ≈ 1, which means that further simulations are unlikely to improve the variance estimates. However, for some parameters, the convergence of the global estimate is slow (as also seen in the supplementary image S1 for e.g. τ , c b and q). The APIS sampling process did not reveal any multimodal distributions and thus provided suitable initial conditions for the optimisation.

Optimised parameters
The results of the optimisation process are gathered in Table 5. There is an overall agreement on the values of the most prevalent parameters (see the bold and italic characters in Table 5 between the models). Most notably, the permanent wilting point (θ pwp ) and the point above which transpiration is unaffected by soil moisture stress (θ tsp ) have been significantly lowered. The LoGro phenology parameters, which affect the timing of the spring and autumn events, are expected to contribute only little to the 20 cost function. The coniferous evergeen trees do not shed all their leaves for winter and therefore the timing of the bud burst is not as critical as for e.g. deciduous trees. Additionally, because of the existing foliage, the state of acclimation parameter τ that depicts the reduction of carbon assimimilation in the early spring likely dominates the phenology parameters that determine when new leaves start to grow.  Table 7 g0 1.0E-3 --4.7E-3 4.7E-3 4.4E-3 4.2E-3 Table 7 g1 Some of the parameters have converged to their limiting values, which can reflect deficiencies in the model structure or the preset parameter ranges. Convergence to the boundary can also be a problem in model calibration, but in this experiment, the algorithms were able to cope with the situation as APIS located the area of high probability and the optimiser located the 5 maxima. The different parameter effectiveness levels reported in Table 5 can be roughly equated to the identifiability situations in Fig. 1. The effectiveness levels are highly situational (e.g. they depend on the sampling limits in Table 2 given for each parameter) and merely reflect the parameter identifiability in the APIS process. Low effectiveness complements the test results in Table 4, as the tests may indicate good performance for a parameter (e.g. for S range ) that is ineffective in the simulations.

Annual cycles
We present the average annual cycles for the validation period and for all sites in Fig. 2 using the Bethy formulation that is part of the standard model. The annual cycles generated with the other stomatal conductance models are added as supplement 5 S2. The parameters of the regression lines (b and r 2 ) between the measured and modelled ET and GPP fluxes of all the models are gathered in Table 6. These indicators have been calculated using all corresponding values regardless of the quality of the data. The sites are in the same order as in Table 1 with the six calibration sites first, followed by the four sites used only for validation. We have also included a supporting synthesis of the b and r 2 values between the model simulations with the default and optimised parameter values as supplement S3.

10
The optimisation has improved the model bias and the correlation coefficients for the GPP in Fig. 2 for nearly every site, with the exception of deteriorating bias for Poker Flat (US-Prr) and Zotino (RU-Zot). Additionally, the improvement in the timing of the springtime increase in the GPP is apparent. All of the correlation coefficients for the ET in Fig. 2 have also been improved but the model bias has mostly increased.

Drought event 15
The resulting parameter values, from the optimisation during the drought conditions in Hyytiälä (FI-Hyy) in the summer of 2006, are presented in Table 7. Setting the maximum carboxylation rate to a constant value (V C,max = 52.0) enabled the full use of the dynamical range of q -the idea was to ensure that V C,max does not dominate the optimisation, any value for q is possible and it is able to influence the outcome. The LoGro phenology parameters and τ were fixed to their optimised values, presented in Table 5, as they should not be affected by the drought. Likewise, the values of other parameters (not presented in 20 Table 7) were set as compromises between the stomatal conductance formulations.
We can now compare the parameter values in Table 7 to those in Table 5. The values of the relative humidity parameter (θ hum ) and the residual stomatal conductance (g 0 ) have remained nearly unchanged, but for the rest of the parameters have quite varied values. The leaf internal-to-external CO 2 concentration (f C3 ) as well as the slope of the stomatal conductance (g 1 ) are at the lower bound (expect g 1 for BB). Noticeably, the USO optimisation only changes the values of θ tsp and q, and leaves 25 the rest of the parameters almost untouched.
The changes these different parameterisations have on the model output are visualised in Fig. 3. All of the stomatal conductance models, with default parameterisation, suffer from too low ET values before (and during) the actual drought. This behaviour was corrected during the general optimisation, but has partially re-emerged with the dry period parameters for the Baseline, Ball-Berry, Leuning, and to a lesser degree the Friend and Kiang formulations. Most of the models also exhibit too 30 high ET values during the actual drought with the generally optimised parameter values. This behaviour was also corrected with the dry period optimisation, but the Baseline and especially the Bethy model now suffer from a too strong drawdown of ET. These models also demonstrate the too strong drawdown for the GPP. The GPP itself was greatly improved with both Table 6. Slope of the regression line (b) and the coefficient of determination (r 2 ) for the different stomatal conductance formulations during the validation period with the optimised parameters. We have written the best values of b and r 2 in boldface for each site, and italicised the abbreviations of the separate validation sites. optimisations and for all models. The dry period optimisation of the USO model also managed to correct the erroneous GPP of the general optimisation during the actual drought, where as the GPP of other formulations has remained roughly the same as with the general optimisation. The USO formulation results in the best fits for r 2 and b with the dry period optimisation. The Bethy and the USO models demonstrate the most variability in the β-function values in Fig. 3 (rightmost panels), for the dry period optimisation. We selected these two stomatal conductance formulations to examine the changes to the water use Finally, we used both optimised parameter sets (Table 5 and 7) to produce the ET and GPP cycles for all sites and stomatal 10 conductance models. This analysis (not shown) verified that in general conditions the Table 5 parameter values produced better estimates in general. The b and r 2 values for the ET were systematically better for all stomatal conductance formulations (except one). There was some variation in the indicators for the GPP, where approximately a third of the values (of mostly r 2 ) are better with the dry period parameter set. These differences are mostly attributed to increased model bias (decreased b) that is explained by the lower values of g 1 . Overall, the more general optimisation provided systematically better or comparable 15 results to the dry period optimisation. The exception is the USO formulation, which had an approximately 1:1 distribution of best values for both variables in-between the parameter sets.

Discussion
We will first discuss the validity of our approach and the simulation setup, followed by examination of the success of the modifications made to the model, and close with some further remarks on the parameter values.

Validity of the simulations
Before we calibrated the model, we fixed the limiting value for LAI and adjusted the site-specific vegetative area fractions to reproduce the measured site level maximum of LAI. In the simulations, we focused on boreal coniferous forests, where light penetration is deep and the light conditions are homogenous -consequently we could assume a homogenous leaf distribution.
Furthermore, the JSBACH model takes into account leaf clumping and we can assume the leaf orientation and shape to be 5 similar throughout the study sites. Therefore, we argue that reproducing the site level maximum of LAI is appropriate approach in this study. Together with parameter calibration it has resulted in improved ET and GPP fluxes as can be verified from the b and r 2 values in Fig. 2. The improvements in b and r 2 are mostly seen in the GPP flux, which can be explained by the fact that the stomatal conductance in JSBACH is primarily resolved for carbon assimilation, and the same conductance is then used for transpiration (A8). Additionally, GPP is derived from the EC measurements by flux partitioning -this tends to remove some 10 of the flux instabilities (that are still present in the ET).
We encountered difficulties in reproducing the fluxes for the validation sites with low LAI (i.e RU-Zot and US-Prr). This can be a consequence of the area scaling as the adjustment linearly changes the proportions between vegetative area and bare soil.
Another reason is the lack of the site understory in these simulations. For example, approximately half of the CO 2 fluxes (and consequently roughly half of the GPP) for Poker Flat are produced by the site understory (Ikawa et al., 2015). Additionally, 15 there are also many parameters describing site-specific soil properties (such as porosity) that were not part of the optimisation and may be inaccurate. These effects may also be pronounced due to the changes in parameters affecting soil moisture as well as the area scaling.
There were no clear differences between sites dominated by pine or spruce. Neither did we notice any particular effect on the bias, NMSE or correlation coefficient that could be explained by geographical location, stand age or annual precipitation 20 or temperature. We optimised the model for individual (calibration) sites as well (not shown). Mostly this changed the values of parameters (such as V C,max and g 1 ) affecting the amplitude of the modelled fluxes. These parameters can be viewed to be more site-specific, a characteristics that is reduced in a multi-site calibration -the possibility of highly site-specific properties (and parameter values) can also explain the difficulties in reproducing the validation site observations. We are omitting these results as single-site optimisation can be viewed as overfitting the model and the results do not provide any additional insights. 25 The APIS performance tests (Gelman-Rubin and δ) indicate that the algorithm is performing well at 20 iterations but the convergence of the global estimate for some parameters is slow. This is mostly a direct result of the normalisation of the cost function that inflates the target distribution, which reduces the parameter sensitivity to observations and gives too much weight to the initial locations and draws. Without the normalisation, the algorithm would also converge faster. Additionally, APIS is meant to examine the full target distribution with only some sequantiality -20 iterations (or less) should be sufficient for APIS 30 to locate the modes of the target. In longer APIS simulations, the global estimate would likely benefit from discarding the first half of the samples but this would require the estimate to be recalculated at each iteration (from the drawn samples) as it could not be calculated iteratively.

Delayed effect of temperature
We modified the JSBACH model by introducing the delayed effect of temperature for photosynthesis to restrain the respiration and photosynthesis of conifers in spring. The effect of this (delayed increase in GPP) is apparent in the annual GPP cycles of CA-Qfo, FI-Hyy, FI-Ken, FI-Sod and RU-Zot in Fig. 2. The delay is in place for the other sites as well, but the effect is less apparent in the figure. This delay is to a lesser extent also reflected in transpiration, and consequently in ET, as can be seen e.g. came to a similar conclusion when they encounter a near-flat distribution for τ in the range of 1-12 days. In our simulations τ exhibits larger optimal values (nearly 15 days), which is most likely due to the model adapting to the multi-site calibration (as 15 sites have different characteristics, a longer acclimation period accounts better for these variations).

Stomatal conductance models
We examined the model behaviour with six stomatal conductance formulations and the resulting b and r 2 values are presented in Table 6. The best performance (bolded values) in simulated ET is achieved by the BB model for bias and the Bethy formulation for r 2 . These two models also share the best performance in the GPP bias, whereas the best r 2 values for the GPP are 20 demonstrated by the F &K model, followed by the USO formulation. Calculating the number of best values demonstrated by each model, we obtain that the best performance is shared by the Bethy (12) and F&K (12) formulations, followed by the BB (11) model. However, we note that some of the "best values" are only marginally better than comparable values. Additionally, we used two more parameters (a and d) for the F&K formulation than for the other Ball-Berry formulations. Likewise, we could have, for example, included the factor D 0 (B3) in the optimisation, which would have likely improved the performance 25 of the Leuning model. Similarly to the results by Knauer et al. (2015), based on this (general) calibration, there is no clear single candidate for the best stomatal conductance formulation.
The model behaviour was also examined during the Hyytiälä drought of 2006. Some of the parameter values were kept fixed during these simulations, most of the fixed parameters should not affect the drought period calibration but there are exceptions, such as the maximum carboxylation rate V C,max . It can be argued that e.g. both the parameters V C,max and g 1 should decrease 30 (Egea et al., 2011) during the drought but we decided to fix V C,max to get a better response for q. The best fit to the observations was achieved by the USO formulation, as seen in Fig. ??, with remarkably similar parameter values to the general optimisation.
The USO model was also able to (somewhat) replicate the "δ" shape of the drought in Fig. 4.
The stomatal conductance function (g s = g 0 +cβg 1 ) incorporates also the soil water parameters θ tsp and θ pwp in the form of the β-function as portrayed in Eq.
(2). The changes in the values of these parameters (mostly g 1 , θ tsp and θ pwp ) are intertwined.
During the drought, the decrease in the optimised values of g 1 is expected as the plants close their stomata to minimise the loss of water by transpiration (Egea et al., 2011;Zhou et al., 2013). The same effect is also achieved by increasing the values of θ tsp and θ pwp as this decreases the values of the β-function. The higher values of g 1 during the more general optimisation are 5 better reflected by Franks et al. (2018), whereas the lower values during the drought are more in accordance with physiological observations by Egea et al. (2011). Likewise, Lin et al. (2015) found higher values for g 1 (both boreal area and gymnosperm trees) using the USO model. In general, the site level estimates of (g 0 and) g 1 are sensitive not only to the stomatal conductance formulation but also e.g. to the structure of the underlying model and the value of other parameters, such as maximum carboxylation rate (V C,max ).

10
Wang ( 1996) reported g 1 = 3.78 (in Table 1 and maximum rate of electron transport J max . We would suggest that the limiting values θ pwp and θ tsp should be optimised or fixed before introducing additional tuning factors such as mesophyll conductance or scaling the β in multiple ways in the stomatal conductance formulations (Egea et al., 2011). Our simulation setup for q corresponds to the configuration 5 (C5) by 15 Egea et al. (2011), with variables q = q B and fixed value q S = 1.

Parameter values
Some of the parameters in this study have been calibrated before by e.g. Kattge et al. (2009);Knorr et al. (2010). Our approach differs from these as we required the model to reproduce the site level maximum of LAI. In contrast e.g. Knorr et al. (2010) found the structural limit for (all-sided) LAI to be 4.2, which is considerably lower than the measured LAI for many of the 20 sites in Table 1. Our approach directly scales the vegetative area, so it also scales GPP and also the amount of rain available for plants (as rain is directed to bare soil and vegetative area). This means that the parameter values should not be directly compared without taking the different paradigms into account. However, our optimised V C,max values are in-between 62.5 reported by Kattge et al. (2009) and 29.3 by Knorr et al. (2010), and are in line with the yearly cycle presented by Ueyama et al. (2016). 25 The exponential scaling factor q in Eq. (A1) of the β-function (2), was revealed to be ineffective in our optimisation as indicated in Table 5. In our simulations, this situation arises as the effective range of the β-function has been lowered by reducing θ pwp and θ tsp . The actual soil moisture is rarely below the fraction θ tsp , so q is constrained with a very limited number of datapoints, and thus has only minimal effect on the fluxes and the cost function. Therefore, the values presented for q in Table 5 can be unreliable and even unrealistic. This situation is remedied in the drought period optimisation when the soil 30 moisture is low. The resulting values for q in Table 7 have a wide dispersion, although they are mostly on the lower end. This signifies that the additional GPP reduction is mostly gradual, with a steep decrese near the permanent wilting point θ pwp .
The values of soil water parameters are closely grouped in the optimisations except for the values of θ pwp during the drought. This can occur due to a larger impact, of the different stomatal conductance formulations on the accumulating soil water content, than assumed -this can also be seen from the differences in the β-function values in Fig. 3. Furthermore, the values of θ tsp and θ pwp have been considerably lowered from their default values in both optimisations. This change can be perceived in at least two different ways. Either the boreal forests are not generally limited by soil moisture stress (except in the case of extreme drought) or the water retention capabilities of the soil (in the model) have been systematically overestimated.
The latter seems unlikely, in the light of results by e.g. Gao et al. (2016).

Conclusions
The adaptive population importance sampler (APIS) is a recent method, capable of estimating complicated multidimensional probability distributions using a population of different proposal densities. The algorithm was able to produce reasonably stable estimates for most parameters quickly. Prior to calibrating the model, we adjusted the site-specific vegetative area fractions to reproduce the measured site level maximum of LAI. This practical approach resulted in improved ET and GPP fluxes, although 10 we encountered difficulties in replicating these for sites with low LAI. The model parameters were optimised simultaneously for all sites without any additional site level tuning. The parameters that were most effective in the optimisation processes, were consistent for all stomatal conductance formulations.
The introduction of the S-function, to delay the start of the vegetation active season, has corrected the springtime increase in GPP for conifers throughout the sites used in this study. The parameters θ tsp and θ pwp , that set the range for the soil moisture  License Agreement (http://www.mpimet.mpg.de/en/science/models/license/). The modifications to the model, described in this paper, have been uploaded to Github and they can be accessed by contacting the authors at jarmo.makela@fmi.fi (after access to the actual model has been approved). For any questions, we encourage you to contact the authors at jarmo.makela@fmi.fi.

Appendix A: Parametric equations within JSBACH
In this appendix we present the most relevant equations that are governed by the parameters in Table 2. The appendix is divided into sections that coincide with the parameter groups.

A1 Photosynthesis
The Farquhar model (Farquhar et al., 1980) is based on the observation that the assimilation rate in the chloroplast is limited either by the carboxylation rate (V C ), induced by the Rubisco enzyme, or the light-limited assimilation rate (J E ). The total rate of carbon fixation is reduced by the amount of dark respiration (R d ), resulting in net assimilation rate (A n ). The experimental scaling factor β q (Egea et al., 2011) is based on soil moisture stress in Eq.
(2), that takes effect (β < 1) when soil moisture is 10 significantly reduced. This scaling is used by all stomatal conductance formulations. We have also introduced here in equation form the actual reduction to photosynthesis by γ from the delay in the start of the vegetation active season in Eq. (1).
Oxygenation of the Rubisco molecule reduces the carboxylation rate, which is given as: Here C i and O i are the leaf internal CO 2 and O 2 concentrations, Γ is the photorespiratory CO 2 compensation point, K C and K O are Michaelis-Menten constants parameterising the dependence on CO 2 and O 2 concentrations. Furthermore, leaf internal CO 2 concentration depends on the external (ambient) concentration C a (in the Baseline and Bethy formulations and unstressed conditions) by: Likewise, the light-limited assimilation rate can be expressed as a function on electron transport rate (J), which is a function of radiation intensity (I) in the photosynthetically active band, the maximum electron transport rate (J max ) and the quantum efficiency for photon capture (α): A2 Soil water 25 In JSBACH the soil water budget is based on several reservoirs (e.g. skin, soil, bare soil, rain intercepted by canopy etc.) and the different formulations are plentiful. We present here only the most crucial of these. Changes in volumetric soil water (θ s , not to be confused with relative soil water content θ = θs θ f c ) due to rainfall (R), evapotranspiration (ET ), snow melt (M ), surface runoff (R s ) and drainage (D) are calculated with a geographically varying maximum field capacity (θ f c ) and soil water density The interception parameter (p int ) also affects the amount of water intercepted by vegetation and bare soil which further 5 affects evaporation and transpiration. The skin reservoir is limited by w skin and excess water is transferred to soil water.
Likewise when the soil water content (θ) is greater than parameter θ dr , the excess water is rapidly drained (in addition to the limited drainage below this threshold), where d, d min and d max are constant parameters: Evaporation from wet surfaces (E ws ) depends on air density (ρ), specific humidity (q a ), saturation specific humidity (q s ) 10 at surface temperature (T s ) and pressure (p s ) and aerodynamic resistance (R a ). The aerodynamic resistance depends on heat transfer coefficient (C h ) and horizontal velocity (v h ).
Transpiration from vegetation (T v ) is likewise formulated but additionally depends on the stomatal resistance of the canopy (R c ), which is an inverse of the stomatal conductance and as such, depends on which conductance model is used.

15
T Evaporation from dry bare soil (E s ) also has an added dependence on surface relative humidity (h s ) calculated from soil dryness: The total evapotranspiration is a weighted average of E ws , T v and E s , where the weights are based on fill levels of reservoirs 20 and the vegetative fraction of the grid cell.

A3 Logistic Growth Phenology (LoGro-P) model
The parameters from the LoGro-P are mainly used to determine the spring and autumn events for JSBACH. To determine the date of the spring event we first introduce a few additional variables, namely the heatsum S T (d), the number of chill days C(d) and the critical heatsum S crit (d). T (d) denotes the mean temperature at day d. 25 S Heatsum S T (d) cumulates the amount of "heat" above the parameter T alt after the previous growing season. The actual starting date d 0 of the summation need not be known since it is enough to start the summation "reasonably late" after the last growth season.
The number of chill days is calculated as the number of days when the mean temperature is below T alt . Here H() denotes 5 the Heaviside step function and the summation starts at the day (d a ) of the last autumn event.
S crit (d) = S min + S range e −C(d)/C decay (A12) The critical heatsum (S crit ) decreases as the number of chill days C(d) increases, with an exponential memory loss parameter C decay . The spring event happens when: The autumn event requires the definition of one more variable, the (pseudo) soil temperature (T s (t)), which at time t is calculated as an average air temperature (T ) with an exponential memory loss (T ps ). The autumn event occurs when T s falls below a certain threshold. In the equation N is the normalization constant and τ is the length of a time step.
Appendix B: Stomatal conductance formulations 15 In this appendix we present the stomatal conductance model formulations used in this study. In the original JSBACH formulation, the Baseline model (Knorr, 1997), the photosynthetic rate is resolved in two steps. First the stomatal conductance under conditions with no water stress is assumed to be controlled by photosynthetic activity (Schulze et al., 1994). Here the leaf internal CO 2 concentration is assumed to be a constant fraction (C i,pot = f C3 C a ) of ambient CO 2 concentration (C a ). This allows for an explicit resolution of the photosynthesis (Knorr, 1997). Then the impact of soil water availability is accounted 20 for by a soil moisture-dependent multiplier (β) that is identical for each canopy layer (Knorr, 1997).
After accounting for soil water stress, the net assimilation rate (A n ) and intercellular CO 2 concentration are (C i ) are recalculated using g s , and integrated over the leaf area index to produce canopy level estimates.
In the Bethy approach (Knorr, 2000), the unstressed canopy conductance (G c,pot ) is calculated similarly to the Baseline 25 model, but potentially further limited by the water supply function of the maximum transpiration rate (T supply = βT max ).
T max is a fixed and predefined upper limit for transpiration as in Knauer et al. (2015).
The potential (unstressed) transpiration rate (T pot ) is a function of air density (ρ), saturation specific humidity (q s ) at given temperature and pressure, specific humidity (q a ), aerodynamic conductance (G a ) and unstressed canopy conductance (G c,pot ).
After this scaling, the net assimilation rate and intercellular CO 2 concentration are recalculated as in the Baseline model.
The Ball-Berry variants relate the stomatal conductance (g s ) to empirically fitted parameters g 0 (mol m −2 s −1 ) and g 1 (unitless, except for g U SO 1 which has units of √ kPa) that respectively represent the residual stomatal conductance and the slope 5 of the function. The stomatal conductance is a function of the net assimilation rate (A n ), the water stress factor (β) and the atmospheric CO 2 concentration (C a ). The original Ball-Berry formulation (Ball et al., 1987) also depends on relative humidity at leaf surface (h s ). In the Leuning model (Leuning, 1995), the CO 2 concentration is reduced by the CO 2 compensation point (Γ) as well as scaled by the vapour pressure deficit (D s ) and a constant (D 0 ) depicting the stomatal sensitivity to changes in D s . The Friend and Kiang model (Friend and Kiang, 2005) adds an exponential dependency on the difference of specific (q a ) 10 and saturation specific humidity (q sat ) with empirically fitted constants a = 2.8 and b = 80. The unified stomatal optimisation model (Medlyn et al., 2011) also adds a dependency to the vapour pressure deficit (D s ).      . Hyytiälä site water use efficiency for the Bethy and USO formulations. Scatter plotted are the dry period 5-day running averages of ET and GPP, coloured by the intensity of the drought (β-function). The left column depicts the model with the more generally optimised parameter values, the middle column with the drought optimisation and the right column presents the corresponding observations, coloured by the same intensity as in the middle column. The grey points are from the corresponding time during the two previous years.