Land surface model photosynthesis and parameter calibration for boreal sites with adaptive population importance sampler

We calibrated the JSBACH model with six different stomatal conductance formulations using measurements from 10 FLUXNET coniferous evergreen sites in the Boreal zone. The parameter posterior distributions were generated by adaptive population importance sampler and the optimal values by a simple stochastic optimisation algorithm. The observations used to constrain the model are evapotranspiration (ET) and gross primary production (GPP). We identified the key parameters in the calibration process. These parameters control the soil moisture stress function and the overall rate of carbon fixation. 5 We were able to improve the coefficient of determination and the model bias with all stomatal conductance formulations. There was no clear candidate for the best stomatal conductance model, although certain versions produced better estimates depending on the examined variable (ET, GPP) and the used metric. We were also able to significantly enhance the model behaviour during a drought event in a Finnish Scots pine forest site. The JSBACH model was also modified to use a delayed effect of temperature for photosynthetic activity. This modifica10 tion enabled the model to correctly time and replicate the springtime increase in GPP (and ET) for conifers throughout the measurements sites used in this study.


Introduction
Plants exchange carbon dioxide (CO 2 ) and water (H 2 O) with atmosphere. Sufficient soil water, irradiance and adequate temperature are required to maintain the exchange rate during the growing season. Disturbances in these conditions such as drought, 15 cold temperature or low radiation cause the plants to respond to the environmental stress via stomatal closure and the drawdown of photosynthesis and transpiration (Lagergren and Lindroth, 2002;Mäkelä et al., 2004;Gao et al., 2017). The capability of plants to recover from such events depends on species and their adaptation to site conditions (Kozlowski and Pallardy, 2002).
Stress is part of the normal annual cycle of the plants, but occasionally it may exceed the limits of recovery.
Soil water deficit and high water vapour pressure deficit can result in suppressed plant transpiration (Bréda et al., 1993;Kropp et al., 2017) and boreal forests occasionally suffer from soil drought (Muukkonen et al., 2015;Gao et al., 2016). Globally soil drought has been recognised as one of the main limiting factors for plant photosynthesis (Nemani et al., 2003). The recovery 5 of photosynthetic capacity in spring has been connected to temperature history, and to frequency of severe night frosts (Bergh et al., 1998;Bergh and Linder, 1999) reversing the development. Understanding, and correctly modelling, these phenomena are especially important to boreal forests (Bonan, 2008) under changing environmental conditions. Ecosystem and land surface models, describing the plant photosynthesis, transpiration and soil hydrology related processes, usually include descriptions and parametrisations for various stress effects. These parameters often lack a theoretical foundation 10 (Gao et al., 2002;Medlyn et al., 2011) and descriptions of vegetation drought response and phenology have been recognized to need better representations Powell et al., 2013;Xu et al., 2013;Medlyn et al., 2016). These deficiencies restrict the model's predictive capability under changing environmental conditions, and call for specific parametrisations for different plant types and vegetation zones.
The stomatal conductance models describe the pathway of CO 2 and water through the leaf stomata by an electric circuit anal-15 ogy (Nobel, 1999). The variations in stomatal opening and mesophyll structure are interpreted as resistances to the water flow and the process is idealised via generalised parameterisation. The stomatal conductance models mainly differ in their choice of variable driving the stomatal closure and their performance has been recently assessed in modelling by e.g. Egea et al. (2011); Knauer et al. (2015); Franks et al. (2018). However, it can be hypothesised that the choice of the stomatal conductance model affects the ecosystem model parameters more broadly than just those directly related to the stomatal conductance formulations. 20 A holistic assessment of the performance of the stomatal conductance models together with parameter optimisation has been missing.
In this study we apply the land-surface model JSBACH for 10 boreal coniferous evergreen forest eddy covariance sites to examine the performance of different stomatal conductance models, and their effect on calibrated parameters related to photosynthesis, phenology and hydrology. We will assess the inter-site variability and focus on a specific drought period at 25 one site. We will provide an assessment of the robustness of the calibration of parameters together with different stomatal conductance descriptions. We utilise the adaptive population importance sampler to sample the full parameter space with the different stomatal conductance formulations and to locate different modes of the target (peaks of high probability).

Materials and methods
The site level measurements, required by the model as input variables, are air temperature, air pressure, precipitation, hu-30 midity, wind speed and CO 2 concentration as well as short-and longwave and potential shortwave radiation. Additionally evapotranspiration (ET) and gross primary production (GPP), derived from the eddy covariance (EC) measurements, are used to constrain and evaluate the model.

Sites and measurements
We use data from 10 FLUXNET (doi:10.17616/R36K9X) sites characterised as coniferous evergreen forests. Site descriptions with appropriate references are gathered in Table 1. The site level half-hourly measurements were quality checked and gapfilled when needed to produce continuous half-hourly and daily time series. The gap-filled and low quality (based on FLUXNET data quality flags) measurements were masked and the daily aggregates accepted as part of the calibration process if at least 5 60% of values between 4:00 and 20:00 for that day were unmasked.
Based on the quality and quantity of their respective measurements, the sites were divided into calibration and validation sites. Essentially, if we have enough data (at least eight years and without too many gaps), the site is used for both calibration and validation purposes (with measurements separated into successive time periods), otherwise the site is used only for validation. The FLUXNET datasets were missing both the long-and shortwave radiation for the two Russian sites, Fyodorkovskoye 10 (RU-Fyo) and Zotino (RU-Zot). These were generated from ERA Interim data. The soil types of all of these sites can mostly be identified as mineral soils with varying sand, clay and peat contents. Fyodorovskoye and Poker Flat (US-Prr) are natural peatlands and Lettosuo (FI-Let) is a drained peatland site.
The measurement error in the EC flux data can be separated into systematic and random errors. The main systematic errors (density fluctuations, high-frequency losses, calibration issues) are taken into account as part of the post-processing of the data, 15 and the random errors tend to dominate the uncertainty of the instantaneous fluxes. The random error is often assumed Gaussian but can be more accurately approximated by a symmetric exponential distribution (Richardson et al., 2006). It increases linearly with the magnitude of the flux, with a standard deviation typically less than 20% of the flux (Richardson et al., 2008;Rannik et al., 2016). Our treatment of the measurement (and model) error is explained in section 2.9. Table 1. Descriptions for the sites used in this study sorted by their FLUXNET identifier. The first six sites are used for both calibration and validation purposes, and the following four for validation only. The reported elevation is in meters above sea level, LAI is the one-sided leaf area index and the average stand age is in years, along with average annual precipitation (P) in mm and temperature (T) in degrees Celsius.  (Tesfa et al., 2014;Singh et al., 2015).
We focus only on the most essential parts of JSBACH relating to our work. A more complete model description with details 10 on e.g. soil heat transfer, water balance and coupling to the atmosphere can be found in Roeckner et al. (2003), whereas Raddatz et al. (2007) provides a more descriptive synopsis on land-surface interactions, Reick et al. (2013) supplements both with land cover change processes and Hagemann and Stacke (2015) introduces soil hydrological mechanisms within a multilayer scheme applying five layers.
In JSBACH the land surface is a fractional structure where the land grid-cells are divided into tiles representing the most 15 prevalent vegetation classes called plant functional types (PFTs) within each grid cell . The grid cell is first divided into bare soil and vegetative area which is furthermore fractionally divided into PFTs. In our site level simulations, the model was set to use only one PFT, coniferous evergreen trees. The seasonal development of leaf area index (LAI) for the trees is regulated by air temperature and soil moisture with a single limiting value (for all sites) for the maximum of LAI. This maximum value was fixed and the site-specific fractions of vegetative area were adjusted to reproduce the measured site level 20 LAI.
The predictions of phenology are produced by the Logistic Growth Phenology (LoGro-P) model of JSBACH (Böttcher et al., 2016). Photosynthesis is described by the biochemical photosynthesis model (Farquhar et al., 1980). Following Kattge et al. (2009) we set the maximum electron transport rate (J max ) at 25 degrees Celsius to 1.9 times the maximum carboxylation rate (V C,max ), which is in line with e.g. Leuning (2002); Ueyama et al. (2016). The photosynthetic rate is dependent on the used 25 stomatal conductance formulation, introduced in chapter 2.3. Radiation absorption is estimated by a two stream approximation within a three-layer canopy (Sellers, 1985). Especially in the sparse canopies, the radiation absorption is affected by clumping of the leaves which is here taken into account according to the formulation by Knorr (1997).
Parameters detailing site-specific soil properties, such as soil porosity and field capacity, were derived from FLUXNET datasets and the references in Table 1. We approximated the soil compositions and generated these properties following Hage-30 mann and Stacke (2015).

Modifications to the JSBACH model
All parameters of interest, presented in Table 2, were extracted from the JSBACH model code to an external file. The default parameter values are those used in the model and, in the case where we have introduced the parameters to the model, they are a synthesis of literature values. Most of the parameter ranges were adapted from our previous work on similar topic (Mäkelä et al., 2016), but they were also evenly sampled and adjusted keeping the setup of our simulations in mind. The parameter grouping 5 was done to enhance optimisation and the mechanism is explained in chapter 2.6. Group I consists of parameters most directly affecting photosynthesis, group II parameters are intimately involved with soil moisture and group III are the logistic growth phenology (LoGro-P) model parameters. The equations governed by these parameters are presented in Appendix A. Start of the growing season in the JSBACH model is defined by a "spring event", in the LoGro phenology model (appendix A3), that induces leaf growth. The phenology model calculates a sum of ambient temperature (heatsum) since last autumn that is above the cutoff value T alt , presented in Eq. (A10). It also calculates a variable threshold, defined in (A12), for the heatsum sum to reach. The threshold decreases based on the number of days the ambient temperature is below T alt , whereas the heatsum increases. When the heatsum reaches the threshold, the plant leaves are free to grow.

5
However, coniferous evergreen trees do not shed all of their leaves for winter and are therefore in readiness for the following spring. The start of the photosynthetically active season in the model has been observed to occur too early in the Boreal region by e.g Böttcher et al. (2016). To correct this behaviour i.e. to restrain the respiration and photosynthesis of conifers, we utilise a delayed effect of temperature for photosynthetic activity, introduced by Mäkelä et al. (2004). To calculate the reduction, we must first define the state of photosynthetic acclimation that Mäkelä et al. (2004, p.371) present as: "an aggregated measure of 10 the state of those physiological processes of the leaves that determine the current photosynthetic capacity at any moment".
The state of acclimation (S) is calculated from air temperature (T ) with a delay prescribed by parameter τ (this is similar to the calculation of T S in appendix A14). S is then inserted into sigmoidal relation Eq. (1) to calculate a factor γ, a formulation that is adapted here from Kolari et al. (2007). Finally γ is used to reduce the photosynthetic efficiency in Eq. (A1). T 1/2 denotes the inflection point where γ reaches half of γ max , b is the curvature of the function and γ = 1 when S ≥ 10.
The JSBACH model was also modified to include altogether six different stomatal conductance formulations following Knauer et al. (2015). These formulations include the pre-existing Baseline and Bethy versions as well as the Ball-Berry model and three of its variants. Model information is gathered in Table 3 for easy referencing and the detailed formulations are given in appendix B. The limits of the slope of the conductance formulation parameter (g 1 ) were set to reflect commonly observed 20 values from physiological measurements (Egea et al., 2011). Likely due to human error, the bounds for g U SO 1 were not adjusted accordingly. All of the stomatal conductance models contain an empirical water stress factor β, which reduces stomatal conductance as a function of volumetric soil water content (θ).
In the original JSBACH formulation, the Baseline version, the stomatal conductance (g s ) is first resolved for unstressed canopy and then scaled by β. The Bethy approach assumes that transpiration is either limited by atmospheric demand or water 5 supply. In cases when water supply is not the limiting factor, the calculations are similar to the Baseline version. In all of the empirical Ball-Berry variants, the stomatal conductance can be written as g s = g 0 +cβg 1 . The residual conductance (g 0 ) and the slope of the function (g 1 ) are both formulation specific parameters as well as factor c, that incorporates net photosynthesis and effects of atmospheric humidity and CO 2 concentration. The parameters g 0 and g 1 are part of our sampling and optimisation processes (group I in Table 2 when applicable).

Parameter estimation
The usual problem in parameter estimation is to infer a parameter distribution or a (optimal) set of parameter values x ∈ X ⊂ R n (such as the maximum a posteriori or MAP estimate) with a given set of observations or measurements y ∈ R d on some model. Generally the parameter probability density function (pdf), with the given measurements, p(x|y) cannot be directly accessed. However, it can be represented as a product of the parameter prior distribution p(x) and the likelihood (or data 15 distribution) p(y|x) of the measurements with the given parameters. Using Bayes' rule on conditional probability we can write the posterior pdf as: In the last step we have substituted to the same notation as in Martino et al. (2015). We describe the modelling setup with the observation equation y = M (z, x) + e. The model M produces estimates of the observations y from the model state z (that 20 includes the driving data) using the current parameter values x. The residuals e are gathered by the likelihood function l(y|x) (formulated in section 2.9), which depicts how we handle the model-data mismatch.
The denominator Z(y) is called the partition function (or model evidence) and it is often disregarded as it does not depend on the drawn sample x and can thus be viewed as a constant. Many Markov chain Monte Carlo (MCMC) methods simply state that p(x|y) ∝ g(x)l(y|x) = π(x) and leave it at that, however Z can be useful in e.g. model selection. On a more general 25 setting (since APIS is able to handle these), the expected value of any measurable function f (x), with respect to the pdf π(x), can be expressed as: In the simulations at hand, the function f is the identity operator, but for completeness sake we will write it out as a function.

Adaptive population importance sampler
The adaptive population importance sampler (APIS) (Martino et al., 2015) is a Monte Carlo (MC) method for efficiently computing integrals involving complicated multidimensional target probability density functions, such as Eq. (4). This method utilises a population of importance samplers (IS) to jointly estimate the target pdf π(x) and the normalising constant Z by a deterministic mixture approach (Veach and Guibas, 1995;Owen and Yi, 2000). Normally during this process only the location 5 parameters of the IS proposals are adapted, but we also adapt the shape parameters. The APIS is able to use different or a mixture of normalised proposals densities, but we utilise truncated Gaussian proposals with diagonal covariance matrices.
A basic IS estimator draws N samples x i , i ∈ {1, ..., N } from a single proposal distribution q(x). The estimator then calculates importance weights (w i ) for each sample and approximates the integral Eq. (4) as: This evaluation is rarely sufficient if the target is even slightly complicated. One classical way of tackling this problem is to join multiple IS estimators together. The simplest approach is to calculate the weights for each of these estimators separately and to normalise the result by the combined sum of all weights. The basic IS estimator, and by extensions the simply joint estimator, is driven by the discrepancy between the target and proposal densities. This feature leaves the estimator susceptible to "bad" proposals.

15
The APIS suppresses the bad proposals by utilising the deterministic mixture approach (Veach and Guibas, 1995;Owen and Yi, 2000) presented in Eq. (6), where each proposal q j , j ∈ {1, ..., M } is evaluated at all the drawn samples and weighed by the amount of samples drawn (N j ) from that proposal. This is equivalent to joining the normalised proposal densities together and evaluating the joint pdf.
In the APIS adaptation, the estimates for I and Z in Eq. (6) are iteratively adjusted and the location and shape parameters of the proposal distributions are updated. In our experiments, this is done at predefined intervals, after drawing N = 50 samples from each of our M = 40 proposals. The location (µ µ µ j ) and shape (C j ) parameters for each proposal are updated using only samples drawn from that proposal and weights calculated by the basic IS estimator: The final configuration of the APIS proposal means (µ µ µ j ) is expected to be around all the modes of the target, the parameter posterior distribution p(x|y). The shape parameter adaptation is not normally part of the APIS procedure, but we have adapted the diagonal elements of the proposal covariance matrix using the self-normalising AMIS estimators by Cornuet et al. (2012).
The new covariance is the weighted mean of the previous covariance and the new estimate.
There is one final note to be made about the sampling process. The first sample of each proposal is always fixed as the 30 proposal mean. This requirement is due to our desire to reduce the needed computational time -we generate the model starting the discrepancy this induces, we slightly scale the importance weights based on the distance of the corresponding sample to the mean of the proposal. This scaling is only used in the adaptation of location and shape parameters.

Parameter optimisation
The APIS algorithm is a rather robust method meant for examining the full target probability distribution and e.g. locating the 5 modes of the target distribution. In order to find the optimal set of parameters (within a reasonably short amount of iterations), we utilise a simple custom stochastic optimiser, with starting points set at the modes of the target. This optimiser is run after the APIS calibration.
Our optimiser is a simple random sampler amplified by the "velocity" of the last jump (the idea is similar to Hamiltonian or Hybrid Monte Carlo by Duane et al. (1987)). We draw a set of samples from a small Gaussian proposal distribution in the 10 vicinity of the current best estimate and calculate the cost function for the samples. Whenever a better point is found (smaller cost function), we jump to that (update the mean of the proposal distribution). The "velocity" of the jump (for us merely distance of change in each parameter) is then added to the new mean (with a maximal limit of one standard deviation in the proposal distribution), but it is reduced and eventually removed if a better sample is not found.
The covariance matrix of the proposal distribution is recalculated at predefined intervals (for all parameters). Additionally the 15 samples are first drawn from the full parameter space, in the next step they are drawn only from group I in Table 2 (the rest are kept at their current optimal values), followed by groups II and III and then back to the full parameter space. When the number of parameters is reduced, we are more likely to find a better set of parameter values. We have kept the parameters mostly affecting the same processes in the same group, but some dependencies may not be apparent and hence it is also important to draw samples from the full parameter space. 20

Simulations
The initial state of the JSBACH model can be generated from predefined values of state variables that usually describe empty initial storage pools or the model can be restarted from a file describing the state of some previous run. Depending on the area of interest, a model spin-up may be required to bring the model into a steady state. In our simulations some of the more slowly changing variables (e.g. soil water content and LAI) need to be equilibrated, so a spin-up of 20 years is used. This can 25 be achieved by running the model over a set of measurements multiple times, each time restarting from the final state of the previous run.
The calibration period consists of the first five years given for the calibration sites in Table 1. The spin-up is achieved by looping over these years multiple times and then saving the state of the model at the end of the run. The actual calibration is started from the beginning of the calibration period, using the previously saved state variables. To reduce any bias this induces, 30 the first year in the calibration run is removed from the cost function calculations.
In APIS simulations the spin-up is generated each time the proposal means are adapted. With the optimisation algorithm, run after the APIS simulations, the spin-up is generated for each sample. The validation run for the six calibration sites continues directly after the calibration run (so the spin-up is created, calibration period run and followed by the validation run when applicable). The spin-ups for the four validation sites in Table 1 are similarly generated and the first year of the period discarded.
During the summer 2006, the Hyytiälä (FI-Hyy) measurement site suffered from a severe drought (Gao et al., 2017). These events are difficult for models to capture, hence we have also calibrated the model parameters using Hyytiälä measurements from 2006. The spin-up was the same as for the calibration period, but at the end of the spin-up the model was run forward to

Simulation analysis
We have utilised a Gaussian kernel density estimation (KDE) to produce distributions from generated samples. In practice, KDE places a Gaussian distribution centred at each sample and the constructed composite distribution is an estimate of the 10 underlying actual distribution. The bandwidth for the distributions is calculated using the Scott's rule (Scott, 2004): the data covariance matrix is multiplied by a factor n −1 d+4 , where n is the number of data points and d is the number of dimensions. The effectiveness of each parameter was calculated from the final state of each optimisation process. This was done by first setting all parameters to their optimised values. Then we (evenly) sampled each parameter separately from their range of acceptable values and calculated the corresponding cost functions. For each parameter the maximum difference in these cost 15 function values (and the optimised value) was recorded. The parameters (within each optimisation) were then ordered by these numbers (with highest difference meaning highest effectiveness) and separated into three groups with "high", "average" or "low" effectiveness value.
We report the slope of the regression line (b) and the coefficient of determination (r 2 ), between the observations (y i ) and the model output (x i ). The slope of the regression line is highly indicative of the model bias (difference of the expected values of 20 the observations and the model). Hence we interpret the bias directly from b (in our results the regression lines pass near origin so the differences this induces are negligible).

Cost function
The cost function (9) in our simulations is based on the mean squared error (MSE) estimates of gross primary production (GPP) 25 and evapotranspiration (ET). The residual of each variable is divided by the mean of observations, as has been previously done successfully by e.g. Mäkelä et al. (2016); Knauer et al. (2015); Groenendijk et al. (2010); Trudinger et al. (2007). We make use of this approach since we needed to balance two series of different magnitudes (ET and GPP).
Optimally we would scale the residuals by a covariance vector, combining the model and observational errors, but unfortunately this is not possible. The model error is unknown as is the (pointwise) observation error. We could use a general type 30 of error estimate (such as that of 20% of the flux value) for the observations, but would have to include a minimal site and instrumentation dependent precision. In this study, the full error is treated as Gaussian white noise. The cost function is calcu- lated for each site separately, with the gap-filled and low-quality points removed (leaving N ET and N GP P points), and then averaged over the sites.
We also use a modified version of this cost function, where the normalised mean squared error estimates (NMSE) are weighted by factors based on coefficients of determination (r 2 ) defined in Eq. (8). This latter cost function is only used during 5 the separate drought period optimisation for Hyytiälä.
3 Results and discussion We will begin this section by examining the performance of the APIS algorithm and the parameters themselves, followed by stomatal conductance model specific site level results and lastly an examination of the Hyytiälä drought event in 2006. 10 For convenience sake, and to simplify sentences, we will equate the name of the stomatal conductance model to that of the JSBACH model utilising that conductance formulation.
The evolution of the APIS algorithmic process is presented in Fig. 1 for three parameters from the calibration of the Bethy model. The parameters were chosen to highlight different levels of identifiability for the algorithm (with the given cost function). The first parameter (f C3 ) is an example of a well identifiable situation, where the algorithm quickly locates the area of 15 high probability. The second parameter (θ dr ) is also identifiable but the speed of convergence is diminished. The last example (C decay ) represents situations when the algorithm is unable to constrain the parameter. This can happen when the signal from a parameter (the effect the parameter would have on the values of the cost function) is too weak (when compared to the signals from other sources) for the algorithm to capture. We have included parameter posterior distributions (at 20 iterations) as supplement (S1). 20 We used 40 proposal distributions and drew 50 samples from each, accumulating to 2 000 samples from the 18-21 (depending on the stomatal conductance model) dimensional parameter space at a time. After 20 such iterations (40 000 samples) the algorithm produced reasonably stable distributions for most parameters. The APIS sampling process did not reveal any multimodal distributions and thus provided suitable initial conditions for the optimisation. The results of the optimisation process are gathered in Table 4. Some of the parameters have converged near their limiting values, which can reflect deficiencies in 25 the model structure or the preset parameter ranges. Convergence to a liming value can also be a problem in model calibration, but in this expirement the algorithms were able to cope with the situation as APIS located the area of high probability and the optimiser located the maxima. The different parameter effectiveness levels reported in Table 4 can be roughly equated to the identifiability situations in Fig. 1. The effectiveness levels are highly situational (e.g. they depend on the sampling limits set for each parameter) and merely reflect the parameter identifiability in the APIS process.

Optimised parameters
A curious reader might have noticed that the last two parameters in Table 4 have not been introduced. These are empirical pa-20 rameters explicit for the Friend and Kiang (Friend and Kiang, 2005) stomatal conductance formulation in B3. We calibrated the model without these parameters and the resulting cost function (9) values were poor when compared to the other formulations.
At that point, these parameters were included in the optimisation process. This increases the degrees of freedom for the Friend and Kiang model by two and therefore may give it an advantage when compared to the other Ball-Berry type formulations, which has to be considered in the interpretation of the results. 25 Before we delve into Table 4, we would like to draw the reader's attention to the exponential scaling factor q in Eq. (A1) of the β-function (2) for reducing photosynthesis. This is an experimental parameter that has been revealed to be ineffective in our optimisation. However, ineffectiveness does not equal irrelevance -in the case of q, the situation arises as the effective range of the β-function has been reduced by lowering θ tsp . The actual soil moisture in our simulations is rarely below this fraction, so q is optimised with a very limited number of datapoints and therefore the values presented for it in Table 4 can be 30 are unreliable and even unrealistic.
The LoGro phenology parameters, that affect the timing of the spring and autumn events, are expected to contribute only little to the cost function. The coniferous evergeen trees do not shed all their leaves for winter and therefore the timing of  Table 6 g1 Table 3 --9.9 8.8 10.9 1.6 Table 6 a 2.8 - bilities of the soil (in the model) have been systematically overestimated. We will focus more thoroughly on these and the parameters, that were part of the drought period optimisation, in section 3.3.
Some of the parameters in this study have been calibrated before by e.g. Kattge et al. (2009); Knorr et al. (2010). Our approach differs from these as we required the model to reproduce the site level maximum of LAI by fixing the limiting value and adjusting the site-specific fractions of bare soil and vegetative area. In contrast e.g. Knorr et al. (2010) found the structural 5 limit for (all-sided) LAI to be 4.2, which is considerably lower than the measured LAI for many of the sites in Table 1. Our approach directly scales the vegetative area, so it also scales GPP and e.g. the amount of rain available for plants (as rain is  came to a similar conclusion when they encounter a near-flat distribution for τ in the range of 1-12 days. In our simulations τ exhibits slightly larger optimal values, which is most likely due to the model adapting to the multi-site calibration (as sites 15 have different characteristics, a longer acclimation period accounts these variation better).

Site level results
We present the average annual cycles for the validation period and for all sites in Fig. 2 Table   5. These indicators have been calculated using all corresponding values regardless of quality. The sites are in the same order as in Table 1 with the six calibration sites first, followed by the four sites used only for validation.
On a first glance, the optimisation has improved the model results in Fig. 2 for all of the calibration sites and at least for There are two further explanations for the less-than-perfect predictive skill seen. The first is our choice of adjusting the sitespecific fractions of vegetative and bare soil areas to reproduce the measured site level maximum of LAI. This can affect the model behaviour especially for sites with very low (Zotino) or high (Lettosuo (FI-Let)) site level LAI as the adjustment linearly changes the proportions between vegetative area and bare soil. The second explanation is that there are also many parameters describing site-specific soil properties (such as porosity) that were not part of the optimisation and may be inaccurate. The 5 latter may also be pronounced due to the changes in parameters affecting soil moisture.
There were no clear differences between sites dominated by pine or spruce. Neither did we notice any particular effect on the bias, MSE or correlation coefficient that could be explained by geographical location, stand age or annual precipitation or temperature. We optimised the model for individual (calibration) sites as well. Mostly this changed the values of parameters (such as V C,max and g 1 ) affecting the amplitude of the modelled fluxes. These parameters can be viewed to be more site-10 specific, a characteristics that is reduced in a multi-site calibration -the possibility of highly site-specific properties (and parameter values) can also explain the difficulties in reproducing the validation site observations. We are omitting these results as single-site optimisation can be viewed as overfitting the model and the results do not provide any additional insights.
We can compare the b and r 2 values with the various stomatal conductance models, reported in Table 5, but the overall differences are small. The r 2 values are reasonably good for both ET and GPP, even for the low LAI sites RU-Zot and US-Prr.

15
The prominent bias in both variables for the low LAI sites is apparent here as well.
Similarly to the results by Knauer et al. (2015), there is no clear candidate for the best stomatal conductance formulation.
The different metrics (b, r 2 ) and variables (ET, GPP) indicate better performance for some model versions, but these seem to be variable and metric specific. The best values tend to cluster to Bethy (13), F&K (12) and BB (12) formulations. However the results are not conclusive and for example we could have included the factor D 0 (that depicts stomatal sensitivity to changes 20 in vapour pressure deficit D s ) in the optimisation, which would have likely improved the performance of the Leuning model.

Drought event in Hyytiälä
We have previously and unsuccessfully attempted to optimise the model (Mäkelä et al., 2016) for the Hyytiälä (FI-Hyy) exceptionally dry period (Gao et al., 2017). The drought was severe, leading e.g. to visible discolouration of needles. We focus now directly on the extended dry period (190-260th day of the year in 2006), during which the actual drought is mostly in 25 effect between 215-240th DOY. The fixed parameter values during this optimisation are presented in Table 4. The maximum carboxylation rate V C,max was set to a value that enabled the full use of the dynamical range of q and it was an approximate mean of the optimised values. The LoGro phenology parameters and τ were fixed to their optimised values as they should not be affected during the drought. The other set values were compromises between the stomatal conductance formulations but these parameters should not take different values during the dry period. We note here that these decision were made from the 30 modelling perspective and knowing the limitations of the models. It could be argued that i.e. the parameters V C,max and g 1 should fluctuate (Egea et al., 2011) during the drought but this does not fit in the model paradigm.
The other major change in the dry period optimisation was the use of the modified cost function (10). We have reduced the number of free parameters and essentially set the amplitude of GPP by fixing the parameter V C,max . Hence we are more interested in the correct timing of the change in GPP and ET fluxes, rather than the size of the actual change. The aim is to correctly reproduce the changes in the water use efficiency (WUE) of plants, which we interpret here as the pointwise ratio of (ecosystem level) GPP to ET. The normalised MSE values in cost function (10) ensure that the overall amplitude of the fluxes will remain satisfactory. The resulting optimised parameter values are presented in Table 6. We can now compare the parameter values in Table 6 to those in Table 4. The values of relative humidity parameter θ hum and 5 the residual stomatal conductance g 0 have remained nearly unchanged, but for the rest of the parameter we see wildly different results. Noticeably the USO optimisation only changes the value of θ tsp and leaves the rest of the parameters almost untouched. This is probably partly due to our mistake of not adjusting the limits for g U SO 1 properly before any model calibration. In this instance, the mistake works in our favour and we argue this because of two reasons. The drought period is very short and any optimisation, with such limited data, can result in overfitting the model. Additionally, when the parameter values change 10 radically under some special conditions, the fault can also be in the modelled processes. In the USO optimisation, these pitfalls have been avoided as only one of the parameter values has been changed (and the parameter in question merely shifts the upper bound for the β-function).
The most noteworthy is the behaviour of the slope of the stomatal conductance (g 1 ), where the previously high values (and close to the upper bound) have been radically lowered. For two of the Ball-Berry variants (we disregard USO) it has reached the 15 lower bound. However, we would argue that these g 1 parameters are close to their optimal values in this setting. The parameter g BB 1 has reached its optimal value and as the models in question are very similar and exhibit comparable results in Fig. 3, the optimal values are close. Overall, the decrease in the values of g 1 during the drought is expected as the plants close their stomata to minimise the loss of water by transpiration (Egea et al., 2011;Zhou et al., 2013).
The higher values of g 1 during the more general optimisation are better reflected by Franks et al. (2018), whereas the lower The site level estimates of (g 0 and) g 1 are sensitive to the stomatal conductance formulation but also e.g. to the general structure of the underlying model and the value of other parameters, such as maximum carboxylation rate (V C,max ). Wang ≈ 5 for Sodankylä while estimating the variation in the values of V C,max and maximum rate of electron transport J max . We would suggest that the limiting values θ pwp and θ tsp should be optimised or fixed before introducing additional tuning factors e.g. to mesophyll conductance or scaling the β in the stomatal conductance formulation (Egea et al., 2011).
The water stress in JSBACH is modelled via the soil moisture stress function (β). When the slope (g 1 ) of the stomatal 10 conductance function (g s = g 0 + βg 1 ) converges to the lower limit, the only viable option to further reduce carbon assimilation is to increase the β-function limits. The upper limit (θ tsp ) of this function is increased from the previously optimised values but is still far lower than the default value of 0.75 and the resulting values are fairly congruent (disregardin the Baseline formulation). The lower limit of the β-function (θ pwp ) exceeds the optimised value for all of the parametrisations and the default value of 0.35 for the Baseline and Bethy formulations. The differences in the optimised values of θ pwp can occur due 15 to a larger impact, of the different stomatal conductance formulations to the accumulating soil water content, than assumedthis can be seen from the differences in the β-function values in Fig. 3. Additionally we could be overfitting the model, but the underlying model structure might also be lacking in e.g. by missing processes. Likely, all of these factor play a role.
The changes to modelled fluxes are visualised in Fig. 3. The Baseline, Bethy and USO formulations demonstrate a considerable increase in the agreement in GPP between the model and observations when compared to the default setting or the 20 previous more general optimisation. The GPP of other formulations has remained roughly the same as with the more generally optimised parameter values. The Baseline, Ball-Berry, Leuning, and to a lesser degree the Friend and Kiang formulations, now suffer from the too low ET values before the actual drought. The Bethy model has a too strong drawdown of both ET and GPP during the drought. The USO formulation results in the best fits for r 2 and b with the dry period optimisation and it makes full use of the dynamical range of the β-function. Overall, the dry period optimisation was succesful for the USO model and to a 25 lesser for the Bethy formulation as well -the results for the other variants are mixed and inconclusive.
We selected two of the stomatal conductance formulations, Bethy and USO, to examine the changes to the WUE of plants during the extended dry period. The highlighted observations in Fig. 4 show a clear path of development for the drought. Both observational colourings are similar (as are the β-functions) and depict first a linear decrease in both ET and GPP, followed by a rapid decline in ET and a delayed decline in GPP. The recovery of plants from the drought can also be seen the colouring 30 starts to turn lighter. The models depict a more linear response of GPP to ET as the drought develops, although with USO we can see a bit more similarities in the pattern of the values. The β-function, when optimised to the dry period, seems to be a reasonably good indicator of drought, when compared to the indicators presented by Gao et al. (2017).
Lastly, we inspected the ET and GPP cycles (not shown) for the whole validation period with both optimised parameter sets, all stomatal conductance formulations and all calibration sites. The b and r 2 values for ET were better for all stomatal conduc- Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-313 Manuscript under review for journal Geosci. Model Dev. tance formulations (except one) using the more generally optimised parameter set. There is some variation in the indicators for GPP, where approximately a third of the values (mostly r 2 ) are better with the dry period parameter set. These differences are mostly attributed to increased model bias (decreased b) that is explained by the lower values of g 1 . Overall, the more general optimisation provided systematically better or comparable results to the dry period optimisation. The exception is the USO formulation, which had an approximately 1:1 distribution of best values for both variables in-between the parameter sets.

4 Conclusions
The adaptive population importance sampler (APIS) is a recent method, capable of estimating complicated multidimensional probability distributions using a population of different proposal densities. The algorithm was able to produced reasonably stable estimates for most parameters quickly. Prior to calibrating the model, we adjusted the site-specific vegetative area fractions to reproduce the measured site level maximum of LAI. This practical approach was successful, although we encountered 10 difficulties in replicating ET and GPP fluxes for sites with low LAI. The model parameters were optimised simultaneously for all sites without any additional site level tuning. The parameters mostly affecting the optimisation processes were consistent for all stomatal conductance formulations.
The introduction of the S-function, to delay the start of the vegetation active season, has corrected the springtime increase in GPP for conifers throughout the sites used in this study. The parameters θ tsp and θ pwp , that set the range for the soil mois-  either by the carboxylation rate (V C ), induced by the Rubisco enzyme, or the light-limited assimilation rate (J E ). The total rate of carbon fixation is reduced by the amount of dark respiration (R d ), resulting in net assimilation rate (A n ). The experimental scaling factor β q (Egea et al., 2011) is based on soil moisture stress in Eq.
(2), that takes effect (β < 1) when soil moisture is significantly reduced. This scaling is used by all stomatal conductance formulations. We have also introduced here in equation form the actual reduction to photosynthesis by γ from the delay in the start of the vegetation active season in Eq. (1).
Oxygenation of the Rubisco molecule reduces the carboxylation rate, which is given as: Here C i and O i are the leaf internal CO 2 and O 2 concentrations, Γ is the photorespiratory CO 2 compensation point, K C and K O are Michaelis-Menten constants parametrizing the dependence on CO 2 and O 2 concentrations. Furthermore, leaf internal 15 CO 2 concentration depends on the external (ambient) concentration C a (in the Baseline and Bethy formulations and unstressed conditions) by: Likewise, the light-limited assimilation rate can be expressed as a function on electron transport rate (J), which is a function of radiation intensity (I) in the photosynthetically active band, the maximum electron transport rate (J max ) and the quantum 20 efficiency for photon capture (α):

A2 Soil water
In JSBACH the soil water budget is based on several reservoirs (e.g. skin, soil, bare soil, rain intercepted by canopy etc.) and the different formulations are plentiful. We present here only the most crucial of these. Changes in soil water (θ s , not to be 25 confused with volumetric soil water content θ = θs θ f c ) due to rainfall (R), evapotranspiration (ET ), snow melt (M ), surface runoff (R s ) and drainage (D) are calculated with a geographically varying maximum field capacity (θ f c ). The interception parameter (p int ) also affects the amount of water intercepted by vegetation and bare soil which further affects evaporation and transpiration. The skin reservoir is limited by w skin and excess water is transferred to soil water.
Likewise when the soil water content (θ) is greater than parameter θ dr , the excess water is rapidly drained (in addition to the limited drainage below this threshold), where d, d min and d max are constant parameters: Evaporation from wet surfaces (E ws ) depends on air density (ρ), specific humidity (q a ), saturation specific humidity (q s ) at surface temperature (T s ) and pressure (p s ) and aerodynamic resistance (R a ). The aerodynamic resistance depends on heat transfer coefficient (C h ) and horizontal velocity (v h ).
Transpiration from vegetation (T v ) is likewise formulated but additionally depends on the stomatal resistance of the canopy 10 (R c ), which is an inverse of the stomatal conductance and as such, depends on which conductance model is used.
Evaporation from dry bare soil (E s ) also has an added dependence on surface relative humidity (h s ) calculated from soil dryness: The total evapotranspiration is a weighted average of E ws , T v and E s , where the weights are based on fill levels of reservoirs and the vegetative fraction of the grid cell.

A3 Logistic Growth Phenology (LoGro-P) model
The parameters from the LoGro-P are mainly used to determine the spring and autumn events for JSBACH. To determine the date of the spring event we first introduce a few additional variables, namely the heatsum S T (d), the number of chill days C(d) 20 and the critical heatsum S crit (d). T (d) denotes the mean temperature at day d.
Heatsum S T (d) cumulates the amount of "heat" above the parameter T alt after the previous growing season. The actual starting date d 0 of the summation need not be known since it is enough to start the summation "reasonably late" after the last growth season. The number of chill days is calculated as the number of days when the mean temperature is below T alt . Here H() denotes the Heaviside step function and the summation starts at the day (d a ) of the last autumn event.
S crit (d) = S min + S range e −C(d)/C decay (A12) The critical heatsum (S crit ) decreases as the number of chill days C(d) increases, with an exponential memory loss parameter C decay . The spring event happens when: The autumn event requires the definition of one more variable, the (pseudo) soil temperature (T s (t)), which at time t is calculated as an average air temperature (T ) with an exponential memory loss (T ps ). The autumn event occurs when T s falls below a certain threshold. In the equation N is the normalization constant and τ is the length of a time step.

Appendix B: Stomatal conductance formulations
In this appendix we present the stomatal conductance model formulations used in this study. In the original JSBACH formulation, the Baseline model (Knorr, 1997), the photosynthetic rate is resolved in two steps. First the stomatal conductance under conditions with no water stress is assumed to be controlled by photosynthetic activity (Schulze et al., 1994). Here the leaf internal CO 2 concentration is assumed to be a constant fraction (C i,pot = f C3 C a ) of ambient CO 2 concentration (C a ). This 15 allows for an explicit resolution of the photosynthesis (Knorr, 1997). Then the impact of soil water availability is accounted for by a soil moisture-dependent multiplier (β) that is identical for each canopy layer (Knorr, 1997).
After accounting for soil water stress, the net assimilation rate (A n ) and intercellular CO 2 concentration are (C i ) are recalculated using g s , and integrated over the leaf area index to produce canopy level estimates. 20 In the Bethy approach (Knorr, 2000), the unstressed canopy conductance (G c,pot ) is calculated similarly to the Baseline model, but potentially further limited by the water supply function of the maximum transpiration rate (T supply = βT max ).
T max is a fixed and predefined upper limit for transpiration as in Knauer et al. (2015).
The potential (unstressed) transpiration rate (T pot ) is a function of air density (ρ), saturation specific humidity (q s ) at given 25 temperature and pressure, specific humidity (q a ), aerodynamic conductance (G a ) and unstressed canopy conductance (G c,pot ).
After this scaling, the net assimilation rate and intercellular CO 2 concentration are recalculated as in the Baseline model. The Ball-Berry variants relate the stomatal conductance (g s ) to empirically fitted parameters g 0 (mol m −2 s −1 ) and g 1 (unitless, except for g U SO 1 which has units of √ kPa) that respectively represent the residual stomatal conductance and the slope of the function. The stomatal conductance is a function of the net assimilation rate (A n ), the water stress factor (β) and the atmospheric CO 2 concentration (C a ). The original Ball-Berry formulation (Ball et al., 1987) also depends on relative humidity at leaf surface (h s ). In the Leuning model (Leuning, 1995), the CO 2 concentration is reduced by the CO 2 compensation point 5 (Γ) as well as scaled by the vapour pressure deficit (D s ) and a constant (D 0 ) depicting the stomatal sensitivity to changes in D s . The Friend and Kiang model (Friend and Kiang, 2005) adds an exponential dependency on the difference of specific (q) and saturation specific humidity (q sat ) with empirically fitted constants a = 2.8 and b = 80. The unified stomatal optimisation model (Medlyn et al., 2011) also adds a dependency to the vapour pressure deficit (D s ).