Limiting the parameter space in the Carbon Cycle Data Assimilation System ( CCDAS )

Terrestrial ecosystem models are employed to calculate the sources and sinks of carbon dioxide between land and atmosphere. These models may be heavily parameterised. Where reliable estimates are unavailable for a parameter, it remains highly uncertain; uncertainty of parameters can substantially contribute to overall model output uncertainty. This paper builds on the work of the terrestrial Carbon Cycle Data Assimilation System (CCDAS), which, here, assimilates atmospheric CO 2 concentrations to optimise 19 parameters of the underlying terrestrial ecosystem model (Biosphere Energy Transfer and Hydrology scheme, BETHY). Previous experiments have shown that the identified minimum may contain non-physical parameter values. One way to combat this problem is to use constrained optimisation and so avoid the optimiser searching non-physical regions. Another technique is to use penalty terms in the cost function, which are added when the optimisation searches outside of a specified region. The use of parameter transformations is a further method of avoiding this problem, where the optimisation is carried out in a transformed parameter space, thus ensuring that the optimal parameters at the minimum are in the physical domain. We compare these different methods of achieving meaningful parameter values, finding that the parameter transformation method shows consistent results and that the other two provide no useful results.


Introduction
The response of the global carbon cycle to future changes in climate is highly uncertain.It has been proposed that there is a positive climate-carbon cycle feedback that might significantly accelerate climate change; the study of Friedlingstein et al. (2006) used 11 Earth System models with an interactive carbon cycle and two simulations with each model, to isolate the feedback between climate change and the carbon cycle.All of the models showed that future climate change would reduce the efficiency of the Earth system and in particular the land biosphere to absorb the anthropogenic carbon perturbation, with an additional CO 2 of between 20 and 200 ppm between the two most extreme models by 2100.Friedlingstein et al. (2006) estimated that this rise in CO 2 would lead to a further warming of 0.1 to 1.5 • C.
The sources and sinks of carbon dioxide between land and atmosphere can be calculated using terrestrial ecosystem models (TEMs).State of the art TEMs, such as the Biosphere Energy Transfer and Hydrology (BETHY) scheme (Knorr, 2000), encapsulate large numbers of biogeochemical processes and hence involve a large number of parameters.Results from TEMs can diverge markedly, indicating limited understanding and representation of the processes involved.The study of Sitch et al. (2008) used five dynamic global vegetation models (DGVMs) to model the contemporary terrestrial carbon cycling.They coupled the DGVMs to a fast "climate analogue model" based on the Hadley Centre General Circulation Model, and ran the coupled models to the year 2100 using four Special Report Emissions Scenarios.The most extreme projections differed by up to 494 PgC of cumulative land uptake across the DGVMs over the 21st century (over 50 years of anthropogenic emissions at current levels; Sitch et al., 2008), although they remained consistent with the contemporary global land carbon budget.Furthermore, Huntingford et al. (2013) explored uncertainties of potential future carbon loss from tropical forests.They found that the DGVM response uncertainty dominated over variation between emission scenarios and climate models.
There are various sources of uncertainty within the model -for example structural uncertainty, which depends on the formulation of individual processes and their numerical representation.Another source of uncertainty is parametric uncertainty, which results from the uncertainty of the process parameter values used in the model's parameterisation, either due to a lack of knowledge or to upscaling to larger spatial domains.Model parameter values are commonly based on "expert knowledge".Where little information is known, this can be just an educated guess.If estimates are unavailable for a parameter, it remains highly uncertain.Uncertainty of parameters can substantially contribute to overall model output uncertainty.In this case, parameter estimation to constrain the model against observations can be very useful.
Many parameter estimation methods, such as gradientbased, Kalman filter, Monte Carlo inversion, Levenberg-Marquardt and genetic algorithm, use the Bayesian approach (Tarantola, 1987(Tarantola, , 2005)), which combines probability density functions (PDFs) of observational information, prior information and the model dynamics.Four-dimensional variational (4D-Var) schemes use the gradient of the model for the optimisation of parameters; this is usually provided by the adjoint.These approaches are generally computationally efficient but unlike some other variational data assimilation methods, for example the Markov chain Monte Carlo method, it is possible to identify only a local minimum.Another weakness of 4D-Var schemes is that they concentrate only on the optimal solution without considering uncertainties.However, there are some 4D-Var schemes, such as the one used in the Carbon Cycle Data Assimilation System (CCDAS) (Rayner et al., 2005), which are able to approximate posterior parameter uncertainties using the inverse of the second-order derivative of the cost function with respect to the parameters (Hessian) at the global minimum.
Generally, Gaussian distributions are assumed for the prior probability distributions of the parameters.This is not always a good assumption as sometimes parameters are restricted to certain values; many are positive, for example, and some are restricted between two values, such as a fraction between 0 and 1.Another example is the terrestrial carbon parameter Q 10 , which regulates the response of the decomposition rate of organic material to changes in temperature and is known to be greater than 1 ("A rule of thumb widely accepted in the biological research community is that. . . the Q 10 of decomposition is two" Davidson and Janssens, 2006).Where parameters are limited to certain values, optimal solutions can contain non-physical parameter values, as has been seen in Koffi et al. (2012) when using CCDAS (Rayner et al., 2005) without attempting to limit the parameter space.Here, the optimal value of one of the parameters in the photosynthesis scheme was negative, which is unrealistic and would lead to a reversed photosynthesis.Kaminski et al. (2012) used, in addition, quadratic and double-bounded transformations to achieve a limited parameter space.Further, in Trudinger et al. (2007), an optimisation inter-comparison study of parameter estimation methods in terrestrial biogeochemical models, and in Fox et al. (2009), another intercomparison project, the parameter space needed to be limited to avoid non-physical values.
A simple method of avoiding these non-physical values would be to place hard constraints within the search algorithm.Byrd et al. (1995) described a limited memory quasi-Newton algorithm for solving large nonlinear optimisation problems, which can be applied to parameter estimation.
Alternatively, it is possible to modify the cost function formulation by adding a so-called penalty term associated with some of the parameters.The penalty term is zero when the parameter is within its specified limits and increases as the parameter goes further away from these limits.This has been implemented in a study to estimate the turnover time of terrestrial carbon (Barrett, 2002).A genetic algorithm was used to improve consistency between estimated model parameters and data.All of the parameters were limited between two values and a penalty term was added whenever they violated these constraints.
A further option to avoid these non-physical values would be to alter the estimation problem by using a parameter transformation (i.e. a nonlinear change of parameters' PDFs) so that the parameter limits can never be reached.Simon and Bertino (2009) performed a twin experiment with a coupled ocean ecosystem model (HYCOM-NORWECOM) with an ensemble Kalman filter (EnKf), with and without parameter transformations to limit parameters to positive values.The study compared EnKf with parameter transformations and the plain EnKf with post-processing of results, where negative values are increased to zero.These two methods led to similar results.However, the parameter transformations had an advantage in efficiency.Simon and Bertino use the term "Gaussian anamorphosis", however, we will continue to use the term "parameter transformation".
Within CCDAS, a parameter transformation from a Gaussian prior parameter distribution to a log-normal prior parameter distribution is already routinely in use for some selected parameters such as the Q 10 parameters.Koffi et al. (2012) showed that the choice of prior parameter distribution can have a great effect on the parameter's uncertainty and the resulting flux field.In their experiments a log-normal PDF on prior parameters reduced the sensitivity of net CO 2 exchange flux (net ecosystem productivity, NEP) to the observational network as well as the transport model.In the study, the differences in NEP between two configurations are quantified by calculating the root mean square difference (RMSD) over all the grid cells and all months in the study period.After applying the log-normal PDF, the RMSD between the observational networks went from 42 gCm −2 yr −1 to 16 gCm −2 yr −1 .
This paper builds upon the findings of Koffi et al. (2012) and systematically investigates the ability of the abovementioned three different methods to limit the parameter space within CCDAS.
The outline of the paper is as follows.First, we give an overview of the data assimilation system and the model, going on to describe the parameter limiting methods and the experiments (Sect.2).Section 3 describes the results and discussion.Section 4 concludes.

Methodology
CCDAS employs a terrestrial ecosystem model BETHY (Knorr, 2000) and an atmospheric tracer transport model TM2 (Heimann, 1995), along with prescribed CO 2 fluxes constituting land-use change, sea surface-atmosphere exchange flux and fossil fuel emission (Rayner et al., 2005;Scholze et al., 2007) that are not calculated by the BETHY model.The biosphere model parameters are estimated using the variational approach.The configuration of CCDAS has been comprehensively described by Scholze (2003), Rayner et al. (2005) and Ziehn et al. (2011b).Here, we provide a brief summary and an explanation of the points where we differ.

Data assimilation system
There are two steps to the data assimilation in CCDAS as can be seen in Fig. 1.The first uses the full version of BETHY to assimilate space-borne remote sensing data of vegetation activity to optimise the model's phenology and hydrology.The second is a simplified form of BETHY and uses the optimised leaf area index (LAI) and soil moisture fields from the full version as input.
This paper focuses on the soil carbon balance, a simplified part of the second step.This simplification of the model keeps parameters that control net primary productivity (NPP) fixed; previous studies (Rayner et al., 2005;Scholze et al., 2007) have demonstrated that atmospheric CO 2 data constrain these parameters only moderately.The NPP parameters are calculated by an additional forward simulation covering the full 25-year simulation period after the first step.They are then used as input, similar to soil moisture from the first step.
Posterior parameter values are obtained via iterative minimisation of a cost function J (x).The cost function yields the mismatch between the parameter vector x and their priors x 0 and modelled concentrations M(x) and observations c, where each is weighted by the uncertainties C x 0 and C c of the prior and the observations, respectively (Rayner et al., 2005): The formulation of the cost function uses a Bayesian approach (Tarantola, 1987(Tarantola, , 2005) ) and reflects an assumption of Gaussian probability distributions on the observed concentrations and the prior information on the parameters (explained further in Ziehn et al., 2012).Minimisation of the cost function uses the gradient of J with respect to the parameters x at each iteration.Transformation of Algorithms in Fortran (TAF) (Giering and Kaminski, 1998;Kaminski et al., 2003) is used to generate derivative code from the model's source code.
At the minimum in the cost function, the Hessian approximates the inverse covariance of the parameter uncertainties (Tarantola, 1987) and can therefore be used to estimate the posterior uncertainties in the process parameters.Calculation of the Hessian is done by using TAF once more to differentiate the gradient vector in forward mode with respect to the process parameters.Although there is a significant reduction of the cost function within a few tens of iterations, for the Hessian assumptions to hold, many more iterations are required to achieve the near-zero gradient of a cost function minimum.
When using the gradient-based approach, it is possible that only a local minimum is identified.Therefore, an ensemble of optimisations is performed, with each optimisation starting in slightly varied points in parameter space.In this way, if they all converge to the same minimum, we have confidence that we have found a minimum that is more likely to be a global minimum within the physical parameter space.
Using the atmospheric tracer transport model TM2, calculated fluxes from BETHY are mapped onto atmospheric concentrations for comparison with measurements of observations of CO 2 obtained from the GLOBALVIEW database (GLOBALVIEW-CO 2 , 2008).As in previous studies (Rayner et al., 2005), we are using global monthly mean atmospheric CO 2 concentration data from 41 sites, but here we use data from a 25-year period .
As the interest of this study is the natural CO 2 exchange flux between land and atmosphere, the remaining fluxes contributing to the atmospheric CO 2 content are added separately.We use the estimates of Houghton (2008) for the landuse flux, without seasonality or interannual variability, following the procedure of Rayner et al. (2005).The flux pattern and magnitude of ocean CO 2 exchange is taken from Takahashi et al. (1999) and estimations of interannual variability from Le Quéré et al. (2007).Background fluxes for fossil fuel emissions, based on the flux magnitudes from Boden et al. (2009), are described by the method of Scholze et al. (2007).

Terrestrial biosphere model and parameters
BETHY, a process-based model of the terrestrial biosphere (Knorr, 2000), simulates carbon uptake and soil respiration within a full energy and water balance and phenology scheme.The grid resolution of BETHY in this study is 2 • × 2 • with the global vegetation mapped onto 13 plant functional types (PFTs) based on Wilson and Henderson-Sellers (1985).Each grid cell can contain up to three PFTs.The amount of present PFTs within a grid cell is specified by their fractional coverage.
In BETHY, NEP is defined as where R S,s and R S,f are respiration fluxes from the slowly and rapidly decomposing soil carbon pools.Input to the fast pool is parameterised by the annual course of LAI for deciduous PFTs and the constant fraction of the leaf carbon pool for evergreen PFTs.Soil respiration is simulated to be soil moisture and temperature dependent assuming the following functional dependencies: where C f and C s represent sizes of the fast and slow carbon pool, respectively, and f s the fraction of decomposition from the fast pool to the long-lived soil carbon pool.The rate constants are where ω is the dimensionless plant available soil moisture, i.e. divided by the field capacity of the soil in the respective grid cell (a value between 0 and 1), T a air temperature, κ a soil moisture dependence parameter, Q 10,f and Q 10,s are temperature dependence parameters for the fast and slow pool, τ f and τ s the pool turnover times at 25 • C.
A parameter can either be global or differentiated by certain criteria (in this study, PFT).In this simplified version with NPP kept fixed, there are six controlling parameters; five are global and one, the β parameter, is PFT dependent.There is an additional parameter, the offset, representing the carbon dioxide concentration at the beginning of the optimisation, giving 19 process parameters, as can be seen in Table 3.Also shown in Table 3 are the prior uncertainties.As little information is known about some of the parameters, we have chosen to start with larger, realistic uncertainties.The five global parameters are Q 10,f and Q 10,s , the temperature dependence parameters for the fast and slow pool, τ f the fast pool turnover time at 25 • C, f s the fraction of decomposition from the fast pool to the slow decomposing soil carbon pool and κ the soil moisture dependence parameter.The PFT dependent parameter β, described in Eq. ( 7) and in more detail in Ziehn et al. (2012), is the carbon balance parameter and determines whether a PFT is a long-term source (β > 1) or sink (0 < β < 1): NEP = NPP(1 − β). (7) Note that the vertical lines above denote the temporal average value over the full 25-year simulation period at each subgrid cell.The β parameter is strictly positive and, whilst it has no physical upper bound, it should not be unrealistically large; a value of 10, for example, would indicate that locations covered by this PFT have a net flux, NEP, 9 times that of NPP as described in Ziehn et al. (2011a).Therefore, an upper bound of 2 is a reasonable selection and is the value we have chosen when bounding this parameter.
We distinguish between the physical model parameters p i and the parameters as seen by the optimisation routine, the control variables x i .Control variables have variance 1.In this sense, all the parameters are on the same dimensionless scale and so a change of 1 in that scale to the value of each parameter contributes equally to the value of the cost function.Furthermore, the control variables have PDFs assuming a Gaussian distribution, as mentioned above.To obtain the control variables and to achieve the unit uncertainty, physical parameter values are divided by their prior standard deviation:

Limiting the parameter space
It is not always the case that the physical parameters are distributed in a Gaussian way.For example, this gives positive probability of negative values and as mentioned, some model parameters are only physically meaningful with strictly positive values.Three methods of avoiding these non-physical parameter values are examined in this paper.Two of the methods incorporate the bounding directly into the optimisation.The first, constrained optimisation, seeks a solution within the physically meaningful parameter space.The second adds a penalty term to the cost function when the optimiser begins to search the non-physical domain.This encourages it to stay within the physically meaningful parameter space.The final method investigated in this paper, parameter transformations, performs the optimisation in a transformed parameter space, which ensures that, when backtransformed, the minimum is always in the physically meaningful parameter space.In addition to testing these three methods, we go on to investigate the effect of different parameter transformations on the inferred target quantities and their posterior uncertainties.One particular transformation, the log transformation, has already been used in CCDAS and found to have a large impact on the optimised parameter values and also the resulting flux fields as explained above (Koffi et al., 2012).In addition to this log-normal transformation we propose two other transformations: quadratic and double-bounded log.The quadratic and log transformations are used to provide a lower bound on a parameter and the double-bounded log can be used to provide an upper and lower bound on a parameter.We will examine the effect of using these different parameter transformations on parameter values and their uncertainties.
The essential difference between the three approaches is the form of the prior (and thus posterior) PDFs in (physical) parameter space.Both the constrained and the penalty function approaches produce a prior of Gaussian shape inside the bounds/non-penalised region.Outside, the constrained approach produces a zero probability while the penalty approach produces a non-zero probability consisting of a gradual reduction of the Gaussian probability with increasing distance from the bounds.The parameter transformation approach produces a zero probability outside the bounds and a non-zero but non-Gaussian probability within the bounds.

Constrained optimiser
When using the constrained optimisation, the optimiser can only choose from amongst a restricted well-defined set.Minimisation of the cost function is done via a gradient-based algorithm updating an approximation of the Hessian through the L-BFGS-B method (Byrd et al., 1995;Zhu et al., 1997), which limits the control parameter space to the restricted set.This is a variant of the Davidon-Fletcher-Powell (DFP) formula (Fletcher and Powell, 1963;Press et al., 1996).

Penalty term in the cost function
For the penalty term optimisation, we use BFGS but add a penalty term to the cost function when the optimiser begins to search a non-physical region in the form of P r (D r g r δ r µ r ), r = 1, . . ., R = 19 (i.e. the number of parameters), ( 9) where D r is a penalty factor that scales the penalty function, g r is the threshold function, δ r invokes the penalty when the threshold is violated and µ r determines the sensitivity of the penalty function to threshold violation (with even, integer values): where α r is the current value of the rth parameter and α * r is the threshold value, the value beyond which the threshold is violated and the penalty imposed:

Parameter transformations
Depending on the transformation used on the parameter, different equations are used to convert them from the model parameters p i into the control variables x i .The equations give control variables with a variance of 1.Where no transformation is used (i.e. the prior is assumed to have a Gaussian distribution), the parameters are just normalised using Eq. ( 8) as mentioned above.p 0i is the prior value of the ith model parameter and σ p 0i is its prior uncertainty.As further options we have a double-bounded log, a lower-bounded log and a quadratic transformation.
Where a parameter has a lower and upper bound, a and b, the parameter transformation from optimisation space to physical space is given by an equation of the form For the log transformation with only a lower bound of a, this simplifies to an equation of the form The quadratic transformation with lower bound a is computed by a function like this: Minimisation of the cost function is achieved via a gradient-based algorithm updating an approximation of the Hessian through the Broyden-Fletcher-Goldfarb-Shannon formula (Fletcher and Powell, 1963;Press et al., 1996), a quasi-Newton method.

Experiments
We performed a total of five experiments investigating the impact of the three parameter space restriction methods on the results of the optimisation.Table 1 provides an overview of the experiments and how they differ.
Previous experiments with this reduced version of CCDAS using no parameter transformations indicated three β parameters (8, 13, 18) that were either negative or extremely high, so in this paper, unless otherwise stated, they have all been limited between 0 and 2 using a double-bounded log transformation.The rest of the β parameters have been left untransformed (i.e.assumed Gaussian), as they did not require any bounding since their posterior values already lay between 0 and 2.
To explore the effect that parameter transformations have in the model, the Q 10,f parameter's treatment was varied between Gaussian, log and quadratic, whilst keeping all but the three β parameters' (8, 13 and 18) treatments Gaussian.(Experiments PTG, PTL and PTQ.) For the default penalty term optimisation, we only added a penalty term when the β parameter for crops (parameter 18) became negative.In Eq. ( 9), we chose D 18 = 10 4 as the penalty factor, since this is on the same order of magnitude as the cost function minimum from previous experiments, and µ 18 = 4 for the sensitivity value, as it has to be positive but we do not use 2 since we use the second derivative to calculate posterior uncertainties.The other two β parameters (8 and 13) were still transformed using the double-bounded log transformation.(Experiment PEN.)In the default constrained optimisation, the three β parameters (8, 13 and 18) were restricted between 0 and 2 by the hard limits imposed by the constrained optimiser.(Experiment CONS.) For each of the experiments above, four extra optimisations (building together an ensemble of five optimisations) were performed with the default prior parameter values randomly perturbed by up to 10 %.This ensures that if most of the optimisations converge to the same minimum we have found a robust solution.

Results and discussion
We present the results of the different experiments, with a focus on the parameter transformations, as these are the experiments that successfully located a minimum within the physical parameter space.The other two methods were not successful and so are of limited use.We commence with the constrained optimisation (CONS), then briefly discuss the penalty term experiment (PEN) and finish with the results from the parameter transformations (PTG, PTL, PTQ).An overview of the optimisation results is presented in Table 2. CONS: Here, for the default prior parameter values, the optimisation did not converge and reached the preset maximum number of iterations (5000).We did not continue this optimisation as the number of iterations was already about 10 times more than the average number of iterations for the parameter transformation experiments.At this point there had been a significant reduction in the cost function by around a factor of 550 but the bounded parameters (8, 13, 18) were exactly at their bounds of 0 or 2, which at least in the case of 0 for a beta parameter does not make sense.Furthermore, there was not a near-zero gradient.The other four ensemble members terminated after fewer iterations (10-382), without finding a minimum because of internal numerical problems within the optimiser.There was some reduction in the cost function by between a factor of around 20 and 400 but there was still a very large gradient of at least 4000 and all of these four ensembles had negative values for the soil moisture dependence parameter, κ.
Since this method of limiting the parameter space was unsuccessful for this problem, uncertainties have not been calculated.
PEN: All of the five optimisations converged and found a minimum in a mathematical sense (zero or at least close to zero gradient).However, they did not achieve the parameter bounding, as the limited β parameter (parameter 18) was slightly negative (−0.024), contributing a penalty term of 0.8 to the cost function.As the penalty was non-zero, the experiment was not successful in our aim of limiting the parameter.The optimisation is able to offset this small negative penalty contribution by achieving a smaller input to the cost function from the data and the parameters.We performed further experiments adding a penalty to parameter 8 as well, with no successful bounding of these parameters.We also increased the penalty term by a factor of 100 but still the β parameter was slightly negative.Again, uncertainties have not been calculated due to the unsuccessful optimisation of the parameters to physically meaningful values.PTG, PTL, PTQ: Using the parameter transformations we were able to successfully limit the parameter space.In this case, the transformation of Q 10,f did not seem to have an effect on the final value of the cost function.Of the 15 (3 × 5) optimisations, 12 converged to the same minimum in the cost function of J = 9667 (reduced from an initial value of 5 294 051 when using the prior parameter values) and took between 174 and 876 iterations.The other three (one from each of the Gaussian, log and quadratic) converged to a different value of 9515, but were outside of the physical parameter space since another of the β parameters (parameter 9) was negative (−0.057), and are therefore not relevant.Having been reduced from over 10 7 to 10 −3 , the gradient of the minimum in the cost function can be considered to be sufficiently small to indicate that a minimum has been located for all three parameter transformation experiments.We calculate posterior parameter uncertainties and also propagate these uncertainties onto the net carbon flux using a linearisation of the model (Kaminski et al., 2003).
Prior and optimised parameter values for all the parameter transformation experiments are shown in Table 3.Also shown are prior and posterior uncertainties and percentage reduction in uncertainty.The three β parameters that were double bounded show their upper and lower percentiles, equivalent to one standard deviation.The global parameters behave in a consistent way to previous studies (Ziehn et al., 2011a).The temperature dependence parameter of the fast carbon pool, Q 10,f , is somewhat reduced to 1.07 compared to its initial value of 1.5, although this change is within the range of the prior parameter uncertainty.The temperature dependence parameter of the slow pool, Q 10,s is increased from its initial value of 1.5 to 1.82, which again lies within the one sigma range of the parameter's prior uncertainty.The two Q 10 parameters' posterior uncertainties are lowered by more than one order of magnitude, which confirms the result of Scholze et al. (2007) that atmospheric CO 2 data constrain the parameters of soil respiration relatively well.
Table 3. Controlling parameters and their initial and optimised values for each experiment, and parameters' initial and posterior uncertainty (equivalent to one standard deviation) and percentage reduction in uncertainty (relative to the upper standard deviation) after the optimisation for the parameter transformation experiments.For the three β parameters that were transformed using the double-bounded log transformation, upper and lower percentiles, equivalent to one standard deviation, are shown.The fast pool turnover time τ f is also within the prior uncertainty range of one standard deviation, increasing from 1.5 to 3.46, as is the soil moisture dependence parameter, κ, which is reduced from 1 to 0.57.The small posterior uncertainty of this parameter indicates that it is also well constrained by the data.The optimised parameter value of the fraction f s , however, is outside of the prior uncertainty range, increasing from 0.2 to 0.74.It behaves similarly to previous studies (Ziehn et al., 2011a).Again, the posterior uncertainty is very small.Lastly, the offset parameter also behaves in a consistent way to Ziehn et al. (2011a).The posterior uncertainties for all the global parameters are reduced by over 95 % compared to their prior uncertainty.This is due in part to the fact that the global atmospheric CO 2 network strongly observes those parameters that act globally at all subgrid cells and is further explained by the fact that moderately large prior parameter uncertainty values are used.

Parameter
For the PFT-dependent β parameters, the uncertainty reduction varies between 5 and 90 % and so is clearly less than for the global parameters.This is partly due to the β parameter being differentiated by PFT, which means each PFT is less well observed by the atmospheric network.
The cost function reduction of all of the five experiments is shown in Fig. 2 up to the first 400 iterations on a log scale.By 400 iterations, all of the parameter transformations (PTG, PTL, PTQ) and the penalty term optimisation (PEN)  had converged, but the constrained optimisation (CONS) had not.The rest of the constrained optimisation's performance is shown inset in Fig. 2 on a linear scale.The majority of the cost function reduction (around two orders of magnitude) is within the first 30 iterations.After this, convergence is slower but the optimisation continues until a near-zero gradient is achieved, which indicates that we have found a minimum.Only at the minimum can the inverse of the Hessian be used to estimate the posterior parameter uncertainty.
The gradient value for all five optimisations is shown in Fig. 3, again up to 400 iterations, where we can see that the parameter transformations and the penalty term experiments have converged.The constrained optimisation has not been included up to its full 5000 iterations; it continues in much the same way after this and does not achieve a near-zero gradient.The different methods and the minimisations from different starting points converge differently as they are solving different problems.Each change in the formulation of the cost function results in a different optimisation problem.When an optimisation starts at a different point in the parameter space, it follows a different trajectory to find a minimum.
Figure 4 shows a time series of our target output quantity of global mean NEP, along with uncertainties.We calculated the values for all three of the parameter transformation experiments but as they are all within the same numerical limits only the Gaussian case has been shown.The global mean NEP time series and their uncertainties resemble that of Ziehn et al. (2011b).This is because we are using exactly the same set-up with identical forcing and assimilation data.We also calculated NEP using the parameter values obtained from the constrained and the penalty experiments.Each year, NEP from these two cases does not differ much from the parameter transformations (between 2 and 4 %), so we have not added this to Fig. 4. The resulting NEP fields look very similar to each other as do the posterior parameter values, but since each of the constrained and penalty experiments yielded at least one unphysical value, we shall not consider these any further.The effect of these unphysical parameters does not show up in aggregated quantities such as annual global values or even annual grid cell values since the NEP of a grid cell is also the sum of the NEP of the individual PFTs within that grid cell.
We also show the covariance between the flux uncertainties, which, as in Ziehn et al. (2011b), we express using the uncertainty correlation matrix of diagnostics, R d , defined as follows: where C i,j d is element i, j of the error covariance matrix of global net CO 2 exchange flux (NEP) per year and σ i the posterior uncertainty of parameter i obtained from the diagonal element C i,i d of the matrix C d .For mean global NEP for the Gaussian case this uncertainty correlation matrix is shown in Fig. 5.There are a large number of negative correlations, which is the reason for a relatively small overall uncertainty for the global mean NEP over the whole period 1979 to 2003.However, the uncertainty for the global mean NEP for a single year (as shown in Fig. 4) is substantially larger.It is worthwhile to note that, between the different parameter transformations, there is no difference in the uncertainty correlation matrix and thus also in the posterior parameter covariances.
It seems, in general, that a lower cost function value can be achieved when the optimiser is allowed to search the whole space.For example, in one of each of the parameter transformation experiments the cost function at the minimum was 9515 but one of the β parameters (parameter 9) was negative (−0.057).It is possible that the global minimum is within the non-physical space because the model is highly non-linear with a complex, 19-dimensional parameter space and, from a purely mathematical point of view, a smaller minimum can be found outside the physically meaningful parameter space.However, this does not constitute a solution for our optimisation problem.Another possible reason for finding a minimum outside the physically valid parameter space is that the model, as it stands, is missing or does not fully describe a relevant process and therefore the optimisation has to compensate for this missing process by choosing non-physical parameter values.The analysis of such non-physical parameter values can provide useful information for further model development.However, this is not always feasible and therefore limiting the parameter space with parameter transformations seems to be the most effective way to ensure physically meaningful parameter values.

Conclusions
We systematically investigated the effects of different methods of limiting the parameter space, which is an emerging issue in parameter optimisation studies.In our simplified setup of CCDAS, we saw that two of the methods were not successful; both the constrained and the penalty term optimisation had values outside of the physically meaningful parameter space and, in fact, the former did not converge to a minimum at all.Parameter transformations however, were successful in locating an optimal solution within the limits.All of the physically meaningful ensembles converged to the same minimum, so we can be confident that this is the global minimum.We tested two parameter transformations against standard scaling and found that these three experiments all reached the same minimum, indicating that the transformation does not alter the optimisation problem.This is in contrast to the study of Koffi et al. (2012).However, we note that Koffi et al. used a more complex system that involved 57 parameters compared to our 19.Furthermore, in Koffi et al. (2012), the optimisations do not converge to a minimum and have been stopped after a certain reduction in the cost function value, without obtaining a near-zero gradient.In our experiments, for this paper, all the optimisations with parameter transformations have converged to a minimum with a final gradient approaching zero.A future experiment of interest may involve systematically investigating parameter transformations within the fully complex model.
In our experience, we would therefore recommend the parameter transformations as the most suitable solution to the problem of limiting parameter spaces.As the parameter transformations are applied outside of the optimisation routine this would be a good general method for any problem involving restricted parameter sets.As for CCDAS, the quadratic transformation is slightly preferred to the log as it may have a lower range for the number of iterations required to achieve convergence.

Figure 2 .
Figure 2. Cost function value for all five experiments in log scale for the first 400 iterations and cost function value for constrained optimisation from 400 to 5000 in linear scale (dark blue: Gaussian, green: log, red: quadratic, light blue: constrained, pink: penalty).

Figure 3 .
Figure 3. Gradient value for all five experiments in log scale up to 400 iterations (dark blue: Gaussian, green: log, red: quadratic, light blue: constrained, pink: penalty).

Figure 4 .
Figure 4. Time series of annual global mean net ecosystem productivity (NEP) with posterior uncertainty.

Figure 5 .
Figure 5. Uncertainty correlation matrix of global mean NEP.

Table 1 .
Experiments performed to investigate the impact of different methods of limiting the parameter space on the optimisation.

Table 2 .
Values of the cost function, the contributions from data and parameters, the gradient, the number of iterations to achieve convergence and how many optimisations from the ensemble of five converged to this value for the different parameter transformations of Q 10,f and the constrained and penalty term experiments.