the Creative Commons Attribution 4.0 License.

Special issue: The Transport Matrix Method (TMM) for ocean biogeochemical...

**Methods for assessment of models**| 05 May 2022

# A derivative-free optimisation method for global ocean biogeochemical models

Sophy Oliver Coralia Cartis Iris Kriest Simon F. B Tett and Samar Khatiwala

^{1},

^{2},

^{3},

^{4},

^{1}

**Sophy Oliver et al.**Sophy Oliver Coralia Cartis Iris Kriest Simon F. B Tett and Samar Khatiwala

^{1},

^{2},

^{3},

^{4},

^{1}

^{1}Department of Earth Sciences, University of Oxford, South Parks Road, Oxford OX1 3AN, UK^{2}Mathematical Institute, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK^{3}GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel, Düsternbrooker Weg 20, 24105 Kiel, Germany^{4}School of GeoSciences, University of Edinburgh, Crew Building, Alexander Crum Brown Road, Edinburgh EH9 3FF, UK

^{1}Department of Earth Sciences, University of Oxford, South Parks Road, Oxford OX1 3AN, UK^{2}Mathematical Institute, University of Oxford, Radcliffe Observatory Quarter, Woodstock Road, Oxford OX2 6GG, UK^{3}GEOMAR Helmholtz-Zentrum für Ozeanforschung Kiel, Düsternbrooker Weg 20, 24105 Kiel, Germany^{4}School of GeoSciences, University of Edinburgh, Crew Building, Alexander Crum Brown Road, Edinburgh EH9 3FF, UK

**Correspondence**: Sophy Oliver (sophy.oliver@oriel.ox.ac.uk)

**Correspondence**: Sophy Oliver (sophy.oliver@oriel.ox.ac.uk)

Received: 28 May 2021 – Discussion started: 30 Sep 2021 – Revised: 14 Mar 2022 – Accepted: 27 Mar 2022 – Published: 05 May 2022

The skill of global ocean biogeochemical models, and the earth system models in which they are embedded, can be improved by systematic calibration of the parameter values against observations. However, such tuning is seldom undertaken as these models are computationally very expensive. Here we investigate the performance of DFO-LS, a local, derivative-free optimisation algorithm which has been designed for computationally expensive models with irregular model–data misfit landscapes typical of biogeochemical models. We use DFO-LS to calibrate six parameters of a relatively complex global ocean biogeochemical model (MOPS) against synthetic dissolved oxygen, phosphate and nitrate “observations” from a reference run of the same model with a known parameter configuration. The performance of DFO-LS is compared with that of CMA-ES, another derivative-free algorithm that was applied in a previous study to the same model in one of the first successful attempts at calibrating a global model of this complexity. We find that DFO-LS successfully recovers five of the six parameters in approximately 40 evaluations of the misfit function (each one requiring a 3000-year run of MOPS to equilibrium), while CMA-ES needs over 1200 evaluations. Moreover, DFO-LS reached a “baseline” misfit, defined by observational noise, in just 11–14 evaluations, whereas CMA-ES required approximately 340 evaluations. We also find that the performance of DFO-LS is not significantly affected by observational sparsity, however fewer parameters were successfully optimised in the presence of observational uncertainty. The results presented here suggest that DFO-LS is sufficiently inexpensive and robust to apply to the calibration of complex, global ocean biogeochemical models.

Ocean biogeochemical models are a key tool in understanding the cycling of nutrients and carbon in the ocean.
They are used to quantify the uptake of greenhouse gases such as CO_{2} emitted by human activity, of which
the ocean has absorbed roughly a third since the start of the industrial revolution (Khatiwala et al., 2009; DeVries, 2014),
as well as assess the impact of increasing concentrations of greenhouse gases on ocean ecosystems. Such
models are also an important component of the earth system models (ESMs) used to project future climate
change. In global ocean biogeochemical models the complex interactions between biota, nutrients, oxygen and
carbon are typically heavily parameterised. The skill of such models can be improved by either subjective manual or systematic
tuning of the parameter values against observations. The latter uses numerical optimisation algorithms which seek to find
the minima of a “misfit function” – often defined as the root mean squared difference between the model and
observations – within the parameter space. However, biogeochemical models are seldom
subjected to such tuning because of their large computational expense and the long spin-up time required for
chemical and biological tracers to reach equilibrium (Wunsch and Heimbach, 2008; Khatiwala et al., 2012). Moreover, optimisation
algorithms must be able to navigate a generally irregular misfit landscape. Efficient and
robust optimisation methods are thus of considerable interest to the ocean biogeochemical and broader climate
modelling community.

Previous ocean biogeochemical calibration studies have more frequently been carried out on computationally less expensive zero-dimensional (e.g. Kidston et al., 2013) and 1-dimensional models (e.g. Chen and Smith, 2018; Xiao and Friedrichs, 2014; Ward et al., 2010; Spitz et al., 1998), regional models (e.g. Melbourne-Thomas et al., 2015; Zhao et al., 2005), or steady-state global models (e.g. Kwon and Primeau, 2006, 2008). However, with the aid of fast “offline” circulation schemes, such as those using transport matrix methodology (e.g. Khatiwala et al., 2005; Li and Primeau, 2008) which can be applied to time-dependent biogeochemical models, more recently, complex global ocean biogeochemical models have also begun to be systematically optimised to observations (e.g. Kriest et al., 2017, 2020; Sauerland et al., 2019; Niemeyer et al., 2019; Kriest, 2017).

Optimisation methods can generally be split into two broad categories. Derivative-based algorithms such as Gauss–Newton (Hartley, 1961) use derivatives within the parameter space to locate minima. The calculation of derivatives, which can be undertaken using finite differences or automatic differentiation and adjoints (Griewank and Walther, 2008), can be prohibitively expensive in some cases, such as when evaluating the misfit function is computationally costly or noisy (chapters 8 and 9 of Nocedal and Wright, 2006). By contrast, derivative-free algorithms (Conn et al., 2009) may require fewer evaluations per iteration and are typically better adapted to handle noisy misfit functions. An example of the latter is “covariance matrix adaptation evolution strategy” (CMA-ES; Hansen, 2016). CMA-ES was applied by Kriest et al. (2017) to optimise six parameters within the Model of Oceanic Pelagic Stoichiometry (MOPS; Kriest and Oschlies, 2015), by minimising a globally averaged misfit incorporating annual mean dissolved phosphate, nitrogen and oxygen. This constituted one of the first successful attempts at systematic tuning of a relatively complex global biogeochemical model. CMA-ES was subsequently used by Sauerland et al. (2019) for multiobjective calibration of MOPS by including oxygen minimum zones as a misfit metric, and by Kriest et al. (2020) who compared the influence of different general circulation models on parameter optimisation.

While the development and application of CMA-ES is an important step forward, its evaluation cost per iteration, as well as overall computational cost, is prohibitively expensive for routine use. In Kriest et al. (2017), for example, the misfit function had to be evaluated at least 950 times to achieve a sufficiently low misfit. As each evaluation requires running the biogeochemical model to equilibrium (3000 years in that study), this would be prohibitively expensive for the more complex models run at resolutions typical of the current generation of ESMs. Here, we explore the application of another, computationally less expensive algorithm, “derivative-free optimisation by least squares” (DFO-LS; Cartis et al., 2019), to the same problem set-up as in Kriest et al. (2017). We first compare the performance of CMA-ES and DFO-LS to optimise six biogeochemical parameters against the output of a reference run of MOPS where the parameters are known. We examine in this “twin” experiment the ability of the algorithms to recover the true parameters, and the computational cost incurred. True oceanic observations contain observational uncertainty; therefore we also investigate the impact of optimising in the presence of observational uncertainty by adding noise to the synthetic observations. Lastly, we evaluate the performance of DFO-LS when given sparse data. Sparse scattered oceanic observations are commonly mapped onto a regular grid using methods such as objective interpolation, introducing significant error, especially in regions such as the Southern Ocean with poor data coverage. The structure of the paper is as follows: Section 2 describes the methodology, Sect. 3 the results, Sect. 4 the discussion and Sect. 5 the conclusions.

## 2.1 Ocean biogeochemical model

The Model of Oceanic Pelagic Stoichiometry (MOPS-2.0) is a global ocean biogeochemical model, which simulates the cycling of nine biogeochemical tracers, namely dissolved inorganic and organic phosphate, dissolved inorganic nitrate, dissolved oxygen, phytoplankton, zooplankton, and detritus (Kriest and Oschlies, 2013, 2015), with the possibility to include the carbon cycle. MOPS is coupled to the transport matrix method (TMM; Khatiwala et al., 2005; Khatiwala, 2007, 2018), an efficient numerical method for “offline” simulation of biogeochemical tracers. In this study we use monthly mean transport matrices and other physical forcing fields (including temperature, salinity, sea ice and winds) derived from a relatively coarse resolution (2.8^{∘} ×2.8^{∘} × 15 levels)
configuration of MITgcm (Marshall et al., 1997) driven by climatological momentum, heat and freshwater fluxes
(Dutkiewicz et al., 2005).

## 2.2 Biogeochemical model parameters

The behaviour of MOPS is controlled by several parameters, of which we have chosen the same six parameters to consider for calibration as chosen in the previous optimisation study by Kriest et al. (2017). The detailed definitions and possible ranges of these parameters
are described in that paper. Briefly, four of these parameters are mainly restricted to the epipelagic and mesopelagic zones
of the ocean, as they involve phytoplankton and zooplankton. *I*_{C} and *K*_{PHY} are the phytoplankton half-saturation constants for light absorption and phosphate uptake, respectively. *μ*_{ZOO} is the zooplankton maximum grazing rate and *k*_{ZOO} the zooplankton quadratic mortality rate. The remaining two parameters influence the remineralisation and sinking of particulate organic matter (POM). ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$ is the ratio of oxygen consumption to phosphate release during remineralisation when oxygen is available, and *b*^{∗} is the exponent of the “Martin curve”, a power law function that describes the attenuation of POM flux with depth (Martin et al., 1987).

## 2.3 Parameter sensitivities

Local sensitivities for each of the six parameters were calculated at one location within parameter space, whereby only the parameter whose sensitivity was to be calculated was perturbed from its target value by 10 % of its range while the other parameter values remained constant.

## 2.4 Optimisation algorithms

Optimisation algorithms iteratively evaluate the misfit between model and observations, then vary the model parameter inputs with the aim of finding a lower misfit. Here, every evaluation of the misfit requires running the biogeochemical model to equilibrium (3000 years), then calculating the misfit between the model outputs and real (or synthetic) observations of dissolved oxygen, phosphate and nitrate. In general the misfit “landscapes” of biogeochemical models tend to be nonlinear, as found by Kriest et al. (2017) for example, who converged to multiple local minima. “Twin” experiments are used to determine if an optimiser can find the global minimum within the misfit function landscape, whereby the misfit is calculated between the model outputs and synthetic observations. The synthetic observations are created by the model with a known parameter configuration; therefore the global minimum (is zero) and optimal parameter values are known. We compare the performance of two different optimisation algorithms, by using twin experiments.

### 2.4.1 CMA-ES

The Covariance Matrix Adaptation Evolution Strategy (CMA-ES; Hansen, 2016) is a widely used stochastic evolutionary algorithm, for use on a “black box” misfit function. By design, CMA-ES is an unconstrained solver, that is, parameters are not restricted to be within specified bounds. To ensure that parameters lie within reasonable bounds, a penalty score is added to the misfit when any parameter value goes outside of their specified range, as also done by Kriest et al. (2017). During each iteration, a population size of *λ* biogeochemical parameter vectors are sampled from a multi-variate normal distribution, which is fully described by a mean and a positive definite matrix of covariances. CMA-ES then requires the misfit function to be evaluated at these *λ* locations in the parameter space. The results of these are used to update the mean and covariance of the multi-variate normal distribution, before another *λ* biogeochemical parameter vectors are sampled for the next iteration. With each iteration the population should be guided towards areas of the parameter landscape which provide lower expected misfit values, aiming to converge on the parameter configurations which provide the best misfits. This process has been well illustrated by Kriest et al. (2017, see their Fig. 2), who previously used CMA-ES to optimise MOPS. CMA-ES carries out a global search of the parameter space; therefore it seeks to find the minimum over the parameter space. In order to achieve this, CMA-ES can require thousands of function evaluations (e.g. 950–3460 required by Kriest et al., 2017). The CMA-ES code used in this study, which is based on the (*μ*/*μ*_{W},*λ*)-CMA-ES algorithm of Hansen (2016), is summarised in Appendix A. The optimisation code was sourced from the Supplement by Kriest et al. (2017), with some editing to make it compatible with our chosen optimisation framework (see Code and data availability section). As in the previous Kriest et al. (2017) study, we use a population size *λ* of 10; i.e. in each iteration of CMA-ES, the misfit function is evaluated 10 times.

### 2.4.2 DFO-LS

Derivative-free optimisation using least squares (DFO-LS) is an iterative algorithm for minimising a function *f*(** x**) (Cartis et al., 2019), where

**is the**

*x**n*-dimensional vector of parameters, each of which is constrained within specified bounds. DFO-LS can take into account individual terms of the misfit function and use their structure to improve convergence. Mathematically, DFO-LS solves the nonlinear least-squares problem:

where *D* is a bounded domain of ℝ^{n}, and *r*_{i}(** x**) denotes individual terms in the misfit function. DFO-LS starts at an initial location within the parameter space and then moves through the space to provably find a local minimum (see Appendix A.2 in Cartis et al., 2019, for convergence and complexity rates). The algorithm is illustrated in Fig. 1 and summarised in Appendix B.

DFO-LS must be given a starting location within the parameter space from which to initialise. In the initial iteration of DFO-LS, the misfit function is evaluated at the starting location and at *n* additional locations nearby (where *n* is the number of parameters to be optimised), with their proximity determined by DFO-LS settings (see Table B1). In subsequent iterations, typically only one function evaluation is needed, and often only a handful are needed to achieve significant misfit reduction.

Using this set of evaluated points, DFO-LS creates a quadratic approximation to the underlying true (unknown) misfit function (Cartis et al., 2019) and calculates the minimum of this function within a “trust region” centred around the starting point. The true misfit function is then evaluated at this location. If it is found to be worse than the misfit at the existing *n*+1 points, it is rejected. The trust region is shrunk and the procedure is repeated. On the other hand, if it is found to be lower than the best of the *n*+1 points then it is accepted, and the point corresponding to the highest misfit amongst the previous *n*+1 points is discarded. A new quadratic approximation is calculated for this *n*+1 set of points, and the procedure is repeated. Thus, at any iteration DFO-LS keeps track of *n*+1 points in parameter space and the point with the lowest misfit is considered as that iteration's best set of parameters. The algorithm is terminated based on three specified criteria: (1) the maximum allowed number of function evaluations is exceeded, (2) the trust region radius is shrunk below a specified size, and (3) misfit reduction progress is identified as being too slow.

Unlike CMA-ES, DFO-LS is more of a “local” optimisation method. However, there is strong numerical evidence from the derivative-free optimiser Py-BOBYQA, upon which DFO-LS is based (Cartis et al., 2021), that it is able to find global minima. To increase the likelihood of finding the global minimum, DFO-LS can either be manually re-initiated from different starting locations in the parameter space, or automatically “restarted” once it determines that the reduction in the misfit is progressing too slowly. During a restart the trust region expands, allowing DFO-LS to search for points potentially outside the local minimum it may be trapped in, and move towards a lower minimum elsewhere. This can be done by either a “hard” restart, whereby the (expensive) misfit function is re-evaluated at *n*+1 new locations within the expanded trust region, or by a “soft” restart, whereby DFO-LS only “shifts” some of the current *n*+1 points in parameter space to geometry-improving points (Cartis et al., 2019). The former is more computationally expensive; therefore we do not use it here, although soft restarts are allowed. To increase confidence that DFO-LS has found the global minimum, we also initiate from multiple starting points.

## 2.5 Misfit functions

Every evaluation of our misfit requires running the biogeochemical model for 3000 years before calculating the misfit between the model outputs and synthetic observations. While both CMA-ES and DFO-LS minimise a single misfit function, DFO-LS can exploit the structure of the misfit function. Thus, if the misfit is defined as per Eq. (1), we only provide CMA-ES with *f*(** x**) whereas the individual

*r*

_{i}(

**) are supplied to DFO-LS. There is no maximum suggested value for**

*x**d*, the number of

*r*

_{i}terms; therefore the misfit at every grid point within the model domain could be provided to DFO-LS. However, many of the individual

*r*

_{i}misfits would be physically close to each other in the model and therefore will respond similarly to perturbations in the biogeochemical parameters being optimised, which will result in a heavier weighting to this location of the ocean model. To avoid this, we define

*r*

_{i}to take into account the spatial structure of the misfit by partitioning the ocean into previously established biome regions of similar ocean biogeochemical properties (Henson et al., 2010; Weber et al., 2016, provided by Raffaele Bernardello, Barcelona Supercomputing Centre, personal communication, 2018), several of which were further split by depth at 1000 m (see Fig. 2) for a total of 19 regions. For every region

*j*, we further calculate a misfit for each of the three tracers

*q*(phosphate, nitrate, oxygen) used in the optimisation. The objective

*f*(

**) is thus composed of $\mathrm{19}\times \mathrm{3}=\mathrm{57}$ terms of the following form:**

*x*
where *m*_{qi}(** x**) is the model solution with parameters

**at grid point**

*x**i*for tracer

*q*and

*o*

_{qi}the corresponding observation (the synthetic observations provided by a reference run of MOPS). The misfit is normalised by the volume-weighted mean tracer concentration for that region and weighted, first, by individual grid point volumes

*V*

_{i}relative to the volume

*V*

_{j}of region

*j*and, second, by the region's total volume relative to the global ocean volume

*V*

_{global}. Real oceanic observations have a degree of uncertainty associated with them due to spatio-temporal oceanic processes, e.g. from small-scale processes such as unresolved eddies. To account for this we add a noise term

*ϵ*

_{iq}, which is the added noise due to uncertainty associated with tracer

*q*for every grid box

*i*in the model. The total global misfit ${f}_{\mathrm{T}}^{\mathit{\u03f5}}\left(\mathit{x}\right)$ is then defined as

The total misfit function is broadly similar to Kriest et al. (2017), with the main difference being the incorporation of the 19 biome regions.

The non-noisy equivalent of these misfit terms are *r*_{qj} and *f*_{T} as in Eqs. (2) and (3) with *ϵ*=0. We also define “baseline” misfits ${r}_{qj}^{\mathrm{base}}$ and ${f}_{\mathrm{T}}^{\mathrm{base}}$, which are the misfits due to the noise alone in the special case where the model outputs equal the observations (in Eq. 2: *m*_{qi}=*o*_{qi}). ${f}_{\mathrm{T}}^{\mathrm{base}}$ give an indication of the termination criteria when optimising the model to real noisy oceanic observations, as optimising below this threshold would serve no useful purpose.

To specify a realistic noise field we take the standard deviation variable provided in the World Ocean Atlas database (WOA18 Garcia et al., 2018a, b). Since our misfit is defined with respect to annual mean data we require an annual mean standard deviation without the variability of the seasonal cycle. To do so we take the numerical mean (weighted by number of observations) of the monthly standard deviations reported in the WOA18 dataset for the upper 800 m (phosphate and nitrate) or 1500 m (oxygen), and the annual standard deviation below those depths. These standard deviations fields were linearly interpolated onto the model grid and then multiplied by three different Gaussian noise fields to create three separate noise (*ϵ*) realisations. The baseline misfit terms ${r}_{qj}^{\mathrm{base}}$ and ${f}_{\mathrm{T}}^{\mathrm{base}}$ were calculated as an average over these realisations.

As mentioned above, the “observations” in this study are from a reference simulation of MOPS, hereafter referred to as MOPS-ref, run with the following parameter values: ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}=\mathrm{170}$ mmol O_{2}: mmol P, *I*_{C}=24 W m^{−2}, *K*_{PHY}=0.03125 mmol P m^{−3}, *μ*_{ZOO}=2 d^{−1}, *k*_{ZOO}=3.2 (mmol P m^{−3})^{−1} d^{−1} and ${b}^{\ast}=\mathrm{0.858}$ (see Table C1).

## 2.6 Optimisation experimental design and solver settings

In this study we seek to (1) compare the performance of CMA-ES and DFO-LS on noise-free observations, (2) investigate DFO-LS's performance on noisy observations, and (3) investigate the impact of sparse observations on the ability of DFO-LS to recover the true parameters. In order to do so we carried out the following series of experiments (see Table 1 for the corresponding experiment labels):

*Noise-free experiments.*
In the noise-free experiments we attempted to recover all six parameters. For this a single run of CMA-ES was performed (labelled C_SMOOTH). For DFO-LS two experiments were carried out (D_SMOOTH_{1} and D_SMOOTH_{2}), starting from two different locations in parameter space that were chosen to be relatively far from the target parameters. The parameter values for these and all other experiments are listed in Table C1.

Both CMA-ES and DFO-LS are controlled by various solver settings. For CMA-ES the main ones are the number of sequential generations and the population size. As per Kriest et al. (2017) we set these to 200 and 10, respectively. The solver settings used by DFO-LS are summarised in Table B1. Of the noise-free experiments D_SMOOTH_{1} and D_SMOOTH_{2}, the former had DFO-LS settings regarding trust region management (**tr_radius**) which are more suitable for a noisy misfit function, while the latter is more suitable for a smooth misfit function. Therefore, D_SMOOTH_{1} and D_SMOOTH_{2} vary both in starting values and trust region management. As D_SMOOTH_{1} was slightly more successful, the trust region management settings were set to be more suitable for a noisy misfit function in all subsequent experiments.

*DFO-LS experiments with observational uncertainty.*
To understand optimisation performance in the presence of observational uncertainty, noise was added to the reference observations (see Sect. 2.5). Three such optimisation runs, each with a different noise realisation, were carried out with DFO-LS (D_NOISY_{1}, D_NOISY_{2}, D_NOISY_{3}) starting from the same location in parameter space, to minimise the noisy misfit function ${f}_{\mathrm{T}}^{\mathit{\u03f5}}$. The goal was to see if DFO-LS could recover all six of the MOPS-ref target parameter values.

*DFO-LS experiments with sparse observations.*
There are large areas of the ocean which have not been sampled adequately or at all (e.g. Fig. 2). While it is possible to fill in the gaps in the data using objective interpolation methods, this might not always work well in the presence of large gradients. In a last set of experiments we therefore compared how DFO-LS performs in the presence of data sparsity (D_SPARSE_{1} and D_SPARSE_{2}), by only using observations at model grid points for which the corresponding locations in WOA18 contain data, with its corresponding performance in the absence of data sparsity (D_SMOOTH_{1} and D_SMOOTH_{2}).

## 3.1 Parameter sensitivities

To provide insight into why some parameters were tuned better or worse in the following optimisation experiments, the sensitivity of the misfit function to an individual perturbation in each parameter has been calculated and shown in Fig. 3. The greatest change in the misfit was caused by perturbing *b*^{∗} by 10 % of its range, followed by *I*_{C}. The parameter with the lowest influence on the misfit at this local point in parameter space was *K*_{PHY}, in which a 10 % perturbation caused a misfit change of only $\mathrm{4.1}\times {\mathrm{10}}^{-\mathrm{6}}$.

## 3.2 Noise-free experiments

The results for all twin optimisation experiments are summarised in Figs. 4 and 5, and in Appendix C (Tables C1 and C2), which show the starting and optimised parameter values, and parameter recovery information. In the subsequent sections we then plot both the global misfit and parameter values for every function evaluation throughout each individual optimisation experiment (Figs. 6–13). During one CMA-ES iteration we evaluate the misfit function 10 times (the population size). Therefore for CMA-ES we plot the minimum (best) and maximum misfits, and we plot the parameter values corresponding to the best misfit, and the minimum and maximum parameter values of each population.

To both reiterate how DFO-LS works and fully explain the DFO-LS figures, we briefly describe the optimisation process in terms of expected misfit reduction and parameter trajectories. First, DFO-LS evaluates the misfit function *n*+1 times near to the chosen starting point; therefore in the first seven evaluations we do not expect a misfit reduction. After these initial evaluations, DFO-LS attempts to minimise the misfit function and there will be both successful evaluations (the resulting misfit is lower than previously found in the optimisation) and unsuccessful ones (the misfit is not lower). There may also be restarts, directly after which unsuccessful evaluations are common as DFO-LS perturbs the parameter values to get out of a possible local minimum. Therefore on every DFO-LS figure we have plotted the misfit or parameter values for every evaluation (both successful and unsuccessful) as scattered points, and successful ones in a solid line.

On every figure of total global misfits the expected baseline misfit (${f}_{\mathrm{T}}^{\mathrm{base}}$; see Sect. 2.5) is also plotted, below which any misfit reduction is within observational noise levels. On every parameter trajectory plot the “recovery zone” is also shown. This indicates the range of parameter values within ±5 % of the target value, normalised by the total range (upper bound minus lower bound) for that parameter. We consider a parameter to have been “recovered” by a certain number of evaluations, when all subsequent parameter values corresponding to successful evaluations remain within this recovery zone.

*CMA-ES experiment.*
Figure 6 shows that during the optimisation C_SMOOTH by CMA-ES the global misfit decreases significantly from 10^{−1} to $\approx {\mathrm{10}}^{-\mathrm{4}}$ within the first 500 function evaluations. Subsequently progress slows down as the misfit is reduced by only 1 more order of magnitude over the next 1000 evaluations. Progress then significantly improves, with the misfit decreasing from $\approx {\mathrm{10}}^{-\mathrm{5}}$ to 10^{−9} within the final 500 evaluations. Note that the spikes in the maximum global misfit near the 1650th evaluation was due to the added penalty factor when one of the parameter values in this population had a value just outside of its allowed range. While this experiment did not include noise, we note that CMA-ES required 309 evaluations to reach the baseline misfit, beyond which any misfit reduction would have been within observational noise levels.

Figure 7 shows how the six parameters were optimised towards the MOPS-ref target parameter values by CMA-ES. The targets were found relatively quickly within the initial 500 evaluations for the parameters ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$, *I*_{C} and *b*^{∗}, corresponding to the initial fast misfit reduction previously shown in Fig. 6. The *μ*_{ZOO} and *k*_{ZOO} targets were found next after approximately 1000 evaluations, after which the optimiser began tuning *K*_{PHY} towards its target until it located after 1700 evaluations. If C_SMOOTH had been terminated once the observational noise level was reached after 309 evaluations, ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$, *I*_{C} and *b*^{∗} would have been optimised to their MOPS-ref values relatively well, while *K*_{PHY}, *μ*_{ZOO} and *k*_{ZOO} would still be far from their target values.

*DFO-LS experiments.*
To compare the performance of DFO-LS with CMA-ES we carried out two optimisation experiments with DFO-LS (D_SMOOTH_{1} and D_SMOOTH_{2}) starting from two different locations in parameter space, with differing parameters controlling the DFO-LS trust region shrinking speed. Optimisation D_SMOOTH_{1} had slower trust region shrinking settings to allow it to better handle an irregular misfit function. Figure 8 shows the comparison between both experiments' reduction of the global misfit. In both cases there was rapid initial misfit decrease from near 10^{−1} to 10^{−3} within 30 model evaluations. Optimisation D_SMOOTH_{2} showed slightly slower misfit reduction, needing 35 evaluations to reach the baseline misfit, while D_SMOOTH_{1} only required 20 to reach the baseline, beyond which any misfit reduction is within observational noise levels. In both D_SMOOTH_{1} and D_SMOOTH_{2} DFO-LS managed to reduce the misfit to below 10^{−5} within 45–49 evaluations and then initiated restarts to reduce it further. As D_SMOOTH_{1} performed slightly better than D_SMOOTH_{2}, especially between evaluations 15–45, D_SMOOTH_{1} trust region shrinking settings were used as defaults for all subsequent experiments.

Figure 9 shows how the six parameters were optimised towards the MOPS-ref target parameter values by D_SMOOTH_{1} and D_SMOOTH_{2}. In both experiments within the first 30 evaluations ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$, *I*_{C}, *k*_{ZOO} and *b*^{∗} were optimised to relatively close to their targets, and *μ*_{ZOO} within the first 45 evaluations. The parameter the misfit function was least sensitive to, *K*_{PHY}, was successfully optimised by D_SMOOTH_{1}; however, was not successfully optimised at all by D_SMOOTH_{2}. If D_SMOOTH_{1} and D_SMOOTH_{2} had been terminated once reaching the noise baseline, after 20 and 35 evaluations respectively, ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$, *I*_{C}, *b*^{∗} and *k*_{ZOO} would have been optimised to their MOPS-ref values relatively well.

## 3.3 DFO-LS experiments with observational uncertainty

To assess the impact of observational uncertainty we carried out three experiments in which DFO-LS was initialised from the same starting location in parameter space, but with three different realisations of random noise added to the observations (see Sect. 2.6). As seen in Fig. 10 DFO-LS managed to reduce the misfit to very close to the average baseline misfit within 30 evaluations with a reduction in misfit from $\sim {\mathrm{10}}^{-\mathrm{1}}$ to $\sim {\mathrm{10}}^{-\mathrm{3}}$. Closer to the end of the optimisation runs, restarts were initiated to encourage further misfit reduction, hence the large variations in misfit. Figure 11 shows that the initial misfit reduction corresponds to improved values for the parameters ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$, *I*_{C}, *K*_{PHY} and *b*^{∗}. DFO-LS seems to compensate for the noise by increasing the values for the parameters *μ*_{ZOO} and *k*_{ZOO}.

## 3.4 DFO-LS experiments with sparse observations

In a final set of experiments we examine whether DFO-LS is able to successfully optimise MOPS given a sparse set of observations (see Sect. 2.6). The experiments D_SPARSE_{1} and D_SPARSE_{2} were initialised from the same location in parameter space as D_SMOOTH_{1} and D_SMOOTH_{2}, respectively, but the former were optimised using observations sub-sampled at grid points corresponding to locations in the un-interpolated WOA18 database. In these experiments no noise was added to the observations. Figure 12 shows that the two optimisations using full observations (D_SMOOTH_{1} and D_SMOOTH_{2}) converged to slightly lower misfits than when using sparse observations. Figure 13 shows that D_SMOOTH_{1} successfully recovered all six parameters within 42 evaluations, while D_SPARSE_{1} only successfully recovered ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$, *I*_{C} and *b*^{∗} throughout the optimisation. The above results would indicate a poorer optimisation when using sparse observations; however, when starting from a different location in parameter space, D_SMOOTH_{2} successfully recovered five parameters within 44 evaluations, while D_SPARSE_{2} recovered the same five after only 26 evaluations. This suggests that even with sparse observations it is possible to successfully optimise a global ocean biogeochemical model such as MOPS.

## 4.1 CMA-ES vs. DFO-LS optimisation performance

Our comparison of the two optimisation algorithms shows that DFO-LS could recover all six target parameter values within ∼40 evaluations of MOPS, while CMA-ES achieved the same goal within ∼1700 evaluations. By “recover” we mean optimised to within ±5 % (normalised by the parameter range) of the target value. DFO-LS reduced the misfit to below the observational uncertainty threshold within 20–35 evaluations, while CMA-ES required 309 evaluations. DFO-LS is thus significantly more efficient for this particular problem and may, in general, be more practical for optimising more than a small handful of parameters. However, we note that the multiple evaluations CMA-ES requires can be run in parallel. In contrast, DFO-LS, except for the initial *n*+1 evaluations, runs sequentially.

CMA-ES is a single-objective optimiser, while DFO-LS can use information from multiple misfit values instead of just one. Therefore it can exploit more information to allow for a faster reduction in the misfit. Neither algorithm can completely guarantee a global optimum solution, although CMA-ES carries out a more global search than DFO-LS. There is significant evidence DFO-LS can find the global optimum (Cartis et al., 2021), but to increase confidence in the final solution it can be combined with a globalising method such as starting from different points in parameter space or using the DFO-LS restart functionality.

Both methods struggled with one of the parameters *K*_{PHY}, due to the misfit function's low sensitivity to this parameter (as found by perturbing the parameter values in each direction and computing the gradient). DFO-LS had not begun to tune this parameter for one of the experiments (D_SMOOTH_{2}) before we terminated it at a maximum of 70 evaluations, although it did find *K*_{PHY} when initiated from a different starting point (D_SMOOTH_{1}). CMA-ES also had difficulty in tuning *K*_{PHY} and only started optimising this parameter after all the other parameters were recovered at ∼1200 evaluations. The maximum number of DFO-LS evaluations was set to 70 as it is a sequential algorithm; therefore it was impractical to allow too many more evaluations. Had it been allowed to run longer the expectation is it would begin tuning *K*_{PHY} once the other five were sufficiently tuned, as was the case with CMA-ES. However, the computational expense of continuing DFO-LS to recover *K*_{PHY} was deemed too costly, particularly due to the fact that this parameter has so little influence on the misfit function and therefore did not impact the successful misfit reduction achieved by DFO-LS. This low sensitivity to *K*_{PHY} was also seen by Kriest et al. (2017), who determined the surface observations contribute too little to the total misfit, rendering the misfit function insensitive to perturbations in parameters that mainly influence the surface ocean (e.g. *K*_{PHY}). To help overcome this in future work, one could put more weighting on surface data when formulating the misfit.

## 4.2 Calibrating to uncertain observations

Real oceanic observations come with associated uncertainty due to measurement error, temporal variations such as seasonal and diurnal cycles, and meso-scale variability due to factors such as eddies and the movement of fronts. Here we have studied how this uncertainty raises the base of the misfit function, below which any optimisation of the biogeochemical model would be within the uncertainty level. We determined this baseline for the misfit function (or termination threshold) using the standard deviations of the observational data; however, others have defined it as the global optimum of a surrogate formulation of the biogeochemical model (Sauerland et al., 2017). In the present case and with the chosen set of oceanic observations, the model was significantly optimised before reaching levels of observational uncertainty, particularly due to optimisation of the parameters which the model is most sensitive to, as was determined by perturbing each parameter while holding the others fixed and calculating the misfit. In this case it was *I*_{C} (the phytoplankton half-saturation for light) and *b*^{∗} (the increase in particle sinking speed with depth). Somewhat surprisingly, a parameter the model is less sensitive to, ${R}_{{\mathrm{O}}_{\mathrm{2}}:\mathrm{P}}$ (the ratio of oxygen consumption to phosphate release during remineralisation), was also well optimised before reaching the baseline. Despite the low sensitivity, possibly caused by narrow parameter bounds, the high optimisation potential by this parameter may be due to the fact that the misfit function includes both oxygen and phosphate. It could also be due to the fact that this parameter has a non-local effect, as it influences the flux of oxygen and phosphate to the deeper ocean, hence to ocean basins further along the “conveyor belt”. This is also the case for *b*^{∗} (Kwon and Primeau, 2006; Kriest et al., 2012) but even more so as it also influences the vertical flux of all three of the tracers. In order to optimise less sensitive parameters before reaching noise levels one could introduce more metrics into the misfit calculation to help constrain these parameters, for example phytoplankton and zooplankton data, or additional oxygen constraints, such as the location of oxygen minimum zones as done by Niemeyer et al. (2019).

## 4.3 Calibrating to sparse observations

We also investigated the ability of DFO-LS to optimise MOPS in the presence of sparsity in the observational data. The results shown here suggest that there is no significant difference in the performance of DFO-LS when tuning to data at every grid point versus a subset of grid points, in line with earlier findings by Kriest et al. (2010). Interpolating can introduce large errors, on the order of 20 % (Garcia et al., 2018a, b), particularly in poorly sampled regions such as the Southern Ocean. However, our experiments suggest that it is possible to use un-interpolated observations, but it is important to start the optimiser from multiple locations in parameter space, or to generously allow restarts, in the presence of a complex misfit function with many local minima. These multiple runs clearly can be run in parallel.

This study compared the efficiency and performance of two derivative-free optimisation algorithms, CMA-ES and DFO-LS, applied to MOPS, a global ocean biogeochemical model with seven prognostic tracers. The two methods were used to tune six of the parameters that control the behaviour of MOPS. We found that DFO-LS has a significantly lower computational cost when compared to CMA-ES, between one and two orders of magnitude, which is important considering that global ocean biogeochemical models are computationally expensive, as they must be integrated for several thousand years to reach equilibrium. DFO-LS exploits more information when minimising the misfit function and therefore has more scope for reducing the misfit faster than CMA-ES. However, as DFO-LS is more of a local optimiser than CMA-ES, it should be paired with a globalising method such as starting from different initial points in parameter space, which can easily be run in parallel.

Future work will involve applying DFO-LS to tune the MEDUSA biogeochemical model (Yool et al., 2011, 2013) to real observations. MEDUSA is more typical of the biogeochemical models which are embedded within earth system models (in the case of MEDUSA, UKESM) that are used to project climate change.

Below is a simplified description of the (*μ*/*μ*_{W},*λ*)-CMA-ES algorithm (Hansen, 2016).

*λ*=10, $\mathit{\mu}=\mathit{\lambda}/\mathrm{2}$, evolution paths, covariance matrix

**C**=

**I**, distribution mean, step size and maximum generation number.

**while**maximum generation is not reached and fitness distribution is not flat

**do**

**Sample population of new probability distribution**

**for**$k=\mathrm{0},\mathrm{1},\mathrm{2},\mathrm{\dots}\mathit{\lambda}$

**do**

*k*

**end**

**for**

**Update probability distribution**

**end**

**while**

Below is a simplified description of the DFO-LS algorithm in the context of how it has been used in this study. Not all technical details are included, such as safeguarding steps to improve the geometry of points and the quality of the model; therefore see the full description in Cartis et al. (2019).

**Require:**

*n*, starting point

*x*_{0}∈ℝ

^{n}, minimum trust region radius (

*p*

_{end}), if hard or soft restarts are allowed (see Sect. 2.4.2), and maximum number of true misfit function evaluations.

*n*+1 points within the initial trust region to build the initial interpolation set

*{*

*Y*_{0}

*}*(this can be done in parallel).

**for**$k=\mathrm{0},\mathrm{1},\mathrm{2},\mathrm{\dots}$

**do**

**if**we have exceeded the maximum number of true misfit function evaluations

**then**

**end**

**if**

*s*_{k}to this point.

*x*_{k}+

*s*_{k}.

**if**the misfit is significantly decreased

**then**

**if**misfit decrease is not significant

**then**

**end**

**if**

*{*

*Y*_{k+1}

*}*by replacing the worst point with the new accepted point to maintain a set of

*n*+1 points.

**else**

**if**the trust region radius is smaller than

*p*

_{end}

**then**

**end**

**if**

**end**

**if**

**end**

**for**

The base TMM and MOPS code used for the ocean biogeochemical simulations are available to download from https://doi.org/10.5281/zenodo.1246300 (Khatiwala, 2018). Transport matrices and forcing fields required to perform the simulations can be downloaded from https://doi.org/10.5281/zenodo.5517238 (Khatiwala, 2021). Modifications to the MOPS code for the specific experiments described in this paper, along with model output and scripts to recreate the figures shown here, are available from https://doi.org/10.5281/zenodo.5517626 (Oliver et al., 2021). The OptClim optimisation framework used in this study to couple any climate model to any optimiser is available at https://doi.org/10.5281/zenodo.5517610 (Oliver and Tett, 2021). This includes the CMA-ES optimisation code taken from the Supplement of Kriest et al. (2017) and adapted to work with OptClim.

SK and CC conceived the project and together with SO designed the experiments. SO carried out the experiments and analysis and wrote the paper with contributions from all co-authors.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Computing resources were provided by the University of Oxford Advanced Research Computing (ARC) facility (https://doi.org/10.5281/zenodo.22558, Richards, 2015).

This research was supported by multiple funders and grants. Studentship and funding was provided to Sophy Oliver by the Natural Environmental Research Council (grant no. NE/L002612/1), the Oxford Doctoral Training Partnership in Environmental Research, and the Met Office. Samar Khatiwala was supported by UK NERC (grant nos. NE/M020835/1 and NE/P019218/1). When developing OptClim, Simon F. B. Tett was supported by UK NERC (grant no. NE/L012146/1).

This paper was edited by Qiang Wang and reviewed by Benoit Pasquier and one anonymous referee.

Cartis, C., Fiala, J., Marteau, B., and Roberts, L.: Improving the flexibility and robustness of model-based derivative-free optimization solvers, ACM T. Math. Software, 45, 1–35, https://doi.org/10.1145/3338517, 2019. a, b, c, d, e, f

Cartis, C., Roberts, L., and Sheridan-Methven, O.: Escaping local minima with local derivative-free methods: a numerical investigation, Optimization, 0, 1–31, https://doi.org/10.1080/02331934.2021.1883015, 2021. a, b

Chen, B. and Smith, S. L.: CITRATE 1.0: Phytoplankton continuous trait-distribution model with one-dimensional physical transport applied to the North Pacific, Geosci. Model Dev., 11, 467–495, https://doi.org/10.5194/gmd-11-467-2018, 2018. a

Conn, A. R., Scheinberg, K., and Vicente, L. N.: Introduction to Derivative-Free Optimization, Society for Industrial and Applied Mathematics (SIAM), https://doi.org/10.1137/1.9780898718768, 2009. a

DeVries, T.: The oceanic anthropogenic CO_{2} sink: Storage, air‐sea fluxes,
and transports over the industrial era, Global Biogeochem. Cycles, 28,
631–647, https://doi.org/10.1002/2013GB004739, 2014. a

Dutkiewicz, S., Follows, M. J., and Parekh, P.: Interactions of the iron and phosphorus cycles: A three-dimensional model study, Global Biogeochem. Cycles, 19, 1–22, https://doi.org/10.1029/2004GB002342, 2005. a

Garcia, H., Weathers, K., Paver, C., Smolyar, I., Boyer, T., Locarnini, R., Zweng, M., Mishonov, A., Baranova, O., Seidov, D., and Reagan, J.: World Ocean Atlas 2018, Volume 3: Dissolved Oxygen, Apparent Oxygen Utilization, and Dissolved Oxygen Saturation, edited by: Mishonov, A., NOAA Atlas NESDIS, 83, 38 pp., https://www.ncei.noaa.gov/access/world-ocean-atlas-2018/ (last access: 5 May 2022), 2018a. a, b, c

Garcia, H., Weathers, K., Paver, C., Smolyar, I., Boyer, T., Locarnini, R., Zweng, M., Mishonov, A., Baranova, O., Seidov, D., and Reagan, J.: World Ocean Atlas 2018. Vol. 4: Dissolved Inorganic Nutrients (phosphate, nitrate and nitrate+nitrite, silicate), edited by: Mishonov, A., NOAA Atlas NESDIS 84, 35 pp., https://www.ncei.noaa.gov/access/world-ocean-atlas-2018/ (last access: 5 May 2022), 2018b. a, b, c

Griewank, A. and Walther, A.: Evaluating derivatives. Principles and techniques of algorithmic differentiation., Society for Industrial and Applied Mathematics, 2nd edn., https://doi.org/10.1137/1.9780898717761, 2008. a

Hansen, N.: The CMA Evolution Strategy: A Tutorial, arXiv preprint arXiv:1604.00772, http://arxiv.org/abs/1604.00772 (last access: 1 February 2021), 2016. a, b, c, d

Hartley, H. O.: The Modified Gauss-Newton Method for the Fitting of Non-Linear Regression Functions by Least Squares, Technometrics, 3, 269–280, 1961. a

Henson, S. A., Sarmiento, J. L., Dunne, J. P., Bopp, L., Lima, I., Doney, S. C., John, J., and Beaulieu, C.: Detection of anthropogenic climate change in satellite records of ocean chlorophyll and productivity, Biogeosciences, 7, 621–640, https://doi.org/10.5194/bg-7-621-2010, 2010. a, b

Khatiwala, S.: A computational framework for simulation of biogeochemical tracers in the ocean, Global Biogeochem. Cycles, 21, 1–14, https://doi.org/10.1029/2007GB002923, 2007. a

Khatiwala, S.: samarkhatiwala/tmm: Version 2.0 of the Transport Matrix Method software (v2.0), Zenodo [code], https://doi.org/10.5281/zenodo.1246300, 2018. a, b

Khatiwala, S.: MITgcm 2.8deg Transport Matrix configuration, Zenodo [data set], https://doi.org/10.5281/zenodo.5517238, 2021. a

Khatiwala, S., Visbeck, M., and Cane, M. A.: Accelerated simulation of passive tracers in ocean circulation models, Ocean Model., 9, 51–69, https://doi.org/10.1016/j.ocemod.2004.04.002, 2005. a, b

Khatiwala, S., Primeau, F., and Hall, T.: Reconstruction of the history of
anthropogenic CO_{2} concentrations in the ocean, Nature, 462, 346–349,
https://doi.org/10.1038/nature08526, 2009. a

Khatiwala, S., Primeau, F., and Holzer, M.: Ventilation of the deep ocean constrained with tracer observations and implications for radiocarbon estimates of ideal mean age, Earth Planet. Sc. Lett., 325–326, 116–125, https://doi.org/10.1016/j.epsl.2012.01.038, 2012. a

Khatiwala, S., Palmieri, J., Yool, A., Oliver, S., and Martin, A.: The Transport Matrix Method interface to the MEDUSA 2.0 global ocean biogeochemical model, in preparation, 2022.

Kidston, M., Matear, R., and Baird, M. E.: Phytoplankton growth in the Australian sector of the Southern Ocean, examined by optimising ecosystem model parameters, J. Marine Syst., 128, 123–137, https://doi.org/10.1016/j.jmarsys.2013.04.011, 2013. a

Kriest, I.: Calibration of a simple and a complex model of global marine biogeochemistry, Biogeosciences, 14, 4965–4984, https://doi.org/10.5194/bg-14-4965-2017, 2017. a

Kriest, I. and Oschlies, A.: Swept under the carpet: organic matter burial decreases global ocean biogeochemical model sensitivity to remineralization length scale, Biogeosciences, 10, 8401–8422, https://doi.org/10.5194/bg-10-8401-2013, 2013. a

Kriest, I. and Oschlies, A.: MOPS-1.0: towards a model for the regulation of the global oceanic nitrogen budget by marine biogeochemical processes, Geosci. Model Dev., 8, 2929–2957, https://doi.org/10.5194/gmd-8-2929-2015, 2015. a, b

Kriest, I., Khatiwala, S., and Oschlies, A.: Towards an assessment of simple global marine biogeochemical models of different complexity, Prog. Oceanogr., 86, 337–360, https://doi.org/10.1016/j.pocean.2010.05.002, 2010. a

Kriest, I., Oschlies, A., and Khatiwala, S.: Sensitivity analysis of simple global marine biogeochemical models, Global Biogeochem. Cycles, 26, 1–15, https://doi.org/10.1029/2011GB004072, 2012. a

Kriest, I., Sauerland, V., Khatiwala, S., Srivastav, A., and Oschlies, A.: Calibrating a global three-dimensional biogeochemical ocean model (MOPS-1.0), Geosci. Model Dev., 10, 127–154, https://doi.org/10.5194/gmd-10-127-2017, 2017. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p

Kriest, I., Kähler, P., Koeve, W., Kvale, K., Sauerland, V., and Oschlies, A.: One size fits all? Calibrating an ocean biogeochemistry model for different circulations, Biogeosciences, 17, 3057–3082, https://doi.org/10.5194/bg-17-3057-2020, 2020. a, b

Kwon, E. Y. and Primeau, F.: Optimization and sensitivity study of a biogeochemistry ocean model using an implicit solver and in situ phosphate data, Global Biogeochem. Cycles, 20, 1–13, https://doi.org/10.1029/2005GB002631, 2006. a, b

Kwon, E. Y. and Primeau, F.: Optimization and sensitivity of a global biogeochemistry ocean model using combined in situ DIC, alkalinity, and phosphate data, J. Geophys. Res.-Oceans, 113, 1–23, https://doi.org/10.1029/2007JC004520, 2008. a

Li, X. and Primeau, F. W.: A fast Newton-Krylov solver for seasonally varying global ocean biogeochemistry models, Ocean Model., 23, 13–20, https://doi.org/10.1016/j.ocemod.2008.03.001, 2008. a

Marshall, J., Adcroft, A., Hill, C., Perelman, L., and Heisey, C.: A finite-volume, incompressible navier stokes model for studies of the ocean on parallel computers, J. Geophys. Res., 102, 5753–5766, https://doi.org/10.1029/96JC02775, 1997. a

Martin, J. H., Knauer, G. A., Karl, D. M., and Broenkow, W. W.: VERTEX: carbon cycling in the northeast Pacific, Deep-Sea Res. Pt. I, 34, 267–285, https://doi.org/10.1016/0198-0149(87)90086-0, 1987. a

Melbourne-Thomas, J., Wotherspoon, S., Corney, S., Molina-Balari, E., Marini, O., and Constable, A.: Optimal control and system limitation in a Southern Ocean ecosystem model, Deep-Sea Res. Pt. II, 114, 64–73, https://doi.org/10.1016/j.dsr2.2013.02.017, 2015. a

Niemeyer, D., Kriest, I., and Oschlies, A.: The effect of marine aggregate parameterisations on nutrients and oxygen minimum zones in a global biogeochemical model, Biogeosciences, 16, 3095–3111, https://doi.org/10.5194/bg-16-3095-2019, 2019. a, b

Nocedal, J. and Wright, S. J. (Eds.): Numerical Optimization, Springer, 2nd edn., ISBN : 978-0-387-22742-9, 2006. a

Oliver, S. and Tett, S.: OPTCLIMSO Optimisation Framework (Version 1), Zenodo [code], https://doi.org/10.5281/zenodo.5517610, 2021. a

Oliver, S., Cartis, C., Kriest, I., Tett, S., and Khatiwala, S.: Code and data archive to accompany “A derivative-free optimisation method for global ocean biogeochemical models”, Oliver et al. 2021 (Version 2), Zenodo [data set], https://doi.org/10.5281/zenodo.5517626, 2021. a

Richards, A.: University of Oxford Advanced Research Computing, Zenodo, https://doi.org/10.5281/zenodo.22558, 2015. a

Sauerland, V., Löptien, U., Leonhard, C., Oschlies, A., and Srivastav, A.: Error assessment of biogeochemical models by lower bound methods (NOMMA-1.0), Geosci. Model Dev., 11, 1181–1198, https://doi.org/10.5194/gmd-11-1181-2018, 2018. a

Sauerland, V., Kriest, I., Oschlies, A., and Srivastav, A.: Multiobjective Calibration of a Global Biogeochemical Ocean Model Against Nutrients , Oxygen, and Oxygen Minimum Zones, J. Adv. Model. Earth Sy., 11, 1285–1308, https://doi.org/10.1029/2018MS001510, 2019. a, b

Spitz, Y. H., Moisan, J. R., Abbott, M. R., and Richman, J. G.: Data assimilation and a pelagic ecosystem model: Parameterization using time series observations, J. Marine Syst., 16, 51–68, https://doi.org/10.1016/S0924-7963(97)00099-7, 1998. a

Ward, B. A., Friedrichs, M. A. M., Anderson, T. R., and Oschlies, A.: Parameter optimisation techniques and the problem of underdetermination in marine biogeochemical models, J. Marine Syst., 81, 34–43, https://doi.org/10.1016/j.jmarsys.2009.12.005, 2010. a

Weber, T., Cram, J. A., Leung, S. W., DeVries, T., and Deutsch, C.: Deep ocean nutrients imply large latitudinal variation in particle transfer efficiency, P. Natl. Acad. Sci., 113, 8606–8611, https://doi.org/10.1073/pnas.1604414113, 2016. a, b

Wunsch, C. and Heimbach, P.: How long to oceanic tracer and proxy equilibrium?, Quaternary Sci. Rev., 27, 637–651, https://doi.org/10.1016/j.quascirev.2008.01.006, 2008. a

Xiao, Y. and Friedrichs, M. A. M.: The assimilation of satellite-derived data into a one-dimensional lower trophic level marine ecosystemmodel, J. Geophys. Res.-Oceans, 119, 2691–2712, https://doi.org/10.1002/2013JC009433, 2014. a

Yool, A., Popova, E. E., and Anderson, T. R.: Medusa-1.0: a new intermediate complexity plankton ecosystem model for the global domain, Geosci. Model Dev., 4, 381–417, https://doi.org/10.5194/gmd-4-381-2011, 2011. a

Yool, A., Popova, E. E., and Anderson, T. R.: MEDUSA-2.0: an intermediate complexity biogeochemical model of the marine carbon cycle for climate change and ocean acidification studies, Geosci. Model Dev., 6, 1767–1811, https://doi.org/10.5194/gmd-6-1767-2013, 2013. a

Zhao, L., Wei, H., Xu, Y., and Feng, S.: An adjoint data assimilation approach for estimating parameters in a three-dimensional ecosystem model, Ecol. Model., 186, 235–250, https://doi.org/10.1016/j.ecolmodel.2005.01.017, 2005. a

- Abstract
- Introduction
- Methodology
- Results
- Discussion
- Summary
- Appendix A: CMA-ES algorithm description
- Appendix B: DFO-LS algorithm description
- Appendix C: Optimisation results tables
- Code and data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References

- Abstract
- Introduction
- Methodology
- Results
- Discussion
- Summary
- Appendix A: CMA-ES algorithm description
- Appendix B: DFO-LS algorithm description
- Appendix C: Optimisation results tables
- Code and data availability
- Author contributions
- Competing interests
- Disclaimer
- Acknowledgements
- Financial support
- Review statement
- References