Biogeochemical models, capturing the major feedbacks of the pelagic ecosystem
of the world ocean, are today often embedded into Earth system models which
are increasingly used for decision making regarding climate policies. These
models contain poorly constrained parameters (e.g., maximum phytoplankton
growth rate), which are typically adjusted until the model shows reasonable
behavior. Systematic approaches determine these parameters by minimizing the
misfit between the model and observational data. In most common model
approaches, however, the underlying functions mimicking the biogeochemical
processes are nonlinear and non-convex. Thus, systematic optimization
algorithms are likely to get trapped in local minima and might lead to
non-optimal results. To judge the quality of an obtained parameter estimate,
we propose determining a preferably large lower bound for the global
optimum that is relatively easy to obtain and that will help to assess the
quality of an optimum, generated by an optimization algorithm. Due to the
unavoidable noise component in all observations, such a lower bound is
typically larger than zero. We suggest deriving such lower bounds based on
typical properties of biogeochemical models (e.g., a limited number of
extremes and a bounded time derivative). We illustrate the applicability of
the method with two real-world examples. The first example uses real-world
observations of the Baltic Sea in a box model setup. The second example
considers a three-dimensional coupled ocean circulation model in combination with
satellite chlorophyll

Earth system models are widely used to assess the consequences of climate
change and explore climate engineering options

In contrast to ocean physics, which is derived from first principles, current
biogeochemical modules are based on empirical relationships. Thus, several
studies compare models of different complexities

Biogeochemical processes are nonlinear, non-convex, and complexly entangled.
Therefore, as stressed by several foregoing studies, associated model–data misfit
measures comprise an unknown number of local optima and the results of an optimization provide no proof
whether an obtained parameter set is globally optimal or
not

Still, it is crucial to find a global optimum to assess the quality of a
certain model formulation. Lacking a proof on the global optimality of
chosen parameters, it is difficult to determine whether a model–data misfit is
mainly caused by the parameter choice or attributed to other sources of
uncertainty, like those concerning model equations or observational
data

The following section focuses on some typical properties of biogeochemical
models which lead to the relaxed problems described above. The choice of the
respective model properties is also based on the fact that efficient tailored
algorithms for solving the associated relaxed problems are readily available.
In Sect.

Comparing model output to observational data requires a criterion to measure
the misfit between both data sets. To apply an automated optimization
algorithm, such a measure needs to be reduced to a single real number. We
provide commonly used measures in the
following subsection. In Sect.

A quality assessment of biogeochemical models usually compares available
observational data

For the sake of simplicity, we will consider scalar data in the following,
assuming that both

Objective judgment about the differences between observational data and model
output requires an associated measure

There are several possible measures for the model–data misfit that have been
used to evaluate biogeochemical models

As mentioned above, we consider scalar observations

In order to find such a lower bound

Pick some properties that the model comprises for all parameters

Solve the optimization problem detached from the parametric model.
Precisely, we minimize the sum of squared errors over all functions

The procedure yields a lower bound for the original optimization problem as the set of possible solutions is larger for the relaxed problem and contains the original model output.

We start with an example that is not directly related to biogeochemical
models but which serves as a basis for the approaches in
Sect.

The optimization problem (

When simulating periodic systems, the model might (intentionally or un-intentionally) not resolve all frequencies that occur in the corresponding observational data. Models that resolve low frequencies with respect to data frequency (e.g., NPZD models that aim to capture the main characteristics of an annual cycle) take a correspondingly limited number of extreme values within a given time interval, e.g., a seasonal cycle. This situation is sketched in Fig. 1.

Synthetic data (blue dots) and corresponding output (blue crosses)
of a periodic model function (blue curve). As the model frequencies are low,
both the model function and its samples take only two local extremes. The
segments before, after, and between the extremes (times

The fact that each segment between two subsequent extreme values is
monotonically increasing/decreasing allows us to apply the methods introduced
in Sect.

The change rates of biogeochemical processes like growth and decay have
natural limits. In the presence of noise, observational data are very likely
to exhibit higher variations than a model that is devoted to comparatively
slow interactions. In other words, noise (or unresolved periodic processes
with high frequencies and high amplitudes) cannot be well approximated by
models that mimic processes of lower variation, i.e., models with small
changes in a given time step. These processes are characterized by a small
absolute derivative. If we are able to postulate general bounds on the
derivatives of a parametric model function

General bounds on the first time derivative (steepness) of

It is also possible to add linear constraints to the QP which consider bounds
on higher-order derivatives of

Clearly, we can combine model properties into a joint QP, e.g., if the model
has two local extremes within a window of interest and bounded
steepness. We can apply the combination of Eqs. (

Similar to the approach in Sect.

We first aim to examine the extent to which the minimum model–data misfit of a parameterized model can deviate from the corresponding minimum misfit of a proposed non-parametric relaxation. Clearly, the difference between both misfits also depends on the characteristics of the observational data, that is, noise level and data density. We therefore derive statistics about that dependency using synthetic observations.

We generate the synthetic observations by adding white noise to

A cubic polynomial, synthetic observational data generated by adding
white noise to

The related RMSE between the synthetic data and this
piece-wise monotonic fit is

For our statistics about the proposed error assessment methods we are
interested in the ratio

The synthetic observational data of Fig.

Here, for both relaxations we assume a maximum model steepness of

To derive robust statistics, we repeat the experiment

Ratios (times 100) between the misfit of the parametric model (cubic
polynomial) to synthetic observations (the model output plus white noise) and
the misfit of the corresponding non-parametric regression model. We state the
ratios for different noise levels

The approach to calculate lower bounds on the model–data misfit by using
property-based model relaxations stems from the intuition that the overall
shape of the optimized parametric model and that of the non-parametric
relaxation should be similar if the relaxation describes the main properties
of the original model well. The amount of similarity is reflected by the
ratios stated in Table

We observe that the data must be rather dense in order to reach good error
ratios, especially with low levels of noise. This dependence is plausible
because small numbers of observations and low levels of noise cause
small difference quotients

For example, consider a target ratio of 85 % to be reached
for all

Having evidence that the lower bounds on the model–data misfit become tight
with sufficiently dense observations, we want to countercheck if an optimized
parametric model that slightly differs from the actual process behind the
observational data has a significantly worse model–data misfit in comparison
with its non-parametric relaxation. This time, we generate

Synthetic data obtained by adding noise to the function

With regard to the data density, the ratio

Synthetic observations as in Fig.

We repeat the experiment with noise levels of

Ratios (times 100) between the misfit of the parametric model to
synthetic observations (the model output plus white noise) and the misfit of
the corresponding non-parametric regression model. The ratios are given for
different noise levels

The experiments help to identify conditions under which we may distinguish
the “truth” from “distortions of the truth”. Sufficient conditions
are given if the misfit ratio for the true parametric model, say

We now consider two real-world examples with the aim of fitting chlorophyll

Our first example considers observations from the Bornholm Basin in the
Baltic Sea at 55.15

We fit a NPZD box model to the data. It is based on a model
of

In a next step, we assess our result by
examining the lower bounds. Following the procedure outlined at the end
of Sect.

Bornholm seasonally adjusted observation time series of
phytoplankton, and data fits using the considered NPZD model using the
parameters which were adjusted for global model configuration (red) and
optimized parameters for the local model version (blue). The (black)
reference plot is the minimum error data fit with regard to the properties
that no more than two extremes are taken and the steepness is at most

Some indication of an even smaller gap between the attained model–data misfit
and the globally optimal misfit of the NPZD model is given by the following
additional step. The RMSE error of the calibrated NPZD model is the empirical
standard deviation

In a second example, we consider global observations of maritime
chlorophyll

Observed and simulated annual mean chlorophyll

The observational data are quite rugged for larger regions of the ocean while the simulations are comparably smooth everywhere, due to the resolution of the model. Therefore, we can expect positive lower bounds on the model–data misfit.

The RMSE model–data misfit is

We repeated our experiment restricted to the Southern Ocean (below

Our aim is to complement research on the calibration of biogeochemical models by calculating lower bounds on their best-attainable model–data misfit. We utilize two general model properties for our purpose; a limited number of extremes and a bounded model steepness. We also consider the combination of both properties. The reason to consider such non-parametric model properties is that they yield efficiently solvable (relaxed) optimization problems whereas optimizing the original parametric model is computationally demanding.

In our experiments (Sect.

Our second constraint, a limited number of extremes, is generally relatively
easy to determine for common, rather smooth biogeochemical models. An
applicable number of extremes can be determined if a regression with more
extremes only barely reduces the misfit any further. But here one should also keep
the model structure in mind. Simple models can be limited in reproducing
specific shapes of the seasonal cycle. Based on the model structure, we
assumed only two extremes for our NPZD real-world example in
Sect.

Unsurprisingly, the combination of tight steepness bounds with a limited
number of extremes yields even better lower bounds on the minimum-attainable
model–data misfit than both properties separately. Finally, all our model
relaxations require a rather large number of observations (per chunk) in
order to yield convincingly tight bounds (see Table

Our contribution considers the root mean square error (RMSE) as an objective
measure of the model–data misfit because it eases the task of formulating
certain model properties in terms of convex optimization problems and to
resort to corresponding tailor-made efficient algorithms. However, the
suggested model properties also allow us to deduce
optimization problems which are efficiently solvable if other misfit measures are used. For example, the sum
absolute error can be dealt with in terms of linear programs (LPs) by
including auxiliary variables and auxiliary linear constraints to express
absolute values. Also, the efficient methods of

Concerning the number of local extremes, our proof-of-concept experiments are
restricted to a maximum of two (four) extremes according to the properties of
the respective parametric models. However, solutions can even be calculated
efficiently if the model output is assumed to take a large maximum number of
extremes

Contrary to the fact that a small gap between the misfit of some property-based model relaxation and the misfit of the optimized original model proves that further parameter calibration is not required, a large gap between both misfits does not necessarily mean that the calibration of the chosen model is bad, nor does it mean that the model is an incorrect representation of the processes of interest. Our experiments indicate that a large gap only then tends to prove the inadequacy of a model (calibration) if enough observations are available. Otherwise, the chosen property-based relaxations might fit observations too well.

On the other hand, a small gap between the optimal misfit of a property-based
non-parametric relaxation and the misfit of the original parametric model can
even be reached with an inappropriate parametric model structure if there is
too much noise in the data. The experiments in
Sect.

We presented an approach for proving that a parametric model is well calibrated, i.e., that changes of its free parameters can no longer lead to a much better model–data misfit. The intention is motivated by the fact that calibrating global biogeochemical ocean models is important but computationally expensive.

Generally, the aim is to determine an optimal parameter set such that a predefined metric of the model–data misfit is minimal. To keep the number of required expensive model simulations as small as possible, we suggest calculating “tight” lower bounds on the lowest achievable model–data misfit. Our objective is to utilize properties of the original model that are satisfied for all permitted parameters and lead to easily solvable optimization problems. Here, we focus on two such model properties to derive our lower bounds on the model–data misfit; a maximum time derivative and a maximum number of extremes per time unit.

Indeed, our experiments show that the achieved bounds can come quite close to
the optimized misfit of the original model if many observations are
available. However, a problem with global observational data (e.g., World
Ocean Atlas data) is that it is often sparse in time. For example, if we
examine annual cycles of periodic processes with monthly observations, our
lower bound approach will only succeed if we overlay (seasonally adjust)
measurement data of several years in order to reach the required
data coverage. Long-term time series from observing platforms like
BATS

Assuming the error between model output and observations to be Gaussian
distributed noise, an obtained lower bound on the RMSE is
also a lower bound on the empirical standard derivation

Optimize the model parameters with regard to the corresponding model–data misfit.

Calculate lower error bounds on the model–data misfit by using appropriate assumptions about the model properties.

Accept if the ratio

Implementations of the applied methods are
available on GitHub (

We explicitly state the free parameters and equations of the NPZD box
model that has been studied in

Parameters of the considered NPZD model with their physical units, allowed ranges, and optimized values.

Here, the hyperbolic MM equations

The authors declare that they have no conflict of interest.

This work is a contribution to the the project “Reduced Complexity Models” (supported by the Helmholtz Association of German Research Centres (HGF)), the DFG-supported project SFB754 and the DFG Cluster of Excellence “The Future Ocean”. The authors received very helpful comments from two anonymous reviewers. We are grateful to Heiner Dietze for technical assistance.Edited by: Christoph Müller Reviewed by: three anonymous referees