A new test statistic for climate model evaluation has been developed that potentially mitigates some of the limitations that exist for observing and representing field and space dependencies of climate phenomena. Traditionally such dependencies have been ignored when climate models have been evaluated against observational data, which makes it difficult to assess whether any given model is simulating observed climate for the right reasons. The new statistic uses Gaussian Markov random fields for estimating field and space dependencies within a first-order grid point neighborhood structure. We illustrate the ability of Gaussian Markov random fields to represent empirical estimates of field and space covariances using “witch hat” graphs. We further use the new statistic to evaluate the tropical response of a climate model (CAM3.1) to changes in two parameters important to its representation of cloud and precipitation physics. Overall, the inclusion of dependency information did not alter significantly the recognition of those regions of parameter space that best approximated observations. However, there were some qualitative differences in the shape of the response surface that suggest how such a measure could affect estimates of model uncertainty.

Climate scientists are interested in developing new metrics for assessing how
well climate simulations reproduce observed climate for purposes of comparing
models, driving model development, and evaluating model prediction
uncertainties

A test statistic is a metric that includes information about the significance of modeling errors.

for determining likelihood measures of different model configurations. A level of skepticism exists within the climate assessment community concerning the sufficiency of any one metric to judge a climate model's scientific credibility. Climate phenomena involve interactions of multiple fields (observables) on a wide range of timescales and space scales from minutes to decades (and longer) and from meters to planetary scales. Thus there are plenty of challenges that exist for synthesizing the many ways that a climate model can be tested against observational data.The most common approach to climate model evaluation among climate scientists
is to display maps of long-term means of well-known fields (e.g.,
temperature, sea-level pressure, precipitation) whose distribution is
familiar and well understood in order to identify sources of model error.
Taylor metrics that are often generated as part of model evaluation are based
on spatial means of squared grid point errors for individual fields

Here we present a new test statistic based on Gaussian Markov random fields
(GMRFs) that addresses some of the challenges that currently exist for
estimating the significance of modeling errors across multiple fields that
takes into account field and space dependencies that exist within
observations. Perhaps one of the under-recognized challenges in this regard
is the limited number of observations available to quantify dependencies.
Data assimilation is commonly used to fill in gaps in the observational
record

The present application of GMRFs operates on long-term means. While it may be
possible to extend GMRFs to capture time dependencies

The sections of this paper explain, test, and provide examples of how various
components of GMRFs work. Section

A Gaussian Markov random field (GMRF) is a special case of a multivariate
normal distribution. The density of a normal random vector

The GMRF-based expression that we have developed for quantifying the significance of differences between model output and observations is

The precision operator of a GMRF

reflects the kind of spatial dependency we assume our data has, and

yields a legitimate covariance matrix,

Graphical representation of a

Consider

Neighbors of

A problem arises in that one of the eigenvalues of the

The generalization of

Consider

In this last expression, one can see that the inverse of

“Witch hat” graphs for air temperature on a

GMRFs provide a way to approximate field and space dependencies contained in
the inverse covariance matrix

Correlation matrix between four fields from CAM 3.1.

Three versions of the GMRF-based cost as a function of two CAM3.1
parameters

In the effort to compare space and field dependencies approximated by GMRF
with empirical estimates we need to determine an optimal value for

It may not be so obvious what the diagonal elements of

Figure

Different field contributions to the GMRF-based costs for a slice of
Fig.

To illustrate any differences that may exist between empirical estimates of
the covariance matrix

Figure

In this section we show how inclusion of field and space dependencies using
GMRF affects comparisons of the Community Atmosphere Model (CAM3.1)

The observational data that are used to evaluate the model come from a
ECMWF-ERA interim reanalysis product

A total of 64 experiments were completed, varying each of the two parameters
within an

The correlation matrix,

The primary field correlations are the values of (

Figure

We have developed a new test statistic as a scalar measure of model skill or
cost for evaluating the extent to which climate model output captures
observed field and space relationships using Gaussian Markov random fields
(GMRFs). The challenge has been that few observations exist for establishing
a meaningful observational basis for quantifying field and space
relationships of climate phenomena. Much of the data that are typically used
for model evaluation are suspected of having their own relationship biases
introduced by the numerical model that is used to synthesize measurements
into gridded products. The GMRF-based metric overcomes some of these
limitations by considering field and space variations within a neighborhood
structure, thereby lowering the metric's data requirements. The form of the
metric separates space and field dependencies using a Kronecker product that,
when multiplied out, has all the terms necessary to represent how different
points in space are tied together across multiple fields. We also include a
scalar

R code and data for generating Figs.

This material is based upon work supported by the US Department of Energy Office of Science, Biological and Environmental Research Regional & Global Climate Modeling Program under award numbers DE-SC0006985 and DE-SC0010843. Alvaro Nosedal-Sanchez was partially supported by the National Council of Science and Technology of Mexico (CONACYT). Edited by: P. Ullrich Reviewed by: two anonymous referees