Description of algorithms for co-locating and comparing gridded model data with remote-sensing observations

MACC-II,III, Monitoring Atmospheric Composition and Climate, is the current pre-operational Copernicus Atmosphere Monitoring Service (CAMS). It provides data records on atmospheric composition for recent years, present conditions and forecasts (a few days ahead). To support the quality assessment of the CAMS products, the EU FP7 project Network Of ground-based Remote-Sensing Observations (NORS) created a server to validate the gridded MACC-II,III/CAMS model data against remote-sensing observations from the Network for the Detection of Atmospheric Composition Change (NDACC), for a selected set of target species and pilot stations. This paper describes in detail the algorithms used in this validation server. Amongst others, the algorithms take into account the horizontal displacement of the measured profiles from the location of the instrument, the vertical averaging and uncertainty propagation.


Introduction and notations
MACC-III, Monitoring Atmospheric Composition and Climate (http://copernicus-atmosphere.eu), is the current preoperational Copernicus Atmosphere Monitoring Service (CAMS).It combines state-of-the-art atmospheric modeling with Earth observation data to provide information services covering European air quality, global atmospheric composition, climate forcing, the ozone layer and UV and solar energy, and emissions and surface fluxes.The EU FP7 R&D NORS project (Demonstration Network Of groundbased Remote-Sensing Observations in support of the Copernicus Atmospheric Service, http://nors.aeronomie.be) was set up to support the development and generation of fit-for-purpose CAMS data products and services by providing quality information based on validation results.In NORS the validation will be carried out using NORS data products which are essentially ground-based remote-sensing data from the Network for the Detection of Atmospheric Composition Change (NDACC; see http://www.ndacc.org),optimized for the needs of the CAMS validation.
The validation processes carried out in NORS were created to be compliant with best practices as defined by the international community: all validation results include traceability information (see the Global Earth Observation System of Systems (GEOSS), Quality Assurance for Earth Observation, QA4EO, 2010, and follow the validation road map for Copernicus atmospheric data and services formalized in the MACC validation protocol, Lambert, 2010;Huijnen and Eskes, 2012).The validation support by NORS is delivered as a web-based application that generates default validation reports in an operational and automatic way, but which can also be used for the generation of dedicated user-driven validation reports on demand (http://nors-server.aeronomie.be).
NORS is a demonstration project: it focuses on a limited number of target data products from a limited number of pilot NDACC stations representative of four major atmospheric regimes (Table 1).
The validation service is built such that it is easily expandable to a larger number of stations and instruments and to additional CAMS data products for which NDACC can provide independent reference data.The MACC-II,III validation subproject (VAL) has a focus on reactive gases and aerosol composition on a global scale, which coincides partially with the NORS/NDACC target species.Duplication with existing validation tasks in MACC is not an issue because MACC-Published by Copernicus Publications on behalf of the European Geosciences Union.VAL produces 3 monthly validation reports for the nearreal-time services of MACC-II,III, and 6 monthly validation reports (updates) for the reanalysis services of MACC-II,III (http://www.copernicus-atmosphere.eu/services/aqac/ global_verification/validation_reports).The results of the NORS validation server have been used in the most recent VAL reports and in the latest reanalysis report.
The paper is organized as follows: in Sect. 2 a brief description of the measurement and model data is presented.For the measurements, the basic GEOMS (Generic Earth Observation Metadata Standard, Retscher et al., 2011) variables and their notations are introduced.For the model data, the algorithms to set up a vertical height grid and to do unit conversions are outlined.Section 3 contains the essential steps for comparing a single measurement with a model output.It describes the re-gridding, temporal and spatial co-location and smoothing algorithms for model vertical profile data.Section 4 contains further details on how the validation results are presented in their final output, e.g., time averaging of column data or uncertainty propagation.
Regarding notation conventions, all multiplications are scalar; matrix multiplication is denoted by a central dot •.Most of the notations used in the algorithms can be found in Tables 2, 3 and A1.For example, the profile array O M 3 is linked to the MACC parameter go3 and denotes a target species vertical profile.To distinguish between the MACC and NORS vertical profile, a superscript notation is used, e.g., O M 3 is the MACC model profile and O N 3 is the NORS measurement profile.If it is clear from the context to what class a data array belongs (model or measurement), the superscript is omitted to ease the notation.

NORS data
NORS data files are delivered in rapid delivery mode (not later than 1 month after acquisition) to the NDACC database rapid delivery directory, if not fully in final form, or to the corresponding NDACC database station directory if in its final form both with respect to data versioning (PI reviewed vs. operational) and file temporal coverage.For each measurement technique the NDACC data format has to be compliant with a pre-defined template (see http://avdc.gsfc.nasa.gov).The mapping of the template variable names to the mathematical concepts used throughout this paper, e.g., the target profile, the averaging kernel (AVK), is depicted in Table A1.Table 2 lists the GEOMS variables that all templates have in common (the dimensions are exemplary).All measurement techniques, except LIDAR, report averaging kernels, a priori profiles and uncertainties.
For each measurement technique, site and target species, a specific sensitivity range can be determined.For further details on sensitivity ranges and typical AVK's, see the data user document that was developed within the framework of the NORS project (De Mazière et al., 2013).

MACC data
At present, the validation server validates in operational mode the following forecast runs: near-real-time operation suite (NRT o-suite), the experiment running the TM5 3D atmospheric chemistry-transport model and the experiment with MOZART chemistry (for further details see https://www.gmes-atmosphere.eu/oper_info/nrt_info_for_users, http://tm.knmi.nl).The model data are generated regularly on specific output times t o (every 12 h for NRT o-suite and every 24 h for the other two models) and each output contains 3 hourly forecast data valid in the future of the output (see Fig. 1).The experiments are downloaded from the Meteorological Archival and Retrieval System (MARS) archive on their native Integrated Forecast System (IFS) res- olutions 1 on regular Gaussian grids: T255N128 for NRT osuite and T159N80 for the other models.
For each of these experiment versions, the validation server generates a time sequence of MACC model data with a time interval of 3 h between two MACC output times (see Fig. 1) such that the forecast validity times are as close as possible to the corresponding output times.
Table 3 describes the MACC data fields that are used in the validation server.The first column gives the MARS short 1 An IFS resolution is denoted by TxNy: x indicates the spectral truncation and y is the number of latitudes between a pole and the equator (for regular Gaussian grids, the number of longitude bands is 4y).name or the parameter id.The dimension column is only indicative (these values change when considering other model grids.For notational convenience we fix the dimensions to the grid N128).Table 3 shows the notations of the MACC variable as they will appear in the algorithms.
The surface height variable z s [m] is the geopotential height field corresponding to the surface pressure.It is not available in the MARS archive for the MACC experiments under consideration and is downloaded directly from ECMWF (European Centre for Medium-Range Weather Forecasts) for several resolutions available.The server will automatically attach to a given model the geopotential field that matches the native IFS resolution of the model.

Vertical height grid for MACC data
This section contains a detailed description of the calculation of a vertical height grid for MACC data, starting from the vertical pressure coordinate.The algorithm described here calculates directly the height coordinate for the model pressure levels (i.e., the middle of a layer) and not for the pressure interfaces (layer boundaries) of the model as it is described in the IFS documentation.Throughout this document i ϕ denotes the index for the longitude dimension, i λ the latitude and i the vertical dimension.
Algorithm.(Pressure grid) Following the standard ECMWF procedure, the vertical pressure coordinate p is set up in the following way: the surface pressure p s equals p s (i ϕ , i λ ) = e lnsp(i ϕ ,i λ ) , and the pressure on the level boundaries is The pressure p is increasing in the vertical, i.e., p(61, •, •) corresponds to the surface pressure p s and p(1, •, •) is the pressure at the top of the atmosphere.The vertical pressure coordinate on model levels then equals All profile data (for the MARS variables go3, co, ch4, etc., except aergn03) are defined on model pressure levels p m .
To construct a vertical height vector out of the pressure grid, the server calculates the molar mass of humid air.
Algorithm.(Relative humidity) Let M X denote the molar mass of species X.By definition, the value q(i, i ϕ , i λ ) is the fraction where n H 2 O represents the number of molecules of water vapor in n a molecules of air (n a is the number of molecules in the ith layer at height p m (i, i ϕ , i λ )).Similarly, n da denotes the number of particles dry air in n a mol molecules air.Using the fact that n a = n da +n H 2 O and In the next algorithm, we use the fraction M da /M a explicitly: with M da = 28.960g mol −1 , M H 2 O = 18.015 g mol −1 , and R da and R H 2 O the gas constants for dry air and water vapor, respectively.For unit conversion algorithms, the molar mass of humid air M a (g mol −1 ) at the grid point (i, i ϕ , i λ ) is used explicitly: Algorithm.(Height grid) We use the MACC pressure and temperature to construct a height coordinate in the usual way: we start with z s and construct recursively a height for the upper levels using the hydrostatic balance equation gM a dz = −RTd ln(p) (see Andrews, 2010).The latter equation is rewritten using the virtual temperature gM da dz = −RT ν d ln(p), with and recursively for i = 59, . .., 1: ).The validation server requires the knowledge of the thickness of the layers for re-gridding purposes.The heights of these boundaries of the layers do not match the MACC pressure grid p boundaries.Because these boundaries will only be used for re-gridding purposes, we believe that the differences that this approach may imply are negligible.

Algorithm. (MACC boundaries)
The boundary height vector z B at a chosen grid point (i ϕ , i λ ), consists of a lower boundary z B (1, i) and an upper boundary z B (2, i) for the ith layer.The layer boundaries are calculated as the midpoints between layers, i.e., for i = 1, . .., 59 (recall that z denotes the decreasing MACC grid height vector): The outer boundaries are determined from The server checks that the lowest boundary does not become negative and that the upper boundary does not exceed the top of atmosphere z toa = 120 km: if z B (1, 60) < 0 and z(60) ≥ 0, the lowest boundary is set to z B (1, 60) = 0, and if z B (2, 1) > z toa , and z(1) < z toa , the upper boundary is set to z B (2, 1) = z toa .
Remark.The above boundary algorithm only requires the knowledge of the layer heights.Therefore, the same algorithm is used for generating boundaries on a NDACC data product, if the product does not provide ALTI-TUDE.BOUNDARIES.
Remark.In the validation server, the above algorithms, although described to be calculated on the full MACC grid, are actually performed on MACC data that are already horizontally interpolated to the site location (see Sect. 3.3).This reduces significantly the required computation time.

Unit conversions
The validation server will align the MACC data with the measurement data.This means that all model profile data are re-gridded to the measurement's vertical grid and converted to measurement's units.Typically the MACC profile data are given in mass mixing ratio (MMR, kg kg −1 ), and the measurement data in volume mixing ratio (VMR, ppv) or number density (ND, mol m −3 ).
Algorithm.(Unit conversions) As an example, assume an O 3 profile is given in MMR.To convert it to VMR, the profile is multiplied with the factor M a /M O 3 .To convert VMR to ND: The partial column profile is derived from the ND profile, using the layer thickness z = z B (2, . ..) − z B (1, . ..): The above formula with the layer thickness is also used to scale a profile of optical depths to an optical thickness profile.

Re-gridding of profile data
Re-gridding of profile data are done with conservation of mass in mind (or total optical depth, in the case of aerosol data).As an example, assume that a target profile (in partial column units for a concentration profile or optical depth for an aerosol extinction profile) is defined on a vertical height grid z S with boundaries z S B (the superscript S is used to identify this grid as the source grid).To re-grid the profile to a new grid with layer heights z E and boundaries z E B (E is external), we construct a transformation matrix D that contains the fractions of how each external grid layer is covered by a source grid layer.The coefficients of the transformation matrix D satisfy 0 ≤ D(i, j ) ≤ 1 where i runs over the dimension of z E and j runs over the dimension of the source grid z S .A situation is depicted in Fig. 2, where the external grid is coarser than the source grid (i.e., external layers overlap multiple source layers).
Algorithm.(Layer height weighted re-gridding) Assume the ith coarse grid layer overlaps with the j th source grid layer.Then the element in the ith row of D corresponding to the j th source layer is an interpolation factor: .
If there is no overlap between ith external layer and j th source layer, the corresponding element in D(i, j ) equals 0. For the situation in Fig. 2, the ith row will take the following form (source layers (j + 1) and (j − 3) contain the

B. Langerock et al.: Description of algorithms for comparing gridded model data
height boundaries for the ith layer of z E layer ( j + 1) layer j layer ( j − 1) layer ( j − 3) Interpolation factors for grid layers: green source grid layers are only partially overlapped by the ith destination layer.
.5 The dark gray layers in Fig. 2 have integer coefficients in D as they are completely covered by the ith row of z E B .The sum of all rows in the transformation matrix is a vector with the dimension of the source grid that contains a coefficient of 1 for every source layer that is completely covered by an external layer.The re-gridded profile is obtained from matrix multiplication: O R 3 = D • O 3 .A final step in the regridding process is to set profile values to void (or NaN) for layers of the external grid that are not or only partly covered by the source grid.

Temporal co-location
Section 2.2 describes the construction of the MACC timely data sequence t M 1 , t M 2 , etc.The time between two subsequent MACC data validity times is constant and denoted by > 0 (typically = 3 h).Around each MACC time t M the server puts a pre-defined time window δ of length 0 < δ ≤ as indicated in Fig. 3 with a gray rectangle.
A NORS measurement at time t N will be used to validate a MACC data instance valid at time t M if |t N − t M | < δ/2.This choice implies that a single MACC data instance at time t M is validated against several measurements.But a single measurement will not validate multiple MACC data instances.If a NDACC measurement cannot be related to a MACC time (for instance, in the situation of t N 4 in Fig. 3), the NDACC measurement will not be used in the validation.
This type of co-location is used for all NDACC products; however, depending on the nature of the species and the availability of measurements, the window δ may differ from one target species to another (see Table 4).For example, due to the high diurnal variability of stratospheric NO 2 , UVVIS.DOAS.ZENITH.NO2 (UVVIS.DOAS stands for UV-visible (MAX)differential optical absorption spectroscopy) measurements have δ = 1 2 h.

Spatial co-location and smoothing
Not all measurement techniques measure the state of the atmosphere directly above the instrument's location, e.g., FTIR measurements measure direct sunlight, and the probed column of air varies with the local measurement time.
A similar situation occurs for UVVIS.DOAS.ZENITH and UVVIS.DOAS.OFFAXIS measurements.The vertical profile that is extracted from the MACC model at the site's location, should take this off location of the measured air mass into account.
The UVVIS template includes the latitude and longitude coordinate for the probed air mass at each height grid point (the latitude and longitude GEOMS variables).For FTIR measurements, the server uses an off-line routine to calculate the latitude and longitude GEOMS variables when not available in the NDACC file.Depending upon the availability of these variables, the server distinguishes two situations.

Latitude and longitude not available: microwave radiometer and LIDAR
In this case a vertical profile is extracted at the site's location by means of a standard bilinear interpolation to get from the MACC latitude-longitude grid to a vertical profile at the instrument's latitude and longitude at all altitudes.The horizontally interpolated profile is re-gridded to the measurement vertical grid (i.e., the external grid is z E B = z N B and the source grid is z S B = z M B ).

Latitude and longitude available: FTIR and UVVIS.DOAS
If the horizontal coordinates of the probed air mass for each measurement layer are available, the co-located model profile is constructed per measurement layer; i.e., the MACC regridded profile value for the ith measurement grid layer z N equals the value for layer i of the horizontally interpolated (towards (ϕ N (i), λ N (i))) and consequently the vertically regridded (towards z N (i)) MACC profile.
The values for the measurement grid that are not (or only partially) covered by the MACC grid are void.

Alignment of re-gridded model data
The re-gridding algorithm described in Sect.3.1 acts on partial column profiles (or optical depth for aerosol).Further unit conversion is required since the measurement's averag- ing kernel typically acts on VMR profiles.To do this unit conversion, we use temperature T N and pressure p N profiles provided along with the measurement data, i.e., the partial column of air for each layer in the measurement grid equals This implies that the validation is based on the model's partial column/optical depth profile and that further manipulation on both model and measurement data are done using equal conversion factors.
Aerosol optical depth profiles require one further manipulation.Typically, the measurement's optical depth profile is measured at a specific wavelength (say λ N = 477 nm) which does not coincide with the wavelength of the model optical depth profile at 532 nm.
Algorithm.(Optical depth wavelength scaling) To convert the model optical depth profile to the measurement's wavelength, we use the Ångström exponent α (Ångström, 1929), calculated from the MACC array of optical depths aergn03 (use the pair of total OD's at 532 nm and at the wavelength closest to the measurement λ N = 477 nm, in this case 469 nm): Under the assumption that the Ångström coefficient is height independent and the measurement's wavelength λ N falls within the validity range of the above estimate for α, the re-gridded model optical depth profile is then scaled with the factor λ N 532 −α .

Application of the measurement's averaging kernel
In the following, it is understood that the re-gridded model profile, e.g., O M,R 3 , has been converted to the units of the measurement's averaging kernel.Smoothing of model profile data ensures that the model profile and measurement profile coincide on height levels outside the measurement's sensitivity range (see also below); i.e., there can be no biases between model and measurement outside the sensitivity range.Furthermore, according to the GEOMS template files (FTIR; UVVIS; see Retscher et al., 2011), the reported measurement uncertainties do not include the smoothing uncertainty which makes the smoothing of model data mandatory in order to have a complete uncertainty budget for the validation statistics.Examples of typical AVKs for each measurement technique can be found in the NORS documentation (De Mazière et al., 2013;Richter et al., 2013).and as a final step, after applying the above smoothing formula, the values in the resulting smoothed model profile O M,S 3 corresponding to the initial void layers of the re-gridded profile are replaced by NaN.In this way, the AVK is applied to the model target profile, even if the model does not provide data on the entire AVK grid.The above method ensures that these outside layers do not contribute in the smoothing operation.Figures 4 and 5 show an example of the above-described regridding and smoothing algorithms applied to a stratospheric O 3 FTIR profile measurement.
In some cases, the measurement data only provide a retrieved total column (e.g., the UVVIS.DOAS.ZENITH measurements).In that case the above formula is adapted such that the column averaging kernel is used.In this case A N is a transformation (with the shape of a vector) acting on profiles of partial columns, and the result is a scalar total column value: , The resulting smoothed model total column/optical depth is void if the measurement grid outranges the model grid.The model grid height typically reaches 65 km.For the FTIR, LI-DAR and microwave radiometer (MWR) data, the measurements are reported on grids up to 100 km.

Representation of validation results
In order to get statistics on the validation results (see Huijnen and Eskes, 2012), all individual (per measurement) results are brought to a chosen fixed vertical grid.Such a common grid can be a single layer grid to get a partial column or a true vertical grid to calculate a mean (difference) profile.Another possibility implemented by the server, is the re-gridding towards a subgrid of the measurement grid where each partial column overlaps layers whose cumulative sum of the degrees of freedom (the diagonal elements of the AVK) is approximately one.As a general rule, the representation vertical grid is either a chosen fixed grid from a measurement or from the MACC model.This choice is such that the coarsest grid is chosen for representation purposes: for LIDAR, the vertical grid of the measurement may be finer than the model grid and in this case the representation grid is chosen to be a fixed model  is done with the algorithm described earlier (i.e., applied to profile data of partial columns/optical depths).

Uncertainty propagation
NDACC uncertainties can be reported as standard deviation values σ , or covariances S. Uncertainty propagation requires the knowledge of covariance matrices.If either the systematic or random uncertainty matrix contains fill values, the matrix is filled with NaNs.NaN uncertainties are not shown in the reports.The measurement uncertainty covariance matrix S N (random or systematic) is propagated to this chosen representation grid using the transformation matrix D described in Sect.3.1 with z N B as source grid and the representation grid as external grid (matrix multiplication): The covariance matrices in the formula above are in partial column units (e.g., mol m −2 ).To convert a covariance matrix The plot shows the rows of the AVK matrix, color coded according to the height that corresponds to each row.The matrix elements of an AVK are without unit (the depicted AVK acts on O 3 profiles relative to the a priori profile).The black dashed line is the sensitivity (the sum of each row): for rows with zero sensitivity the smoothing formula returns the a priori.
in VMR unit to partial column units, the partial column profile of air a N is used: the (i, j )th component of the covariance in VMR units is multiplied with a N (i)a N (j ).
To transform a covariance matrix from optical thickness units towards optical depth, the same formula is used where a N is replaced by the vector of layer thickness z N .

Sensitivity and partial columns
Depending on the measurement technique, site and species, the measurement sensitivity may differ.Table 4 contains the lower and upper boundaries used by the server as the boundaries for the partial columns and profile data.The lower boundary is site dependent and adapted with the instrument's height.These height boundaries are typical boundaries in which the measurement has sensitivity and are derived by looking at representative AVK matrices (see De Mazière et al., 2013;Richter et al., 2013, for more information on the different sensitivity ranges).
Algorithm.(Smoothing of model data) The smoothed model profile is obtained by the standard smoothing formula on the re-gridded model profile O Mwhere the (possibly) void (NaN) re-gridded model profile values are well taken care of.By substituting a zero for these void layers in the difference profile O M

Figure 4 .
Figure 4. Example of a model O 3 partial column profile (dashed blue) and FTIR measured profile (black).The model profile is first re-gridded to the measurement grid (blue) and smoothed (red) at Jungfraujoch.The lower two horizontal lines show the lowest layer height of the model (lowest) and site (highest).Because a partial column profile depends linearly on the layer thickness of the grid, the right plot shows the layer thickness profile for model and measurement grids.Above 40 km the measurement (black), a priori (gray) and smoothed model (red) profiles coincide because the sensitivity of the measurement decreases (see Fig.5).

Figure 5 .
Figure 5. Example of a O 3 FTIR measurement averaging kernel.The plot shows the rows of the AVK matrix, color coded according to the height that corresponds to each row.The matrix elements of an AVK are without unit (the depicted AVK acts on O 3 profiles relative to the a priori profile).The black dashed line is the sensitivity (the sum of each row): for rows with zero sensitivity the smoothing formula returns the a priori.

Table 2 .
List of GEOMS variable names common for all GEOMS templates (dimensions are indicative for 100 measurements on a grid with 47 layers).

Table 3 .
MACC variables specifications.MMR is mass mixing ratio.The dimensions are indicative for a model with IFS resolution T255N128.

Table 4 .
List of default z min , z max values for NORS target species, based upon the measurement's sensitivity range and upper boundaries of MACC models, and of time windows for temporal co-location, based on species temporal variability.is the time between two subsequent MACC data validity times -see text (target values may contain wildcards if applicable to multiple targets).