Articles | Volume 17, issue 13
https://doi.org/10.5194/gmd-17-5225-2024
https://doi.org/10.5194/gmd-17-5225-2024
Methods for assessment of models
 | 
08 Jul 2024
Methods for assessment of models |  | 08 Jul 2024

Merged Observatory Data Files (MODFs): an integrated observational data product supporting process-oriented investigations and diagnostics

Taneil Uttal, Leslie M. Hartten, Siri Jodha Khalsa, Barbara Casati, Gunilla Svensson, Jonathan Day, Jareth Holt, Elena Akish, Sara Morris, Ewan O'Connor, Roberta Pirazzini, Laura X. Huang, Robert Crawford, Zen Mariani, Øystein Godøy, Johanna A. K. Tjernström, Giri Prakash, Nicki Hickmon, Marion Maturilli, and Christopher J. Cox
Abstract

A large and ever-growing body of geophysical information is measured in campaigns and at specialized observatories as a part of scientific expeditions and experiments. These collections of observed data include many essential climate variables (as defined by the Global Climate Observing System) but are often distinguished by a wide range of additional non-routine measurements that are designed to not only document the state of the environment but also the drivers that contribute to that state. These field data are used not only to further understand environmental processes through observation-based studies but also to provide baseline data to test model performance and to codify understanding to improve predictive capabilities. To address the considerable barriers and difficulty in utilizing these diverse and complex data for observation–model research, the Merged Observatory Data File (MODF) concept has been developed. A MODF combines measurements from multiple instruments into a single file that complies with well-established data format and metadata practices and has been designed to parallel the development of corresponding Merged Model Data Files (MMDFs). Using the MODF and MMDF protocols will facilitate the evolution of model intercomparison projects into model intercomparison and improvement projects by putting observation and model data “on the same page” in a timely manner. The MODF concept was developed especially for weather forecast model studies in the Arctic. The surprisingly complex process of implementing MODFs in that context refined the concept itself. Thus, this article explains the concept of MODFs by providing details on the issues that were revealed and resolved during that first specific implementation. Detailed instructions are provided on how to make MODFs, and this article can be considered a MODF creation manual.

1 Introduction

The Merged Observatory Data File (MODF) concept is based on the simple principle of combining measurements made by multiple co-located instruments from research observatories and campaigns into a single file that complies with already established data stewardship standards. Here, “observatory” refers to a facility that measures an extensive inventory of collocated geophysical variables that have been chosen with the intention of investigating specific, usually interrelated, physical processes in order to answer hypothesis-driven science questions. In this context, an observatory could be a land site or a research vessel. While it is standard scientific operating procedure to co-locate research-grade instruments both continuously at observatories and episodically for field campaigns (often side by side with routine operational station instruments with long operational histories), there are generally no standard procedures for coordinated data management such as those that have been developed for operational data. Thus, the data from separate instruments can be scattered between separate files with different authors, formats, metadata, physical archive locations, and use restrictions.

The specific MODF realization presented here is for Arctic observatories (Uttal et al., 2016; Mariani et al., 2022) and field campaigns that resulted from initiatives established during the Year of Polar Prediction (YOPP; Goessling et al., 2016; Jung et al., 2016; Wilson et al., 2023). One key YOPP activity was the YOPP supersite Model Intercomparison Project (YOPPsiteMIP; Day et al., 2023), which was designed to facilitate the process-based validation of numerical weather prediction (NWP) models at polar locations during special observing periods (SOPs). The concept of MODFs and their forecasting analogs, Merged Model Data Files (MMDFs), was motivated by the YOPPsiteMIP community's desire to have the same variables from observations and models in easy-to-use files of the same structure in order to explore small-scale parameterized processes that are not represented well in the forecast models. These MODFs thus provide an integrated observation database to support model process representation through parameterization improvements for weather forecasts in the polar regions. At the same time, they also facilitate comparative observational studies across Arctic sites.

The MODF concept addresses the problem that research-grade, process-level observations are currently underutilized for model evaluation of parameterization deficiencies. As weather forecasting models increase in complexity and include detailed representations of land, the ocean, ice, and snow in addition to the atmosphere, it is increasingly important to evaluate processes using observations recorded throughout the whole earth system column, including the fluxes at their interfaces, to inform model development. This requires the use of multi-variate process-oriented diagnostic methods which utilize data that span these multiple components to understand model error. An essential component of the MODF concept is that analogous Merged Model Data Files (MMDFs) can be developed with extracted model data from locations near and around the observatory sites. Together, these are defined as Merged Data Files (MDFs). The MDFs, which bring together observations from different earth system components as well as model output in a standard file format, provide the basis for this and will support model intercomparison and improvement projects (MIIPs) as an evolution from model intercomparison projects (MIPs). The idea of MIIPs (a new acronym that we define here) does not imply that previous MIPs did not have productive results. However, through the development of matched MODF and MMDF data sets, it is implied that what have often been intercomparison studies can be smoothly extended to model improvements.

2 Background

Using observations and model outputs symbiotically is an area of ongoing effort and research with recognized challenges (Holtslag, et al., 2013). Many levels of MIPs have been in progress for decades (Stephens et al., 2023). In addition, existing efforts and methodologies facilitate the usage of increasingly heterogeneous data sets in model–observation fusion efforts through assimilation (Gettleman et al., 2022) and for inputs into multivariate artificial-intelligence analyses (Boukabara et al., 2021). There are well-organized systems for managing the data from operational surface networks, upper-air networks, and satellites that are uploaded into the Global Telecommunications System (GTS), which is overseen by the World Meteorological Organization (WMO) (see Global Telecommunications System (GTS), 2023; WMO, 2020). However, GTS data are only readily available directly to national forecasting centers (which presumably have developed institutionally specific reading and ingesting routines) and via products developed by WMO institutional repositories (https://climatedata-catalogue-wmo.org/, last access: 27 June 2024) (Bojinski et al., 2014; Lavergne et al., 2022). The latter have necessarily gone through various quality control, formatting, averaging, and (sometimes) interpolation steps to create globally uniform products. As a result of the standardized processing, it is likely that information on high-resolution, rapid, and extreme events (Sardeshmukh et al., 2015) may have sometimes been lost.

The MODF schema has been specifically developed for managing observatory and campaign research observations as opposed to operational observations. Research observations target local-scale and often rapid or extreme processes that are intended to lead to the discovery of the physics within the atmosphere as well as the physics that governs the coupling processes between the atmosphere and the underlying surface. The surface can be land, ocean, ice, or any of the three, often with obfuscating layers of plant and/or snow cover that are themselves components of the system and separate objects of study. Research-grade data pose many additional challenges regarding data latency, accessibility and uptake issues, and institutional ownership compared to data that have been managed more systematically specifically for operational purposes.

The data science community is aware of these issues and, in response, has developed FAIR (findable, accessible, interoperable, reusable) data principles (Wilkinson et al., 2016) that can be applied to individual data sets. However, Wilkinson et al. (2016) explicitly stated that “These high-level FAIR Guiding Principles precede implementation choices, and do not suggest any specific technology, standard, or implementation–solution; moreover, the principles are not, themselves, a standard or a specification.” Whereas many FAIR solutions are implemented by web services (e.g., Buck et al., 2019), the MODF concept described here can be considered an alternative: an integrated data product that is based on the same metadata conventions that have been developed for web service solutions. The considerations and steps described here for creating MODFs can therefore be considered a particular FAIR implementation–solution specifically for observatory and campaign data.

3 The MODF concept

Figure 1 is a conceptual schematic of the end-to-end process involving data collection, data quality control (QC) and processing, metadata information collection, and data amalgamation into netCDF files (Unidata, 2023) that follow the NetCDF Climate and Forecast (CF) metadata conventions (Eaton et al., 2022, hereafter the “CF conventions”). The process for turning model forecast output into MMDFs is qualitatively similar,1 including the use of a particular set of global and variable attributes, since the specifications were developed by modelers as well as observationalists.

https://gmd.copernicus.org/articles/17/5225/2024/gmd-17-5225-2024-f01

Figure 1Instruments, measurements, data processing, and metadata generation for (a1) the atmospheric state at the surface, (a2) broadband radiation at the surface, (a3) surface turbulent fluxes, (a4) upper-atmospheric profiles, (a5) the terrestrial subsurface, and (a6) ocean and sea ice/snow. This information is collected into netCDF files with specified (b) global attributes and (c) variable attributes that are compliant with FAIR principles.

Download

The different types of geophysical data (A1 through A6) that can typically comprise observatory and campaign research data are described below.

3.1 Surface atmospheric measurements

Surface atmospheric measurements at observatories or during campaigns are made with research-grade thermometers, hygrometers, and anemometers, with measurements often performed at a higher frequency and with more carefully calibrated sensors compared to operational weather stations. Such observations are sometimes performed side by side with operational weather service station measurements, providing context to multi-decadal operational records. Research meteorological measurements are frequently redundant; they are performed at multiple locations across a site (on a scale smaller than an NWP model grid cell) or at different levels on towers to get detailed profiles for the near-surface boundary layer. The variables measured are the temperature, pressure, relative humidity, and eastward and northward components of the wind speed. Both operational and research measurements of these atmospheric state variables can be considered to have well-quantified uncertainties determined by the instrument calibration and tolerances.

Broadband surface radiation is measured by radiometers that measure both incoming and outgoing shortwave and longwave radiation. There are multiple commercial and experimental instrument options for measuring broadband radiation, and in the Arctic, a particularly wide range of methods must be applied to keep glass domes clear of obstructions (e.g., ice, snow, dust, or sea salt), such as heating, ventilation, and manual cleaning (Cox et al., 2021). The Baseline Surface Radiation Network (BSRN) (Ohmura et al., 1998) is a global repository for standardized radiation products that are traceable to the WMO World Radiation Radiometric Reference. However, the BSRN is not designed to accommodate short-term campaign data sets and does not account for different methods for data QC and processing (Matsui et al., 2012; Long and Shi, 2006, 2008).

Surface turbulent flux variables are calculated and inferred by a number of different methods. Eddy-correlation techniques (Kaimal and Finnigan, 1994) are based on measurements by fast-response sonic anemometers and hygrometers with built-in fast-response temperature sensors. It is often necessary to make considerable site-specific adjustments to processing methods, accounting for the local surface roughness, sensor height, and obstructed wind-direction sectors. In the polar regions, sensors operate near the thresholds of instrument ratings for detection and environmental conditions and have to be quality checked for periods of riming. There are specific challenges relating to the cold snow-and-ice-covered surface conditions over land (Grachev et al., 2018) and over the ice pack of the central Arctic Ocean (Andreas et al., 2010; Cox et al., 2023). Eddy-correlation methods are only valid under stringent environmental conditions, which results in frequent data gaps. Alternatively, turbulence variables can be calculated by bulk-aerodynamic methods (Monin and Obukhov, 1954; Mahrt and Sun, 1995), but results are not as meaningful when comparing to models as this becomes a model-to-model rather than a model-to-observation comparison. Commercial packages for calculating latent, sensible, and gas fluxes have been developed and compared (Mauder et al., 2008; Fratini and Mauder, 2014), but caution must be exercised when using the results, and it is important to understand the basis on which variables are derived by proprietary commercial or custom software. The resulting flux variables are the latent and sensible heat fluxes, friction velocity, surface stress (momentum flux), drag coefficient, kinematic temperature scale, Monin–Obukhov stability parameter, and dissipation rate of the turbulent kinetic energy. Given the variety of physical conditions and complex methodologies for calculating turbulent fluxes, an assessment of the interoperability and consistency between flux products from different data collections is still necessary, despite the existence of operational flux data networks such as FLUXNET (Baldocchi et al., 2001) and AmeriFlux (Baldocchi et al., 1996; Boden et al., 2013).

Solid and liquid precipitation are both measured at the surface but are notoriously difficult to characterize accurately. Precipitation accumulation is traditionally measured at frequencies ranging from every hour to every 6 h by gauges within a surface precipitation network that is usually operated by operational weather centers or hydrological agencies. Measuring solid precipitation presents a number of unique challenges; snow pillows and the double-fence automated reference (DFAR) configuration around gauges provide reliable estimations of precipitating snow. However, snow precipitation measurements from WMO standard installations with single-Alter-shielded and unshielded gauges are affected by an undercatch of solid precipitation in windy conditions (Nitu et al., 2018; Kochendorfer et al., 2022). The WMO Solid Precipitation Intercomparison Experiment (SPICE) analyzed this undercatch and developed adjustment functions for correcting it (Kochendorfer et al., 2018; Wolff et al., 2015), which are now being used in verification practices (Køltzow et al., 2020; Buisán et al., 2020; Casati et al., 2023).

3.2 Upper-atmospheric profiles

Radiosonde data are typically treated as if they are instantaneous vertical profiles of the troposphere by collapsing the balloon-borne trajectory of the instrument package into a profile that is time indexed with the launch time. The measurements of temperature, dew-point temperature, pressure, humidity, and winds are high cadence (on the order of seconds) during the balloon ascent, with individual launches being low cadence (every 12 h, with standard launch intervals at 00:00 and 12:00 UTC). During intensive campaign periods, the launch frequency is often increased to 4–6 sondes per day. A full sounding can take 2 h to ascend and can travel up to 200 km horizontally, and important fine-scale information is available if the full original trajectory information (time–height/pressure level–latitude–longitude) coordinates are maintained to assess the spatial displacement. Sometimes, special arrangements are made to continue to capture data from the radiosonde during its descent after the balloon bursts (Hartten et al., 2018), although care is necessary when using descent data because key assumptions built into the instrument design are being violated. For example, the temperature and humidity sensors are positioned on the sonde such that they sample air undisturbed by the instrument casing during ascent, but those conditions may or may not be met during descent (Ingleby et al., 2022).

Many observatories and campaigns support the operation of radars, lidars, sodars, profilers, and microwave radiometers which, separately and in combination, can remotely infer properties throughout the depth of the planetary boundary layer (PBL) and free atmosphere. The systems use active (transmission and the interpretation of the reflected signal) and passive (detection of the natural atmospheric signal) sensing techniques. Using a significant body of research on retrieval methods, the systems can determine properties such as the cloud base, cloud liquid-water path, cloud ice-water path, cloud liquid-water content, cloud ice-water content, hydrometeor sizes and shapes, snowfall rates (Matrosov et al., 2022), degree of riming, aerosol extinction coefficients, winds, temperature, and humidity. These advanced products may be obtained from diverse instrument hardware configurations and technologies as well as site-specific scanning and collection schedules. Robust site-independent retrieval methodologies that are consistent across networks – such as the products produced by Cloudnet (Illingworth et al., 2007) and the ARM Active Remote Sensing of Clouds product (Clothiaux et al., 2001) – have been developed; however, they may not be consistently implemented.

3.3 The terrestrial surface and subsurface

The terrestrial surface and subsurface are composed of different and complex soil, rock, ice (permafrost), and vegetation layers that can be covered with surface snow and ice that show evolving density, heat capacity, conductivity, and chemistry. Thermistor strings can be co-located with tensiometers; the variables measured are the gradients of temperature and moisture, respectively, with thermal conductivity and heat capacity determined as fixed intrinsic properties from soil samples. Thermistor strings can also be co-located with moisture probes, providing vertical profiles of soil temperature and moisture, respectively, down to depths ranging from 0.1 to 2 m, depending on the soil type. Snow depth, like precipitation, is a difficult quantity to measure representatively; techniques vary from using simple measurement stakes to downward-looking mast- or tower-mounted acoustic devices.

3.4 The ocean, sea ice, and the snow surface and subsurface

Ocean thermistor measurements are accompanied by conductivity and pressure measurements, which allow the determination of the salinity, temperature, and depth. Additional current meters allow the determination of turbulent fluxes using eddy correlation techniques similar in principle to those used for atmospheric fluxes. The measurement of sea-ice and snow macro- and microphysical properties below the atmosphere–ice or atmosphere–snow interface (i.e., the subsurface) is increasingly sophisticated, with upward- and downward-looking acoustic devices on ice buoys determining the ice thickness and snow depth (Zuo et al., 2018). Measurements are made of the snow and ice density, crystalline structure, and salinity via manual sampling.

4 MODFs for the Year of Polar Prediction

The WMO Polar Prediction Project (PPP) organized the Year of Polar Prediction (YOPP; PPP Steering Group et al., 2019; Jung et al., 2016), which concluded in 2022 (Wilson et al., 2023). The YOPPsiteMIP (Svensson et al., 2020; Day et al., 2023) working group envisioned matched sets of observation and model data to support model process diagnostics. The efforts of this working group resulted in a metadata schema for MODFs and MMDFs (collectively known as MDFs) as well as an iterative production workflow. The YOPPsiteMIP effort focused on polar terrestrial stations, but because it was anticipated that a similar strategy would be applied to the YOPP-endorsed Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) expedition (Shupe et al., 2022; Nicolaus et al., 2022), the schema was developed to also accommodate expected additional ocean and sea-ice observations from ships and on-ice platforms in the central Arctic Ocean.

The first steps in MODF production are to assemble the available data files that will feed into the MODF, extract the desired data, and acquire the corresponding metadata. During these steps, each individual instrument and instrument group requires both unique quality control and processing, together with attribution tracking, in order to produce the geophysical variables and to document their provenance. Given the heterogeneity and often research-grade nature of campaign data, the amount and quality of metadata for different variables is likely to be inconsistent. Although required metadata are ideally harvested from the internally documented data files, it is likely that many data sets will need additional metadata that will require interviewing the original data collectors (for an example of this other than our own, see Papoutsoglou et al., 2023). Once individual data and any available metadata from individual instruments and instrument suites are created and/or assembled, the merging process can begin.

As we developed the schema and workflow, we identified the following challenges and solutions.

Semantics. Data semantics address the issue of the same variable being given different names drawn from multiple or ad hoc naming conventions. A significant part of the MODF solution has been the development of an extensive schema based on already existing vocabulary standards.

Units. Variables' units are frequently absent entirely from data sets or expressed with nonstandard abbreviations (Hanisch et al., 2022). The MODF solution is to associate each variable with recommended units, typically as identified in the CF standard name table (2023). Eaton et al. (2022) explain how these are meant to be compatible with and generally recognized by the Unidata (2020) UDUNITS package.

Attribution. Perhaps the greatest MODF challenge is not technical but rather cultural. When data for a single MODF product come from instruments operated by multiple institutions and researchers, there are complex issues with acknowledging the original sources of the data (Pierce et al., 2019; Nature Editorial, 2022). Data from campaigns or programs usually have multiple institutions and individual researchers involved, all of whom have different performance metrics for original research. This often leads to official or implicit data embargoes so that the researchers who collected the data will have the first opportunity to publish research results. The design of MODFs is responsive to these issues as it provides a high level of attribution metadata, including links to original data and data producers, thereby supporting copious data citations (Vannan et al., 2020).

Data heterogeneity. Measurement heterogeneity is generated by differences in how data are collected and processed. The MODF solution is to, as much as possible, document variable derivation from the original data metadata: type of instrument, calibration information, method of deployment, quality control and processing histories, and original licenses.

Inconsistent cadences. Different variables are collected over a range of native cadences varying from Hz for the fast-response sensors needed to record phenomena that can change on short timescales (typically in the atmosphere) to sensors for which it is sufficient to sample on an hourly or even daily timescale (typically in the terrestrial subsurface), along with a wide range in-between (typically for the ocean and ice sub-surfaces). The MODF solution is for each variable to retain its native recording cadence and to minimize any temporal averaging beyond that necessary for sensible data processing. In addition, original data sets are not interpolated to fill data gaps.

Redundancy. It is common during campaigns or at intensive observatory sites for there to be multiple measurements of the same variable (e.g., temperature). There is important information contained in redundant measurements when evaluating how representative a variable is of site characteristics or when comparing in situ observations to model grid cells and satellite footprints to evaluate local variability. The MODF solution is to include as many redundant measurements as possible, maintaining high-accuracy location information that helps define the observations' microclimates, including, for example, soil and vegetation characteristics.

Processing levels. Research-grade data sets typically require unique quality control and often extensive post-processing, which can only rarely be fully automated. Frequently, the state variables (e.g., temperature) are readily available in near real time, whereas more complex variables such as turbulent fluxes can have a wide range of availability, depending on whether they are output by commercial software (Fratini and Mauder, 2014), calculated from bulk variables, or based on customized calculations for complex environments. The MODF solution is to include full processing, data-quality, and usability metadata.

Versions. Since it is common for observed data to have versions, the MODF concept accommodates multiple product releases. Examples of data versions include the raw data (sometimes just voltages); data that have been minimally quality checked by the automated elimination of extreme physically unrealistic outliers; data that have been subjectively curated for unusual and, if possible, correctable operational or environmental conditions; and data that have undergone different levels of processing to produce the retrieved variables (those based on more direct measurements). Frequently, highly processed and certified data are available months or even years after the original collection period. In some cases, all processing levels of data are available and archived. Two obvious ways of handling the situation are to either replace the original data with more processed data as they become available (Fig. 2a) or to keep all versions of the data (Fig. 2b). MODFs' rich metadata include careful version tracking for individual variables, which encourages the latter.

https://gmd.copernicus.org/articles/17/5225/2024/gmd-17-5225-2024-f02

Figure 2There are two approaches for MODF augmentation (as new variables become available) and modification (as variables proceed through processing levels). In both approaches, variables that are immediately available (Var 1, Var 2) can be immediately ingested and variables can be added as they become available (Var 3, Var 4). The two approaches accommodate two strategies for variables that undergo post-processing, which results in products with quality-control (QC) levels. In (a), the original data (Var 3 v1.0) can be replaced with quality-controlled data (Var 3 v2.0). In (b), the original data (Var3 v1.0) can be retained in the MODF, and quality-controlled data (Var 3, 2.0) can be added.

Download

5 The H-K variable schema table

The H-K variable schema table, developed for the YOPPsiteMIP (Hartten and Khalsa, 2022; hereafter the “H-K schema”), provides guidelines for creating both MODF and MMDFs as netCDF files (Unidata, 2023) with consistent variable names and metadata. Hereafter, the discussion in this paper centers on those entries which are relevant to MODFs, although most of them are also relevant to MMDFs. The H-K schema follows the Attribute Convention for Data Discovery (2024; hereafter “ACDD”), the NetCDF Climate and Forecast (CF) metadata conventions (Eaton et al., 2022), and the ISO/TC 211 19100 series of standards for digital geographic information. Short variable names are compliant with the World Climate Research Program (WCRP) Coupled Model Intercomparison Project, Phase 6 (CMIP6; Taylor et al., 2022) whenever possible, as these are in wide use in the geophysical research community.

For global attributes, the H-K schema requirements include identifying the file feature type, the file maker, the license, the location, information on when the data were collected, and a permanent identifier. The global attributes listed in the H-K table are minimum requirements and may be augmented by others from the ACDD or the CF conventions in order to include other metadata deemed important for provenance or usability. In particular, the product_version global attribute can be used to help clearly identify when a MODF has been augmented by new or reprocessed variables, and comment can always be used for free-text metadata or stable URLs linked to additional metadata files. For variable attributes, the H-K schema identifies vocabularies, units, individual variable attributions (i.e., who originally collected the data) and variable provenance, and it also presents a method for the differentiation of multiple measurements of the same geophysical variable (redundancy). The H-K schema is available in both the JSON (machine readable) and PDF (human readable) formats.

5.1 Global attributes

Global attributes are the descriptive metadata that are relevant for the entire MODF file. Table 1 lists recommended global attributes for MODFs. Some were chosen because they are highly recommended by ACDD. Others are merely recommended by ACDD, but they made sense for our purpose and seemed not overly burdensome for those most likely to be making MODFs. We chose additional attributes from metadata standards other than ACDD – e.g., the CF conventions and the DataCite Metadata Kernel v4.4 (DataCite Metadata Working Group, 2021) – because we felt they would help MODFs be FAIR. Some global attributes whose inclusion and consistent use are particularly important are described below. We use italics for the names of attributes and single quotes around variable names.

id. The global unique persistent identifier (PID). This global attribute does not supersede references to the PIDs associated with individual variables (Prakash et al., 2016), which can be provided within the variable attribute references. The MODF id remains constant as modifications or additions are made to the MODF file.

license. Specifies the terms of distribution. As with id, individual variables and groups of variables in the MODF may have unique license requirements that can be recorded in the variable attribute comment.

creator_name. The names of individuals who should be in the citation for the MODF (Jones et al., 2020; see Sect. 2.2.7.3.3). The DataCite Metadata Kernel v4.4 (DataCite Metadata Working Group, 2021) defines creators in this context as the main producers of the MODF file and also links them to authorship. (Note that the main producers of component data sets are credited within the variable attributes that come from their individual data sets.) Enabling those who create MODFs to receive appropriate credit for their work is a key element of making MODFs FAIR; data reuse depends in part on data accessibility, which, in turn, is more likely when data are “considered legitimate, citable products of research” (Data Citation Synthesis Group, 2014).

featureType. Although the original vision was that each site during a campaign would have a single MODF that would include all relevant variables, when considering the practicalities of archive submission, it became apparent that the file featureType needed to be specified to facilitate data services such as visualization tools. Most observatory data fit into the featureTypes timeSeries, timeSeriesProfile, and timeSeriesTrajectory. These featureTypes are defined by the temporal–spatial dimensions that are associated with individual variables: time only for timeSeries, time and height (or depth) for timeSeriesProfile, and time–height (or depth)–latitude–longitude for timeSeriesTrajectory.

missing_value. Establishes one consistent missing_value for all data in the MODF.

history. Provides an audit trail for the MODF file, documenting its provenance. Initially, this should include information about how it was created. Later, if the original file is modified (e.g., variables are corrected or added), information about what has changed should be added to the original information. The CF conventions recommend that each step be documented by a line containing a datestamp, the name of the person or entity who did the action, a brief description of the action, and any program(s) which accomplished the action (including relevant settings or command arguments). If the modifications involve the addition of a new version of an existing variable, the variable attributes associated with the new version, and the earlier version if it is not removed, should thoroughly document the change.

Table 1Recommended global attributes for the H-K schema (adapted from Hartten and Khalsa, 2022).

Download Print Version | Download XLSX

5.2 Time, space, and site variables

In the H-K schema, variables have been divided into subgroups. There are three subcategories with time, space, and site information: temporal dimensions and variables; spatial dimensions and variables; and single-level fixed variables. Per the NetCDF user's guide (Unidata, 2023), dimensions “may be used to represent a real physical dimension, for example, time, latitude, longitude, or height … [or] to index other quantities, for example station or model-run-number.” Examples of these indexing variables are listed in Table 2 and discussed further below. Per the H-K schema, all dimensional and geophysical variables are given a short CMIP or CMIP-like variable name and are characterized by variable attributes long_name, standard_name, units, additional recommended attributes, and any other attributes the MODF creators feel are necessary to fully document the data.

Table 2Examples of dimensions and variables related to time, space, and site information, together with the associated long_name, standard_name, units, and recommended attributes (extracted from Hartten and Khalsa, 2022).

Download Print Version | Download XLSX

“time”; “time15”; “time_sonde”. MODFs are intended to retain as much high-resolution information from the original data collection as possible. Averaging is limited to that necessary for processing (e.g., for eddy-correlation flux calculations), and data are not interpolated. Therefore, temporal dimensions support maintaining the original data collection cadences for individual variables. For instance, in MODFs, “time” is a generic temporal dimension, whereas “time15” is a time dimension associated with variables collected at 15 min intervals and “time_sonde” is the time dimension associated with the data collected during a radiosonde ascent. In the YOPPsiteMIP implementation of MODFs, we have chosen to append “N” or “_platform” to a generic CMIP name such as “time” in order to indicate a time array with a particular cadence in minutes or a time array tied to a particular instrument or platform. In keeping with CF metadata guidelines, all the differently named time variables can and should have the same long_name attribute.

“height_tower”. A spatial coordinate can be a single scalar value or a set of values (a dimension) that describes the location where geophysical measurements are collected. This can be an array of fixed values in the case where measurements are made at set levels, for instance, on an instrumented tower. A tower may also consist of multiple measurements of the same geophysical variable collected at different heights above the ground. For instance, observations of air temperature collected by three sensors located at the surface (2 m a.g.l.), on a pole (10 m a.g.l.), and on top of a building (20 m a.g.l.) could be combined into a single array designated as “_tower” so long as the sensors' horizontal positions were co-located.

“lat_sonde”; “lon_sonde”; “alt_sonde”. Spatial coordinates can also be a dimension variable. For instance, if information on the latitude, longitude, and/or altitude of a radiosonde during its ascent is retained, this information should be put into auxiliary coordinate variables that provide the location in space for the geophysical variables collected by the sonde. If the sonde output includes only varying altitudes and a fixed launch time, the geophysical data from each sonde flight should be put into a file with the featureType “profile”. If the sonde output includes varying altitudes and associated times, the geophysical data should be put into a file with the featureType “timeSeriesProfile”. In both cases, a scalar altitude variable should be used as the vertical dimension for each flight. However, if varying values of the latitude and longitude are also provided, the geophysical data from each sonde should be put into a file with the featureType “timeSeriesTrajectory”, with the reported altitude, latitude, and longitude being provided as coordinate variables for each flight. Finally, the “profile”, “timeSeriesProfile”, and “timeSeriesTrajectory” featureTypes all require that times monotonically increase. It would be tempting (and not against the requirements of featureType) to put geophysical data collected during the radiosonde descent and their accompanying spatial coordinates into the same variables used for the values collected during radiosonde ascent. However, we would recommend using a separate set of variables with a clearly different suffix and long_ name set instead. Data collected during descent have different statistical characteristics than data collected during ascent (Stephan et al., 2021; Ingleby et al., 2022), and we think users should affirmatively choose, rather than accidentally use, such “off-label” data.

“orog”. Certain fixed variables describe the site or the platform from which measurements are taken. An example of the former is the variable with the surface altitude (CMIP name: “orog”), which describes the site's altitude above the surface defined as the lower boundary of the atmosphere. Other common single-level fixed variables are “lat” (latitude) and “lon” (longitude), which identify a site's general location. If the site has distributed measurements, individual variables may have more refined “lat_platform” and “lon_platform” variables or dimensions.

5.3 Geophysical variable attributes and examples

There are six categories of geophysical variables: single-level atmosphere variables; surface and top-of-atmosphere (TOA2) variables; atmospheric variables on model or instrument levels; subsurface terrestrial variables; oceanic variables on model or instrument levels; and sea ice variables. Examples of long_name, standard_name, and units attributes for observed geophysical variables that have been cataloged in the H-K schema are shown in Tables 3 and 4. A full listing of all the single-level atmospheric variables, the surface and TOA variables, and the atmospheric variables on instrument levels in the H-K schema version 1.2 is presented in Table A1, while a listing of the current H-K schema oceanic variables on model or instrument levels, subsurface terrestrial variables, oceanic single-level variables, and sea ice variables is found in Table A2. Some discussion of the examples in Tables 3 and 4 follows.

CMIP name. The Coupled Model Intercomparison Project (CMIP) names are taken from the CMIP6 Participation Guidance for Modelers, a program supported by the NASA Program for Climate Model Diagnosis and Intercomparison. CMIP names prioritize terse, presumably code-efficient abbreviations such as “ta” (air temperature). To clearly identify the variables that were observed or derived redundantly from different methods or platforms, a suffix can be appended (e.g., _tower, _radar, _8m). Some standard measurements, such as 2 m temperature and 10 m winds, have unique CMIP names such as “tas”, “uas”, and “vas” (near-surface air temperature, eastward wind, and northward wind, respectively). In the case where a CMIP variable name has not been defined in the CMIP6 vocabulary, a CMIP-like name has been composed.

long_name. Fully describes the physical quantity and can be thought of as a useful attribute for labeling plot axes; in other words, it is the name that best communicates with humans about what the variable is. (Note that the long_name does not necessarily need to be in English.) The H-K schema provides ad hoc long_name definitions for all variables.

standard_name. Taken from the CF standard name table (2023), which is periodically updated based on community requests and discussion (see the “Discussion” link in CF Metadata Conventions, 2024). Standard names are constructed in conformance with the CF conventions (Hassell et al., 2017; Guidelines for Construction of CF Standard Names, 2024). Different variables can have the same standard name. For instance, “albs” and “albsn” both have a standard_name of surface_albedo, but the use of “albsn” is restricted to snow-covered areas. Redundant variables (such as multiple measurements of temperature or fluxes computed by bulk versus eddy-correlation methods) will have the same standard name and will require differentiation by adding a suffix to the CMIP name, using additional descriptors in the long_name, and possibly using accompanying spatial–temporal indices. The standard_name of a variable sometimes implicitly gives information about the directionality of a flux or a similar variable, information which is explicitly given in the CF standard name table and replicated in the “Notes” column of the full H-K schema. Specifically, to continue with the example of fluxes, “The sign convention is that `upwelling' is positive upwards and `downwelling' is positive downwards” is part of the definition of any variable with those four up or down words in its standard_name.

Table 3Examples of atmospheric variables, together with the associated long_name, standard_name, and units attributes (extracted from Hartten and Khalsa, 2022).

Download Print Version | Download XLSX

Table 4Examples of non-atmospheric variables, together with the associated long_name, standard_name, and units attributes (extracted from Hartten and Khalsa, 2022).

Download Print Version | Download XLSX

Table 5 lists the other attributes that should be included in MODFs for geophysical variables. Some of the variable attributes in Table 5 are listed in the CF conventions as being for use as either global or data attributes, but the history attribute is listed as for global use only. We have encouraged its use with variables (data) in MODFs because we believe that this maintains the spirit, if not the letter, of the CF conventions; like institution, references, source, and title, history helps document the provenance and nature of the data included in these multi-institution, multi-sourced files. We also encourage MODF (and MMDF) makers to make use of additional variable attributes to share information about the variable contents with users. Brief explanations of some of the attributes in Table 5 follow, but those interested in creating MODFs should also review the definitions and explanations of attributes in the ACDD and the CF conventions.

original_name. Refers to the name of the variable in the original file from which it was extracted. This provides an important cross-reference in the case where a user may need to refer back to the original source data set.

instrument. Tracking instrumentation characteristics and, in many cases, calibration coefficients is critical to provide information for users who may be in the process of developing refined data processing, developing higher-order products, or doing instrument intercomparison studies.

references. Published material that describes either the data or the methods used to produce them should be listed here. Best practice is to include a URI (a DOI for a paper or a URL for a website).

source. Both the CF conventions and ACDD describe this as “the method of production of the original data”, by which they mean either the model that produced it or the instrument that gathered it. In either case, the idea is to give the user information that will help them understand what they're working with, including what assumptions or methods are inherent to the data. Therefore, the name and version of a numerical model or the type (and perhaps the make and model) of an instrument would be appropriate here.

history. Provides an audit trail for the data, documenting its provenance. When used as a variable attribute, this should trace what has been done to the data from its raw state until it was put into this file. We recommend that each step be documented by a line that starts with a date stamp and the version number of the variable, the person or entity who did the action, a brief description of the action, and any program(s) which accomplished the action (including relevant settings or command arguments).

Table 5Minimum recommended additional variable attributes for geophysical variables (adapted from Hartten and Khalsa, 2022).

Download Print Version | Download XLSX

Figure 3 describes the workflow and process that developed through the efforts of the YOPPsiteMIP team of observers, modelers, data scientists, and data managers. Key components include both the H-K schema and the A-M variable and attribute template table (Morris and Akish, 2022). The A-M table was developed for collating metadata that were not already encoded into the individual data files being used as input to MODFs, and it was used as a direct input to the MODF creation process. The development of the H-K schema and the A-M table was a highly iterative process, as indicated by the two-way arrow between (c) and (d) in Fig. 3, and is expected to continue to be so.

https://gmd.copernicus.org/articles/17/5225/2024/gmd-17-5225-2024-f03

Figure 3The MODF workflow process. (a) Gather input files. (b) Interview contributing data principals to obtain necessary metadata that are not digitally encoded in the data files. (c) Utilize the A-M MODF template to construct the specific MODF framework-based inputs from (a) and (b) compliant with the H-K schema (d). MODF makers create the MODF (e) with inputs from (c) and the data values from individual files collected (a). (f) The MODF file. (g) MODF checking. (h) Upon “pass”, send the MODF to the archive. (i) Upon “fail”, assess the MODF. (j) In the case of a major fail, return the MODF to the MODF makers. (k) In the case of a minor fail, the archivist corrects and sends the MODF to the archive after rechecking (g). (l) Iterative developments between the H-K schema (d) and MODF checking (l). The people icons were adapted from one designed by VectorStock (image #45970079 at http://VectorStock.com, last access: 22 July 2023).

6 Discussion

We have presented the H-K schema (Hartten and Khalsa, 2022) and a production framework for organizing complex campaign and observatory data from multiple instruments into Merged Observatory Data Files (MODFs). The H-K schema also enables the formatting of forecast model output into corresponding Merged Model Data Files (MMDFs). MODFs and MMDFs, i.e., Merged Data File (MDF) collections, are compliant with existing metadata and data standards that support FAIR principles. The schema and the framework were developed by a YOPPsiteMIP working group of observers, modelers, and data managers. MODFs address the mundane but complex issues that arose from the YOPPsiteMIP vision of confronting polar weather-forecasting models with observations from richly instrumented sites during special observing periods. The issues addressed include data semantics, attribution for the original data, data provenance, different cadences, multiple measurements of the same variable from the same site (including the local subgrid-scale spatial distribution), versioning strategies to account for different levels of data processing, unit conventions, and missing data indicators. Because of the high cadence of many measurements (seconds to minutes), MODFs can be used to evaluate accumulating biases at model time-step increments, which are typically shorter than model output increments.

Although providing site, instrument, processing, and attribution metadata documentation is good practice, it is often neglected. Since we discovered that this essential MODF information was difficult to assemble after the fact, we recommend that, as part of the routine MODF development process, observers use datagrams during the development, deployment, and operation of sensors as a standard practice. Datagrams “are designed to document the life story of a data value from start to finish and provide a guide for humans to design, deploy, troubleshoot, repair, record, transmit, process, and archive data collected with measuring devices.” (Morris and Uttal, 2022). In addition to proactively gathering observation metadata, the MIIP strategy would also be significantly supported if model outputs were extracted in the vicinity of observatories in real time. This can be difficult after the fact for routine output and impossible for time-step output. Furthermore, while we strongly encouraged YOPPsiteMIP MODF and MMDF makers to check their files for compliance before submitting, we have found it incredibly useful that the data manager at the host archive we worked with actively participated in checking the files (Tjernström, 2022) and, in some cases, made editorial corrections to bring the MODFs and MMDFs into full compliance with the H-K schema. This sort of effort at the data archive can be necessary when MDF providers do not have the resources to support the level of data management required to create fully compliant files.

Currently, the H-K schema includes geophysical variables across the atmosphere, snow, ice, terrestrial, and ocean systems. Appendix A lists the MODF (and MMDF) values that are currently in the H-K schema (version 1.2). These are not exhaustive, and we expect that MODF users and makers will augment their MODF files to accommodate individual campaigns by including, for example, atmospheric aerosols, constituent gases, additional ocean and terrestrial variables, and ecosystem and biogeochemical data. Any additional variables should be incorporated for specific applications using the metadata standards described here. Some MODF variable collections (e.g., precipitation from different sensor systems or profiler data with sensor- and/or range-dependent measurement volumes) may require the discovery or creation of special variable attributes. If no standard exists, we recommend that the custom variable attribute should be submitted for consideration as a standard; directions are available via the “Discussion” link in CF Metadata Conventions (2024). However, MODF makers should keep in mind that additional variable metadata for complicated variables can also be shared via the reference and comment attributes.

A first set of MODF files (Mariani et al., 2024) implementing the H-K schema from concept to production for YOPP special observing periods has been archived by the Norwegian Meteorological Service, where a collection (https://thredds.met.no/thredds/catalog/alertness/YOPP_supersite/catalog.html, last access: 4 August 2023) of matched MODF and MMDF files is available for several Arctic ground stations. As of March 2024, these MODFs are not yet available for all the Arctic YOPP supersites, nor for any of the Antarctic YOPP supersites; those which are available do not yet contain all the observed variables from the individual sites. This situation reflects the difficulty involved with amalgamating data into MODF files. An initial analysis (Day et al., 2023) using the YOPPsiteMIP MODF–MMDF collection demonstrates, through a number of examples and case studies, how process-describing variables such as surface fluxes (longwave, sensible, and latent) and ground fluxes can be used to gain a deeper understanding of the nature of forecast errors.

The main motivation for the creation of the MODF and MMDF file format is to accelerate improvements in process description in numerical models by facilitating evaluation using suitable observations. The main strength is that the concept and format are co-developed by and have a purpose for both the modeling and observational communities. Another beneficial aspect is that what we have presented here is a structure that could be built upon by the data management community and incorporated into data center operations. The format is intentionally designed with growth in mind, allowing for new variables and feature types together with the possible development of community platforms for interaction. These aspects pave the way for widening the concept to include more sites, extending the time coverage for existing sites, and including more processes and additional research areas. While the material presented here is intended to help explain MODFs and how to create them, we encourage those making MODFs to explore the ACDD, the CF conventions, and the DataCite Metadata Kernel to more fully understand the possibilities for expansion.

We acknowledge that model–observation interoperability is based on standards that may be limiting for specific applications, so we expect that custom MODFs will be developed as necessary. In other words, the MODF concept is meant to be flexible while remaining within proscribed protocols. While there are few limitations to the MODF concept, currently, the types of data that are encompassed are designed for the YOPPsiteMIP goals (Jung et al., 2024). Further development depends on the engagement of communities and their interest in growing and expanding the format based on other research needs. The success of the MODF format is and will be dependent upon the interest of those data producers.

The concept of organizing multivariate observational data sets from a single site or platform into a unified product is not unique. Wei et al. (2021) describe how the International Consortium for Atmospheric Research on Transport and Transformation (ICARTT) data format conventions (Aknan et al., 2013) can be used to create merged in situ data products based on data from research aircraft carrying multiple sensors. The US DOE ARM program (Stokes and Schwartz, 1994) creates Climate Modeling Best Estimate (CMBE) data files (Xie et al., 2010) for their climate reference sites. Hogan and O'Connor (2004) describe how the Cloudnet project preprocesses cloud data from multiple remote sensors using appropriate ancillary observations and model forecasts and then makes the output available in netCDF format with rich metadata about the preprocessing. What is unique about MODFs is the strategy of simultaneously creating corresponding matched MMDF data files to support modeling verification and process evaluations. This addresses a long-standing issue with communication silos between observing and modeling sciences (Holloway et al., 2014; Sprintall et al., 2021; Neang et al., 2021).

We also expect that MODFs will be used to facilitate studies of interdisciplinary observation-based system science. The NSF workshop report Opportunities and Challenges of Arctic System Science (Vorosmarty et al., 2018) introduced the concept of “multiple `currencies' that link the Arctic climate and environment – geophysical entities such as water, energy, carbon, and nutrients with quantifiable properties – and how they interact to produce and illuminate systems-level behaviors.” MODFs, by quantifying the currencies throughout a system with internally consistent standards, can serve to break down barriers between individual studies of separate components of the system that are typically divided along disciplinary lines, thereby advancing multidisciplinary process studies.

Generating MODFs will expand the usage of data from field campaigns by increasing data uptake and decreasing data latency. This will promote the usage of non-operational data that are currently underutilized and difficult to access comprehensively, specifically for environmental services requiring near real-time environmental intelligence. The US Interagency Arctic Research Policy Committee (IARPC) has defined environmental intelligence as “a system through which information about a particular region or process is collected for the benefit of decision makers through the use of more than one inter-related source.” Furthermore, the IARPC notes that “Traditionally, researchers collect data, develop models, and communicate results through well-established channels that are often slow and inefficient. While the vetting of scientific results ensures that the conclusions are of highest quality, the process is not well-aligned with the need for rapid information delivery in the face of environmental transitions that are putting stress on ecosystems and human populations.” MODFs will not only accelerate timely and relevant data access and scientific results for the primary researchers that collected the data; they will also support the iterative observations, modeling, and data systems that connect researchers, stakeholders, and decision makers to allow informed responses to environmental events. To this end, MODFs are designed to be living files that can be created in a timely manner with near-real-time variables and then augmented when additional variables become available. This is particularly important for situations in which observational data become obsolete before they can be utilized by environmental awareness and short-term prediction services for extreme events. MODFs can then be augmented with variables that are only available after extensive human-assisted processing (e.g., surface energy balance fluxes) for research purposes as well as with new versions of variables that require detailed quality control to produce higher-level and higher-reliability products. This addresses the concern that many data providers express about only releasing the most highly curated values (often resulting in data embargoes and release time lags) and recognizes that the level of necessary post-processing depends on the application, while still reducing the data latency so that rapid climate change events can be addressed in a timely manner.

Creating merged data products that are also findable, accessible, interoperable, and reusable is easy to say and hard to implement; it is expensive in any currency, be it time or people or computing. In developing the concept of MODFs and MMDFs, we have struggled over whether there is a realm between “nothing” and “everything”; over how to keep the quest for perfect compliance from preventing a good or even very good improvement. In the end, we think the guiding principle must be this: to remember that FAIR data are an ideal, and the implementation involves tradeoffs; some metadata is better than none, and anything that moves along this path is good.

Appendix A: Current measurable geophysical MODF and MMDF variables and their “essential climate variable” status

The H-K schema is a set of tables that is expected to be expanded and adjusted as the need arises for additional MODFs and MMDFs and for additions to existing ones. Therefore, researchers interested in using the H-K schema to guide file creation should always refer to the current online version; the DOI listed in the references of this article will always land on the most recent version of the schema, and when a new version of the H-K schema is published, the earlier versions will be prefaced with a note including a link to the latest version.

In this Appendix, we list the long_name attribute for all atmospheric (Table A1) and non-atmospheric (Table A2) geophysical variables, current as of the time of this article's submission. We do this for two reasons: to show the variety of variables incorporated into the H-K schema so far and to highlight the variables in the schema which the WMO has identified as essential climate variables (ECVs; Essential Climate Variables, 2023). Note that some of the ECVs are defined by the Global Climate Observing System (GCOS) in a manner other than the point measurements typically made at field sites. Those details are identified by footnotes in the tables. Lavergne et al. (2022) have proposed expanding the list of ECVs related to sea ice; the proposed sea ice ECVs that are in the H-K schema are separately highlighted in Table A2.

Table A1long_name attributes of atmospheric variables (extracted from Hartten and Khalsa, 2022) that are measured at Arctic YOPP supersites or included in Arctic YOPP MMDFs (prefaced by “•”). Underlined variables are ECVs.

a “Total column ozone” is the listed ECV, while observations from a site are typically point values. b The ECV listing is for maps, not point measurements, of albedo;
snow albedo is not explicitly listed. c The ECV listing specifies “land-biosphere evaporation from land”.

Download Print Version | Download XLSX

Table A2Non-atmospheric variables (extracted from Hartten and Khalsa, 2022) that are measured at Arctic YOPP supersites or included in Arctic YOPP MMDFs (prefaced by “•”). Underlined variables are ECVs, with an asterisk () used for newly proposed sea ice ECVs (Lavergne et al., 2022).

a ECV listing is for just the wave height.

Download Print Version | Download XLSX

Code and data availability

The H-K variable schema table developed for the YOPPsiteMIP is archived on Zenodo (https://doi.org/10.5281/zenodo.6463464; Hartten and Khalsa, 2022) under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/, last access: 29 March 2024).

The A-M variable and attribute template table developed for the YOPPsiteMIP is archived on Zenodo (https://doi.org/10.5281/zenodo.6780400; Morris and Akish, 2022) under the Creative Commons Attribution 4.0 International License.

A preliminary set of MODF and MMDF files developed for the YOPPsiteMIP is available at https://thredds.met.no/thredds/catalog/alertness/YOPP_supersite/catalog.html (Norwegian Meteorological Institute, 2022). The catalog link provides information on licensing and crediting which is not repeated here, as the example files are not the subject of this article.

Author contributions

TU conceptualized the original MODF vision; CJC and MM participated in conceptualization by providing valuable perspectives on the practicality, utility, and application of the MODF concept. BC, GS, and JD provided methodology requirements from a numerical modeling perspective. LMH and SJK, in developing the H-K schema, provided methodology requirements from an observational perspective and also enhanced data curation. JAKT and OG guided the development of the H-K schema and MODF formats by providing methodology requirements from a data repository standpoint.

Software contributions were provided by JH and JAKT, who wrote Python code functions to facilitate MODF creation, checking, and modification. Data curation, investigation, and software contributions were provided by EA, SM, NH, LXH, RC, ZM, EO'C, RP, JH, and MM, who served as MODF makers and, through that practical application of the MODF concept, developed the workflow indicated in Fig. 3. They also iteratively worked with LMH and SJK to further develop the H-K schema. JAKT contributed to the data curation, software, and validation by serving as the MODF curator and by iteratively working with LMH and SJK to further develop the H-K schema. GP and NH enhanced data curation by developing strategies to have MODFs include comprehensive formal attribution for individuals and institutions.

TU drafted the original manuscript and TU, LMH, and SJK revised the draft, with commentary and revisions from BC, GS, JD, SM, EO'C, RP, ZM, JAKT, MM, and CJC. Visualizations, in the form of figures and tables, were drafted by TU and revised by TU and LMH.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

This is a contribution to the Year of Polar Prediction (YOPP), a flagship activity of the Polar Prediction Project (PPP), initiated by the World Weather Research Programme (WWRP) of the World Meteorological Organization (WMO).

This work was supported in part by NOAA's Global Ocean Monitoring and Observing Program (FundRef https://doi.org/10.13039/100018302), the NOAA Physical Sciences Laboratory (Taneil Uttal, Leslie M. Hartten, Elena Akish, Sara Morris, and Christopher J. Cox), and the NOAA Global Monitoring Laboratory (Sara Morris). Leslie M. Hartten, Elena Akish, and Sara Morris were supported in part by NOAA cooperative agreements NA17OAR4320101 and NA22OAR4320151; Leslie M. Hartten was also supported by NOAA's Climate Program Office (Climate Observations and Monitoring Program, FundRef 100007298). This work was also supported in part by the US Department of Energy's Atmospheric System Research, an Office of Science Biological and Environmental Research program. Jonathan Day was supported by the European Union's Horizon 2020 research and innovation program through grant agreement 871120 (INTERACTIII). Roberta Pirazzini was partly supported by the European Union's Horizon 2020 research and innovation program (projects INTAROS (grant 727890) and PolarRES (grant 101003590)).

Michael Gallagher (Univ. of Colorado–CIRES and the NOAA Physical Sciences Laboratory) established the MODF makers' GitLab and Slack channels and wrote Python functions to facilitate MODF creation. We are grateful to Dave Allured (Univ. of Colorado–CIRES and the NOAA Physical Sciences Laboratory) for extensive discussions about the CF conventions, which greatly improved the MODF and MMDF projects. We also thank Scott Landolt (NCAR–Research Applications Lab) for discussions about measuring snow. The comments from two anonymous reviewers helped us to clarify certain points and encouraged us to expand on others.

Financial support

This research has been supported by the Global Ocean Monitoring and Observing Program (grant no. 100018302), the NOAA Research (grant nos. NA17OAR4320101 and NA22OAR43201511), the Climate Program Office (grant no. 100007298), and the European Union's Horizon 2020 (grant nos. 871120, 727890, and 101003590).

Review statement

This paper was edited by Nina Crnivec and reviewed by two anonymous referees.

References

Aknan, A., Chen, G., Crawford, J., and Williams, E.: ICARTT File Format Standards V1.1, National Aeronautics and Space Administration (NASA), ESDS-RFC-019v1.1, 21 pp., https://espoarchive.nasa.gov/sites/default/files/archive/ESDS-RFC-019-v1.1_0.pdf (last access: 15 July 2023), 2013. 

Andreas, E. L., Persson, P. O. G., Grachev, A. A., Jordan, R. E., Horst, T. W., Guest, P. S., and Fairall, C. W.: Parameterizing Turbulent Exchange over Sea Ice in Winter, J. Hydrometeorol., 11, 87–104, https://doi.org/10.1175/2009JHM1102.1, 2010. 

Attribute Convention for Data Discovery 1–3: https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3, last access: 21 March 2024. 

Baldocchi, D., Valentini, R., Running, S., Oechel, W., and Dahlman, R.: Strategies for measuring and modelling carbon dioxide and water vapour fluxes over terrestrial ecosystems, Global Change Biology, 2, 159–168, https://doi.org/10.1111/j.1365-2486.1996.tb00069.x, 1996. 

Baldocchi, D., Falge, E., Gu, L., Olson, R., Hollinger, D., Running, S., Anthoni, P., Bernhofer, C., Davis, K., Evans, R., Fuentes, J., Goldstein, A., Katul, G., Law, B., Lee, X., Malhi, Y., Meyers, T., Munger, W., Oechel, W., Paw U, K. T., Pilegaard, K., Schmid, H. P., Valentini, R., Verma, S., Vesala, T., Wilson, K., and Wofsy, S.: FLUXNET: A New Tool to Study the Temporal and Spatial Variability of Ecosystem-Scale Carbon Dioxide, Water Vapor, and Energy Flux Densities, B. Am. Meteorol. Soc., 82, 2415–2434, https://doi.org/10.1175/1520-0477(2001)082<2415:FANTTS>2.3.CO;2, 2001. 

Boden, T. A., Krassovski, M., and Yang, B.: The AmeriFlux data activity and data system: an evolving collection of data management techniques, tools, products and services, Geosci. Instrum. Method. Data Syst., 2, 165–176, https://doi.org/10.5194/gi-2-165-2013, 2013. 

Bojinski, S., Verstraete, M., Peterson, T. C., Richter, C., Simmons, A. J., and Zemp, M.: The Concept of Essential Climate Variables in Support of Climate Research, Applications, and Policy, B. Am. Meteorol. Soc., 95, 1431–1443, https://doi.org/10.1175/BAMS-D-13-00047.1, 2014. 

Boukabara, S.-A., Krasnopolsky, V., Penny, S. G., Stewart, J. Q., McGovern, A., Hall, D., Ten Hoeve, J. E., Hickey, J., Huang, H.-L. A., Williams, J. K., Ide, K., Tissot, P., Haupt, S. E., Casey, K. S., Oza, N., Geer, A. J., Maddy, E. S., and Hoffman, R. N.: Outlook for Exploiting Artificial Intelligence in the Earth and Environmental Sciences, B. Am. Meteorol. Soc., 102, E1016–E1032, https://doi.org/10.1175/BAMS-D-20-0031.1, 2021. 

Buck, J. J. H., Bainbridge, S. J., Burger, E. F., Kraberg, A. C., Casari, M., Casey, K. S., Darroch, L., del Rio, J., Metfies, K., Delory, E., Fischer, P. F., Gardner, T., Heffernan, R., Jirka, S., Kokkinaki, A., Loebl, M., Buttigieg, P. L., Pearlman, J. S., and Schewe, I.: Ocean Data Product Integration Through Innovation-The Next Level of Data Interoperability, Front. Marine Sci., 6, 32, https://doi.org/10.3389/fmars.2019.00032, 2019. 

Buisán, S. T., Smith, C. D., Ross, A., Kochendorfer, J., Collado, J. L., Alastrué, J., Wolff, M., Roulet, Y.-A., Earle, M. E., Laine, T., Rasmussen, R., and Nitu, R.: The potential for uncertainty in Numerical Weather Prediction model verification when using solid precipitation observations, Atmos. Sci. Lett., 21, e976, https://doi.org/10.1002/asl.976, 2020. 

Casati, B., Robinson, T., Lemay, F., Køltzow, M., Haiden, T., Mekis, E., Lespinas, F., Fortin, V., Gascon, G., Milbrandt, J., and Smith, G.: Performance of the Canadian Arctic Prediction System during the YOPP Special Observing Periods, Atmosphere-Ocean, 61, 1–27, https://doi.org/10.1080/07055900.2023.2191831, 2023. 

CF Metadata Conventions: https://cfconventions.org, last access: 27 March 2024. 

CF Standard Name Table: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html, last access: 18 July 2023. 

Clothiaux, E. E., Miller, M. A., Perez, R. C., Turner, D. D., Moran, K. P., Martner, B. E., Ackerman, T. P., Mace, G. G., Marchand, R. T., Widener, K. B., Rodriguez, D. J., Uttal, T., Mather, J. H., Flynn, C. J., Gaustad, K. L., and Ermold, B.: The ARM Millimeter Wave Cloud Radars (MMCRs) and the Active Remote Sensing of Clouds (ARSCL) Value Added Product (VAP), ARM user facility, Pacific Northwest National Laboratory, Richland, WA, United States, 56 pp., https://doi.org/10.2172/1808567, 2001. 

Cox, C. J., Morris, S. M., Uttal, T., Burgener, R., Hall, E., Kutchenreiter, M., McComiskey, A., Long, C. N., Thomas, B. D., and Wendell, J.: The De-Icing Comparison Experiment (D-ICE): a study of broadband radiometric measurements under icing conditions in the Arctic, Atmos. Meas. Tech., 14, 1205–1224, https://doi.org/10.5194/amt-14-1205-2021, 2021. 

Cox, C. J., Gallagher, M., Shupe, M. D., Persson, P. O. G., Solomon, A., Fairall, C. W., Ayers, T., Blomquist, B., Brooks, I. M., Costa, D., Grachev, A., Gottas, D., Hutchings, J. K., Kutchenreiter, M., J. Leach, J., Morris, S. M., Morris, V., Osborn, J., Pezoa, S., Preusser, A., Riihimaki, L., and Uttal, T.: Continuous observations of the surface energy budget and meteorology over the Arctic sea ice during MOSAiC, Sci. Data, 10, 519, https://doi.org/10.1038/s41597-023-02415-5, 2023. 

Data Citation Synthesis Group: Joint Declaration of Data Citation Principles, FORCE11, San Diego CA, https://doi.org/10.25490/a97f-egyk, 2014. 

DataCite Metadata Working Group: DataCite Metadata Schema Documentation for the Publication and Citation of Research Data and Other Research Outputs. Version 4.4, DataCite e.V., 82 pp., https://doi.org/10.14454/3w3z-sa82, 2021. 

Day, J., Svensson, G., Casati, B., Uttal, T., Khalsa, S.-J., Bazile, E., Akish, E., Azouz, N., Ferrighi, L., Frank, H., Gallagher, M., Godøy, Ø., Hartten, L., Huang, L. X., Holt, J., Di Stefano, M., Suomi, I., Mariani, Z., Morris, S., O'Connor, E., Pirazzini, R., Remes, T., Fadeev, R., Solomon, A., Tjernström, J., and Tolstykh, M.: The YOPP site Model Intercomparison Project (YOPPsiteMIP) phase 1: project overview and Arctic winter forecast evaluation, EGUsphere [preprint], https://doi.org/10.5194/egusphere-2023-1951, 2023. 

Eaton, B., Gregory, J., Drach, B., Taylor, K., Hankin, S., Blower, J., Caron, J., Signell, R., Bentley, P., Rappa, G., Höck, H., Pamment, A., Juckes, M., Raspaud, M., Horne, R., Whiteaker, T., Blodgett, D., Zender, C., Lee, D., Hassell, D., Snow, A. D., Kölling, T., Allured, D., Jelenak, A., Soerensen, A. M., Gaultier, L., and Herlédan, S.: NetCDF Climate and Forecast (CF) Metadata Conventions Version 1.10, https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html (last access: 29 March 2024), 2022. 

Essential Climate Variables: https://public.wmo.int/en/programmes/global-climate-observing-system/essential-climate-variables, last access: 13 September 2023. 

Fratini, G. and Mauder, M.: Towards a consistent eddy-covariance processing: an intercomparison of EddyPro and TK3, Atmos. Meas. Tech., 7, 2273–2281, https://doi.org/10.5194/amt-7-2273-2014, 2014. 

Gettelman, A., Geer, A. J., Forbes, R. M., Carmichael, G. R., Feingold, G., Posselt, D. J., Stephens, G. L., van den Heever, S. C., Varble, A. C., and Zuidema, P.: The future of Earth system prediction: Advances in model-data fusion, Sci. Adv., 8, eabn3488, https://doi.org/10.1126/sciadv.abn3488, 2022. 

Global Telecommunication System (GTS): https://community.wmo.int/en/activity-areas/global-telecommunication-system-gts, last access: 16 July 2023. 

Goessling, H. F., Jung, T., Klebe, S., Baeseman, J., Bauer, P., Chen, P., Chevallier, M., Dole, R., Gordon, N., Ruti, P., Bradley, A., Bromwich, D. H., Casati, B., Chechin, D., Day, J. J., Massonnet, F., Mills, B., Renfrew, I. A., Smith, G., and Tatusko, R.: Paving the Way for the Year of Polar Prediction, B. Am. Meteorol. Soc., 97, ES85–ES88, https://doi.org/10.1175/BAMS-D-15-00270.1, 2016. 

Grachev, A. A., Persson, P. O. G., Uttal, T., Akish, E. A., Cox, C. J., Morris, S. M., Fairall, C. W., Stone, R. S., Lesins, G., Makshtas, A. P., and Repina, I. A.: Seasonal and latitudinal variations of surface fluxes at two Arctic terrestrial sites, Clim. Dynam., 51, 1793–1818, https://doi.org/10.1007/s00382-017-3983-4, 2018. 

Guidelines for Construction of CF Standard Names: https://cfconventions.org/Data/cf-standard-names/docs/guidelines.html, last access: 27 March 2024. 

Hanisch, R., Chalk, S., Coulon, R., Cox, S., Emmerson, S., Sandoval, F. J. F., Forbes, A., Frey, J., Hall, B., Hartshorn, R., Heus, P., Hodson, S., Hosaka, K., Hutzschenreuter, D., Kang, C.-S., Picard, S., and White, R.: Stop squandering data: make units of measurement machine-readable, Nature, 605, 222–224, https://doi.org/10.1038/d41586-022-01233-w, 2022. 

Hartten, L. M. and Khalsa, S. J. S.: The H-K Variable SchemaTable developed for the YOPPsiteMIP, Zenodo [code], https://doi.org/10.5281/zenodo.6255666, 2022. 

Hartten, L. M., Cox, C. J., Johnston, P. E., Wolfe, D. E., Abbott, S., McColl, H. A., Quan, X.-W., and Winterkorn, M. G.: Ship- and island-based soundings from the 2016 El Niño Rapid Response (ENRR) field campaign, Earth Syst. Sci. Data, 10, 1165–1183, https://doi.org/10.5194/essd-10-1165-2018, 2018. 

Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1), Geosci. Model Dev., 10, 4619–4646, https://doi.org/10.5194/gmd-10-4619-2017, 2017. 

Hogan, R. J. and O'Connor, E. J.: Facilitating cloud radar and lidar algorithms: the Cloudnet Instrument Synergy/Target Categorization product, 14 pp., https://www.met.rdg.ac.uk/~swrhgnrj/publications/categorization.pdf (last access: 29 March 2024), 2004. 

Holloway, C. E., Petch, J. C., Beare, R. J., Bechtold, P., Craig, G. C., Derbyshire, S. H., Donner, L. J., Field, P. R., Gray, S. L., Marsham, J. H., Parker, D. J., Plant, R. S., Roberts, N. M., Schultz, D. M., Stirling, A. J., and Woolnough, S. J.: Understanding and representing atmospheric convection across scales: recommendations from the meeting held at Dartington Hall, Devon, UK, 28–30 January 2013, Atmos. Sci. Lett., 15, 348–353, https://doi.org/10.1002/asl2.508, 2014. 

Holtslag, A. A. M., Svensson, G., Baas, P., Basu, S., Beare, B., Beljaars, A. C. M., Bosveld, F. C., Cuxart, J., Lindvall, J., Steeneveld, G. J., Tjernström, M., and Van De Wiel, B. J. H.: Stable Atmospheric Boundary Layers and Diurnal Cycles: Challenges for Weather and Climate Models, B. Am. Meteorol. Soc., 94, 1691–1706, https://doi.org/10.1175/BAMS-D-11-00187.1, 2013. 

Illingworth, A. J., Hogan, R. J., O'Connor, E. J., Bouniol, D., Brooks, M. E., Delanoé, J., Donovan, D. P., Eastment, J. D., Gaussiat, N., Goddard, J. W. F., Haeffelin, M., Baltink, H. K., Krasnov, O. A., Pelon, J., Piriou, J.-M., Protat, A., Russchenberg, H. W. J., Seifert, A., Tompkins, A. M., van Zadelhoff, G.-J., Vinit, F., Willén, U., Wilson, D. R., and Wrench, C. L.: Cloudnet: Continuous Evaluation of Cloud Profiles in Seven Operational Models Using Ground-Based Observations, B. Am. Meteorol. Soc., 88, 883–898, https://doi.org/10.1175/BAMS-88-6-883, 2007. 

Ingleby, B., Motl, M., Marlton, G., Edwards, D., Sommer, M., von Rohden, C., Vömel, H., and Jauhiainen, H.: On the quality of RS41 radiosonde descent data, Atmos. Meas. Tech., 15, 165–183, https://doi.org/10.5194/amt-15-165-2022, 2022. 

Jones, M. B., Budden, A. E., Mecum, B., Clark, J., Brun, J., Lowndes, J., and McLean, E.: Data Science Training for Arctic Researchers, Arctic Data Center [data set], https://doi.org/10.18739/A24746R2N, 2020. 

Jung, T., Gordon, N. D., Bauer, P., Bromwich, D. H., Chevallier, M., Day, J. J., Dawson, J., Doblas-Reyes, F. J., Fairall, C., Goessling, H. F., Holland, M., Inoue, J., Iversen, T., Klebe, S., Lemke, P., Losch, M., Makshtas, A., Mills, B., Nurmi, P., Perovich, D., Reid, P., Renfrew, I. A., Smith, G., Svensson, G., Tolstykh, M., and Yang, Q.: Advancing Polar Prediction Capabilities on Daily to Seasonal Time Scales, B. Am. Meteorol. Soc., 97, 1631–1647, https://doi.org/10.1175/BAMS-D-14-00246.1, 2016. 

Jung, T., Wilson, J., Bazille, E., Bromwich, D., Casati, B., Day, J., De Coning, E., Eayrs, C., Grumbine, R., Inoue, J., Khalsa, S. J., Kristiansen, J., Lamers, M., Liggett, D., Olsen, S., Perovich, D., Renfrew, I., Sandu, I., Shupe, M., Smolyanitsky, V., Svensson, G., Sun, Q., Uttal, T., Werner, K., Yang, Q., and Heinrich, V. J.: The Year of Polar Prediction (YOPP): Achievements, impacts and lessons learnt, B. Am. Meteorol. Soc., accepted, 2024. 

Kaimal, J. C. and Finnigan, J. J.: Atmospheric Boundary Layer Flows: Their Structure and Measurement, Oxford University Press, New York, ISBN 9780195062397, 1994. 

Kochendorfer, J., Nitu, R., Wolff, M., Mekis, E., Rasmussen, R., Baker, B., Earle, M. E., Reverdin, A., Wong, K., Smith, C. D., Yang, D., Roulet, Y.-A., Meyers, T., Buisan, S., Isaksen, K., Brækkan, R., Landolt, S., and Jachcik, A.: Testing and development of transfer functions for weighing precipitation gauges in WMO-SPICE, Hydrol. Earth Syst. Sci., 22, 1437–1452, https://doi.org/10.5194/hess-22-1437-2018, 2018. 

Kochendorfer, J., Earle, M., Rasmussen, R., Smith, C., Yang, D., Morin, S., Mekis, E., Buisan, S., Roulet, Y.-A., Landolt, S., Wolff, M., Hoover, J., Thériault, J. M., Lee, G., Baker, B., Nitu, R., Lanza, L., Colli, M., and Meyers, T.: How Well Are We Measuring Snow Post-SPICE?, B. Am. Meteorol. Soc., 103, E370–E388, https://doi.org/10.1175/BAMS-D-20-0228.1, 2022. 

Køltzow, M., Casati, B., Haiden, T., and Valkonen, T.: Verification of Solid Precipitation Forecasts from Numerical Weather Prediction Models in Norway, Weather Forecast., 35, 2279–2292, https://doi.org/10.1175/WAF-D-20-0060.1, 2020. 

Lavergne, T., Kern, S., Aaboe, S., Derby, L., Dybkjaer, G., Garric, G., Heil, P., Hendricks, S., Holfort, J., Howell, S., Key, J., Lieser, J. L., Maksym, T., Maslowski, W., Meier, W., Muñoz-Sabater, J., Nicolas, J., Özsoy, B., Rabe, B., Rack, W., Raphael, M., de Rosnay, P., Smolyanitsky, V., Tietsche, S., Ukita, J., Vichi, M., Wagner, P., Willmes, S., and Zhao, X.: A New Structure for the Sea Ice Essential Climate Variables of the Global Climate Observing System, B. Am. Meteorol. Soc., 103, E1502–E1521, https://doi.org/10.1175/BAMS-D-21-0227.1, 2022. 

Long, C. N. and Shi, Y.: The QCRad Value Added Product: Surface Radiation Measurement Quality Control Testing, Including Climatology Configurable Limits, PNNL, Richland, Washington, United States, 70 pp., https://doi.org/10.2172/1019540, 2006. 

Long, C. N. and Shi, Y.: An Automated Quality Assessment and Control Algorithm for Surface Radiation Measurements, The Open Atmospheric Science Journal , 2, 23–37, https://doi.org/10.2174/1874282300802010023, 2008. 

Mahrt, L. T. and Sun, J.: The Subgrid Velocity Scale in the Bulk Aerodynamic Relationship for Spatially Averaged Scalar Fluxes, Mon. Weather Rev., 123, 3032–3041, https://doi.org/10.1175/1520-0493(1995)123<3032:TSVSIT>2.0.CO;2, 1995. 

Mariani, Z., Huang, L., Crawford, R., Blanchet, J.-P., Hicks-Jalali, S., Mekis, E., Pelletier, L., Rodriguez, P., and Strawbridge, K.: Enhanced automated meteorological observations at the Canadian Arctic Weather Science (CAWS) supersites, Earth Syst. Sci. Data, 14, 4995–5017, https://doi.org/10.5194/essd-14-4995-2022, 2022. 

Mariani, Z., Morris, S., Uttal, T., Akish, E., Crawford, R., Huang, L., Day, J., Tjernström, J., Godøy, Ø., Ferrighi, L., Hartten, L., Holt, J., Cox, C., O'Connor, E., Pirazzini, R., Maturilli, M., Prakash, G., Mather, J., Strong, K., Fogal, P., Kustov, V., Svensson, G., Gallagher, M., and Vasel, B.: Special Observing Period (SOP) Data for the Year of Polar Prediction site Model Intercomparison Project (YOPPsiteMIP), Earth Syst. Sci. Data Discuss. [preprint], https://doi.org/10.5194/essd-2023-497, in review, 2024. 

Matrosov, S. Y., Shupe, M. D., and Uttal, T.: High temporal resolution estimates of Arctic snowfall rates emphasizing gauge and radar-based retrievals from the MOSAiC expedition, Elementa: Science of the Anthropocene, 10, 00101, https://doi.org/10.1525/elementa.2021.00101, 2022. 

Matsui, N., Long, C. N., Augustine, J., Halliwell, D., Uttal, T., Longenecker, D., Niebergall, O., Wendell, J., and Albee, R.: Evaluation of Arctic broadband surface radiation measurements, Atmos. Meas. Tech., 5, 429–438, https://doi.org/10.5194/amt-5-429-2012, 2012. 

Mauder, M., Foken, T., Clement, R., Elbers, J. A., Eugster, W., Grünwald, T., Heusinkveld, B., and Kolle, O.: Quality control of CarboEurope flux data – Part 2: Inter-comparison of eddy-covariance software, Biogeosciences, 5, 451–462, https://doi.org/10.5194/bg-5-451-2008, 2008. 

Monin, A. S. and Obukhov, A.: Basic laws of turbulent mixing in the surface layer of the atmosphere, Contrib. Geophys. Inst. Acad. Sci. USSR, 151 163–187, 1954. 

Morris, S. and Akish, E.: A-M Variable & Attribute Template Table developed for the YOPPsiteMIP, Zenodo [code], https://doi.org/10.5281/zenodo.6780400, 2022. 

Morris, S. and Uttal, T.: Datagrams: Diagrammatic Metadata for Humans, B. Am. Meteorol. Soc., 103, E1343–E1350, https://doi.org/10.1175/BAMS-D-21-0219.1, 2022. 

Nature Editorial : Time to recognize authorship of open data, Nature, 608, 8, https://doi.org/10.1038/d41586-022-00921-x, 2022. 

Neang, A. B., Sutherland, W., Beach, M. W., and Lee, C. P.: Data Integration as Coordination: The Articulation of Data Work in an Ocean Science Collaboration, Proc. ACM Hum.-Comput. Interact., 4, 256, https://doi.org/10.1145/3432955, 2021. 

Nicolaus, M., Perovich, D. K., Spreen, G., Granskog, M. A., von Albedyll, L., Angelopoulos, M., Anhaus, P., Arndt, S., Belter, H. J., Bessonov, V., Birnbaum, G., Brauchle, J., Calmer, R., Cardellach, E., Cheng, B., Clemens-Sewall, D., Dadic, R., Damm, E., de Boer, G., Demir, O., Dethloff, K., Divine, D. V., Fong, A. A., Fons, S., Frey, M. M., Fuchs, N., Gabarró, C., Gerland, S., Goessling, H. F., Gradinger, R., Haapala, J., Haas, C., Hamilton, J., Hannula, H.-R., Hendricks, S., Herber, A., Heuzé, C., Hoppmann, M., Høyland, K. V., Huntemann, M., Hutchings, J. K., Hwang, B., Itkin, P., Jacobi, H.-W., Jaggi, M., Jutila, A., Kaleschke, L., Katlein, C., Kolabutin, N., Krampe, D., Kristensen, S. S., Krumpen, T., Kurtz, N., Lampert, A., Lange, B. A., Lei, R., Light, B., Linhardt, F., Liston, G. E., Loose, B., Macfarlane, A. R., Mahmud, M., Matero, I. O., Maus, S., Morgenstern, A., Naderpour, R., Nandan, V., Niubom, A., Oggier, M., Oppelt, N., Pätzold, F., Perron, C., Petrovsky, T., Pirazzini, R., Polashenski, C., Rabe, B., Raphael, I. A., Regnery, J., Rex, M., Ricker, R., Riemann-Campe, K., Rinke, A., Rohde, J., Salganik, E., Scharien, R. K., Schiller, M., Schneebeli, M., Semmling, M., Shimanchuk, E., Shupe, M. D., Smith, M. M., Smolyanitsky, V., Sokolov, V., Stanton, T., Stroeve, J., Thielke, L., Timofeeva, A., Tonboe, R. T., Tavri, A., Tsamados, M., Wagner, D. N., Watkins, D., Webster, M., and Wendisch, M.: Overview of the MOSAiC expedition: Snow and sea ice, Elementa: Science of the Anthropocene, 10, 000046, https://doi.org/10.1525/elementa.2021.000046, 2022. 

Nitu, R., Roulet, Y.-A., Wolff, M., Earle, M., Reverdin, A., Smith, C., Kochendorfer, J., Morin, S., Rasmussen, R., Wong, K., Alastrué, J., Arnold, L., Baker, B., Buisán, S., Collado, J. L., Colli, M., Collins, B., Gaydos, A., Hannula, H.-R., Hoover, J., Joe, P., Kontu, A., Laine, T., Lanza, L., Lanzinger, E., Lee, G., Lejeune, Y., Leppänen, L., Mekis, E., Panel, J.-M., Poikonen, A., Ryu, S., Sabatini, F., Theriault, J., Yang, D., Genthon, C., van den Heuvel, F., Hirasawa, N., Konishi, H., Motoyoshi, H., Nakai, S., Nishimura, K., Senese, A., and Yamashita, K.: WMO Solid Precipitation Intercomparison Experiment (SPICE) (2012–2015), World Meteorological Organization (WMO), Geneva, Switzerland, IOM No. 1, 1443 pp., https://library.wmo.int/opac/ (last access: 21 August 2023), 2018. 

Norwegian Meteorological Institute: MET Norway YOPP Supersite Catalog, Norwegian Meteorological Institute [data set], https://thredds.met.no/thredds/catalog/alertness/YOPP_supersite/catalog.html (last access: 4 August 2023), 2022. 

Ohmura, A., Dutton, E. G., Forgan, B., Fröhlich, C., Gilgen, H., Hegner, H., Heimo, A., König-Langlo, G., McArthur, B., Müller, G., Philipona, R., Pinker, R., Whitlock, C. H., Dehne, K., and Wild, M.: Baseline Surface Radiation Network (BSRN/WCRP): New Precision Radiometry for Climate Research, B. Am. Meteorol. Soc., 79, 2115–2136, https://doi.org/10.1175/1520-0477(1998)079<2115:BSRNBW>2.0.CO;2, 1998. 

Papoutsoglou, E. A., Athanasiadis, I. N., Visser, R. G. F., and Finkers, R.: The benefits and struggles of FAIR data: the case of reusing plant phenotyping data, Sci. Data, 10, 457, https://doi.org/10.1038/s41597-023-02364-z, 2023. 

Pierce, H. H., Dev, A., Statham, E., and Bierer, B. E.: Credit data generators for data reuse, Nature, 570, 30–32, https://doi.org/10.1038/d41586-019-01715-4, 2019. 

PPP Steering Group, Bauer, P., Bradley, A., Bromwich, D., Casati, B., Chen, P., Chevallier, M., Dawson, J., Day, J., Doblas-Reyes, F. J., Fairall, C., Goessling, H., Gordon, N., Grumbine, R., Hoke, W., Holland, M., Inoue, J., Iversen, T., Jung, T., Khalsa, S. J. S., Klebe, S., Kristiansen, J., Lamers, M., Lemke, P., Liggett, D., Ljubicic, G., Massonnet, F., Makshtas, A., Mills, B., Nurmi, P., Olsen, S., Perovich, D., Reid, P., Renfrew, I., Sandu, I., Smith, G., Stewart, E., Smolyanitsky, V., Svensson, G., Swinbank, R., Tolstykh, M., Uttal, T., Werner, K., Wilson, J., and Yang, Q.: WWRP Polar Prediction Project Implementation Plan for the Year of Polar Prediction (YOPP), 80 pp., https://www.polarprediction.net/about/implementation-and-science-plans/ (last access: 15 July 2023), 2019. 

Prakash, G., Shrestha, B., Younkin, K., Jundt, R., Martin, M., and Elliott, J.: Data Always Getting Bigger – A Scalable DOI Architecture for Big and Expanding Scientific Data, Data, 1, 11, https://doi.org/10.3390/data1020011, 2016. 

Sardeshmukh, P. D., Compo, G. P., and Penland, C.: Need for Caution in Interpreting Extreme Weather Statistics, J. Climate, 28, 9166–9187, https://doi.org/10.1175/JCLI-D-15-0020.1, 2015. 

Shupe, M. D., Rex, M., Blomquist, B., Persson, P. O. G., Schmale, J., Uttal, T., Althausen, D., Angot, H., Archer, S., Bariteau, L., Beck, I., Bilberry, J., Bucci, S., Buck, C., Boyer, M., Brasseur, Z., Brooks, I. M., Calmer, R., Cassano, J., Castro, V., Chu, D., Costa, D., Cox, C. J., Creamean, J., Crewell, S., Dahlke, S., Damm, E., de Boer, G., Deckelmann, H., Dethloff, K., Dütsch, M., Ebell, K., Ehrlich, A., Ellis, J., Engelmann, R., Fong, A. A., Frey, M. M., Gallagher, M. R., Ganzeveld, L., Gradinger, R., Graeser, J., Greenamyer, V., Griesche, H., Griffiths, S., Hamilton, J., Heinemann, G., Helmig, D., Herber, A., Heuzé, C., Hofer, J., Houchens, T., Howard, D., Inoue, J., Jacobi, H.-W., Jaiser, R., Jokinen, T., Jourdan, O., Jozef, G., King, W., Kirchgaessner, A., Klingebiel, M., Krassovski, M., Krumpen, T., Lampert, A., Landing, W., Laurila, T., Lawrence, D., Lonardi, M., Loose, B., Lüpkes, C., Maahn, M., Macke, A., Maslowski, W., Marsay, C., Maturilli, M., Mech, M., Morris, S., Moser, M., Nicolaus, M., Ortega, P., Osborn, J., Pätzold, F., Perovich, D. K., Petäjä, T., Pilz, C., Pirazzini, R., Posman, K., Powers, H., Pratt, K. A., Preußer, A., Quéléver, L., Radenz, M., Rabe, B., Rinke, A., Sachs, T., Schulz, A., Siebert, H., Silva, T., Solomon, A., Sommerfeld, A., Spreen, G., Stephens, M., Stohl, A., Svensson, G., Uin, J., Viegas, J., Voigt, C., von der Gathen, P., Wehner, B., Welker, J. M., Wendisch, M., Werner, M., Xie, Z. Q., and Yue, F.: Overview of the MOSAiC expedition: Atmosphere, Elementa: Science of the Anthropocene, 10, 00060, https://doi.org/10.1525/elementa.2021.00060, 2022. 

Sprintall, J., Coles, V. J., Reed, K. A., Butler, A. H., Foltz, G. R., Penny, S. G., and Seo, H.: Using Process Studies to Improve Climate Modeling: Strategies for Success, B. Am. Meteorol. Soc., 102, 523–526, https://doi.org/10.1175/BAMS-D-19-0263.A, 2021. 

Stephan, C. C., Schnitt, S., Schulz, H., Bellenger, H., de Szoeke, S. P., Acquistapace, C., Baier, K., Dauhut, T., Laxenaire, R., Morfa-Avalos, Y., Person, R., Quiñones Meléndez, E., Bagheri, G., Böck, T., Daley, A., Güttler, J., Helfer, K. C., Los, S. A., Neuberger, A., Röttenbacher, J., Raeke, A., Ringel, M., Ritschel, M., Sadoulet, P., Schirmacher, I., Stolla, M. K., Wright, E., Charpentier, B., Doerenbecher, A., Wilson, R., Jansen, F., Kinne, S., Reverdin, G., Speich, S., Bony, S., and Stevens, B.: Ship- and island-based atmospheric soundings from the 2020 EUREC4A field campaign, Earth Syst. Sci. Data, 13, 491–514, https://doi.org/10.5194/essd-13-491-2021, 2021. 

Stephens, G. L., Polcher, J., Zeng, X., van Oevelen, P., Poveda, G., Bosilovich, M., Ahn, M.-H., Balsamo, G., Duan, Q., Hegerl, G. C., Jakob, C., Lamptey, B., Leung, R., Piles, M., Su, Z., Dirmeyer, P., Findell, K. L., Verhoef, A., Ek, M., L'Ecuyer, T., Roca, R., Nazemi, A., Dominguez, F., Klocke, D., and Bony, S.: The First 30 Years of GEWEX, B. Am. Meteorol. Soc., 104, E126–E157, https://doi.org/10.1175/BAMS-D-22-0061.1, 2023. 

Stokes, G. M. and Schwartz, S. E.: The Atmospheric Radiation Measurement (ARM) Program: Programmatic Background and Design of the Cloud and Radiation Test Bed, B. Am. Meteorol. Soc., 75, 1201–1222, https://doi.org/10.1175/1520-0477(1994)075<1201:TARMPP>2.0.CO;2, 1994. 

Svensson, G., Casati, B., Day, J., Uttal, T., Godøy, Ø., and Hartten, L.: YOPPsiteMIP – The YOPP site Model Inter-comparison Project, Alfred-Wegener-Institut, Bremerhaven, 15 pp., https://www.polarprediction.net/fileadmin/user_upload/www.polarprediction.net/Home/Organization/Task_Teams/Atmospheric_Processes/YOPP_Supersite_common_model_output_rev8.pdf (last access: 23 April 2023), 2020. 

Taylor, K. E., Durack, P. J., Elkington, M., Guilyardi, E., Hassell, D., Lautenschlager, M., and Stockhause, M.: CMIP6 Participation Guidance for Modelers, https://pcmdi.llnl.gov/CMIP6/Guide/modelers.html (last access: 15 July 2023), 2022. 

Tjernström, J.: Visualizing Process-Based Model Evaluation for Numerical Weather Prediction Models, Student thesis, 16 pp., http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-187157 (last access: 26 June 2024), 2022. 

Unidata: UDUNITS-2, version 2.2.28 (2.2.28), UCAR/Unidata Program Center [software], https://doi.org/10.5065/D6KD1WN0, 2020. 

Unidata: NetCDF User's Guide, version 1.1, NetCDF [data set], https://doi.org/10.26024/nw73-vm64, 2023. 

Uttal, T., Starkweather, S., Drummond, J. R., Vihma, T., Makshtas, A. P., Darby, L. S., Burkhart, J. F., Cox, C. J., Schmeisser, L. N., Haiden, T., Maturilli, M., Shupe, M. D., De Boer, G., Saha, A., Grachev, A. A., Crepinsek, S. M., Bruhwiler, L., Goodison, B., McArthur, B., Walden, V. P., Dlugokencky, E. J., Persson, P. O. G., Lesins, G., Laurila, T., Ogren, J. A., Stone, R. S., Long, C. N., Sharma, S., Massling, A., Turner, D. D., Stanitski, D. M., Asmi, E., Aurela, M., Skov, H., Eleftheriadis, K., Virkkula, A., Platt, A., Førland, E. J., Iijima, Y., Nielsen, I. E., Bergin, M. H., Candlish, L., Zimov, N. S., Zimov, S. A., O'Neill, N. T., Fogal, F., Kivi, R., Konopleva-Akish, E. A., Verlinde, J., Kustov, V. Y., Vasel, B., Ivakhov, V. M., Viisanen, Y., and Intrieri, J. M.: International Arctic Systems for Observing the Atmosphere: An International Polar Year Legacy Consortium, B. Am. Meteorol. Soc., 97, 1033–1056, https://doi.org/10.1175/BAMS-D-14-00145.1, 2016. 

Vannan, S., Downs, R. R., Meier, W., Wilson, B. E., and Gerasimov, I. V.: Data sets are foundational to research. Why don't we cite them?, Eos, 101, https://doi.org/10.1029/2020EO151665, 2020. 

Vorosmarty, C., Rawlins, M., Hinzman, L., Francis, J., Serreze, M., Liljedahl, A., McDonald, K., Piasecki, M., and Rich, R.: Opportunities and Challenges in Arctic System Synthesis: A Consensus Report from the Arctic Research Community, New York, 84 pp., https://www.arcus.org/publications/28459 (last access: 15 July 2023), 2018. 

Wei, Y., Shrestha, R., Pal, S., Gerken, T., Feng, S., McNelis, J., Singh, D., Thornton, M. M., Boyer, A. G., Shook, M. A., Chen, G., Baier, B. C., Barkley, Z. R., Barrick, J. D., Bennett, J. R., Browell, E. V., Campbell, J. F., Campbell, L. J., Choi, Y., Collins, J., Dobler, J., Eckl, M., Fiehn, A., Fried, A., Digangi, J. P., Barton-Grimley, R., Halliday, H., Klausner, T., Kooi, S., Kostinek, J., Lauvaux, T., Lin, B., McGill, M. J., Meadows, B., Miles, N. L., Nehrir, A. R., Nowak, J. B., Obland, M., O'Dell, C., Fao, R. M. P., Richardson, S. J., Richter, D., Roiger, A., Sweeney, C., Walega, J., Weibring, P., Williams, C. A., Yang, M. M., Zhou, Y., and Davis, K. J.: Atmospheric Carbon and Transport – America (ACT-America) Data Sets: Description, Management, and Delivery, Earth Space Sci., 8, e2020EA001634, https://doi.org/10.1029/2020EA001634, 2021. 

Wilkinson, M. D., Dumontier, M., Aalbersberg, I. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., Gonzalez-Beltran, A., Gray, A. J. G., Groth, P., Goble, C., Grethe, J. S., Heringa, J., 't Hoen, P. A. C., Hooft, R., Kuhn, T., Kok, R., Kok, J., Lusher, S. J., Martone, M. E., Mons, A., Packer, A. L., Persson, B., Rocca-Serra, P., Roos, M., van Schaik, R., Sansone, S.-A., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M. A., Thompson, M., van der Lei, J., van Mulligen, E., Velterop, J., Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J., and Mons, B.: The FAIR Guiding Principles for scientific data management and stewardship, Sci. Data, 3, 160018, https://doi.org/10.1038/sdata.2016.18, 2016. 

Wilson, J., Jung, T., Bazile, E., Bromwich, D., Casati, B., Day, J., De Coning, E., Eayrs, C., Grumbine, R., Ioue, J., Khalsa, S. J. S., Kristiansen, J., Lamers, M., Liggett, D., Olsen, S. M., Perovich, D., Renfrew, I. A., Smolyanitsky, V., Svensson, G., Sun, Q., Uttal, T., and Yang, Q.: The YOPP Final Summit: Assessing Past and Forecasting Future Polar Prediction Research, B. Am. Meteorol. Soc., 104, E660–E665, https://doi.org/10.1175/BAMS-D-22-0282.1, 2023.  

Wolff, M. A., Isaksen, K., Petersen-Øverleir, A., Ødemark, K., Reitan, T., and Brækkan, R.: Derivation of a new continuous adjustment function for correcting wind-induced loss of solid precipitation: results of a Norwegian field study, Hydrol. Earth Syst. Sci., 19, 951–967, https://doi.org/10.5194/hess-19-951-2015, 2015. 

World Meteorological Organization (WMO): Manual on the Global Telecommunication System: Annex III to the WMO Technical Regulations, 2015, WMO (Series), no. 386, Secretariat of the World Meteorological Organization, Geneva, Switzerland, 197 pp., ISBN 978-92-63-10386-4, https://library.wmo.int/idurl/4/35800 (last access: 26 June 2024), 2020. 

Xie, S., McCoy, R. B., Klein, S. A., Cederwall, T., Wiscombe, W. J., Jensen, M. P., Johnson, K. L., Clothiaux, E. E., Gaustad, K. L., Long, C. N., Mather, J. H., McFarlane, S. A., Shi, Y., Golaz, J.-C., Lin, Y., Hall, S. D., McCord, R. A., Palanisamy, G., and Turner, D. D.: CLOUDS AND MORE: ARM Climate Modeling Best Estimate Data: A New Data Product for Climate Studies, B. Am. Meteorol. Soc., 91, 13–20, https://doi.org/10.1175/2009BAMS2891.1, 2010. 

Zuo, G., Dou, Y., and Lei, R.: Discrimination Algorithm and Procedure of Snow Depth and Sea Ice Thickness Determination Using Measurements of the Vertical Ice Temperature Profile by the Ice-Tethered Buoys, Sensors, 18, 4162, https://doi.org/10.3390/s18124162, 2018. 

1

Differences generally arise from the fact that forecast models can produce tendencies on time-step scales that are typically not available from field instruments and because the model output is typically regularly distributed in time and space; i.e., the output does not come from discrete instruments which may move irregularly in space, which operate at different optimized cadences, and which are subject to physical or power disruptions.

2

TOA variables are only relevant to MMDFs but are grouped with surface variables in the H-K schema and are therefore included for completeness.

Download
Short summary
A Merged Observatory Data File (MODF) format to systematically collate complex atmosphere, ocean, and terrestrial data sets collected by multiple instruments during field campaigns is presented. The MODF format is also designed to be applied to model output data, yielding format-matching Merged Model Data Files (MMDFs). MODFs plus MMDFs will augment and accelerate the synergistic use of model results with observational data to increase understanding and predictive skill.