the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Virtual Integration of Satellite and In-situ Observation Networks (VISION) v1.0: In-Situ Observations Simulator
Abstract. This work presents the first step in the development of the VISION toolkit, a set of python tools that allows for easy, efficient and more meaningful comparison between global atmospheric models and observational data. Whilst observational data and modelling capabilities are expanding in parallel, there are still barriers preventing these two data sources to be used in synergy. This arises from differences in spatial and temporal sampling between models and observational platforms: observational data from a research aircraft, for example, is sampled on specified flight trajectories at very high temporal resolution. Proper comparison with model data requires generating, storing and handling a large amount of highly temporally resolved model files, resulting in a process which is data, labour, and time intensive. In this paper we focus on comparison between model data and in-situ observations (from aircrafts, ships, buoys, sondes etc.). A stand-alone code, In-Situ Observation simulator, or ISO_simulator in short, is described here: this software reads modelled variables and observational data files and outputs model data interpolated in space and time to match observations. This model data is then written to NetCDF files that can be efficiently archived, due to their small sizes, and directly compared to observations. This method achieves a large reduction in the size of model data being produced for comparison with flight and other in-situ data. By interpolating global, gridded, hourly files onto observations locations, we reduce data output for a typical climate resolution run, from ~3 Gb per model variable per month to ~15 Mb per model variable per month (a 200 times reduction in data volume). The VISION toolkit is fast and easy to use, therefore enabling the exploitation of large observational datasets spanning decades, to be used for large scale model evaluation. Although this code has been initially tested within the Unified Model (UM) framework, which is shared by the UK Earth System Model (UKESM), it was written as a flexible tool and it can be extended to work with other models.
- Preprint
(2530 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on gmd-2024-73', Gijs van den Oord, 15 Jul 2024
Dear authors of the manuscript "Virtual Integration of Satellite and In-situ Observation Networks (VISION) v1.0: In-Situ Observation Simulator", I had the pleasure to read your manuscript describing a tool that undoubtedly will prove itself to be a very useful and important to the validation and quality assessment of our climate models. However, I do have a few minor comments about this paper
- On page 3, line 72 you speculate that this software could be a valuable tool for upcoming CMIP7 experiments. However, in the rest of the text, I get the impression that VISION is designed to work with hourly output from UM/UKESM with specific output file names. Furthermore, to my knowledge the most frequent global fields in CMIP are 3-hourly. Does the tool operate on CMIP6 CMORized output? If not, could you elaborate in the discussion what would be needed for VISION to claim a role in the CMIP7 assessment cycle?
- In section 2.3 you briefly discuss the output of the VISION tool. Is the output CF-compliant? Please mention if so.
- Table 2: you test different I/O libraries for NetCDF on performance. Surprisingly, the cf-python library is faster when reading a pp file with 36 fields than a single field. Could you elaborate on this? In a broader perspective, I'm not sure whether a numpy.print is a good indicator for the I/O performance of the actual VISION workload, especially for distributed lazy I/O libraries under consideration. Maybe extracting values along a trajectory would provide a better indication of the performance? Please address this concern in the text.
- Section 4: the examples only involve ozone concentration. Since the tool is presented as a general-purpose interpolator/collocator, one would expect multiple variables to be plotted for illustration.
Citation: https://doi.org/10.5194/gmd-2024-73-RC1 -
AC1: 'Reply on RC1', Maria Russo, 25 Oct 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-73/gmd-2024-73-AC1-supplement.pdf
-
RC2: 'Comment on gmd-2024-73', Thomas Clune, 16 Jul 2024
(1) I believe that it would be worth mentioning the following tangential point.
Another scenario where the overall approach described can be beneficial is for the creation of Nature Run (NR) simulations used for OSSEs. These will generally be much higher in resolution than climate simulations, and therefore allow for an even higher compression ratio. Further, practical limitations generally limit NR output to infrequent snapshots, whereas there is great research value in sampling at the model time step. However, to be practical, producing in-situ data in this configuration will generally require online processing to avoid the costly intermediate step of dumping full states to disk as the first step in the processing workflow. OTOH, online processing would allow for improved scalability of the interpolation step. It would be nice if future versions of VISION would allow for an online distributed interface for such scenarios.
(2) To a limited degree a similar approach has been used for field campaigns. E.g., https://github.com/GEOS-ESM/GMAOpyobs/blob/develop/src/pyobs/sampler.py
Citation: https://doi.org/10.5194/gmd-2024-73-RC2 -
AC2: 'Reply on RC2', Maria Russo, 25 Oct 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-73/gmd-2024-73-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Maria Russo, 25 Oct 2024
-
RC3: 'Comment on gmd-2024-73', Manuel Schlund, 31 Jul 2024
The paper “Virtual Integration of Satellite and In-situ Observation Networks (VISION) v1.0: In-Situ Observations Simulator” by Maria R. Russo et al. addresses an important scientific question: how can we accurately and effectively compare climate model output and observational data? The authors achieve this by co-locating model output to the same time and location as the corresponding observations, which is a very reasonable approach. This does not only reduce the amount of data that needs to be stored, but also allows a direct comparison of model output and observations.
The paper is well-structured and fits well into the scope of the journal. I really enjoyed reading it! Thus, I only have some minor comments which I hope can be used to improve the manuscript even further.
General Comments
- Regarding section 2.4 “Code optimization”: Could you give a rough estimate of how much real time running ISO_simulator takes? Are you using any of the parallel/distributed/out-of-core computing functionalities provided by cf-python or Iris via Dask? This could potentially lead to large performance improvements.
- Please consider publishing your code as a Python package on PyPI and/or conda-forge to enable installing it via “pip install ” or “conda install ”. This will greatly simplify the installation process, the dependency handling, and the inclusion of your code into other software. For example, other software products could simply do a “import <name_of_your_package” in their code.
Specific Comments
- 23: I think it would be helpful to also include the acronym “ISO_simulator” into the title. You mention it very often in the paper, so I think it deserves to be there.
- 45: There are many models which also use unstructured grids (ICON, FESOM, etc.), so it’s probably better to avoid the term “regular grid”, which really is the opposite of an unstructured (or irregular) grid.
- 83-86: Mention what you need Iris for? Both other tools are mentioned here.
- Table 1: Please mention that input files can also be other formats than PP (like you do in the next paragraph).
- Table 2: Could you please explain what you mean by “Iris + structured UM loading” and why the difference is so big between “Iris” and that?
- 143: Iris 3.1.0 is very old (Sep. 2021), have you considered using a later version?
- 189-191: You already mentioned a lot of this in the paragraph l.180-185, maybe you can unify this?
Technical Corrections
- 35: “NERC” is not defined
- 114-115: “input variable” -> “command line argument”
- 188: “UAV” is undefined
Citation: https://doi.org/10.5194/gmd-2024-73-RC3 -
AC3: 'Reply on RC3', Maria Russo, 25 Oct 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-73/gmd-2024-73-AC3-supplement.pdf
- AC4: 'Reply on AC3', Maria Russo, 26 Oct 2024
Status: closed
-
RC1: 'Comment on gmd-2024-73', Gijs van den Oord, 15 Jul 2024
Dear authors of the manuscript "Virtual Integration of Satellite and In-situ Observation Networks (VISION) v1.0: In-Situ Observation Simulator", I had the pleasure to read your manuscript describing a tool that undoubtedly will prove itself to be a very useful and important to the validation and quality assessment of our climate models. However, I do have a few minor comments about this paper
- On page 3, line 72 you speculate that this software could be a valuable tool for upcoming CMIP7 experiments. However, in the rest of the text, I get the impression that VISION is designed to work with hourly output from UM/UKESM with specific output file names. Furthermore, to my knowledge the most frequent global fields in CMIP are 3-hourly. Does the tool operate on CMIP6 CMORized output? If not, could you elaborate in the discussion what would be needed for VISION to claim a role in the CMIP7 assessment cycle?
- In section 2.3 you briefly discuss the output of the VISION tool. Is the output CF-compliant? Please mention if so.
- Table 2: you test different I/O libraries for NetCDF on performance. Surprisingly, the cf-python library is faster when reading a pp file with 36 fields than a single field. Could you elaborate on this? In a broader perspective, I'm not sure whether a numpy.print is a good indicator for the I/O performance of the actual VISION workload, especially for distributed lazy I/O libraries under consideration. Maybe extracting values along a trajectory would provide a better indication of the performance? Please address this concern in the text.
- Section 4: the examples only involve ozone concentration. Since the tool is presented as a general-purpose interpolator/collocator, one would expect multiple variables to be plotted for illustration.
Citation: https://doi.org/10.5194/gmd-2024-73-RC1 -
AC1: 'Reply on RC1', Maria Russo, 25 Oct 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-73/gmd-2024-73-AC1-supplement.pdf
-
RC2: 'Comment on gmd-2024-73', Thomas Clune, 16 Jul 2024
(1) I believe that it would be worth mentioning the following tangential point.
Another scenario where the overall approach described can be beneficial is for the creation of Nature Run (NR) simulations used for OSSEs. These will generally be much higher in resolution than climate simulations, and therefore allow for an even higher compression ratio. Further, practical limitations generally limit NR output to infrequent snapshots, whereas there is great research value in sampling at the model time step. However, to be practical, producing in-situ data in this configuration will generally require online processing to avoid the costly intermediate step of dumping full states to disk as the first step in the processing workflow. OTOH, online processing would allow for improved scalability of the interpolation step. It would be nice if future versions of VISION would allow for an online distributed interface for such scenarios.
(2) To a limited degree a similar approach has been used for field campaigns. E.g., https://github.com/GEOS-ESM/GMAOpyobs/blob/develop/src/pyobs/sampler.py
Citation: https://doi.org/10.5194/gmd-2024-73-RC2 -
AC2: 'Reply on RC2', Maria Russo, 25 Oct 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-73/gmd-2024-73-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Maria Russo, 25 Oct 2024
-
RC3: 'Comment on gmd-2024-73', Manuel Schlund, 31 Jul 2024
The paper “Virtual Integration of Satellite and In-situ Observation Networks (VISION) v1.0: In-Situ Observations Simulator” by Maria R. Russo et al. addresses an important scientific question: how can we accurately and effectively compare climate model output and observational data? The authors achieve this by co-locating model output to the same time and location as the corresponding observations, which is a very reasonable approach. This does not only reduce the amount of data that needs to be stored, but also allows a direct comparison of model output and observations.
The paper is well-structured and fits well into the scope of the journal. I really enjoyed reading it! Thus, I only have some minor comments which I hope can be used to improve the manuscript even further.
General Comments
- Regarding section 2.4 “Code optimization”: Could you give a rough estimate of how much real time running ISO_simulator takes? Are you using any of the parallel/distributed/out-of-core computing functionalities provided by cf-python or Iris via Dask? This could potentially lead to large performance improvements.
- Please consider publishing your code as a Python package on PyPI and/or conda-forge to enable installing it via “pip install ” or “conda install ”. This will greatly simplify the installation process, the dependency handling, and the inclusion of your code into other software. For example, other software products could simply do a “import <name_of_your_package” in their code.
Specific Comments
- 23: I think it would be helpful to also include the acronym “ISO_simulator” into the title. You mention it very often in the paper, so I think it deserves to be there.
- 45: There are many models which also use unstructured grids (ICON, FESOM, etc.), so it’s probably better to avoid the term “regular grid”, which really is the opposite of an unstructured (or irregular) grid.
- 83-86: Mention what you need Iris for? Both other tools are mentioned here.
- Table 1: Please mention that input files can also be other formats than PP (like you do in the next paragraph).
- Table 2: Could you please explain what you mean by “Iris + structured UM loading” and why the difference is so big between “Iris” and that?
- 143: Iris 3.1.0 is very old (Sep. 2021), have you considered using a later version?
- 189-191: You already mentioned a lot of this in the paragraph l.180-185, maybe you can unify this?
Technical Corrections
- 35: “NERC” is not defined
- 114-115: “input variable” -> “command line argument”
- 188: “UAV” is undefined
Citation: https://doi.org/10.5194/gmd-2024-73-RC3 -
AC3: 'Reply on RC3', Maria Russo, 25 Oct 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-73/gmd-2024-73-AC3-supplement.pdf
- AC4: 'Reply on AC3', Maria Russo, 26 Oct 2024
Data sets
UKESM1 hourly modelled ozone for comparison to observations N. L. Abraham and M. R. Russo https://catalogue.ceda.ac.uk/uuid/300046500aeb4af080337ff86ae8e776
Continuous Cape Verde Atmospheric Observatory Observations L. J. Carpenter et al. https://catalogue.ceda.ac.uk/uuid/81693aad69409100b1b9a247b9ae75d5
FAAM ozone dataset 2010 to 2020. NERC EDS Centre for Environmental Data Analysis M. R. Russo, N. L. Abraham, and FAAM Airborne Laboratory https://catalogue.ceda.ac.uk/uuid/8df2e81dbfc2499983aa87781fb3fd5a
ATom: Merged Atmospheric Chemistry, Trace Gases, and Aerosols, Version 2 S. C. Wofsy et al. https://doi.org/10.3334/ORNLDAAC/1925
Model code and software
NCAS-VISION/VISION-toolkit: 1.0 M. R. Russo, S. L. Bartholomew, and N. L. Abraham https://doi.org/10.5281/ZENODO.10927302
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
437 | 133 | 117 | 687 | 21 | 15 |
- HTML: 437
- PDF: 133
- XML: 117
- Total: 687
- BibTeX: 21
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1