the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
EAT v0.9.6: a 1D testbed for physical-biogeochemical data assimilation in natural waters
Abstract. Data assimilation (DA) in marine and freshwater systems combines numerical models and observations to deliver the best possible characterisation of a water body’s physical and biogeochemical state. This underpins the widely used 3D ocean state reanalyses and forecasts produced operationally by e.g. the Copernicus Marine Service. The use of DA in natural waters is an active field of research, but testing new developments in realistic setting can be challenging, as operational DA systems are demanding in terms of computational resources and technical skill. There is a need for testbeds that sufficiently realistic but also efficient to run and easy to operate. Here, we present the Ensemble and Assimilation Tool (EAT): a flexible and extensible software package that enables data assimilation of physical and biogeochemical variables in a one-dimensional water column. EAT builds on established open-source components for hydrodynamics (GOTM), biogeochemistry (FABM) and data assimilation (PDAF). It is easy to install and operate, and flexible through support for user-written plugins. EAT is well suited to explore and advance the state-of-the-art in DA in natural waters thanks to its support for (1) strongly and weakly coupled data assimilation, (2) observations describing any prognostic and diagnostic element of the physical-biogeochemical model, and (3) estimation of biogeochemical parameters. Its range of capabilities is demonstrated with three applications: ensemble-based coupled physical-biogeochemical assimilation, the use of variational methods (3D-Var) to assimilate sea surface chlorophyll, and the estimation of biogeochemical parameters.
- Preprint
(1600 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2023-238', Anonymous Referee #1, 03 Jan 2024
The authors present a data assimilation (DA) test bed based on a 1-dimensional physical-biogeochemical water column model. The manuscript contains 3 example applications for the DA framework, which highlights its versatility regarding different model configurations and DA techniques. It could be a useful tool for beginners to learn about DA, or for practitioners to test modifications to existing DA systems. The manuscript is well written and easy to follow. I tested one of the example applications on a computer, and it ran with just a few small issues. In the manuscript I would like to see a bit more accompanying information about the test cases, in particular, related to how easy it would be to modify some of the implementation aspects.
General comments
Overall, the three test cases that are presented in the manuscript, and also included as example applications in the downloadable software, are very instructive and helpful to a potential user of EAT. Each of the test cases appears to highlight an issue or weakness of the chosen modelling/DA approach and made me think of possible extensions of the test cases to further investigate or mitigate those issues. Here, it would be useful to include more information in the manuscript to describe how much user input or work would be required to extend the test cases.
Case in point, in test case 1 (Section 3.1) it would be interesting to examine the inclusion of more sources of uncertainty, beside the 3 or 5 parameters that are included in the ensemble generation. I am not suggesting the authors make any changes to the test case, but it would be helpful to describe the amount of change required to include other parameters here (Fig 2 appears to suggest it requires just a small change in one of the python files) or introduce changes to the initial conditions. This additional information could be included in a final paragraph of the section.
Similarly, in test case 2, the biogeochemical covariance is limited to the phytoplankton variables. Mention how easy it would be to expand it, for example, to include nitrate.
Then there is a (perhaps worrying) decline in subsurface chlorophyll brought by the assimilation of surface chlorophyll. Here, EAT could be a nice test bed to evaluate modified observation operators, would this be an easy thing to implement? For example, could a chlorophyll observation be considered the sum of the top 4 grid cells? Or -- more difficult -- could the observation operator be dynamically determined based on optical depth? Again, a small paragraph suggesting changes to the test case, and a description of the effort that would be involved, would be of use for many future users of EAT.
In test case 3, finally, the authors suggest that perhaps more than one parameter should be included in the estimation. How difficult would it be to include more? Beginners like me do not know, but would be interested in learning more, especially if the change required is small (modification of a YAML file perhaps).Beyond the test cases, it might also be of interest to some to include a description of some more sophisticated features of EAT -- or a brief description of what can be implemented: For example, is there an ability to use pre-computed ensemble members for quick DA experiments without having to re-run the model? The authors further mention hybrid variational-ensemble schemes, are these included in EAT already, what kind of coding abilities are required to add a new DA technique?
As nice as it is to have a fast DA system, it is limited to a 1D (vertical water column) model. Such a model is useful, but it will not be able to serve as a test bed for many operational systems which use 3D models. As use cases for EAT, the authors mention "practical aspects such as the spatial correlation structure and regionalized setting of estimated parameters" (line 54), even though I would consider these two applications as bad examples for the use of 1D models. Examining the effect of spatial correlations of satellite chlorophyll error, for example, would require a 3D DA setup; regionalized parameter estimates would require a number of 1D models, at least one for each region (many if the region boundaries are not set). Here, it would be good if the authors acknowledged some of the limitations of 1D models in the introduction already.
Finally, I wanted to bring up the name of the tool: It is probably too late to change EAT's name, but EAT expands to "Ensemble Assimilation Tool" in my mind (it is easy to miss the "and" in "Ensemble and Assimilation Tool") even though EAT supports variational DA as well. Furthermore, it would have been nice to include "Aquatic" or "Marine" in the abbreviation, but that is just a suggestion if a name change is still on the table.
Testing one of the example applicationsI decided to run test case 2 "Biogeochemical state estimation with variational assimilation" on a Linux machine. Overall, it worked well and there were only a few minor hiccups. I downloaded the zip file that was referenced in the paper, unzipped it and followed the instructions in the `README.md` file. Installing EAT using conda was straightforward and worked right away. The free run worked flawlessly as well, just the DA experiment is missing the output directory:
```
$ mpiexec -n 1 python runVar.py : -n 1 eat-gotm
INFO:root:Model simulated period: 2019-01-01 00:00:00 - 2019-12-31 18:00:00
Traceback (most recent call last):
File "/home/user/eat-applications/Variational/BFMvar/runVar.py", line 34, in <module>
experiment.add_plugin(eatpy.plugins.output.NetCDF(outfile))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/.conda/envs/eat/lib/python3.12/site-packages/eatpy/plugins/output.py", line 14, in __init__
self.nc = netCDF4.Dataset(path, "w")
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "src/netCDF4/_netCDF4.pyx", line 2469, in netCDF4._netCDF4.Dataset.__init__
File "src/netCDF4/_netCDF4.pyx", line 2028, in netCDF4._netCDF4._ensure_nc_success
PermissionError: [Errno 13] Permission denied: 'OUTNC/DAout.nc'
```
A quick `mkdir -p OUTNC` solved the issue, and I would recommend adding this directory to the zip (perhaps with a small placeholder file `OUTNC/README.md` which just mentions that this directory will be used for netCDF output).
Finally, I ran into some issues with the figure creation, too: `launcher.sh` was not executable and after making it (`chmod u+x launcher.sh`), I received odd error messages because the file has dos line breaks.
```
./launcher.sh: line 2: $'\r': command not found
./launcher.sh: line 10: $'\r': command not found
./launcher.sh: line 14: $'\r': command not found
[...]
```
This issue might not be easily reproducible, but the script even created an odd `^M^M` directory. After changing the file format to UNIX, it all worked and created the plots. I am not sure what to do about this particular issue (will UNIX line endings cause issues on Windows machines?) but including a proper shebang `#!/bin/bash` in `launcher.sh` at least stopped the script right away without creating directories etc., so I would at least recommend that. Overall, I was impressed by how well it worked, but I did not yet attempt to make any modifications to the experiment.
Specific comments:L 17: It is unclear what "This" is referring to, I would suggest using "DA".
L 78: Here (and maybe already in the abstract), spell out the abbreviations used or include a reference to section 2.
L 159: While I appreciate the discussion of the different coupling schemes and implementation details, as a new user of EAT, I would be more interested in what I need to do to run DA for my model. Case in point:
> "Online coupling is achieved by inserting function calls to PDAF. This can be done by augmenting the model source code itself, which then enables simulation of an ensemble of model states in a single execution of the model. Alternatively, for models already capable of ensemble simulations, it can be implemented in dedicated DA code after an ensemble of model results is received. The latter is the approach adopted by EAT. PDAF-specific additions to the code are usually four functions, all of which are placed outside of the actual numerical core of the model. Overall, the online coupling approach reduces the amount of data that needs to be written to files and allows efficient data assimilation, in particular when the forecast phase between two assimilation steps is short compared to the start-up time of a model, as is common for GOTM-FABM."
After reading it, I have learned some fundamentals about PDAF, but I am not sure what kind of effort is required for a typical user. Do I need to implement four functions or is more work required, are these Python or Fortran functions? I would suggest rewriting this paragraph with an emphasis on the EAT implementation and EAT-specific instructions. For example, instead of starting "There are different strategies to couple a model with PDAF. The offline coupling uses ...", one could use "While PDAF supports both offline and online coupling, EAT uses online coupling to connect the model to the DA framework..."L 186: Is the <RUNSCRIPT> here akin to what is shown in Fig. 3? Maybe add a short explanation to the required input.
Fig 2: For readability, set the mean values explicitly, even though these are identical to the default.
Fig 3: I think it would be more useful to add some of the information from the caption to the code (in the form of comments).
Fig 2 and 3: Mention in the caption that this is Python code.
L 258: Are these text files in CSV (comma-separated values) format? Why not use YAML here as well, or a well-structured custom format to avoid/better catch user error?
Fig 5 and following: "(a)" labels are missing from the figures.
Fig 5: Science question, only related to the DA result: Are the thin layers of subsurface cooling an effect of including only a few mixing-related sources of uncertainty in the ensemble creation?
L 322: Is this the result of one "4D" data assimilation cycle with "asynchronous" assimilation, or were multiple cycles performed? Please add this information to the manuscript.
L 342: Does the assimilation "see" the subsurface chlorophyll maximum, or does the observation operator just work on the top layer of the model?
L 352: "their different components" I would not describe carbon as a "component" of phytoplankton, and would suggest changing it to "their elemental composition" or cell quotas if these are considered.
L 414: "we further use the “diat-MSP” abbreviation": Initially, I wasn't quite sure what was meant, I'd suggest being more explicit, for example by using "in the following we refer to this parameter as “diat-MSP”".
Citation: https://doi.org/10.5194/gmd-2023-238-RC1 -
AC1: 'Reply on RC1', Jorn Bruggeman, 08 Jan 2024
We are extremely grateful to the referee for their careful review and their persistence in running one of the provided example applications, even when this initially presented problems. We have now addressed these problems in an updated version of the example applications: https://doi.org/10.5281/zenodo.10463234. The main change in this new version is that all three examples are now run using Jupyter notebooks, whiche ensures compatibility with all supported platforms (Linux, Mac, Windows). This is also automatically tested: https://github.com/BoldingBruggeman/eat-paper-applications/actions.
We will comment in detail on all questions, suggestions and issues raised in a future response.
Citation: https://doi.org/10.5194/gmd-2023-238-AC1 - AC2: 'Reply on RC1', Jorn Bruggeman, 07 Mar 2024
-
AC1: 'Reply on RC1', Jorn Bruggeman, 08 Jan 2024
-
RC2: 'Comment on gmd-2023-238', Anonymous Referee #2, 31 Jan 2024
The authors present a generalized DA framework for 1D ocean applications. The proposed system, EAT, uses GOTM as the physical model, FABM as the biogeochemistry platform and PDAF as the DA software. The authors examined the new system in 3 different locations assimilation various physical and biogeochemical data. They tested state estimation in addition to state and parameter estimation. I believe the system is highly beneficial for the community and quite attractive given its portability and flexibility. The paper is well-written and easy to read. I only have minor comments.
- When generating the initial ensemble, how did the authors decide on the variable distribution (e.g., lognormal) and it's associated parameters? And for the perturbed parameters, I am wondering why did the authors choose k_min and scale_factor over other ones.
- Transforming the state and the data during the update is interesting. I do believe that having Gaussian distributed variables is better suited for Kalman-type correction. My only question is when you transform the actual data, how do you deal with the associated observation error variance. For instance, if I take the logarithm of the data what would be the corresponding error in transformed space? I think the manuscript would benefit from such information.
- In section 3.1, why not add another experiment assimilating only bgc data? If physical data deteriorates chl then assimilating only chl and adding that to Fig. 6 should be very informative.
- It's very typical in similar 1D applications from the literature to assimilate nutrient profiles. I'm surprised the authors didn't consider that. Is it because the authors don't have access to such data in the tested locations? How about the reanalysis dataset? Addressing subsurface biogeochemical uncertainties can be crucial for adjusting PP across the entire water column (at least within euphotic zone).
- Related to my previous point, I believe assimilating data in the vertical may produce different parameter configurations at different levels beneath the surface.
Other comments:
- Line 20: that *are* sufficiently
- Line 69: Water column models are *ideal testbeds*
- Line 76: performing *state-parameter* estimation
- Line 172: This allows *the* user
- Line 179: and that *runs* the data
- Line 264: the model *serially*, without
- Line 503: different *configuration* optionsCitation: https://doi.org/10.5194/gmd-2023-238-RC2 - AC3: 'Reply on RC2', Jorn Bruggeman, 07 Mar 2024
Data sets
EAT example applications Jorn Bruggeman, Anna Teruzzi, Simone Spada, Jozef Skákala https://doi.org/10.5281/zenodo.10307316
Model code and software
EAT: Ensemble and Assimilation Tool Jorn Bruggeman, Karsten Bolding, Lars Nerger https://doi.org/10.5281/zenodo.10306436
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
357 | 88 | 35 | 480 | 19 | 18 |
- HTML: 357
- PDF: 88
- XML: 35
- Total: 480
- BibTeX: 19
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1