the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The Modular and Integrated Data Assimilation System at Environment and Climate Change Canada (MIDAS v3.9.1)
Abstract. The Modular and Integrated Data Assimilation System (MIDAS) software (version 3.9.1) is described in terms of its range of functionality, modular software design, parallelization strategy, and current uses within real-time operational and experimental systems. MIDAS is developed at Environment and Climate Change Canada for both operational and research applications, including all atmospheric data assimilation (DA) elements of the numerical weather prediction systems. The software is designed to be sufficiently general to enable other DA applications, including atmospheric constituents (e.g. ozone), sea ice, and sea surface temperature. In addition to describing the current MIDAS applications, a sample of the results from these systems is presented to demonstrate their performance in comparison with either systems from before the switch to using MIDAS software or similar systems at other NWP centres. The modular software design also allows the code that implements high-level components (e.g. observation operators, error covariance matrices, state vectors) to easily be used in many different ways depending on the application, such as for both variational and ensemble DA algorithms; for estimating the observation impact on short-term forecasts; and for performing various observation pre-processing procedures. The use of a single common DA software for multiple components of the Earth system provides both practical and scientific benefits, including the facilitation of future research on DA approaches that explicitly include the coupled connections between multiple Earth system components. To this end, work is currently underway to allow the use of MIDAS DA algorithms for initializing both deterministic and ensemble three-dimensional ocean model forecasts.
- Preprint
(1066 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on gmd-2024-55', Anonymous Referee #1, 30 Apr 2024
Review of "The Module and Integrated Data Assimilation System at Environment and Climate Change Canada (MIDAS v3.9.1) by M. Buehner et al.
The manuscript provides an overview of the data assimilation software "MIDAS". The available data assimilation algorithms and observation types that can be assimilated are shortly described. In addition a short overview of further functionality like for processing ensembles, e.g. inflation schemes for the data assimilation, for observation pre-processing and estimation of observation impacts, analysis error estimation, and diagnostics and statistics is provided. The structure of the software with respect of the different programs that are included is described and examples for the modular structure are given. Likewise the parallelization strategy is shortly described. After these descriptions of the functionality and structure of the software a large number of application examples are provided (the manuscript states 'all applications ... are briefly descried". Partly, a figure is shown for an application but for some applications there only a short paragraph of text. The manuscript completes with a summary and short description of future development plans. There are no explicit conclusions, but a statement that the flexibility of MIDAS could make this software useful to data assimilation research at Canadian universities and the preliminary efforts have started to make the software accessible (lines 597-599).
The manuscripts fits into the scope of GMD, even though the software that is described is not a 'model' but a data assimilation software. From my own experience I know that it can be difficult to publish about software and finding a particular focus might be particularly difficult. Unfortunately, the authors here apparently tried to discuss too many aspects and missed to provide a sufficient focus. While the title and abstract state that the manuscript is about a particular version of MIDAS, the manuscript does not focus on this version and also contains parts in which parts of the development history are described. For a particular version of the software, it is however irrelevant if some feature was 'recently introduced'. The description of the functionality is very superficial. Many features, e.g. variants of supported covariance matrices (lines 155-166) or the variants of the ensemble-based Kalman filters LETKF (lines 182-189) are just shortly listed. Only, the variational DA algorithm is explained by equations. Hoeever, this looks unbalanced compared to the otherwise short descriptions by text (even more the equations seem to show a standard 3D-Var scheme with control vector transformation - for specialists from data assimilation this is just standard, while for non-specialists the description is too short). The description of the modular software design also contains little details beyond that there are modules with different purposes (some defining entities, while other focus on operations) and that the software uses a structure similar to 'classes' in object-oriented programming (but it is not using classes, which are supported by Fortran). Likewise the parallelization strategy is contains very little details since it's also mainly text. The motivation and performance of the parallelization choice is not discussed. The application examples are short sections of 'all' possible applications. These are also not aimed at actually presenting what users of MIDAS could achieve, but seem to be rather aiming for showing the different operational or research applications at Environment and Climate Change Canada. The different sections are also too short to provide sufficeint details so that one hardly learns about particular functionalities of MIDAS. Only sometime a reference to a more detailed publication is provided here. Overall, the purpose of the manuscript is not clear. The abstract states that "The ... MIDAS software (version 3.9.1) is described...". However, while the authors also made this version of MIDAS available on github.com, the software does not seem to be intended to be used by other users. E.g., the documentation is mainly generated from in-code comments. It fails to provide suggicient information on how to use the software (e.g structure of input files and configuration files). Obviously, just making a software available online and generating documentation from in-code comments is insufficient to enable other to use it. The manuscript is over all not at a sufficient level for a scientific publication, but it leaves the impression of a technical report. For a scientific publication it is far too superficial in the methodological and functionality sections, but also in the cursory descriptions of the applications. To this end, I don't see a chance to revise the manuscript to a sufficient level of a scientific publication, since this would essentially imply a full re-write. Accordingly, I can only recommend to reject the manuscript.
Apart from the recommendation above, a scientific publication about MIDAS could certainly be valuable if it is prepared with sufficient care. To this end, I like to provide some recommendations for a possible paper publication about MIDAS:
- Please be clear about the purpose of such a publication. It should contain clear explanations of sufficient depth and detail of the particularities. The aim should obviously be to make the software 'usable' by a wider group of readers - otherwise there is no point in publishing about it. (There are a number of articles published in GMD that discuss data assimilation software aimed at a wider group of possible users which might give indications of what can be successful in GMD)
- Also please consider carefully the target audience of the publication. A wide audience would be useful to enhance the value of the publication
- For a scientific publication I recommend to avoid 'storytelling'. This occurs in various places of the manuscript. To give just one example, lines 86-95 state "The implementation of the 4D-EnVar algorithm in the existing 4D-Var software was not practical..." It is unlikely that this contains any useful information for the readers. Even more the information is irrelevant for version 3.9.1 of MIDAS.
- Useful would also be to be clear about the question 'Why this code version?'. The manuscript states that it is about 'v3.9.1'. As a sub-sub-version this seems to be arbitrary and it does not show an ambition of making the software usable (and useful) for others. E.g. one could introduce a new major release (with sufficiently major changes) and the intention of real open-source software. One could take this as the motivation to publish about it (obviously one cannot publish about each new release, but only the most relevant ones)
- Application examples should be carefully chosen with the aim that they are relevant for the readers to learn from them.
- To be useful for readers, the software has to be 'usable'. This implies a sufficient documentation of how it can be used. Also required would be example cases, e.g. toy models. Without this, there is little purpose of making the code available on Github. Achieving a sufficient level of usability is in fact a larger task. Here, one should also be carefully considering the question "Can we support users?" - if the answer is not clearly 'yes', one should perhaps refrain from publishing it open-source.As a final recommendation, I like to suggest to be particularly careful when comparing the functionality of the 'own' software to other existing software. Here the risk is high that one misses some functionality so that the description of the other software is incorrect or incomplete. For example in the manuscript it is stated, relating to e.g. the DART and PDAF data assimilation software, that 'other systems previously mentioned all need to be compiled and executed together with the forecast model and exchange information between DA and forecast model through subroutine' (lines 70-72). However, DART was initially designed to use only files-based transfer of data and a separate program for the assimilation. In contrast, since many years PDAF supports both the direct coupling into a model code and the separate assimilation program with file-based data exchange (see e.g. https://pdaf.awi.de/trac/wiki/GeneralImplementationConcept?version=14). Further, the statement "While DART and PDAF were developed exclusively for applying ensemble DA algorithms to many different applications" (lines 64-65) is incorrect for PDAF. While PDAF was not 'primarily developed for operational NWP' (line 66), PDAF is applied in both research and operational applications. E.g. PDAF is used in the European Copernicus program (CMEMS) for operational forecasting in the Baltic Sea. Also the German Maritime and Hydrographic Agency uses PDAF operationally (e.g. Bruening et al., 2021) and at the Chinese National Marine Environmental Forecasting Center (see e.g. Liang et al., 2019) applies it for sea ice forecasting. In this respect it is also unclear why 'operational NWP and related Earth system component DA' (line 66) for JEDI and OOPS is 'more general' (line 66) than 'applying ensemble DA algorithms to many different applications' (line 65). Actually, what should be 'more general' than a software that was developed for essentially any data assimilation application, as is the case for DART and PDAF? In comparison 'operational NWP and related Earth system component DA' appreads to be more restricted. I can only recommend to avoid the impression that the authors intent do downgrade the value of systems like DART and PDAF, both of which are successfully used for real applications of 'Earth system component DA' and not just for 'applying ensemble DA algorithms to many different applications'. Apart from this, both DART and PDAF are obviously more 'mature' software compared to the very fresh development history of JEDI. Finally, PDAF is also not only providing 'ensemble DA algorithms' (line 65), but also 3D-variational methods. Both DART and PDAF also provide tools for observation handling and diagnostics.
References:
Bruening, T., Li, X, Schwichtenberg, F., Lorkowski, I. (2021) An operational, assimilative model system for hydrodynamic and biogeochemical applications for German coastal waters. Hydrographische Nachrichten, 118, 6-15, doi:10.23784/HN118-01
Liang, X., Zhao, F., Li, C., Zhang, L., Li, B. (2020) Evaluation of ArcIOPS sea ice forecasting products during the nineth CHINARE-Arctic in summer 2018. Adv. Polar Science, 31, 14-25, doi:10.13679/j.advps.2019.0019Citation: https://doi.org/10.5194/gmd-2024-55-RC1 -
RC2: 'Comment on gmd-2024-55', Chris Snyder, 16 Jun 2024
Review of: The Modular and Integrated Data Assimilation System at Environment and Climate Change Canada (MIDAS v3.9.1), by Buehner et al (https://doi.org/10.5194/gmd-2024-55)
Reviewed by: C. Snyder, NCAR
Recommendation: Accept
This manuscript summarizes the design and implementation of a modular data assimilation (DA) system for Environment Canada, and gives example results. The writing is clear and concise and the topic is relevant for GMD. The manuscript shares refinements and approaches that will be of interest for other DA systems.
I offer comments for the authors’ consideration, but I don’t need to see the manuscript again.
- The MIDAS design embraces the simplicity that comes from divorcing the DA from model integrations and accepts the I/O overhead that comes with it. Leaving aside 4DVar, where there are separate reasons to include interfaces to the model integration, I have seen arguments from other efforts that the I/O overhead will be unacceptable, both for high-resolution ensemble DA and for fully coupled DA. It would be useful for the manuscript to include some perspective on this choice to rely on file I/O. Is it simply that you have built efficient parallel I/O?
- Variable changes will be needed between model state, analysis variables, and variables required by observation operators How does MIDAS handle these? Could different models utilize the same code for variable changes?
- MIDAS has interesting capabilities within its observation operators, including the possibility of simulating based on slant paths and footprints that involve many model columns. I’d be interested to know more about how MIDAS handles the data distribution and parallelization in those cases. I don’t see how the data distribution of Fig. 3 works with slant paths that cross multiple model layers, for instance.
- MIDAS can be applied to a diverse set of applications. Unlike JEDI, DART, and PDAF, however, it is not (I think) designed to work interchangeably with different models in the same application. (E.g., swapping another atmospheric model for that used in the GDPS.) I’d be interested in discussion of that more limited scope and the MIDAS design. Does the limited scope permit simplifications or important design choices that are not possible in those other systems? Are those simplifications substantial?
- For JEDI citations, I suggest Liu et al. 2022 GMD as well, since it precedes Huang et al 2023. There is unfortunately no great reference for the underlying developments that support both those application papers; maybe Tremolet 2020 https://doi.org/10.25923/RB19-0Q26?
Citation: https://doi.org/10.5194/gmd-2024-55-RC2 - AC1: 'Comment on gmd-2024-55', Mark Buehner, 26 Jul 2024
Status: closed
-
RC1: 'Comment on gmd-2024-55', Anonymous Referee #1, 30 Apr 2024
Review of "The Module and Integrated Data Assimilation System at Environment and Climate Change Canada (MIDAS v3.9.1) by M. Buehner et al.
The manuscript provides an overview of the data assimilation software "MIDAS". The available data assimilation algorithms and observation types that can be assimilated are shortly described. In addition a short overview of further functionality like for processing ensembles, e.g. inflation schemes for the data assimilation, for observation pre-processing and estimation of observation impacts, analysis error estimation, and diagnostics and statistics is provided. The structure of the software with respect of the different programs that are included is described and examples for the modular structure are given. Likewise the parallelization strategy is shortly described. After these descriptions of the functionality and structure of the software a large number of application examples are provided (the manuscript states 'all applications ... are briefly descried". Partly, a figure is shown for an application but for some applications there only a short paragraph of text. The manuscript completes with a summary and short description of future development plans. There are no explicit conclusions, but a statement that the flexibility of MIDAS could make this software useful to data assimilation research at Canadian universities and the preliminary efforts have started to make the software accessible (lines 597-599).
The manuscripts fits into the scope of GMD, even though the software that is described is not a 'model' but a data assimilation software. From my own experience I know that it can be difficult to publish about software and finding a particular focus might be particularly difficult. Unfortunately, the authors here apparently tried to discuss too many aspects and missed to provide a sufficient focus. While the title and abstract state that the manuscript is about a particular version of MIDAS, the manuscript does not focus on this version and also contains parts in which parts of the development history are described. For a particular version of the software, it is however irrelevant if some feature was 'recently introduced'. The description of the functionality is very superficial. Many features, e.g. variants of supported covariance matrices (lines 155-166) or the variants of the ensemble-based Kalman filters LETKF (lines 182-189) are just shortly listed. Only, the variational DA algorithm is explained by equations. Hoeever, this looks unbalanced compared to the otherwise short descriptions by text (even more the equations seem to show a standard 3D-Var scheme with control vector transformation - for specialists from data assimilation this is just standard, while for non-specialists the description is too short). The description of the modular software design also contains little details beyond that there are modules with different purposes (some defining entities, while other focus on operations) and that the software uses a structure similar to 'classes' in object-oriented programming (but it is not using classes, which are supported by Fortran). Likewise the parallelization strategy is contains very little details since it's also mainly text. The motivation and performance of the parallelization choice is not discussed. The application examples are short sections of 'all' possible applications. These are also not aimed at actually presenting what users of MIDAS could achieve, but seem to be rather aiming for showing the different operational or research applications at Environment and Climate Change Canada. The different sections are also too short to provide sufficeint details so that one hardly learns about particular functionalities of MIDAS. Only sometime a reference to a more detailed publication is provided here. Overall, the purpose of the manuscript is not clear. The abstract states that "The ... MIDAS software (version 3.9.1) is described...". However, while the authors also made this version of MIDAS available on github.com, the software does not seem to be intended to be used by other users. E.g., the documentation is mainly generated from in-code comments. It fails to provide suggicient information on how to use the software (e.g structure of input files and configuration files). Obviously, just making a software available online and generating documentation from in-code comments is insufficient to enable other to use it. The manuscript is over all not at a sufficient level for a scientific publication, but it leaves the impression of a technical report. For a scientific publication it is far too superficial in the methodological and functionality sections, but also in the cursory descriptions of the applications. To this end, I don't see a chance to revise the manuscript to a sufficient level of a scientific publication, since this would essentially imply a full re-write. Accordingly, I can only recommend to reject the manuscript.
Apart from the recommendation above, a scientific publication about MIDAS could certainly be valuable if it is prepared with sufficient care. To this end, I like to provide some recommendations for a possible paper publication about MIDAS:
- Please be clear about the purpose of such a publication. It should contain clear explanations of sufficient depth and detail of the particularities. The aim should obviously be to make the software 'usable' by a wider group of readers - otherwise there is no point in publishing about it. (There are a number of articles published in GMD that discuss data assimilation software aimed at a wider group of possible users which might give indications of what can be successful in GMD)
- Also please consider carefully the target audience of the publication. A wide audience would be useful to enhance the value of the publication
- For a scientific publication I recommend to avoid 'storytelling'. This occurs in various places of the manuscript. To give just one example, lines 86-95 state "The implementation of the 4D-EnVar algorithm in the existing 4D-Var software was not practical..." It is unlikely that this contains any useful information for the readers. Even more the information is irrelevant for version 3.9.1 of MIDAS.
- Useful would also be to be clear about the question 'Why this code version?'. The manuscript states that it is about 'v3.9.1'. As a sub-sub-version this seems to be arbitrary and it does not show an ambition of making the software usable (and useful) for others. E.g. one could introduce a new major release (with sufficiently major changes) and the intention of real open-source software. One could take this as the motivation to publish about it (obviously one cannot publish about each new release, but only the most relevant ones)
- Application examples should be carefully chosen with the aim that they are relevant for the readers to learn from them.
- To be useful for readers, the software has to be 'usable'. This implies a sufficient documentation of how it can be used. Also required would be example cases, e.g. toy models. Without this, there is little purpose of making the code available on Github. Achieving a sufficient level of usability is in fact a larger task. Here, one should also be carefully considering the question "Can we support users?" - if the answer is not clearly 'yes', one should perhaps refrain from publishing it open-source.As a final recommendation, I like to suggest to be particularly careful when comparing the functionality of the 'own' software to other existing software. Here the risk is high that one misses some functionality so that the description of the other software is incorrect or incomplete. For example in the manuscript it is stated, relating to e.g. the DART and PDAF data assimilation software, that 'other systems previously mentioned all need to be compiled and executed together with the forecast model and exchange information between DA and forecast model through subroutine' (lines 70-72). However, DART was initially designed to use only files-based transfer of data and a separate program for the assimilation. In contrast, since many years PDAF supports both the direct coupling into a model code and the separate assimilation program with file-based data exchange (see e.g. https://pdaf.awi.de/trac/wiki/GeneralImplementationConcept?version=14). Further, the statement "While DART and PDAF were developed exclusively for applying ensemble DA algorithms to many different applications" (lines 64-65) is incorrect for PDAF. While PDAF was not 'primarily developed for operational NWP' (line 66), PDAF is applied in both research and operational applications. E.g. PDAF is used in the European Copernicus program (CMEMS) for operational forecasting in the Baltic Sea. Also the German Maritime and Hydrographic Agency uses PDAF operationally (e.g. Bruening et al., 2021) and at the Chinese National Marine Environmental Forecasting Center (see e.g. Liang et al., 2019) applies it for sea ice forecasting. In this respect it is also unclear why 'operational NWP and related Earth system component DA' (line 66) for JEDI and OOPS is 'more general' (line 66) than 'applying ensemble DA algorithms to many different applications' (line 65). Actually, what should be 'more general' than a software that was developed for essentially any data assimilation application, as is the case for DART and PDAF? In comparison 'operational NWP and related Earth system component DA' appreads to be more restricted. I can only recommend to avoid the impression that the authors intent do downgrade the value of systems like DART and PDAF, both of which are successfully used for real applications of 'Earth system component DA' and not just for 'applying ensemble DA algorithms to many different applications'. Apart from this, both DART and PDAF are obviously more 'mature' software compared to the very fresh development history of JEDI. Finally, PDAF is also not only providing 'ensemble DA algorithms' (line 65), but also 3D-variational methods. Both DART and PDAF also provide tools for observation handling and diagnostics.
References:
Bruening, T., Li, X, Schwichtenberg, F., Lorkowski, I. (2021) An operational, assimilative model system for hydrodynamic and biogeochemical applications for German coastal waters. Hydrographische Nachrichten, 118, 6-15, doi:10.23784/HN118-01
Liang, X., Zhao, F., Li, C., Zhang, L., Li, B. (2020) Evaluation of ArcIOPS sea ice forecasting products during the nineth CHINARE-Arctic in summer 2018. Adv. Polar Science, 31, 14-25, doi:10.13679/j.advps.2019.0019Citation: https://doi.org/10.5194/gmd-2024-55-RC1 -
RC2: 'Comment on gmd-2024-55', Chris Snyder, 16 Jun 2024
Review of: The Modular and Integrated Data Assimilation System at Environment and Climate Change Canada (MIDAS v3.9.1), by Buehner et al (https://doi.org/10.5194/gmd-2024-55)
Reviewed by: C. Snyder, NCAR
Recommendation: Accept
This manuscript summarizes the design and implementation of a modular data assimilation (DA) system for Environment Canada, and gives example results. The writing is clear and concise and the topic is relevant for GMD. The manuscript shares refinements and approaches that will be of interest for other DA systems.
I offer comments for the authors’ consideration, but I don’t need to see the manuscript again.
- The MIDAS design embraces the simplicity that comes from divorcing the DA from model integrations and accepts the I/O overhead that comes with it. Leaving aside 4DVar, where there are separate reasons to include interfaces to the model integration, I have seen arguments from other efforts that the I/O overhead will be unacceptable, both for high-resolution ensemble DA and for fully coupled DA. It would be useful for the manuscript to include some perspective on this choice to rely on file I/O. Is it simply that you have built efficient parallel I/O?
- Variable changes will be needed between model state, analysis variables, and variables required by observation operators How does MIDAS handle these? Could different models utilize the same code for variable changes?
- MIDAS has interesting capabilities within its observation operators, including the possibility of simulating based on slant paths and footprints that involve many model columns. I’d be interested to know more about how MIDAS handles the data distribution and parallelization in those cases. I don’t see how the data distribution of Fig. 3 works with slant paths that cross multiple model layers, for instance.
- MIDAS can be applied to a diverse set of applications. Unlike JEDI, DART, and PDAF, however, it is not (I think) designed to work interchangeably with different models in the same application. (E.g., swapping another atmospheric model for that used in the GDPS.) I’d be interested in discussion of that more limited scope and the MIDAS design. Does the limited scope permit simplifications or important design choices that are not possible in those other systems? Are those simplifications substantial?
- For JEDI citations, I suggest Liu et al. 2022 GMD as well, since it precedes Huang et al 2023. There is unfortunately no great reference for the underlying developments that support both those application papers; maybe Tremolet 2020 https://doi.org/10.25923/RB19-0Q26?
Citation: https://doi.org/10.5194/gmd-2024-55-RC2 - AC1: 'Comment on gmd-2024-55', Mark Buehner, 26 Jul 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
412 | 94 | 27 | 533 | 29 | 27 |
- HTML: 412
- PDF: 94
- XML: 27
- Total: 533
- BibTeX: 29
- EndNote: 27
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1