the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Data-Informed Inversion Model (DIIM): a framework to retrieve marine optical constituents in the BOUSSOLE site using a three-stream irradiance model
Abstract. Within the New Copernicus Capability for Trophic Ocean Networks (NECCTON) project, we aim to improve the current data assimilation system by developing a method for accurately estimating marine optical constituents from satellite-derived Remote Sensing Reflectance. We developed and compared two frameworks by implicitly inverting a semi-analytical expression derived from the classical Radiative Transfer Equation. First, we used a Bayesian estimation, which provided retrievals of the optical constituents along with their uncertainties. Moreover, using historical in-situ measurements together with a Markov Chain Monte Carlo (MCMC) algorithm to adjust the model parameters, we were able to reduce the root mean square Error (RMSE) between the retrieved data and in-situ observations. Second, we employed the Stochastic Gradient Variational Bayes (SGVB) framework to efficiently approximate the Maximum Posterior (MAP) estimates of the optical constituents while simultaneously finding the Maximum Likelihood Estimate (MLE) of the model parameters. This approach resulted in faster computations of the optical constituents compared to Bayesian estimations, with equivalent RMSE values between the retrieved data and in-situ observations. We showed that both, the MCMC and SGVB based algorithms, were able to find sets of optimal parameters, which, due to correlations between them, are not unique. We conclude that both methods are consistent with the Radiative Transfer Equation. The first method provides reliable uncertainty estimations, while the second offers a faster alternative to standard inversion techniques, making it suitable for inversion and model optimization problems where MCMC algorithms are intractable.
- Preprint
(1935 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (extended)
-
CEC1: 'Comment on gmd-2024-174 - No compliance with the policy of the journal', Juan Antonio Añel, 26 Dec 2024
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlin your Code and Data Availability section you provide a handful of links for sites containing information related to your manuscript. However, such sites are not acceptable for scientific publication. They include a GitHub site and Google drives. You have included the DOI for a Zenodo repository containing part of the information, after an initial request from the Topical Editor; however, the Zenodo repository that you provide does not contain the necessary information to replicate the work that you present in your manuscript. For example, when trying to run the Python notebook on model availability, it tries to connect to the external GitHub site (moreover under certain circumstances/configurations it fails to do it).
I have to be clear here. All the models, code and data necessary to replicate your work must be contained in the Zenodo repository and must be standalone, that is, work without need for additional download of software or data from third party sites (excluding common libraries such as pandas, numpy, etc. that should be listed with their versions). Also, the Code and Data Availability section in your manuscript contains several references to sites that do not comply with our policy. This section is not there to advertise the "last version" of a software or the work of the authors, but to provide the specific information that assures the compliance with the principle of scientific reproducibility. Including such unuseful information only introduces unnecessary noise and complexity in the assessment of the compliance of your submitted manuscript. Therefore, please, remove from this section all the information about sites that do no comply with the policy for permanent archival of code and data.
Therefore, the current situation with your manuscript is irregular. Please, publish your code in one of the appropriate repositories and reply to this comment with the relevant information (link and a permanent identifier for it (e.g. DOI)) as soon as possible, as we can not accept manuscripts in Discussions that do not comply with our policy. Also, please include the relevant primary input/output data. Also, you must include a modified 'Code and Data Availability' section in a potentially reviewed manuscript, containing the DOI of the new repositories.
I note that if you do not fix this problem, we will have to reject your manuscript for publication in our journal.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2024-174-CEC1 -
AC1: 'Reply on CEC1', Carlos Enmanuel Soto Lopez, 08 Jan 2025
reply
Dear Prof. Juan A. Añel
Geosci. Model Dev. Executive Editor
After careful reviewing of the code and data availability, we have uploaded in Zenodo a standalone version of the code used to reproduce the results from the article, as well as the data needed to reproduce it. In this new upload, we make no mention of other sites, except the standard python3 libraries:
- networkx>=2.8.8
- torch
- torchvision
- torchaudio
- ConfigSpace>=1.1.1
- constrained-linear-regression>=0.0.4
- matplotlib>=3.7
- matplotlib-inline>=0.1.7
- mdurl>=0.1.2
- multiprocess>=0.70.16
- numpy>=1.24
- pandas>=2
- ray>=2.1
- scipy>=1.1
- seaborn>=0.13.2
- sympy>=1.12
- pvlib>=0.11.0
- tqdm>=4.66.5
- pyarrow
- tensorboardX
We have also prepared a README with instructions on how to install the required external packages with a Makefile. We also included a Python script (reproduce.py). By launching reproduce.py, the user is guided to reproduce the calculations and figures in the manuscript, with the option to run all calculations (-a), only the Bayesian computations (-b), only the sensitivity analysis and MCMC analysis (-m), or only the calculations related to training the neural network (-n).
I'll send a version of the manuscript with a different 'Code and Data Availability' section, as well as the DOI identifier updated in the Citations section via email.
DOI from the new code and data availability: 10.5281/zenodo.14609747.
Link to the code and data availability: https://doi.org/10.5281/zenodo.14609747
Best regards,
Carlos Enmanuel Soto Lopez
Citation: https://doi.org/10.5194/gmd-2024-174-AC1
-
AC1: 'Reply on CEC1', Carlos Enmanuel Soto Lopez, 08 Jan 2025
reply
-
RC1: 'Comment on gmd-2024-174', Anonymous Referee #1, 15 Apr 2025
reply
This paper presents a new inversion setup for recovering marine optical constituents (chl_a, non-algal particles and CDOM) and optimal parameters for a three-stream marine radiative transfer model using satellite reflectance measurements and in situ data from the BOUSSOLE buoy located in the oligotrophic sector of the Ligurian Sea (western Mediterranean). Two inversion frameworks, one based on a Bayesian estimation approach including an MCMC algorithm and the other based on a neural network inversion method, are developed and tested using time series of observations collected during 2005-2012. The results of the demonstration illustrate how the methods work. The advantages and limitations of both inversion methods are briefly assessed and discussed.
The inversion of sea surface remote sensing reflectance data is a difficult and important subject that has been investigated for years, and the work presented here is one more step towards the direct assimilation of satellite reflectances into marine biogeochemistry models.
The article provides details about the two proposed algorithms, which are original and potentially useful to other users, but which may question about the robustness of the methods given the many ad hoc algorithmic adjustments needed to ensure convergence. This may also question reproductability of the inversions in different conditions (different sites, data sets etc.). Sections 4 and 5 in particular are difficult to understand and require significant clarifications before they can be published. The benefits of the method should be better discussed and argued in relation to the methods currently used, and in particular with respect to the Copernicus marine products available today. The consistency of the parameter estimation results is also questioned (see section 5 comments).
A major revision is necessary before considering publication in GMD. Below are mentioned a list of major recommendations for the various sections of the paper, as well as more specific but important comments and questions to be addressed to improve understanding of the paper. On top of this revision, some careful check of the language will be needed to avoid approximate wording as found in the original manuscript.
Major comments
- Section 1. The authors refer to inversions of semi-analytical expressions of the RTE (l33-l34). As the term is mentioned in the abstract but not used afterwards, the reader could have doubts as to whether this is actually the case in the paper. If the complete RTE model (i.e. discretized on the vertical) is not used afterwards, this should be explained in the introduction. Similarly, it should also be stated from the begining if the aim of the paper is to estimate IOPs at the sea surface only (assuming vertical homogeneity of marine optical components), and not throughout the water column.
- Section 2. I suggest to move this section after section 3 and explain more clearly how the different data are related to the RTE model variables, surface boundary conditions etc. In addition I would suggest introducing a schematic diagram of the water column including the different fluxes, irradiances and variables of interest, data collected etc. to improve readability.
- Section 3. It looks that the marine optical constituants are assumed to be constant on the vertical. It should be clarified if vertical homogeneity is assumed as this is far from a trivial assumption. It could significantly limit the applicability of the framework to locations where such an assumption is not verified, e.g. when shalow sub-surface chlorophyll maximum concentration occur. A discussion should be added explaining how the scheme could be extended for inhomogeneous water columns. As for vertical homogeneity of optical constituents, it seems that temporal persistence is assumed over the daily cycle. Please confirm and provide some justification.
- Section 4, and section 4.4.2 in particular, is very difficult to understand and needs to be clarified. More generally, the sequence of algorithmic operations in the Bayesian inversion framework and associated partitioning of the training data set need to be better explained, so that the reader understands the order in which the latent variables and the model parameters to be optimized are calculated. This clarification is necessary to fully understand the applicability of the method and the results in section 5. The algorithmic complexity of the sequence also needs to quantified and explained more transparently. The list of optimized paramaters is confusing in places. Please clarify the actual list of optimized parameters referring to the list displayed in Table 1 and 2.
- As presented, the overall convergence of the method does not seem very robust and is subject to numerous adjustments depending on the dataset and DYFAMED site considered. Section 4.4.2: The sentence “we choose to perturb our parameters in such a way that we end up with a fourteen parameter space” is not understandable. What does the hyper-parameter alpha q (line 283) represent? The parameter perturbation method needs to be better explained (also related to Table 5). The last sentence of section 4.4 “... and sampled only uncorrelated values” is rather cryptic.
- There are no objective elements in the paper to guide the choice between the MCMC vs. SGVB methods. The GMD framework calls for guidelines to direct the user towards the most suitable method (Bayesian approach or neural network) instead of experimenting (as seems to be proposed) until a satisfactory converged solution is found.
- Section 5. The presentation of the results (section 5) is confusing. It is not clear what is the purpose of the sensitivity study introduced at the beginning of the section, since its main conclusion (the use of a single set of parameters over one year is sub-optimal) is not used thereafter anyway. This part should be removed, or better justified. What means the new bphy,Int parameter introduced here (Figure 3 and Table 5) ? A new structuring of this section into three parts (results of the MCMC method, results of the SGVB method, comparison of the 2 methods) should be considered, including a more in-depth evaluation and interpretation of the results. For example, the reason why the IOP values found with SGVB differ very slightly from the original values compared with MCMC requires substantiated explanations.
- Figure D1 does not include a comparison with the chl_a concentration estimated using conventional ocean colour (OC) algorithms available in the CMEMS catalogue. It is strongly recommended to add these figures in the plot and provide comments about their consistencies, as one of the potentially key advantages of the proposed method is to be used in operational setups.
- Section 6. The question of the reproductibility of the inversions in other sites needs to be addressed explicitly in a more convincing manner. The discussion refers to some generic aspects (e.g. use of neural networks in earth sciences) but does not address how useful the proposed tools will be for other sites.
- One key missing item is a discussion of the interest of the proposed approach with restect to existing ones (e.g. in the Copernicus framework). How accurate/comparable are the results of the inversion (e.g. D1a, others ?) compared to standard Copernicus products ? Please provide some quantitative comparisons to assess this aspect.
Specific comments and questions
- Abstract, « we conclude that both methods are consistent with the Radiative Transfer Equation » : what is really meant by this sentence ?
- MCMC = Markov Chain Monte Carlo (abstract, ) or Markov State Monte Carlo (section 4) : please unify the nomenclature or explain nuances
- Line 43-l44 and elsewhere : replace « density » of optical constituents by « concentration » ?
- Line 55 : It allows for …
- Line 85 « After filtering … » : please revise and clarify this sentence
- Line 90 : below instead of above ?
- Line 93-94 : what is the « heigh vertical variability » ? Please revise the sentence and better explain the rationale behind the choice of measurements at a depth of 9m
- Equation (1) : d/dh missing in second and third equations
- Section 3.1 : an equation is missing to describe how PAR is related to the direct, scattered and upward irradiances.
- Section 4.2. It is not clear if the process to adjust the alpha hyperparameter defining the model error is dependent on the accuracy of the surface boundary conditions (OASIM). Please provide comments and clarification.
- Line 180 : x(lambda) is a 5-component vector (including Eu) while it is a 4 component vector in eq(11). Please explain.
- Line 217: I don’t understand why in situ observations are available for 3 wavelengths only. This is not consistent with what is said in section 2.3. Please update section 2.3 accordingly.
- Line 238: his -> the
- Line 256: please explain the “standard error propagation scheme” used to compute the uncertainties.
- Figure 1: the Estimate of the optimal parameters (“Lambda tilde”) is not shown in the diagram.
- Line 346: inpute -> input.
- Table 4: why only 9 parameters are shown here ?
- End of Appendix A: “For completeness, … has to be exchanged …”
- Line 518: “… these two errors”.
- Figure D1. Why 2012 time series ? do you mean 2005-2013 time series ?
Citation: https://doi.org/10.5194/gmd-2024-174-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
208 | 56 | 15 | 279 | 17 | 17 |
- HTML: 208
- PDF: 56
- XML: 15
- Total: 279
- BibTeX: 17
- EndNote: 17
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1