Using Probability Density Functions to Evaluate Models (PDFEM, v1.0) to compare a biogeochemical model with satellite-derived chlorophyll

Jönsson, Bror F.; Follett, Christopher L.; Bien, Jacob; Dutkiewicz, Stephanie; Hyun, Sangwon; Kulk, Gemma; Forget, Gael L.; Müller, Christian; Racault, Marie-Fanny; Hill, Christopher N.; Jackson, Thomas; Sathyendranath, Shubha

doi:https://doi.org/10.5194/gmd-16-4639-2023

Articles | Volume 16, issue 16

https://doi.org/10.5194/gmd-16-4639-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-16-4639-2023

© Author(s) 2023. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 16, issue 16

Methods for assessment of models

|

18 Aug 2023

Methods for assessment of models |

| 18 Aug 2023

Using Probability Density Functions to Evaluate Models (PDFEM, v1.0) to compare a biogeochemical model with satellite-derived chlorophyll

Bror F. Jönsson, Christopher L. Follett, Jacob Bien, Stephanie Dutkiewicz, Sangwon Hyun, Gemma Kulk, Gael L. Forget, Christian Müller, Marie-Fanny Racault, Christopher N. Hill, Thomas Jackson, and Shubha Sathyendranath

Download

Final revised paper (published on 18 Aug 2023)
Preprint (discussion started on 08 Nov 2022)

Interactive discussion

Status: closed

RC1: 'Comment on egusphere-2022-849', Lester Kwiatkowski, 05 Dec 2022

The authors here present an interesting combination of statistical approaches to compare the probability density functions of observed and simulated chlorophyll across biomes and Longhurst provinces. I congratulate the authors on the work that has gone into this manuscript. It is extremely well written and the rationale behind their approach is cogently explained. I thoroughly hope that it encourages the ocean biogeochemical community to move beyond the use of correlation and RMSD as standard measures of model performance. I am happy to recommend the manuscript be accepted as it is but have a few very minor recommendations given below.

Perhaps the authors could discuss how the approach might be extended to provide greater insight on model-observation mismatch? Where coincident physical observations are available, one could presumably subset model and observation data to access for example the ability of models to capture the impact of regional heat waves on chlorophyll distributions.

As most biogeochemical models do not produce a Chl_Rrs output, perhaps the authors could offer some thoughts/recommendations on the use of the approach when only Chl_mod output is available. The approach still seems to offer plenty of insight and this should perhaps be emphasized a bit more.

L37 Should RMSD be standard deviation? While the RMSD is sometimes also given in Taylor plots this doesn’t seem to be given in Figure 2.

L80 Suggest giving the model name here.

Fig1. Suggest increasing the label font size for the regions identified.

Fig 2. A point color legend for these Taylor plots would be helpful.

Fig4a. The colorbar should have a label.

Fig4b Suggest adding a point color legend.

Citation: https://doi.org/10.5194/egusphere-2022-849-RC1
RC2: 'Comment on egusphere-2022-849', Marcello Vichi, 05 Feb 2023

I had the opportunity to review earlier versions of this work when submitted to another journal. I was in disagreement with the editorial decision because of the methodological merit of this work. I am very pleased to see the authors decided to rearranged the manuscript in the form of a model evaluation tool and presented in GMD. The last community effort on the validation of ocean biogeochemical models was published in a special issue of the Journal of Marine Systems in 2009 (some papers from that issue are referenced in the manuscript, like Jolliff et al. and Doney et al.). The authors have perfected the explanation, and the manuscript is very easy to read and to the point.
The concept of using PDFs to compare simulated properties is not new in the Earth Sciences and in climate sciences, but this is usually addressed through visual comparisons, or using changes in the descriptors (median, mean, skewness, etc). A recent effort from the SCOR WG on BGC model intercomparisons for simulations of the ocean iron cycle (FeMIP) advocated the use of PDFs to carry out this comparison (Rogerson and Vichi, 2021), but did not offer any objective way to measure the difference between the simulated and modelled distributions. This is exactly what the proposed method is about, and its application goes beyond the focus on chlorophyll, as clearly demonstrated by the authors with teh use of a variety of chl-related datasets (including model-derived,a s the irradiance based chl).
I would recommend swift publication with the very few technical corrections or suggestions listed below:
L315: I have been impressed by how good the model is in the western boundary currents, and less in the eastern boundary currents. The good performance in the WBC is highlighetd in the discussion, but there is no discussion on the EBUS. I would expect the model to perform equally, given that in both cases primary production should occur at the surface. It may also be due to the broadness of EBUS, which are difficult to be separated from the adjacent sub-tropical gyres.
L331-332: I think the author means the section on data availability. However, I could not find the figures in the Zenodo preview. I would suggest to upload all the supplementary figures to the repository, since some users may be interested in looking at the varous provinces without running the whole analysis
L333-340 and Fig. 9: These are all example from the Northern Hemisphere. It would be interesting to see the seasonality in the Southern Ocean, especially because it is further addressed in the discussion. This is less necessary if all the supplementary figures are made available in the repository. Figure 9 can be graphically improved by adding all the labels of the months on the Y-axis, and also the tick labels in panel a and b. There is enough space to add the numbers there.
Sec. 3.6 I find this section title somehow confusing. The EMD has already been used previous results, (Fig.4b, 6, 7, and 8). Fig. 10 is a very useful spatial representation for determining the provinces with the higher discrepancy, so maybe the title should reflect the content of the section better. Thhese graphs are however a little misleading, because the provinces that have a higher coastal coverage tend to have the higher EMD value. This is biased by the different distribution of lands between the northern and southern hemispheres. The Southern Ocean EMD is also quite high in July (as reported in teh discussion), but this is somehow not coming out clearly.

Another point is that Fig. 11 and Fig. 10 are not showing the same months. From the text, one would expect to find in Fig. 11 the same months shown in Fig 10. Maybe some more explanation of the choice of the provinces in Fig. 11 would help the reader.
L375-386 I agree entirely with the point made on the Polar Biomes, and this analysis is very clear. However, the consequence seems to be that models tend to have earlier blooms, rather than delays (Hague and Vichi, 2018). This is also related to the point made at lines 435-439. The suggested paper does support the same mechanism suggested by the authors.
L387-389 The author have demonstrated the power of this method with the Darwin-CBIOMES model, but it can be used with any other biogeochemical model (restricting the analysis to chl_mod). I would suggest to make this point somewhere in the discussion
References
Rogerson, J., Vichi, M. (2021): FeMIPeval. University of Cape Town. Software. https://doi.org/10.25375/uct.14528547.v1

Hague, M., Vichi, M., 2018. A Link Between CMIP5 Phytoplankton Phenology and Sea Ice in the Atlantic Southern Ocean. Geophysical Research Letters 45, 6566–6575. https://doi.org/10.1029/2018GL078061

Citation: https://doi.org/10.5194/egusphere-2022-849-RC2
AC1: 'Comment on egusphere-2022-849', Bror Jonsson, 22 Mar 2023

The authors would like to thank the reviewers for their insightful comments, which significantly helped improving the manuscript. Both reviewers feel that the manuscript makes a relevant contribution and should be published with minor modifications. Should the editor decide to accept our work, we have laid out the modifications that we would do before publication. They are all very straightforward. For each comment, we have first highlighted the issue, provided an answer, and described how the manuscript will be adjusted before publication. We are hopeful that the methods provided in our work will be of interest for the journal's audience and be helpful for many groups working with biogeochemical models.

Reviewer 1
The authors here present an interesting combination of statistical approaches to compare the probability density functions of observed and simulated chlorophyll across biomes and Longhurst provinces. I congratulate the authors on the work that has gone into this manuscript. It is extremely well written and the rationale behind their approach is cogently explained. I thoroughly hope that it encourages the ocean biogeochemical community to move beyond the use of correlation and RMSD as standard measures of model performance. I am happy to recommend the manuscript be accepted as it is but have a few very minor recommendations given below.
Perhaps the authors could discuss how the approach might be extended to provide greater insight on model-observation mismatch? Where coincident physical observations are available, one could presumably subset model and observation data to access for example the ability of models to capture the impact of regional heat waves on chlorophyll distributions.
This is a good question and we are planning to at least add some more text about it in the discussion.

As most biogeochemical models do not produce a Chl_Rrs output, perhaps the authors could offer some thoughts/recommendations on the use of the approach when only Chl_mod output is available. The approach still seems to offer plenty of insight and this should perhaps be emphasized a bit more.
We will add some more recommendations and discussions about how to use $Chl_{mod}$. This is, however, a big topic that probably should be addressed systematically by the community.

L37 Should RMSD be standard deviation? While the RMSD is sometimes also given in Taylor plots this doesn’t seem to be given in Figure 2.
This is a typo that we will fix. Thank you for finding it.

L80 Suggest giving the model name here.
We will add the model name as suggested.

Fig1. Suggest increasing the label font size for the regions identified.
We will increase the label font size.

Fig 2. A point color legend for these Taylor plots would be helpful.
We will add a legend.

Fig4a. The colorbar should have a label.
We will add a label to the colorbar.

Fig4b Suggest adding a point color legend.
We will add a color legend.

Reviewer 2
I had the opportunity to review earlier versions of this work when submitted to another journal. I was in disagreement with the editorial decision because of the methodological merit of this work. I am very pleased to see the authors decided to rearranged the manuscript in the form of a model evaluation tool and presented in GMD. The last community effort on the validation of ocean biogeochemical models was published in a special issue of the Journal of Marine Systems in 2009 (some papers from that issue are referenced in the manuscript, like Jolliff et al. and Doney et al.). The authors have perfected the explanation, and the manuscript is very easy to read and to the point.
The concept of using PDFs to compare simulated properties is not new in the Earth Sciences and in climate sciences, but this is usually addressed through visual comparisons, or using changes in the descriptors (median, mean, skewness, etc). A recent effort from the SCOR WG on BGC model intercomparisons for simulations of the ocean iron cycle (FeMIP) advocated the use of PDFs to carry out this comparison (Rogerson and Vichi, 2021), but did not offer any objective way to measure the difference between the simulated and modelled distributions. This is exactly what the proposed method is about, and its application goes beyond the focus on chlorophyll, as clearly demonstrated by the authors with teh use of a variety of chl-related datasets (including model-derived,a s the irradiance based chl).
L315: I have been impressed by how good the model is in the western boundary currents, and less in the eastern boundary currents. The good performance in the WBC is highlighetd in the discussion, but there is no discussion on the EBUS. I would expect the model to perform equally, given that in both cases primary production should occur at the surface. It may also be due to the broadness of EBUS, which are difficult to be separated from the adjacent sub-tropical gyres.
We have not discussed the EBUS regions in details as this is the topic of our next study, partly within the confines of the ESA funded project PRIMUS (http://primus-atlantic.org). We will add some more discussion about EBUSs in this paper as well.

L331-332: I think the author means the section on data availability. However, I could not find the figures in the Zenodo preview. I would suggest to upload all the supplementary figures to the repository, since some users may be interested in looking at the varous provinces without running the whole analysis
Great suggestion! We will add all figures to the github/Zenodo repository.

L333-340 and Fig. 9: These are all example from the Northern Hemisphere. It would be interesting to see the seasonality in the Southern Ocean, especially because it is further addressed in the discussion. This is less necessary if all the supplementary figures are made available in the repository. Figure 9 can be graphically improved by adding all the labels of the months on the Y-axis, and also the tick labels in panel a and b. There is enough space to add the numbers there.
We will add all figures to the repository and include more labels.

Sec. 3.6 I find this section title somehow confusing. The EMD has already been used previous results, (Fig.4b, 6, 7, and 8). Fig. 10 is a very useful spatial representation for determining the provinces with the higher discrepancy, so maybe the title should reflect the content of the section better. These graphs are however a little misleading, because the provinces that have a higher coastal coverage tend to have the higher EMD value. This is biased by the different distribution of lands between the northern and southern hemispheres. The Southern Ocean EMD is also quite high in July (as reported in teh discussion), but this is somehow not coming out clearly.
Good point, we will change the section header to "Global and seasonal patterns of Earth Mover's Distances" and add more figures to the appendix.

Another point is that Fig. 11 and Fig. 10 are not showing the same months. From the text, one would expect to find in Fig. 11 the same months shown in Fig 10. Maybe some more explanation of the choice of the provinces in Fig. 11 would help the reader.
The months presented in figure 10 are chosen to represent the Boreal and Austral winter/summer (January and July). The panels in figure 11 show the Longhurst regions and Months with highest with highest EMDs to present the variation in distributions. Most months in figure 11 are close to each other (Feb, Mar, Apr) and not necessary the months that are globally most interesting. We will clarify the text.

L375-386 I agree entirely with the point made on the Polar Biomes, and this analysis is very clear. However, the consequence seems to be that models tend to have earlier blooms, rather than delays (Hague and Vichi, 2018). This is also related to the point made at lines 435-439. The suggested paper does support the same mechanism suggested by the authors.
We will rewrite the text to address this concern.

L387-389 The author have demonstrated the power of this method with the Darwin-CBIOMES model, but it can be used with any other biogeochemical model (restricting the analysis to $chl_{mod}$). I would suggest to make this point somewhere in the discussion
We will add this to the discussion.

Citation: https://doi.org/10.5194/egusphere-2022-849-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Bror Jonsson on behalf of the Authors (19 Apr 2023) Author's response Author's tracked changes Manuscript

ED: Publish as is (20 Apr 2023) by Andrew Yool

AR by Bror Jonsson on behalf of the Authors (05 May 2023)

Short summary

While biogeochemical models and satellite-derived ocean color data provide unprecedented information, it is problematic to compare them. Here, we present a new approach based on comparing probability density distributions of model and satellite properties to assess model skills. We also introduce Earth mover's distances as a novel and powerful metric to quantify the misfit between models and observations. We find that how 3D chlorophyll fields are aggregated can be a significant source of error.