the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Constraining the carbon cycle in JULES-ES-1.0
Douglas McNeall
Eddy Robertson
Andy Wiltshire
Abstract. Land surface models are an important tool in the study of climate change and its impacts, but their use can be hampered by uncertainties in input parameter settings and by errors in the models. We apply Uncertainty Quantification (UQ) techniques to constrain the input parameters of JULES-ES-1.0, the land surface component of the UK Earth system model UKESM1.0. We use an ensemble of historical simulations of the land surface model to rule out ensemble members and corresponding input parameter settings that do not match modern observations of the land surface. As JULES-ES-1.0 is computationally expensive, we use a cheap statistical proxy termed an emulator, trained on the ensemble of model runs, to rule out untested parts of parameter space. We use history matching, an iterated approach to constraining JULES-ES-1.0, running an initial ensemble and training the emulator, before choosing a second wave of ensemble members consistent with historical land surface observations. We rule out 88 % of the initial input parameter space as being statistically inconsistent with observed land surface behaviour. We use sensitivity analysis to identify the most (and least) important input parameters for controlling the global output of JULES-ES-1.0, and provide information on how parameters might be varied to improve the performance of the model and eliminate model biases.
- Preprint
(23310 KB) - Metadata XML
- BibTeX
- EndNote
Douglas McNeall et al.
Status: closed
-
RC1: 'Comment on gmd-2022-280', Michel Crucifix, 08 Mar 2023
Review of "Constraining the carbon cycle in JULES-ES-1.0"
This is a convincing case study of calibration and sensitivity analysis of a land-surface model, using Gaussian process emulation, experiment design, history matching, and different techniques of sensitivity analysis.
Main comment
------------Editorial suggestions aside (naming conventions, graphics, clarifications, detailed below) my main comment of substance is about the validation of the emulator. The process is quite well explained, with a focus on leave one out analysis. But:
- Some variables are not so well emulated (shurbFrac_Ind_mean, fLuc_Ind_sum). The authors are aware and seem to have done their best, but the expected implications on the output of sensitivity analysis are perhaps not quite well outlined
- Little is said about the validation of emulator variance. Visual inspection of Figures C1 and C2 helps a bit, but quantitative diagnostics could be heplful here.
Page per page comment
----------------------page 2: I would suggest to move away from the very idea that there is a "best" configuration and uncertainty around it. Because how you define the "best" configuration depends on a metric that is not unambiguously defined. Note also that what is meant by the "value" of a "configuration" is not clear. You want to delineate a space of configuration that is fit-for-purpose.
p. 3 line 76: edit (repeated "and then")
p. 7 l. 173 : it could be helpful to give the parameter numbers of fo_io and b_wl_io, for easier match between the text and the figures.
p. 7 l. 173 : I disagree with the interpretation. Given the nature of the projection, what we can visualise are thresholds beyond which there is no chance of having a zero carbon (we don't see green points). So the absence of intersection further suggests that the 'no chance zone' of having a 'zero carbon' is increased.
Figure 3 and Figures C1 and C2. Labels are quite clear on Figure 3, and hardly legible on Figs C1 and C2 (at least on the print copy, you need to look a the PDF. Now I realise that the set simulated variables are different so the whole affair is rather confusing. Why not stick to the same set of variables throughout the study ?
Section 2.5:
- Technical question: how do you calibrate the emulator in the parameter space where experiments fail ? Emulator will be unreliable there, so how do you safeguard interpretations ? (e.g. : variance analysis coming from running the emulator).
- What a "wave" is must be clearly defined at one point, and then the word must be used consistently. A wave is a plan of experiments with the full model (simulator). If you take that definition, then you can reformulate a few sentences: 'as we have seen from the first wave --> as we have seen by inspecting the output of the first wave'.
Can we have a more legible scheme (why the leading zero in wave00 and wave01). Perhaps more explicit terms would help.
- At places you refer to "Our modeller"; later "modeller Wilshire" -> is it really necessary to name the co-author
p. 13 l. 264 : 'if history matching was perfect, 95% of ensemble members would have output'. I am not sure. I believe this is wrong. Suppose that the hyper-surface delineating I=3 is convex. In that case, 100 % of the experiments inside this surface must have I<3. Maybe this region of I=3 is very complex and scattered, with many islands. But in any case, the link between I=3 being a 95% error measure, and the rate of success in the emulator delineating the I=3 region is not straightforward.
l. 269: "is much higher" ... than what ?
l. 276 : "numerical problems" : are these numerical problems caused by the specific choice of parameters, or could numerical problems occur quasi-stochastically for any parameter value ? Whether it is one or the other will have implications about whether we could be allowed to simply ignore failures in the emulator design, or whether we somehow need to keep track of the fact that one parameter region is dangerous / irrelevant.
Figure 5 : what is the dark-red zone ?
l. 306 : my remark about how the emulator behaves in the 'unsafe' zone where the model is failing comes again. What are the possible implications on the sensitivity analysis ?
Figures 7, 8 and 10 : what is the right-hand-side part of the Figure ? It does not seem to bring much information.
p. 27, l. 410 : It's -> It isPerhaps one more point worth mentioning in the discussion is that the whole process is influenced by the way constraints are selected and aggregated. This is a place where non-epistemic value judgement may be injected (again, the myth of 'a' best parameter configuration).
Final thoughts
- Posterior variance is an important aspect of the GP approach, but it is used little in this study, except (crucially) in equation 3. Again, this variance does not seem to be thoroughly assessed. Does it matter ?- Graphics: make sure labels are legible, and where possible, use more self-explicit labels.
- Graphics (again): the pdf size is big, and displays slowly; consider rasterizing some figures.
Citation: https://doi.org/10.5194/gmd-2022-280-RC1 -
RC2: 'Comment on gmd-2022-280', Anonymous Referee #2, 15 Mar 2023
The manuscript “Constraining the carbon cycle in JULES-ES-1.0” by McNeal et al. examines the use of Gaussian process emulation for exploring the parameter space relevant to the global carbon cycle in the land surface model JULES. It first uses a large perturbed parameter ensemble of JULES runs to train emulators for each of the desired output variables, and then exploits these to carry out history matching and sensitivity analyses. Overall, this is a thoughtful and well written paper, which may well pave the way for better understanding of how we parameterise our land surface models.
However, the paper does not deliver on its initial promise. No concrete results on the extent to which the carbon cycle has been constrained are reported. The closest the authors come to this are Figure 3, 4 and 5, which summarise ensembles of various model diagnostics under different levels of constraint, but I did not find these figures particularly informative, and if they have enabled the authors to make useful decisions about the parameters in JULES that does not come across.
It is very telling that that the discussion and conclusion of the paper focus on the technical aspects of what was done with the ensemble generation and emulation and how this could be improved. Discussion of the what was learned about the JULES carbon cycle and the relevant parameters is largely absent from these sections. Do the authors recommend changing any of the JULES-ES parameters for future configurations based on this work? Great if so, but this is not mentioned. If the outcome of a paper claiming to constrain the carbon cycle in a model does not put forward any proposed changes to the model parameters and/or process then can it really claim to have constrained the model? Overall, I was left wondering if the failed emulation had essentially limited the author’s ability to comment on the carbon cycle.
Main corrections:
1) Despite the above comments, I believe there is much value in this paper, but the way it is framed is misleading. The simplest thing for the authors to do is to change the title to better reflect the content of the paper (something like “Emulation of the carbon cycle in JULES-ES to explore input parameter distributions” could be appropriate, but I do not intend this to be prescriptive). However, if the authors feel that I am wrong, then I think there needs to be a much more in-depth discussion of what has been found about the JULES carbon cycle. Ideally this should include some quantification of key global stocks and fluxes and/or discussion of what parameters or process need to be changed in the model and there should be ample discussion of this. As an aside, I note that the abstract of the current manuscript does not mention the carbon cycle or relevant variables.
2) It is not clear to me that the results are trustworthy. To what extent does the poor performance of the emulators influence the results? In Figure 5, apparently only 32% of ensemble members conform to the constraints, compared to the theoretical ideal of 95%. I accept that the 95% level is unobtainable given emulators will never be perfect but 32% seems, to me, to throw doubt over the validity of most of the results in the paper that follow. The cVeg emulator appears not to work, but yet the analysis in the paper includes it in the formulation of the constraint despite that.
It is to the author’s credit that they have been explicit about what has gone wrong, but quantification of the implications for the onward analysis in absent. There is a statement at line 300 that says “we judge that the emulator is accurate enough” but, despite having looked at Appendix C in detail, I am unable to arrive at the same conclusion. Perhaps I am just missing something, and in fact the results are robust despite the poor emulator skill, but if so, I think that needs to come across in a quantitative fashion in the manuscript, ideally in the main text. Maybe an easier option would be to drop the cVeg constraint, be explicit that that has been done, and then re-do the analysis in the paper with just the other three constraints.
In summary, I think one of the following is required: (a) quantification of the errors in the results due to the poor emulation or (b) removal of the cVeg emulator and re-calculation of the analysis in the paper.
Typos and minor corrections:
L20: “linit”
L63: “train a an emulator”
L98: “CO2” should have a subscripted “2”
L122: unclosed “(“
Fig 3 and 4: two subplots are labelled “c3PftFrac” presumably one is for C4?
L280: “also see a constraints of”
L383: “modellers” needs a possessive apostrophe in this instance
L399: “sensitivity analysis analyses share”
L406: is “no-regrets” a specific term, or would “an exploratory analysis” suffice? I am not sure what is intended by “no-regrets”
L452: “bwlio” and “f0io” presumably need underscores, which have instead been translated to subscripts.
L460: “There (Level 2) using the emulator”
L499: “kriging” should be capital “K”.
L505: “give the modeller as far as possible about the” – missing word?
L540-542 and Table C1: The numbers in the text and table do not agree. Possible I have misunderstood and they are not meant to, but it seemed they should.
Citation: https://doi.org/10.5194/gmd-2022-280-RC2 -
AC1: 'Comment on gmd-2022-280', Douglas McNeall, 20 Sep 2023
Our thanks to the editors and to the two reviewers of our manuscript for the constructive feedback. We have used the feedback to create a better manuscript, which we hope will address any concerns of the reviewers. Our responses to both reviewers are contained in the supplementary document.
Doug McNeall, on behalf of the authors.
Status: closed
-
RC1: 'Comment on gmd-2022-280', Michel Crucifix, 08 Mar 2023
Review of "Constraining the carbon cycle in JULES-ES-1.0"
This is a convincing case study of calibration and sensitivity analysis of a land-surface model, using Gaussian process emulation, experiment design, history matching, and different techniques of sensitivity analysis.
Main comment
------------Editorial suggestions aside (naming conventions, graphics, clarifications, detailed below) my main comment of substance is about the validation of the emulator. The process is quite well explained, with a focus on leave one out analysis. But:
- Some variables are not so well emulated (shurbFrac_Ind_mean, fLuc_Ind_sum). The authors are aware and seem to have done their best, but the expected implications on the output of sensitivity analysis are perhaps not quite well outlined
- Little is said about the validation of emulator variance. Visual inspection of Figures C1 and C2 helps a bit, but quantitative diagnostics could be heplful here.
Page per page comment
----------------------page 2: I would suggest to move away from the very idea that there is a "best" configuration and uncertainty around it. Because how you define the "best" configuration depends on a metric that is not unambiguously defined. Note also that what is meant by the "value" of a "configuration" is not clear. You want to delineate a space of configuration that is fit-for-purpose.
p. 3 line 76: edit (repeated "and then")
p. 7 l. 173 : it could be helpful to give the parameter numbers of fo_io and b_wl_io, for easier match between the text and the figures.
p. 7 l. 173 : I disagree with the interpretation. Given the nature of the projection, what we can visualise are thresholds beyond which there is no chance of having a zero carbon (we don't see green points). So the absence of intersection further suggests that the 'no chance zone' of having a 'zero carbon' is increased.
Figure 3 and Figures C1 and C2. Labels are quite clear on Figure 3, and hardly legible on Figs C1 and C2 (at least on the print copy, you need to look a the PDF. Now I realise that the set simulated variables are different so the whole affair is rather confusing. Why not stick to the same set of variables throughout the study ?
Section 2.5:
- Technical question: how do you calibrate the emulator in the parameter space where experiments fail ? Emulator will be unreliable there, so how do you safeguard interpretations ? (e.g. : variance analysis coming from running the emulator).
- What a "wave" is must be clearly defined at one point, and then the word must be used consistently. A wave is a plan of experiments with the full model (simulator). If you take that definition, then you can reformulate a few sentences: 'as we have seen from the first wave --> as we have seen by inspecting the output of the first wave'.
Can we have a more legible scheme (why the leading zero in wave00 and wave01). Perhaps more explicit terms would help.
- At places you refer to "Our modeller"; later "modeller Wilshire" -> is it really necessary to name the co-author
p. 13 l. 264 : 'if history matching was perfect, 95% of ensemble members would have output'. I am not sure. I believe this is wrong. Suppose that the hyper-surface delineating I=3 is convex. In that case, 100 % of the experiments inside this surface must have I<3. Maybe this region of I=3 is very complex and scattered, with many islands. But in any case, the link between I=3 being a 95% error measure, and the rate of success in the emulator delineating the I=3 region is not straightforward.
l. 269: "is much higher" ... than what ?
l. 276 : "numerical problems" : are these numerical problems caused by the specific choice of parameters, or could numerical problems occur quasi-stochastically for any parameter value ? Whether it is one or the other will have implications about whether we could be allowed to simply ignore failures in the emulator design, or whether we somehow need to keep track of the fact that one parameter region is dangerous / irrelevant.
Figure 5 : what is the dark-red zone ?
l. 306 : my remark about how the emulator behaves in the 'unsafe' zone where the model is failing comes again. What are the possible implications on the sensitivity analysis ?
Figures 7, 8 and 10 : what is the right-hand-side part of the Figure ? It does not seem to bring much information.
p. 27, l. 410 : It's -> It isPerhaps one more point worth mentioning in the discussion is that the whole process is influenced by the way constraints are selected and aggregated. This is a place where non-epistemic value judgement may be injected (again, the myth of 'a' best parameter configuration).
Final thoughts
- Posterior variance is an important aspect of the GP approach, but it is used little in this study, except (crucially) in equation 3. Again, this variance does not seem to be thoroughly assessed. Does it matter ?- Graphics: make sure labels are legible, and where possible, use more self-explicit labels.
- Graphics (again): the pdf size is big, and displays slowly; consider rasterizing some figures.
Citation: https://doi.org/10.5194/gmd-2022-280-RC1 -
RC2: 'Comment on gmd-2022-280', Anonymous Referee #2, 15 Mar 2023
The manuscript “Constraining the carbon cycle in JULES-ES-1.0” by McNeal et al. examines the use of Gaussian process emulation for exploring the parameter space relevant to the global carbon cycle in the land surface model JULES. It first uses a large perturbed parameter ensemble of JULES runs to train emulators for each of the desired output variables, and then exploits these to carry out history matching and sensitivity analyses. Overall, this is a thoughtful and well written paper, which may well pave the way for better understanding of how we parameterise our land surface models.
However, the paper does not deliver on its initial promise. No concrete results on the extent to which the carbon cycle has been constrained are reported. The closest the authors come to this are Figure 3, 4 and 5, which summarise ensembles of various model diagnostics under different levels of constraint, but I did not find these figures particularly informative, and if they have enabled the authors to make useful decisions about the parameters in JULES that does not come across.
It is very telling that that the discussion and conclusion of the paper focus on the technical aspects of what was done with the ensemble generation and emulation and how this could be improved. Discussion of the what was learned about the JULES carbon cycle and the relevant parameters is largely absent from these sections. Do the authors recommend changing any of the JULES-ES parameters for future configurations based on this work? Great if so, but this is not mentioned. If the outcome of a paper claiming to constrain the carbon cycle in a model does not put forward any proposed changes to the model parameters and/or process then can it really claim to have constrained the model? Overall, I was left wondering if the failed emulation had essentially limited the author’s ability to comment on the carbon cycle.
Main corrections:
1) Despite the above comments, I believe there is much value in this paper, but the way it is framed is misleading. The simplest thing for the authors to do is to change the title to better reflect the content of the paper (something like “Emulation of the carbon cycle in JULES-ES to explore input parameter distributions” could be appropriate, but I do not intend this to be prescriptive). However, if the authors feel that I am wrong, then I think there needs to be a much more in-depth discussion of what has been found about the JULES carbon cycle. Ideally this should include some quantification of key global stocks and fluxes and/or discussion of what parameters or process need to be changed in the model and there should be ample discussion of this. As an aside, I note that the abstract of the current manuscript does not mention the carbon cycle or relevant variables.
2) It is not clear to me that the results are trustworthy. To what extent does the poor performance of the emulators influence the results? In Figure 5, apparently only 32% of ensemble members conform to the constraints, compared to the theoretical ideal of 95%. I accept that the 95% level is unobtainable given emulators will never be perfect but 32% seems, to me, to throw doubt over the validity of most of the results in the paper that follow. The cVeg emulator appears not to work, but yet the analysis in the paper includes it in the formulation of the constraint despite that.
It is to the author’s credit that they have been explicit about what has gone wrong, but quantification of the implications for the onward analysis in absent. There is a statement at line 300 that says “we judge that the emulator is accurate enough” but, despite having looked at Appendix C in detail, I am unable to arrive at the same conclusion. Perhaps I am just missing something, and in fact the results are robust despite the poor emulator skill, but if so, I think that needs to come across in a quantitative fashion in the manuscript, ideally in the main text. Maybe an easier option would be to drop the cVeg constraint, be explicit that that has been done, and then re-do the analysis in the paper with just the other three constraints.
In summary, I think one of the following is required: (a) quantification of the errors in the results due to the poor emulation or (b) removal of the cVeg emulator and re-calculation of the analysis in the paper.
Typos and minor corrections:
L20: “linit”
L63: “train a an emulator”
L98: “CO2” should have a subscripted “2”
L122: unclosed “(“
Fig 3 and 4: two subplots are labelled “c3PftFrac” presumably one is for C4?
L280: “also see a constraints of”
L383: “modellers” needs a possessive apostrophe in this instance
L399: “sensitivity analysis analyses share”
L406: is “no-regrets” a specific term, or would “an exploratory analysis” suffice? I am not sure what is intended by “no-regrets”
L452: “bwlio” and “f0io” presumably need underscores, which have instead been translated to subscripts.
L460: “There (Level 2) using the emulator”
L499: “kriging” should be capital “K”.
L505: “give the modeller as far as possible about the” – missing word?
L540-542 and Table C1: The numbers in the text and table do not agree. Possible I have misunderstood and they are not meant to, but it seemed they should.
Citation: https://doi.org/10.5194/gmd-2022-280-RC2 -
AC1: 'Comment on gmd-2022-280', Douglas McNeall, 20 Sep 2023
Our thanks to the editors and to the two reviewers of our manuscript for the constructive feedback. We have used the feedback to create a better manuscript, which we hope will address any concerns of the reviewers. Our responses to both reviewers are contained in the supplementary document.
Doug McNeall, on behalf of the authors.
Douglas McNeall et al.
Douglas McNeall et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
486 | 190 | 21 | 697 | 9 | 9 |
- HTML: 486
- PDF: 190
- XML: 21
- Total: 697
- BibTeX: 9
- EndNote: 9
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1