Calibrating and validating the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) urban cooling model: case studies in France and the United States

Hamel, Perrine; Bosch, Martí; Tardieu, Léa; Lemonsu, Aude; de Munck, Cécile; Nootenboom, Chris; Viguié, Vincent; Lonsdorf, Eric; Douglass, James A.; Sharp, Richard P.

doi:https://doi.org/10.5194/gmd-17-4755-2024

Articles | Volume 17, issue 12

https://doi.org/10.5194/gmd-17-4755-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-17-4755-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 17, issue 12

Model evaluation paper

|

18 Jun 2024

Model evaluation paper |

| 18 Jun 2024

Calibrating and validating the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) urban cooling model: case studies in France and the United States

Perrine Hamel, Martí Bosch, Léa Tardieu, Aude Lemonsu, Cécile de Munck, Chris Nootenboom, Vincent Viguié, Eric Lonsdorf, James A. Douglass, and Richard P. Sharp

Download

Final revised paper (published on 18 Jun 2024)
Supplement to the final revised paper
Preprint (discussion started on 11 Sep 2023)
Supplement to the preprint

Interactive discussion

Status: closed

RC1: 'Comment on egusphere-2023-928', Harald Zepp, 29 Dec 2023

General comments

Models for determining the cooling intensity of blue and green surfaces in urban areas are heavily necessary within the multifaceted assessment of urban ecosystem services. Planning expects models that provide meaningful results in terms of spatial temperature differences within an urban area and for planning scenarios. These two application cases pose the background of the calibration study at hand. The paper focuses on the calibration of the InVEST model using data set that reflect the land surface temperatures(LST) as well as the daytime air temperatures (Tair) both at night and daytime. It provides valuable insights in the significance of model parameters and the overall effect of calibration.

A crucial question for calibration is selecting an adequate data set, which reflects the reference temperatures to a degree that comes close to reality. In the case of land surface temperatures satellite based measurements (MODIS, Landsat) are commonly chosen. Much more delicate is the definition of air temperatures, which are available only from weather stations or a more or less dense network of often temporary auxiliary stations.

To be transparent, I declare that the author of this review has recently dealt with some of the issues raised in this comment (Zepp et al. 2023; Modeling the cooling effect of urban green spaces: The neglected control variables of ‘soil moisture’ and ‘biotope types’, in: Urban Forestry and Urban Greening 90, 128137. https://doi.org/10.1016/j.ufug.2023.128137).

Specific comments

The state-of-the-art is well summarized. The same is true for the objectives that are derived from the deficits that become obvious when comparing the previous model applications and - if existing - underlying calibrations. In this respect, the reviewer especially welcomes the attempt to calibrate modeled data against air temperatures. The authors are encouraged to more clearly address the questionable practice to model air temperatures and compare them with LST, as the latter cannot reveal the effect of shading included in the models. The authors present the InVEST cooling model in a satisfactory manner that provides an understanding without turning to the referenced manual.

I detected some mismatch between the objectives, outlined in the introduction, and the abstract. The development of a calibration algorithm is mentioned first in the abstract. The reader might expect a detailed explanation of the software tool, its strength and weaknesses. In fact, however, only the calibration and not on the tool itself is reported on. Only at the end, the first sentence of the conclusion takes up the calibration tool again. The majority of the calibration results and findings could possibly have been achieved without the special tool. (?) So the title of the paper undoubtedly reflects the content of the paper, but it does not mention the calibration tool, instead focusing on what we learned from the calibration.

Calibration for the Paris example was done for the heat wave that occurred in mid-August 2003. The question arises why averaged ETo (August 1985 – 2005) was chosen as input data for a specific defined heat wave. Wouldn’t it have and more appropriate to calculate ETo for mid-August using the FAO ETO-calculator. As a reference, the researchers used simulated air temperatures of a 1-km raster (TEB/Surfex).

Calibration for the Twin Cities of Minneapolis/St. Paul is based on interpolated air temperatures of a “dense sensor network”. The readers would like to know more about the density of the network. Was it dense enough to derive 1-km rasterized temperatures? Was it possible to account for the spatial patterns of the LULC presumably exhibiting different cooling capacities?

Results: One of the most important results is that calibration had only limited effect on the goodness of fit for the correlation between observed and modeled air temperatures. Even the uncalibrated model revealed satisfactory results. This is in line with experiences of the reviewer, as long as the spatial temperature differences are more interesting than absolute values.

The sensitivity analyses show that the model is sensitive for the weighing coefficients for air mixing, shading and evaporation. This points to the general question if we have enough diversified and differentiated crop coefficients for the calculation of the evapotranspiration from ETo. May I suggest to discuss this issue in the respective section.

Comments on the discussion: As a hypothesis: The strong correlation between modeled and observed air temperatures in the Paris example might partially be explained by similarities of the methods used in both data sets. The TEB/Surfex model considered the energy and water budget of Land surfaces. Taking ETo, InVEST also considers the energy balance. I think this is worth to be commented in the paper.

The freely available calibration tool for advanced application is surely very helpful for further systematic calibrating and testing. The tool has been extended in comparison to a former release.

In the conclusion, the authors raise the question of data resolution and data quality. How true! To narrow it down: For planning purposes in consolidated urban areas with little to moderate spatial expansion, resolution of models should be increased (significantly less than 1 km²) to be of wider practical use. The availability of measured air temperatures will sustain as a key issue.

Review question	Yes	No
1. Does the paper address relevant scientific modelling questions within the scope of GMD? Does the paper present a model, advances in modelling science, or a modelling protocol that is suitable for addressing relevant scientific questions within the scope of EGU?	X
2. Does the paper present novel concepts, ideas, tools, or data?	X, tool
3. Does the paper represent a sufficiently substantial advance in modelling science?	X
4. Are the methods and assumptions valid and clearly outlined?	Presentation of data (T_air) used for calibration should be extended.
5. Are the results sufficient to support the interpretations and conclusions?	X
6. Is the description sufficiently complete and precise to allow their reproduction by fellow scientists (traceability of results)? In the case of model description papers, it should in theory be possible for an independent scientist to construct a model that, while not necessarily numerically identical, will produce scientifically equivalent results.	see above and X
7. Do the authors give proper credit to related work and clearly indicate their own new/original contribution?	X
8. Does the title clearly reflect the contents of the paper? The model name and number should be included in papers that deal with only one model.		The title does not point to the development of a calibration algorithm, which is made available to the public. See my comment above.
9. Does the abstract provide a concise and complete summary?		I suggest to integrate the tool availability in the abstract.
10. Is the overall presentation well structured and clear?	X
11. Is the language fluent and precise?	X
12. Are mathematical formulae, symbols, abbreviations, and units correctly defined and used?	X
13. Should any parts of the paper (text, formulae, figures, tables) be clarified, reduced, combined, or eliminated?	no	no
14. Are the number and quality of references appropriate?	X
15. Is the amount and quality of supplementary material appropriate?	X

Citation: https://doi.org/10.5194/egusphere-2023-928-RC1

AC1:
'Reply on RC1', Perrine Hamel, 02 Feb 2024
We thank the reviewer for their thorough review of the paper, made possible by their extensive expertise in the topic. Our responses to these comments have significantly improved the manuscript. Below we respond to the few comments and suggestions outlined by the reviewer.
State of the art: "The authors are encouraged to more clearly address the questionable practice to model air temperatures and compare them with LST, as the latter cannot reveal the effect of shading included in the models."

Response: We agree with the reviewer. We have added, l. 312: " We highlight, however, that the model was initially not developed for land surface temperatures. The fair performance for land surface temperatures is an artifact of the model’s simplified representation of air temperatures, which imperfectly represents the local energy balance, and the strong correlation of both air and surface temperatures with LULC."
Paper objectives: "I detected some mismatch between the objectives, outlined in the introduction, and the abstract. The development of a calibration algorithm is mentioned first in the abstract. The reader might expect a detailed explanation of the software tool, its strength and weaknesses. In fact, however, only the calibration and not on the tool itself is reported on."

Response: We thank the reviewer for this comment. We have clarified in the abstract that we "further" developed an existing algorithm (as indeed the improvement is modest compared to the initial development of the algorithm. We have also clarified which improvements were made for this study: " The main improvement on the calibration tool developed for this study is the ability to use as reference temperatures either point data (e.g., a network of stations) or raster data (as is the case for the temperature data in this study). Other improvements are minor and of technical nature (as documented in the source code). The tool reports several performance metrics: mean absolute error (MAE), root mean square error (RMSE), and r2. " (l. 216)
Calibration for the Paris example: "The question arises why averaged ETo (August 1985 – 2005) was chosen as input data for a specific defined heat wave. Wouldn’t it have and more appropriate to calculate ETo for mid-August using the FAO ETO-calculator."

Response: We agree that using the FAO ET0-calculator is another option. However, in order to minimize differences in model inputs, we have decided to use the outputs from the ALADIN model that is also used in the reference (modeled) data.
Calibration for the Twin Cities of Minneapolis/St. Paul: "The readers would like to know more about the density of the network. Was it dense enough to derive 1-km rasterized temperatures? Was it possible to account for the spatial patterns of the LULC presumably exhibiting different cooling capacities?"

Response: We agree this is an important point to consider the quality of the reference data. We have added the details on the dense network and interpolation from Smoliak et al's data: "(~170 stations over 5000 km2, interpolated with by cokriging using impervious surfaces" (l.205)
Sensitivity analyses: "This points to the general question if we have enough diversified and differentiated crop coefficients for the calculation of the evapotranspiration from ETo. May I suggest to discuss this issue in the respective section."

Response: We agree and we have made this point explicit in the Discussion of the uncertainties in biophysical values ("(in particular crop coefficients – notably difficult to ascertain for urban land uses)", l. 354)
"The strong correlation between modeled and observed air temperatures in the Paris example might partially be explained by similarities of the methods used in both data sets. The TEB/Surfex model considered the energy and water budget of Land surfaces. Taking ETo, InVEST also considers the energy balance. I think this is worth to be commented in the paper."

Response: While we agree with the similarities in basic principles, we highlight that the models are very different in their mathematical approaches. TEB/Surfex is much more refined temporally and able to represent more complex physical processes. As such, we decided not to emphasize the similarities in models, but rather that they are both heavily influenced by LULC.
Citation: https://doi.org/10.5194/egusphere-2023-928-AC1

RC2:
'Comment on egusphere-2023-928', Anonymous Referee #2, 12 Jan 2024
General comments
The study evaluates the performance of InVEST urban cooling model estimates (i.e., day and nighttime air temperature) against alternative models for air temperature and satellite data for land surface temperature, using spatial correlations (r²) and mean absolute error (MAE). Moreover, the study compared model outcomes for Paris between two models (i.e., InVEST, TEB/Surfex model) under a greening scenario. This manuscript is timely and highly relevant to contribute to the latest discussion on modelling urban ecosystem services for urban planning and decision-making.
The manuscript is well written and structured. The summary of key model equations to orient the readers works very well, and the figures and tables are clear. The discussion reflects on relevant points and the conclusions drawn is well supported by the results.
I have a few minor suggestion to further improve the manuscript, particularly regarding the input data used, calibration and performance assessment (section 2.3. and 2.4.).
Specific comments
Can I assume that the model run before the calibration used the default parameter values as recommended by the InVEST urban cooling model user guide? Maybe that can then made more explicit in the manuscript. In line of that, as Table 2 shows no calibrated values for Paris (for daytime) can we assume that the parameter values have been not changed in the ‘after calibration’?
The author may shortly also explain the performance criteria used (r2 and MAE) (using spatial data, right?)
InVEST does not predicts LST, but they authors seem to expect a generally good correlation between LST and air temperature (as mentioned in line 70-71; does it account for both, day and night time temperature)? Maybe that could be a bit more discussed in the manuscript, to explain to the reader why the study also included LST in their assessment (and not using reference temperature estimates only).
Line 175 – 177: could the authors add a short description (in the main text or supplementary) of how the different parameter values were assigned, e.g. did the authors use default setting? Or was another method applied? Was it done differently in the two case studies?
Line 199 – 207: different years have been used for the temperature data for the Twin Cities case study, while the authors state to ‘study the regional heat wave event in July 22, 2016’ (line 123). This seems to be not correct? Similar, I was wondering why the authors used monthly reference evapotranspiration data averaged for the period 1985 – 2005 instead of monthly data for the year 2003 (as done for the temperature data)? Could you explain why?
Line 257 – 259: could the author shortly elaborate on the the changes of parameter values when compare to values used before the calibration. For example, did the value changed a lot, hence stressing the need to calibrate model parameters instead of using default model settings?
Table 1 shows for the Twin Cities a higher correlation for daytime air temperature with LST after the calibration. Maybe the authors could elaborate a bit more on it (e.g., in line 287)
Technical corrections
Supplementary data of Paris case study: It would be helpful to translate the biophysical table also to English. Also I recommend to add either a short description of how the values have been developed (e.g., which method was used to estimate Kc) or using a reference where the method is further described.

Line 234, 236, 245-246: suggesting to avoid interpretations like ‘overestimating’ and ‘underestimating’ but being descriptive e.g., InVEST model shows higher temperature in forested areas when compared to the reference data; as references data are also obtained from a model rather than obtained from temperature measurements in the field.

Line 253: I do not understand where the ‘confirmation’ refers to. Does this refer to the sentence before regarding the very low correlation for daytime air temperature in Paris? Can that be explained with the results of the sensitivity analysis with LST data?

Line 214: the link to the user guide does not work
Citation: https://doi.org/10.5194/egusphere-2023-928-RC2
- AC2:
  'Reply on RC2', Perrine Hamel, 02 Feb 2024
  We thank the reviewer for their thorough review of the paper, which clearly shows their expertise in the field. Below, we respond to the specific comments.
  Specific comments
  Can I assume that the model run before the calibration used the default parameter values as recommended by the InVEST urban cooling model user guide? Maybe that can then made more explicit in the manuscript. In line of that, as Table 2 shows no calibrated values for Paris (for daytime) can we assume that the parameter values have been not changed in the ‘after calibration’?
  
  The author may shortly also explain the performance criteria used (r2 and MAE) (using spatial data, right?)
  
  Response: We have clarified these points in the manuscript:
  l. 212: “The model outputs were first assessed without calibration, using default parameter values (500 m, 100 m, 0.2, 0.2, 0.6, for r_mix, d_cool, W_a, W_e, W_s, respectively).”
  
  l. 268: “Because of the very low r² values for daytime temperatures in Paris, we considered calibration unsuccessful and we do not report the calibrated values.”
  
  l. 219: “We selected these metrics since MAE and RMSE are useful quantification of the uncertainty in model outputs, which is important from a user perspective. However, MAE and RMSE also depend on UHI_max, which means that performance might be artificially good for areas with small urban heat island magnitudes. For this reason, we also report r² (the default performance criterion for the optimization).”
  
  InVEST does not predicts LST, but they authors seem to expect a generally good correlation between LST and air temperature (as mentioned in line 70-71; does it account for both, day and night time temperature)? Maybe that could be a bit more discussed in the manuscript, to explain to the reader why the study also included LST in their assessment (and not using reference temperature estimates only).
  
  Response: We agree with this point that was also raised by Reviewer 1. We have added some clarifications in the discussion, in addition to the point in the Methods. L. 312: " We highlight, however, that the model was initially not developed for land surface temperatures. The fair performance for land surface temperatures is an artifact of the model’s simplified representation of air temperatures, which imperfectly represents the local energy balance, and the strong correlation of both air and surface temperatures with LULC."
  
  Line 175 – 177: could the authors add a short description (in the main text or supplementary) of how the different parameter values were assigned, e.g. did the authors use default setting? Or was another method applied? Was it done differently in the two case studies?
  
  Response: As there are no default values for the biophysical parameters, we have derived the values from the literature. The references are provided in Appendix B, with the full biophysical table provided in supplementary data. The reviewer is correct that the methods are different for each case study, since some sources are specific to the case study (e.g., the (APUR, 2020) reference, cited in the main manuscript, is a report with values measured or modelled for Paris). We have also clarified that the UHI_max values and T_ref temperatures for the Twin Cities case studies were derived differently than for the Paris case study. We highlight this limitation in the Discussion:
  - l. 328: “The poor performance of the model for daytime air temperature may also be attributed to errors in parameterizations, in particular the use of climate data for short periods (e.g., Aug 6-13^th 2003 for Paris) vs. average values over several months, as was the case for some inputs (e.g., reference evapotranspiration in the Paris case study, or T_ref and UHI_max in the Twin Cities, see Section 2.3). Further investigation of these temporal dynamics should be explored in future work, although we highlight that they did not seem to impact the fair performance of the model for nighttime air temperatures or land surface temperatures.”
  
  Line 199 – 207: different years have been used for the temperature data for the Twin Cities case study, while the authors state to ‘study the regional heat wave event in July 22, 2016’ (line 123). This seems to be not correct? Similar, I was wondering why the authors used monthly reference evapotranspiration data averaged for the period 1985 – 2005 instead of monthly data for the year 2003 (as done for the temperature data)? Could you explain why?
  
  Response: We thank the reviewer for highlighting these inconsistencies, which we have clarified in the manuscript. For the Twin Cities case study, we have modelled the regional heat wave event for LST but the average summertime temperatures for 2011-2014. (We have clarified this with better punctuation l. 122) In general though, we agree that climate variables are not consistently derived from climate data series for both case studies. For reference evapotranspiration, the rational is that the variable is only used for its spatial distribution rather than absolute values. We have acknowledged these temporal inconsistencies in the Discussion, highlighting that these were not investigated in this study.
  - l. 331: “Further investigation of these temporal dynamics should be explored in future work, although we highlight that they did not seem to impact the fair performance of the model for nighttime air temperatures or land surface temperatures.”
  
  Line 257 – 259: could the author shortly elaborate on the the changes of parameter values when compare to values used before the calibration. For example, did the value changed a lot, hence stressing the need to calibrate model parameters instead of using default model settings?
  
  Response: We have added this information to facilitate interpretation for the readers:
  - l. 270: “The shade weight remains the highest after calibration, and values only changed by <15%.”
  We have also clarified earlier the default values:
  - l. 212: “The model outputs were first assessed without calibration, using default parameter values (500 m, 100 m, 0.2, 0.2, 0.6, for r_mix, d_cool, W_a, W_e, W_s, respectively).”
  
  Table 1 shows for the Twin Cities a higher correlation for daytime air temperature with LST after the calibration. Maybe the authors could elaborate a bit more on it (e.g., in line 287)
  
  Response: We agree that the improvement for LST in the case of the Twin Cities is notable. We posit that this could be due to the LULC configuration of the Twin Cities but our analyses do not allow us to ascertain why the calibration was more effective for this dataset. We have highlighted this as an area for future research:
  - l. 300: “Only in the Twin Cities for surface temperature did the calibration significantly improve model performance, possibly due to the LULC configuration in this landscape, although our analyses do not allow us to confirm this hypothesis”
  
  Technical corrections
  Supplementary data of Paris case study: It would be helpful to translate the biophysical table also to English. Also I recommend to add either a short description of how the values have been developed (e.g., which method was used to estimate Kc) or using a reference where the method is further described.
  
  Response: We have translated the biophysical table (supplementary data). The methods to derive biophysical table values was also added to the Appendix (for the Paris case study, while it was already published for the Twin Cities, with the reference provided in the Appendix).
  
  Line 234, 236, 245-246: suggesting to avoid interpretations like ‘overestimating’ and ‘underestimating’ but being descriptive e.g., InVEST model shows higher temperature in forested areas when compared to the reference data; as references data are also obtained from a model rather than obtained from temperature measurements in the field.
  
  Response: We argue that the term “overestimate” and “underestimate” are routinely used in modelling papers and do not infer a personal interpretation, but an objective bias (positive or negative)
  
  Line 253: I do not understand where the ‘confirmation’ refers to. Does this refer to the sentence before regarding the very low correlation for daytime air temperature in Paris? Can that be explained with the results of the sensitivity analysis with LST data?
  
  Response: The confirmation refers to the limited effect of the calibration, i.e. if the model is not very sensitive to the parameters, using those parameters as calibration parameters will not have a strong effect. We have clarified this in the next sentence: l. 266: “which could explain the limited effect of calibration”.
  
  Line 214: the link to the user guide does not work
  
  Response: We have fixed the link to the user guide.
  
  Citation: https://doi.org/10.5194/egusphere-2023-928-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Perrine Hamel on behalf of the Authors (02 Feb 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (10 Feb 2024) by Jinkyu Hong

RR by Anonymous Referee #2 (20 Feb 2024)

ED: Publish subject to minor revisions (review by editor) (24 Feb 2024) by Jinkyu Hong

AR by Perrine Hamel on behalf of the Authors (05 Mar 2024) Author's response Author's tracked changes Manuscript

ED: Publish as is (20 Mar 2024) by Jinkyu Hong

AR by Perrine Hamel on behalf of the Authors (27 Mar 2024)

Post-review adjustments

AA: Author's adjustment | EA: Editor approval

AA by Perrine Hamel on behalf of the Authors (29 May 2024) Author's adjustment Manuscript

EA: Adjustments approved (31 May 2024) by Jinkyu Hong

Short summary

The InVEST Urban Cooling model estimates the cooling effect of vegetation in cities. We further developed an algorithm to facilitate model calibration and evaluation. Applying the algorithm to case studies in France and in the United States, we found that nighttime air temperature estimates compare well with reference datasets. Estimated change in temperature from a land cover scenario compares well with an alternative model estimate, supporting the use of the model for urban planning decisions.

Calibrating and validating the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) urban cooling model: case studies in France and the United States

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection

Post-review adjustments