I highly appreciate the efforts the authors have taken in response to my first review. All in all, the additional analyses and textual refinements have improved the robustness of the manuscript which I deem almost ready for acceptance.
Yet, one relatively important aspect needs further consideration which refers to the selection of AWC-scenarios based on r² and MAE. In particular, the authors now provide more detailed insights into the behavior of r² and MAE in the period prior to 2018, indicating – as suspected – a lower fit between simulated and observed mortality rates. However, they explain this by stochasticity of mortality prior to 2018 which I do not see supported by some of the display items. That is, some of the mortality peaks prior to 2018 in fact seem to be caused by drought, wherefore a satisfying model fit in that period is desirable, too.
While this aspect does not question the overall mortality implementation at all (which I personally think is a strong point of the manuscript since not being based on calibration but model-internal mechanisms) it does have an impact on the presented results. To make this clear early on: I am not asking for major revisions of the results section. But I would like to see a few more aspects of falsification and visualization of uncertainty in the supplementary alongside a brief discussion/evaluation of these items in the results and discussion. This basically includes two figures comparable to Figs. 3 and 4 but only for the calibration period 2000-2017. This would clearly visualize, how a different period used for the identification of the best AWC-scenarios would affect the selection and consequently the simulated mortality rates. The authors can then briefly mention these effects in the corresponding results sections (lines 364-370 for beech and 438-444 for spruce) and elaborate the discussion in lines 479-486 and/or 572-576. Doing so, allows readers to fully capture the effects of period selection on model selection and the related results.
To make clear, why I think this point is so important I would like to bring up a hypothetical scenario: Imagine, the study were done in 2017 and you would only have the observational data available until that year. Back then, the selected AWC-scenarios would be different (as indicated by Fig. S 3.3.1). It would be interesting to see how these other models predict the mortality after 2017, i.e. outside the period that was used as a baseline for selecting these models as best fit. This hypothetical scenario stands example for the current situation, where we do not know what will happen in the future. So examining this hypothetical example would help to assess the uncertainty related to the AWC-scenario selection approach and related interpretations.
Apart from this, I only have a few minor points as outlined below.
Line 16: ‘four’ hypotheses is probably a legacy from the initial submission. I guess it should be three
Line 37: it seems the word ‘of’ is missing in this sentence: understanding, forecasting and managing ‘of’ forest resistance
Lines 111 and 122: it is justified to only select the sites from Knapp et al. (2024) but I nevertheless suggest to reflect in the text why northeastern Germany is basically empty, since readers may wonder (as I did) why that is the case.
It seems that section 2.2 is missing, please adjust (I guess section 2.3 should be 2.2)
Lines 232 and 234: BGB has not been introduced before. Or do the authors refer to BGR which was mentioned in line 225? Please clarify.
Lines 256-260: I wonder whether the authors want to find a more prominent spot for mentioning this important point, e.g. at the beginning of section 2.3 (which should be 2.2. I guess) to stress that the results represent model-intrinsics and not empirical fits which I personally evaluate as a major strength of the presented approach.
Line 282: I understand from your reply that your data were quasi-normal distributed wherefore you assumed a Gaussian fit. Just for the sake of clarity, I recommend to add this information here since readers may have similar doubts as I did.
Lines 364-370: I highly appreciate that the authors have done this additional analysis to reflect the model-performance skills based on a subset of the data. Yet, I do not agree with the conclusion that most of the observed variance before 2018 originates from stochasticity which also contrasts some of the statements in the results section (e.g. lines 350 and 373). Inspection of Fig. S 3.2.1.2 clearly shows, that the years 2003, 2012, and 2015 were characterized by a higher drought index which is mirrored in increased mortality rates in both species. Thus, the peaks of mortality rates prior to 2018 do not originate from stochasticity but an increasing DI which also becomes evident if inspecting Fig. S 3.3.2. As I see it, this mirrors the threshold-like behavior of the mortality model which – as the authors conclude themselves – results in an overestimation of mortality for beech in general and for spruce in the years prior to 2014. Moreover, I would not call the increase of r² an artifact but rather a mathematical feature of the calculation of correlations. The mortality rates following 2017 increase the error squares by an order of magnitude and if this happens in both simulated and observed mortality, r² (and MAE) will increase substantially. Since your best AWC-scenario selection Is based on r² (and MAE), this is a crucial aspect to consider. Based on Fig. S 3.3.1, it seems that if the scenario selection had been based on the years prior to 2018, different scenarios were selected. The question then arises, how these scenarios performed in the years after 2017. I am not asking the authors to change the scenario selection for the main text at this point, but I strongly recommend to reflect this behavior more clearly in the results and discussion (see also my main comment above). Ideally, the authors would show two supplementary display items, resembling Figs. 3 and 4 but for the shorter calibration period 2000-2017. My concern is that if at any point the aim is to use such kind of mortality models for projections into the future, I would rather rely on AWC-scenarios whose r² also performs acceptable under less extreme conditions than 2018. Otherwise, the selected AWC-scenarios might result in too extreme mortality scenarios. Since a model validation based on a subset of data is a common standard, I see the reflection of uncertainty of model performance metrics (r² and MAE) as a mandatory point to reflect in terms of clarity.
Line 378: to ease the reading I suggest to add ‘combined’ before ‘long- and short’ or revise into: the interplay of effects from drought-effects acting on longer and shorter time-scales.
Lines 438-444: see my comment on lines 364-370. For spruce, it is also interesting to note, that the increasing observed mortality after 2012 is not captured by the model. Based on the higher complexity of the AWC-scenario performance of Norway spruce I am even more curious to know what would happen if selecting the models based on performance metrics derived from the period 2000-2017. Again, I am not asking the authors to change the key display items of the results section but I would highly appreciate a more comprehensive model falsification (see my suggestions above).
Lines 450-458: I highly appreciate the inclusion of this sensitivity analysis. The new results totally make sense wherefore they strengthen your model implementation.
Line 468: While I generally agree with this statement, I want to highlight that roughly one third of the mortality observed in 2019 stemming from PIC arises from inciting factors. Also, I am puzzled about the fractions presented in Fig. S 3.3.2. I assume, that total mortality fraction adds up to 100% in each year. So, is the remaining part (which is not shown but adds up to 100%) stochastic mortality? At least I don’t fully get how this SI-figure goes together with the main display items which show different temporal dynamics of the mortality rates. Could the authors please elaborate on this? Maybe this also relates to your interpretation of stochasticity dominating the mortality before 2018 so getting this right is quite essential.
Lines 479-486: I appreciate this critical reflection of the newly added subsample-analysis. Yet, I again want to stress that the patterns before 2018 do not only arise from stochasticity but appear to be triggered by drought events (e.g. 2012, 2015). If implementing my suggestion of additional supplementary display items resembling Figs. 3 and 4 but for the period 2000-2017, the authors could elaborate the discussion in this section to reflect how the period selection affects the AWC-scenario selection. This would also guide future research that mimics the presented approach and address aspects of uncertainty related to a mortality projections based on the AWC-scenarios. I understand that this is not the main point of the manuscript, but it will help to correctly guide related future work.
Lines 572-576: I highly appreciate this paragraph to foster cautious reproduction of the AWC-scenario approach. Maybe the points I suggested for lines 479-486 could instead be added here.
Line 586: please check this sentence: ‘but it does induce longer die-off once water availability improves’. This does not make sense in combination with the following sentence. Or is it simply the word ‘die-off’ which is ambiguous in context of mortality discussions? That is, the stress dies off and not the trees? Please clarify and revise. |
In this study, the authors present a refined version of the DVM ForClim (version 4.2) with the aim to more accurately simulate drought-induced mortality of Norway spruce and European beech. In particular, they implement predisposing, inciting, and – in the case of spruce - contributing factors, which they termed a PI(C)-scheme. Importantly, the authors do not per se calibrate their model against observations in order to test whether their model implementation mechanistically captures mortality.
The presented results indicate, that for both species the stark increase in mortality observed during and after the extreme 2018 drought is reproduced. Yet, absolute mortality rates were largely overestimated for beech whereas the ongoing high mortality of spruce was not captured by the simulations. Based on 105 AWC-simulations, the authors moreover conclude a high importance of soil properties and soil heterogeneity (particularly for beech). Finally, for spruce only the simulations including a bark-beetle component were able to reproduce recent mortality rates. Eventually, the authors advocate for incorporating such mechanistic schemes into DVMs rather than striving for statistical/empirical models while also stressing the importance of an actual calibration of the model if simulating mortality under future conditions. As such, the study touches an important topic in context of dynamic vegetation models, namely the incorporation of drought-induced mortality which to date remains a major challenge. Consequently, the study can be considered very suitable to the general audience and scope of GMD.
Yet, before being publishable some major aspects have to be considered.
Firstly, while I particularly appreciate the approach not to calibrate their model against observations, I wonder to what degree the deployed 105 different AWC-scenarios in combination with the observed mortality rates (which feature a stark increase after 2017) do not result in similar problems arising from classic empirical models. In particular, I wonder to what degree the high error squares introduced by the high mortality rates after 2017 act as ‘influential outliers’ which have the potential to largely boost the main evaluation metric of r². Given this, it seems likely that those AWC-models are selected which match best the observed mortality increase after 2017. But do they also represent the models with the best mechanistic mortality implementation? In other words: how would your models perform if only simulating the years 2000-2017? In particular for spruce this seems to play a role since – based on supplementary figure 3.2.2.2 – the peak of mortality in some simulations occurred in 2021 (instead of 2018) or sometimes even already in 2017. Since all of the 105 simulations are based on similar model parameters, I wonder how such different mortality peaks can be achieved (stochasticity?) and to what degree the mortality-implementation really can be considered robust. I believe these points needs to be clearly highlighted when interpreting the results, since I am guessing that the r² will largely drop if not applying the model to the full period. At least, the authors should – in addition to results representing the full period – show how consistent their model evaluation is if excluding the years after 2017 to avoid the influence of these extreme years. This would then provide a better picture on how the mortality implementation performs under less dry conditions (for instance 2003 and 2015 were also pretty dry in Germany). And it would tackle my concern, that the model-selection procedure is biased by influential outliers, i.e. the extreme impact of the 2018 drought.
Secondly, while the authors conclude that incorporation of a bark-beetle component as well as soil properties (mainly AWC) appear as major drivers of tree mortality, the model-mechanisms causing the stark increase of mortality after 2017 are barely discussed. To provide the full picture, the authors should more deeply explore their model output in order to understand which environmental driver variables are responsible for the strong increase in mortality. Is this simply related to the extraordinary drought of 2018 or are there also predisposing factors (e.g. the dry year 2015) that contribute to this increase? This might also help to explain, why the observed ongoing high spruce mortality after 2018 is not captured by the DVM and it would also provide a better understanding of what may be simulated if applied to climate projections.
Thirdly, I understand why and generally agree with the authors not wanting to calibrate their model against observations. Yet, some of the model parameterizations appeared somewhat arbitrary to me and I wonder whether a sensitivity analysis of specific parameters wouldn’t be meaningful to gain a better understanding of model behavior, which I think should be more emphasized in a model development framework. For instance, the classification of base annual probability for the bark-beetle outbreak classes as well as the factor of 2/3 applied for the inciting factors seem arbitrary but likely have an impact on the model outcome. For future implementations of the PIC-scheme it would be very helpful to know how sensitive the model reacts to these metrics. In other words: would it be possible to achieve models with similar or even higher performance if choosing different outbreak classes or a different factor? And to which of the two factors is the simulated mortality more sensitive? This information would provide readers with more guidance on how to implement comparable mortality mechanisms in other DVMs.
Finally, while the comparison of model output and observations is based on mean mortality across all sites, the spatial scale is largely ignored. I understand why this is the case (only few mortality observations and stochasticity of the model likely result in spatially varying patterns) but it nevertheless deserves a mention in the discussion and maybe 1-2 display items in the supplementary to visualize the spatial patterns of simulated mortality. For instance, I wonder whether simulated mortality shows a spatial pattern or a rather random structure. If the former (spatial patterns) this might also point at the environmental drivers being mostly responsible for the mortality increase (see my second point above).
Only if these additional aspects have been taken into consideration the manuscript will transparently show how the suggested PIC-scheme may enhance the accuracy of mortality simulations in DVMs which I believe should be the major aim of the study. And even if some of the currently very convincing model performance evaluations (r² of 0.72 for beech!) would drop under a corresponding reanalysis (e.g. if adding validation metrics representative of the period 2000-2017 only) this is valuable and important information to the readers, since it would reflect that matching extreme patterns not necessarily means that mortality is generally well implemented (i.e. under less dry conditions). This in turn would also indicate the necessity to only very carefully interpret model output if applied to future climate projections. And finally, understanding what exactly drives the enhanced mortality after 2017 within the model may shed more light on the actual mechanisms driving tree mortality although inference from mechanistic models should be undertaken carefully.
Please find more detailed comments referring to specific sections of the manuscript below.
Abstract:
Line 16: isn't hypothesis 3 a logical consequence of 2, i.e. if soil properties have a strong influence, local soil heterogeneity will automatically have a modulating impact.
Line 21: please quantify ‘hundreds’. How many plots in total?
Introduction:
In contrast to the abstract, you here combined the second and third hypothesis from the abstract. I personally prefer this combination, since H2 and H3 in the abstract are closely related. I suggest to adapt the 3 hypotheses also in the abstract (see also my point above).
Methods:
Line 100:
What is the reason for the large gap in northeastern Germany. Aren't there any sites with beech or spruce? I would at least expect a couple of beech sites here and there. If not, please briefly mention the reasons for this geographic gap.
Line 147: It seems that Marano et al., 2025 is currently under review. This obviously hampers the inspection of details as suggested. Would it make sense to show these details in the supplementary?
Line 150: If I understand correctly, the heterogeneity was artificially generated. I wonder, how this reflects actual soil heterogeneity. And it isn't fully clear to me, whether you actually used existing soilmaps to characterize the soil properties which eventually determine AWC (I later learned this information comes below). Since soil properties are quite crucial for drought-related mortality (as you claim yourself), the soil parameterization is a quite crucial step, which deserves a more detailed description.
Line 182: please reword: ‘This heuristic approach we used combined’
Line 182: did you run a sensitivity analysis to see how these somewhat arbitrary boundaries affect your model outcome? Might be worth a try to see how influential this classification is and whether a different classification might provide better/different results.
Line 189: how is the stress status of trees defined/quantified? Please elaborate.
Line 196: this factor (2/3) is again somewhat arbitrary and would require a sensitivity analysis to quantify its impact on the model outcome.
Equation (5): according to equation 3, gGen can only reach values between 0 and 1 or exactly 2. I wonder whether this abrupt jump from (less than) one generation to 2 generations isn't arbitrary or maybe if there is a typo in equation 3 based on the query of 1.5 here. In any case, the threshold of 1.5 generations is again somewhat arbitrary. Please verify and potentially elaborate.
Line 219: that's the information I was expecting above. Maybe briefly mention above and refer to this section.
Line 236: I suggest to show a supplementary display item which depicts the original data and - in comparison - the min and mean AWC values achieved by your approach to reflect how much of the original spatial variance in AWC is retained in your data. At current it is not clear to me how well your AWC-scenarios actually mirror reported AWC.
Line 239: what is the reason for choosing this specific period, i.e. 2000-2022?
Line 243: A subtraction can lead to negative mortality rates. Did you encounter this? If so, how did you treat this?
Line 269: From equation 9 it seems you only used Msim, so why are you concerned about overfitting? Or did I miss something? Well, if you're concerned about overfitting, variance inflation should be considered e.g. by computing VIF for the predictor variables and excluding highly co-linear predictor variables (but again, if only using one predictor variable this does not make sense). So, I wonder which predictor variables you've been using at all. Please clarify and – if necessary - elaborate.
Line 275: I assume your R² adj values do not follow a Gaussian distribution since ranging from 0 to 1. Did you account for this in your GAM? Which datatype/family did you specify in your GAM? binomial? Please elaborate.
Results:
Fig. 3, panel A: the dark end of the color scale for R² does not always allow for depicting the size of MAE. Please adjust. Same for Fig. 4
Lines 325-336: I wonder to which degree the overall variance of the data affects your r². It would be interesting to compute r² for the period before 2018 only to see how well the 'average' mortality under less dry conditions is captured by the models. It seems, that your model parameterization is able to capture the stark increase in mortality after 2017 but I wonder to what degree the model mechanistically captures mortality or whether it simply reacts to one extreme year. This aspect deserves more careful thinking and interpretation, particularly if using the model later on to predict mortality rates based on projected climate data. This does not require to rerun the simulations but only to evaluate their performance for a sub-period which is a common procedure when evaluating model performance.
Fig. 4: to avoid misinterpretation I suggest to use the same range of r² values in the legend as for beech to visually highlight that r² is much lower for spruce.
Line 386: Again, I wonder to what degree the extreme years after 2017 affect your model-selection process. Moreover, while I agree that the bark-beetle model is important to incorporate, it yet seems to require some improvements, given the inability to capture prolonged impacts of the 2018 drought. Also, from Fig. 3.2.2.2 in the supplementary it seems that some model runs obtained quite different mortality peaks (some in 2017, some in 2021). Since – if I understood correctly – the only difference in these runs was the AWC implementation, I wonder which circumstances have driven such temporally inconsistent mortality peaks. As suggested above, I suggest to gain a deeper understanding of the actual climatic forces driving the mortality peaks, since this also would allow for a better mechanistic interpretation of the parameterization.
Discussion:
Line 446: but it is not yet fully clear what these key drivers are. in other words: which environmental circumstances have led to the stark increase in mortality after 2018? Please elaborate.
Line 464: I generally agree that soil properties are important in mediating drought but some care needs to be taken when interpreting model performance since what you describe here most likely relates to your model-specific parameterization of beech. In reality this small-scale variability might not be as important for a relatively anisohydric species with relatively deep rooting systems.
Line 484: Again I do agree, that soil conditions are important but we have to keep in mind that your interpretation relies on model output and thus mirrors how the model was parameterized. This not need to directly mirror reality. Thus, I would be more careful when deriving implications for real systems from model output.
Line 488: when doing a species-specific calibration, a robust cross-calibration verification should be undertaken to avoid artifacts introduced by influential outliers (as the years after 2017). Please elaborate.
Line 539: You stressed to prioritize process understanding. Yet, the processes leading to the increased mortality after 2018 are barely discussed. Is this mostly related to one extremely dry year (2018), ongoing soil-drought, or predisposing factors? Please evaluate your model output accordingly to provide a deeper understanding of the underlying mechanisms.