the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Coupling the Community Land Model version 5.0 to the parallel data assimilation framework PDAF: description and applications
Heye R. Bogena
Harry Vereecken
Harrie-Jan Hendricks Franssen
Download
- Final revised paper (published on 18 Jan 2022)
- Preprint (discussion started on 08 Sep 2021)
Interactive discussion
Status: closed
-
RC1: 'Comment on gmd-2021-38', Anonymous Referee #1, 21 Sep 2021
General Comments:
This manuscript coupled the CLM version 5.0 to a data assimilation framework (PDAF) to improve the simulation of soil moisture content at a forested site in Germany. The authors ran a series of assimilation experiments in which they adjust simulated soil water content (SWC) with observations of soil water at depths of 5, 20 and 50 cm. They ran additional simulations where they also adjusted soil sand/clay fraction, and finally organic matter content. They found that all assimilation runs performed better in terms of simulated soil moisture as compared to the (open loop) simulation. The assimilations in which both soil moisture and soil parameters were adjusted performed the best. The influence of water upon land surface models is well established and has an important influence upon surface energy balance, temperature, and vegetation (carbon) behavior. Thus, the use of soil moisture observations to improve land surface simulations in terms of updating both model states and parameters is a worthwhile pursuit.
With that being said, this reviewer found there were significant gaps in the cited literature and motivation for the work (See details below). The authors also did not explore or provide an explanation for how the vegetation behavior influenced the water behavior and contributed to soil moisture behavior and what implications this has for the carbon cycle. The reviewer also felt more justification was needed for why soil characteristics (clay/sand/organic matter) were adjusted in place of true hydraulic parameters within the model.
Scientific/Detailed Comments:
Line 16: Not clear to me how this coupling is ‘novel’. The Data Assimilation Research Testbed (DART) uses similar ensemble capability, and couples an EnKF to CLM5. (Raczka et al., 2021 https://doi.org/10.1029/2020MS002421).
Line 28: Seems like a dated citation (Overgaard et al., 2006)– perhaps reference CMIP5 or CMIP6 manuscripts that compare a range of LSMs performance (e.g. Arora et al., 2020; https://bg.copernicus.org/articles/17/4173/2020/)
Lines 34-36: Need to improve references to water limited regions and the work that has been done to improve water limitation and its connection to the carbon cycle. (e.g. Raczka et al., 2021; Weider et al., 2017; Kennedy et al., 2019) https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2016JG003704).
Line 35-40: In general, citations provided here seem rather general, and not focused on particular research topic, which in this case is SWC, hydrology and impact upon latent and sensible heat. You might want to focus more directly on the representation of hydrology within CLM (Swenson et al., 2019; https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019MS001833 ), and why it suits DA for your application. Including specific advances in hydrology with CLM5.0 (Kennedy et al., 2019; https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2018MS001500) might also better motivate this work.
Line 45: Need more background into remote sensing products of soil moisture. There are a growing set of remotely-sensed soil moisture observations that should be referenced here – SMOS, SMAP, ESA-CCI. There are many emerging products. You should better motivate the use of DA precisely because the range of remotely-sensed products is expanding. Also the purpose of DA (especially EnKF) is that unobserved states (subsurface layers) can be adjusted based upon the model state covariance matrix of the modeling system.
Line 47: Awkward sentence, It is common practice….
Line 53: It is unclear what distinction is being made between ‘offline’ vs ‘online’ coupling in data assimilation frameworks. The authors state that ‘offline-coupled’ data assimilation is used for the Data Assimilation Research Testbed (DART) https://dart.ucar.edu/ (Anderson et al., 2009). Furthermore, in offline coupling ‘the framework wraps around the model and does not modify the model’. This is not true. Within DART – the state of the model is modified during the update step of the EnKF. Therefore, the assimilation updates the model state in time so that the trajectory of the model more closely matches the observations being assimilated. Furthermore, DART and CLM are interactive in that DART updates the model state within CLM, and the inflation parameters within DART (Gharamti et al., 2019; doi.org/10.1175/MWR-D-20-0101.1) are also updated in time and influence the ensemble spread of the CLM model state. More recent updates to the CLM code have included model components that call data assimilation components from DART directly. In applications outside of CLM, DART has been used to modify parameters (e.g. Zhang et al., 2021; doi.org/10.5194/tc-15-1277-2021.) within models as well. The authors need to reconsider their assertion that their DA coupling approach is ‘novel’.Line 57: DART is commonly used with all components of the earth system within CESM including land (CLM), atmosphere (CAM), ocean (POP), and sea/land ice, as well as many other earth system models. See https://dart.ucar.edu/publications/
Line 84: “… [these manuscripts] concluded that the consideration of heterogeneous porosities can increase model performance depending on the model structure. In contrast to these detailed distributed catchment studies, we model the study site from the viewpoint of a larger regional model where the catchment is represented by a single grid cell.”
The previous modeling studies suggest that including a description of heterogeneous soil porosities will help model performance in a fine-scale catchment. Presumably a fine spatial scale description is needed to represent a catchment. Therefore it caught this reviewer off-guard that the authors propose to use a coarse, grid cell to represent catchment behavior (see Swenson et al., 2019). Perhaps provide motivation that a DA framework can improve modelling behavior through correcting for known biases in the system – or known errors in parameters.
Method section 2.1
“Furthermore, we investigate whether updating of the soil organic matter parameter via data assimilation can further improve the prediction of soil water with CLM5.”
Given your manuscript goals you need to provide some explanation within the CLM methods section of how soil organic matter influences soil water drainage. It is also slightly unclear what benefits either updating to the CLM 5 description or using the PDAF will bring to this analysis, be a bit more specific. Some of this information is included in the appendix (A1-A4), but a bit more explanation within the main text would be helpful.
May also want to mention in the methods of CLM 5 – that it updates the plant hydraulic stress representation (Kennedy et al., 2019) thereby influencing water-carbon coupling, and transpiration. The authors do not really discuss the influence of vegetation (water-carbon coupling) upon their SWC results.
Section 2.2.1
“For example, ensemble members can be generated based on perturbed soil parameters and atmospheric forcings.”
Not clear at this point how ensemble is generated for this experiment.
“The state vector ð± i contains soil water content (model states), sand and clay fractions (parameters), and organic matter fractions (parameters) depending on the experiment as described in Section 3.3.”
It makes sense here to describe how the CLM soil column is constructed (i.e. PFTs, columns, layers etc) within Section 2.1. Are you updating all soil layers of the CLM model for SWC?
Section 2.3:
“Furthermore, for the optional parameter updating it is necessary to provide a function to transform the input parameters, e.g. soil texture, to the model parameters, e.g. the soil hydraulic parameters. CLM5 performs this transformation once during initialization to obtain the hydraulic parameters from the soil texture in the surface file.”
It was a bit confusing to this reviewer that the authors were referring to the soil characteristics such as clay/sand/organic matter as ‘parameters’. In general, parameters refer to numeric coefficients that influence model equations. This manuscript adjusts the soil characteristics to indirectly adjust the hydraulic parameters (A1-A4). In general, it seems parameter optimization should be limited to parameters which are difficult/impossible to measure. The soil characteristics, on the other hand, could be measured given how well the study site (watershed) seems to be observed already.
3.1 Study Site:
Very unclear how the CLM site-level or gridded simulation was setup. What was the size of the grid cell used in which the soil characteristics / topography were defined? How was this forested site initialized? Was it spun-up from near ground conditions or was a present-day compset used within CLM?
Section : 3.2.1
“The filtered raw data is then spatially and temporally averaged to fit the requirements of the model, i.e., daily averages for the three soil depths.”
I don’t think that’s a limitation or requirement of the model – CLM5.0 can be run on an hourly time basis thus assimilation could be performed hourly. Also there are roughly 25 subsurface potential soil layers in CLM, so it could potentially handle more soil depth observations depending upon the depth of the soil column at this location. I think you performed daily averages of all the soil observation locations to simplify the assimilation process, which is reasonable.
So you averaged all the forested (undisturbed) soil water observation locations into a single value for each depth?
Line 261: Lateral flows are not represented at all in CLM5 – no grid cell to grid cell communication. Surface and subsurface drainage is routed directly to rivers.
Line 287: Be more specific here: Perturbed inputs of *both* atmospheric forcing and soil characteristics of soil/clay and organic matter? What was the purpose of perturbing both? Could you use only atmospheric perturbations if the goal was to only assimilate SWC observations? The additional perturbation of the soil/clay, organic matter was necessary for the parameter updates? Provide a bit more explanation.Line 287: Do you state anywhere what soil water variable in CLM you are adjusting? I assume it is the prognostic variable H2OSOI_LIQ, but there is also H2OSOI_ICE and the diagnostic variable H2OSOI. Also you are adjusting all vertical layers?
Figure 1: Any physical explanation of why the model would overestimate SWC at shallow depth (5 cm) and at the deepest layer (50 cm), but overestimate SWC at the middle depth (20 cm)? Curious of whether this could be related to the observational uncertainty of the SWC sensor – and what was used as the observation uncertainty? Also wondering if this behavior was related to the configuration of the root profile within CLM – how much of the root mass was within this layer and therefore what influence this had upon transpiration and removal of water within this soil layer?
This opens up other questions of what the forest state was for your model simulations including things like biomass and leaf area index from the site observations. Were these reasonable? Did you look at the simulated transpiration, evapotranspiration and GPP to determine if these values seemed reasonable? I don’t think you had flux tower observations available to check, but perhaps you could infer reasonable values from surrounding sites. The vegetation state will have an important impact of subsurface soil moisture state and to what effect this impacted your simulation is unclear. The vegetation state, including how it was initialized and how it was simulated (other than the PFT setting) was not discussed in this manuscript.
Table 3: It was not completely clear until I viewed this table that the model ‘parameters’ that were being adjusted within the assimilation were actually the soil characteristics of clay/sand and organic matter. The term ‘parameter’ is admittedly loosely defined in modeling applications, but in general, this typically refers to ‘coefficient’ values within the model code that (within a model like CLM) are specific for particular plant functional types. The surface characteristics of the soil, however, are typically prescribed and held constant. The reviewer recognizes that this manuscript is, in part, is a demonstration of the capabilities of the assimilation system, and is apparently following the approach taken in (Naz et al., 2019) but physically, does it make sense to adjust the soil characteristics (generally fixed in time) such that they change with time? Would it not make more sense to adjust the numeric coefficients in equations A1-A4 instead of %sand and %clay? The authors acknowledge this at the very end of the conclusion section, but perhaps more justification could be provided earlier on in the manuscript.If there were many soil moisture subsurface observations, were any soil characteristic observations available to check the posterior values of the soil characteristics?
Citation: https://doi.org/10.5194/gmd-2021-38-RC1 -
AC1: 'Reply on RC1', Lukas Strebel, 04 Oct 2021
Scientific/Detailed Comments: Referee comments in bold and author answer non-bold.
Line 16: Not clear to me how this coupling is ‘novel’. The Data Assimilation Research Testbed (DART) uses similar ensemble capability, and couples an EnKF to CLM5. (Raczka et al., 2021 https://doi.org/10.1029/2020MS002421).
Our intention of using the term ‘novel’ in this sentence was to highlight specifically the new coupling of CLM5 with PDAF in contrast to existing coupling of CLM5 to other data assimilation frameworks or previous versions of CLM to PDAF. We will rephrase the section to make this clearer.
Line 28: Seems like a dated citation (Overgaard et al., 2006)– perhaps reference CMIP5 or CMIP6 manuscripts that compare a range of LSMs performance (e.g. Arora et al., 2020; https://bg.copernicus.org/articles/17/4173/2020/)
We will update the dated citation with a more recent citation that highlights the complexity and range of current LSMs.
Lines 34-36: Need to improve references to water limited regions and the work that has been done to improve water limitation and its connection to the carbon cycle. (e.g. Raczka et al., 2021; Weider et al., 2017; Kennedy et al., 2019) https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2016JG003704).
We agree and will add suggested references as well as other references more focused on soil water content.
Line 35-40: In general, citations provided here seem rather general, and not focused on particular research topic, which in this case is SWC, hydrology and impact upon latent and sensible heat. You might want to focus more directly on the representation of hydrology within CLM (Swenson et al., 2019; https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019MS001833 ), and why it suits DA for your application. Including specific advances in hydrology with CLM5.0 (Kennedy et al., 2019; https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2018MS001500) might also better motivate this work.
It is true that the references here are rather general. We chose them to highlight the application of CLM to single-point setups for a wide array of studies. We will rewrite this section to make this clearer focusing more on soil moisture and DA as you suggest.
Line 45: Need more background into remote sensing products of soil moisture. There are a growing set of remotely-sensed soil moisture observations that should be referenced here – SMOS, SMAP, ESA-CCI. There are many emerging products. You should better motivate the use of DA precisely because the range of remotely-sensed products is expanding. Also the purpose of DA (especially EnKF) is that unobserved states (subsurface layers) can be adjusted based upon the model state covariance matrix of the modeling system.
The emerging remotely-sensed soil moisture observations products are important and useful in combination with DA. We did not include many references to remotely-sensed products because our application used only in-situ measurements. However, we agree that remotely-sensed products are a good way to motivate DA applications and we will add this to the section.
Line 47: Awkward sentence, It is common practice.…
Yes, we will rephrase the sentence.
Line 53: It is unclear what distinction is being made between ‘offline’ vs ‘online’ coupling in data assimilation frameworks. The authors state that ‘offline-coupled’ data assimilation is used for the Data Assimilation Research Testbed (DART) https://dart.ucar.edu/ (Anderson et al., 2009). Furthermore, in offline coupling ‘the framework wraps around the model and does not modify the model’. This is not true. Within DART – the state of the model is modified during the update step of the EnKF. Therefore, the assimilation updates the model state in time so that the trajectory of the model more closely matches the observations being assimilated. Furthermore, DART and CLM are interactive in that DART updates the model state within CLM, and the inflation parameters within DART (Gharamti
et al., 2019; doi.org/10.1175/MWR-D-20-0101.1) are also updated in time and influence the ensemble spread of the CLM model state. More recent updates to the CLM code have included model components that call data assimilation components from DART directly. In applications outside of CLM, DART has been used to modify parameters (e.g. Zhang et al., 2021; doi.org/10.5194/tc-15-1277-2021.) within models as well. The authors need to reconsider their assertion that their DA coupling approach is ‘novel’.We acknowledge that the distinction between ‘offline’ vs ‘online’ is not clear. We use the definition of these terms in this context from PDAF as formulated for example in Kurtz et al. 2016. The distinction is about using the main memory or restart files for the transfer from model states to the DA framework and vice versa. We will correct the ‘the framework wraps around the model...’ sentence. We did not mean to imply that offline coupling does not affect model states or that DART and CLM are not interactive but rather that in offline coupling it is often not necessary to modify the model source code. We were not aware of the recent updates to CLM that includes calls to DART directly since even recent publications like Raczka et al., 2021 and Zhang et al., 2018 still mention the use of restart files for the exchange between the model and DART. We will correct the comparison to DART in this section to include your corrections. We did not intend to assert that the DA coupling approach is novel, we specifically mention that we mostly re-use and modify existing software infrastructure, we just meant that our specific implementation of coupling CLM5 and PDAF is ‘novel’.
Line 57: DART is commonly used with all components of the earth system within CESM including land (CLM), atmosphere (CAM), ocean (POP), and sea/land ice, as well as many other earth system models. See https://dart.ucar.edu/publications/
We will modify the sentence and mention the use of DART in other CESM components and other earth system models.
Line 84: “... [these manuscripts] concluded that the consideration of heterogeneous porosities can increase model performance depending on the model structure. In contrast to these detailed distributed catchment studies, we model the study site from the viewpoint of a larger regional model where the catchment is represented by a single grid cell.”
The previous modeling studies suggest that including a description of heterogeneous soil porosities will help model performance in a fine-scale catchment. Presumably a fine spatial scale description is needed to represent a catchment. Therefore it caught this reviewer off-guard that the authors propose to use a coarse, grid cell to represent catchment behavior(see Swenson et al., 2019). Perhaps provide motivation that a DA framework can improve modelling behavior through correcting for known biases in the system – or known errors in parameters.
Yes, representing a catchment in detail requires a fine spatial scale simulation, however in many applications it is not computationally feasible to represent catchments on such fine scales. Therefore, we think it is reasonable to demonstrate the application of a new DA coupling on the coarse scale of many applications but simplified to a single grid cell to highlight the direct effects of DA in the grid cell where observations are available. We agree that we can better motivate the DA framework and we will rephrase this section.
Method section 2.1
“Furthermore, we investigate whether updating of the soil organic matter parameter via data assimilation can further improve the prediction of soil water with CLM5.”
Given your manuscript goals you need to provide some explanation within the CLM methods section of how soil organic matter influences soil water drainage. It is also slightly unclear what benefits either updating to the CLM 5 description or using the PDAF will bring to this analysis, be a bit more specific. Some of this information is included in the appendix (A1-A4), but a bit more explanation within the main text would be helpful. May also want to mention in the methods of CLM 5 – that it updates the plant hydraulic stress representation (Kennedy et al., 2019) thereby influencing water-carbon coupling, and transpiration. The authors do not really discuss the influence of vegetation (water- carbon coupling) upon their SWC results.We will add more details about the CLM5 soil organic matter parameter and its relation to SWC into the main text and include the relations to the plant hydraulic stress representation in CLM5.
Section 2.2.1
“For example, ensemble members can be generated based on perturbed soil parameters and atmospheric forcings.” Not clear at this point how ensemble is generated for this experiment. “The state vector contains soil water content (model states), sand and clay fractions (parameters), and organic matter fractions (parameters) depending on the experiment as described in Section 3.3.” It makes sense here to describe how the CLM soil column is constructed (i.e. PFTs, columns, layers etc) within Section 2.1. Are you updating all soil layers of the CLM model for SWC?We will add a reference to section 3.2.2 and 3.2.3 for the specifics of ensemble generation for this specific experiment. We can add details of the CLM soil column structure as you suggest. It is correct that we are updating the SWC of all soil layers of the CLM model and will clarify this in the revised version.
Section 2.3:
“Furthermore, for the optional parameter updating it is necessary to provide a function to transform the input parameters, e.g. soil texture, to the model parameters, e.g. the soil hydraulic parameters. CLM5 performs this transformation once during initialization to obtain the hydraulic parameters from the soil texture in the surface file.”
It was a bit confusing to this reviewer that the authors were referring to the soil characteristics such as clay/sand/organic matter as ‘parameters’. In general, parameters refer to numeric coefficients that influence model equations. This manuscript adjusts the soil characteristics to indirectly adjust the hydraulic parameters (A1-A4). In general, it seems parameter optimization should be limited to parameters which are difficult/impossible to measure. The soil characteristics, on the other hand, could be measured given how well the study site (watershed) seems to be observed already.We refer to the soil characteristics as parameters because we treat them as parameters in the joint state and parameter approach. As we already describe in the outlook, we agree that the parameter optimization should be applied to the hydraulic parameters directly. Using the soil characteristics as indirect parameters to be updated has been done in various previous studies and was therefore the baseline implementation. Nevertheless, we will mention in the revised version that the soil hydraulic parameters will be updated directly in future works.
3.1 Study Site:Very unclear how the CLM site-level or gridded simulation was setup. What was the size of the grid cell used in which the soil characteristics / topography were defined? How was this forested site initialized? Was it spun-up from near ground conditions or was a present-day compset used within CLM?
The grid cell size used was roughly 3km by 3km. The model was initialized from a cold start and spun up according to the CLM5 documentation. We will add more details about this in the revised version.
Section : 3.2.1
“The filtered raw data is then spatially and temporally averaged to fit the requirements of the model, i.e., daily averages for the three soil depths.” I don’t think that’s a limitation or requirement of the model – CLM5.0 can be run on an hourly time basis thus assimilation could be performed hourly. Also there are roughly 25 subsurface potential soil layers in CLM, so it could potentially handle more soil depth observations depending upon the depth of the soil column at this location. I think you performed daily averages of all the soil observation locations to simplify the assimilation process, which is reasonable. So you averaged all the forested (undisturbed) soil water observation locations into a single value for each depth?You are right, it is not a requirement or limitation of the model. We wanted to say that these are the requirements we defined for this model setup. We will correct this sentence accordingly. Yes, we averaged the large number of forested soil water observation locations into a single value for each depth (at 5, 20 and 50 cm, the three depths for which observations were available) and will specify this in the revision.
Line 261: Lateral flows are not represented at all in CLM5 – no grid cell to grid cell communication. Surface and subsurface drainage is routed directly to rivers.
Yes, we just meant that lateral flow in the form of runoff to rivers is represented in CLM5. We will clarify this.
Line 287: Be more specific here: Perturbed inputs of *both* atmospheric forcing and soil characteristics of soil/clay and organic matter? What was the purpose of perturbing both? Could you use only atmospheric perturbations if the goal was to only assimilate SWC observations? The additional perturbation of the soil/clay, organic matter was necessary for the parameter updates? Provide a bit more explanation.
Yes, we used perturbed inputs of both atmospheric forcings and soil characteristics. We perturb both because the perturbation determines the ensemble spread and the ensemble spread represents the model uncertainty. Model uncertainty exists for both atmospheric forcings and the model parameters i.e. the soil hydraulic parameters indirectly determined through the soil characteristics. Even in studies without parameter updates and only SWC observations soil characteristics are often perturbed. We will add these explanations in the revision.
Line 287: Do you state anywhere what soil water variable in CLM you are adjusting? I assume it is the prognostic variable H2OSOI_LIQ, but there is also H2OSOI_ICE and the diagnostic variable H2OSOI. Also you are adjusting all vertical layers?
We do not specifically mention the CLM5 variable names. We use the diagnostic variable H2OSOI as a state variable and then adjust both prognostic H2OSOI_LIQ and H2OSOI_ICE variables for all vertical layers. We will make this clearer in the revision.
Figure 1: Any physical explanation of why the model would overestimate SWC at shallow depth (5 cm) and at the deepest layer (50 cm), but overestimate SWC at the middle depth (20 cm)? Curious of whether this could be related to the observational uncertainty of the SWC sensor – and what was used as the observation uncertainty? Also wondering if this behavior was related to the configuration of the root profile within CLM – how much of the root mass was within this layer and therefore what influence this had upon transpiration and removal of water within this soil layer?
We currently do not have a physical explanation of this model behavior. For simplicity, the observational uncertainty is assumed be constant and set to a RMS of 2%. We did not explore the effect of the root profile on this behavior. We will discuss this in the revision.
This opens up other questions of what the forest state was for your model simulations including things like biomass and leaf area index from the site observations. Were these reasonable? Did you look at the simulated transpiration, evapotranspiration and GPP to determine if these values seemed reasonable? I don’t think you had flux tower observations available to check, but perhaps you could infer reasonable values from surrounding sites. The vegetation state will have an important impact of subsurface soil moisture state and to what effect this impacted your simulation is unclear. The vegetation state, including how it was initialized and how it was simulated (other than the PFT setting) was not discussed in this manuscript.
We looked at LAI and evapotranspiration and they are reasonable. For ET even close to flux tower observations. We did not include a figure for ET because the effects of SWC DA for this specific site were not significant and more analysis is required. We are currently in the process of performing more simulations for different sites and analyzing the effects of SWC DA on ET and other variables in a further study. For this study, we mainly wanted to demonstrate the coupling of PDAF to CLM5 and the direct and clear impact of SWC DA for a simple setup. We will discuss this in the revision.
Table 3: It was not completely clear until I viewed this table that the model ‘parameters’ that were being adjusted within the assimilation were actually the soil characteristics of clay/sand and organic matter. The term ‘parameter’ is admittedly loosely defined in modeling applications, but in general, this typically refers to ‘coefficient’ values within the model code that (within a model like CLM) are specific for particular plant functional types. The surface characteristics of the soil, however, are typically prescribed and held constant. The reviewer recognizes that this manuscript is, in part, is a demonstration of the capabilities of the assimilation system, and is apparently following the approach taken in (Naz et al., 2019) but physically, does it make sense to adjust the soil characteristics (generally fixed in time) such that they change with time? Would it not make more sense to adjust the numeric coefficients in equations A1-A4 instead of %sand and %clay? The authors acknowledge this at the very end of the conclusion section, but perhaps more justification could be provided earlier on in the manuscript.
If there were many soil moisture subsurface observations, were any soil characteristic observations available to check the posterior values of the soil characteristics?
As mentioned in the previous question, we will be more careful with the terminology “soil characteristics” and “soil parameters”. We used the term ‘parameter’ more generally as a distinction to state variables rather than to differentiate between coefficients and prescribed constants. We fully agree that it makes more sense to adjust the coefficients in A1-A4 directly. The indirect approach using the soil characteristics was an established approach, but in future work we will adjust the numerical coefficients directly. As you suggest, we will mention this earlier in the manuscript and not just in the conclusions. There are soil characteristic observations at different locations in the catchment, but it is not a simple task to ‘average’ these discrete spatially distributed observations to compare it to the posterior soil characteristic value that represents the whole catchment / grid cell. We will discuss this in the revision.
Citation: https://doi.org/10.5194/gmd-2021-38-AC1
-
AC1: 'Reply on RC1', Lukas Strebel, 04 Oct 2021
-
RC2: 'Comment on gmd-2021-38', Anonymous Referee #2, 29 Sep 2021
The study presents the development of PDAF coupling with CLM version 5.0 for data assimilation. Further, the authors present a 10-year sensitivity simulation, using a single column model over a forested catchment in Germany. The simulation includes data assimilation (DA) with and without parameter updates. Compared to open loop runs, which exhibit a wet bias, the DA could improve the soil water content (especially at 5 cm and 20 cm) compared to deeper layer (50 cm).
This study provides additionally new DA capability to the larger CLM scientific community and also shows its potential for improving the model states by inclusion of joint parameter updates. However, there are several shortcomings in the present manuscript which needs to be addressed before being suitable for publication in GMD.
Major Comments
The motivation for this study is weak. The authors briefly mention about the difference between online and offline DA (Ln 55), but they need to better motivate the coupling CLM5.0 with PDAF. Is it more for the standalone DA with CLM5.0 or for CLM5.0 within the TSMP framework? What new does PDAF bring? How does it reduce the number of core-hours or computation time compared to other offline DA? And, how it scales with increase in domain size and time period of simulation? This needs to be discussed clearly.
Kurtz et al. (2016) already presented the PDAF coupling to TSMP including CLM3.5. So, what is new in this study? I assume that there must be substantial work involved in developing the PDAF interface around CLM5.0 which has different software environment compared to earlier versions of CLM (e.g. CLM3.5). But it is not so clear in the current version of the manuscript.
Ln 85: This comes so suddenly. The authors need to provide better motivation to use single column model.
The literature review is another weak part of the manuscript. The authors make no effort in presenting their results in context of previous findings. Also, does the improvement in soil moisture also improves the surface energy fluxes. For LSMs, improvements need to be explored soil states as well as fluxes. And, a discussion section is missing.
There is no README file or User manual to reproduce the results presented in this study, also please provide a web URL for Zenodo and cite this paper in the References. The upload should also include scripts for processing the figures and observation data for reproducibility.
Minor Comments
- Ln 10: Even tuned second generation LSMs can be “accurate”, here maybe the authors want to imply that third generation LSMs better represent the key physical processes. Also, check in the rest of the manuscript.
- Ln 11: more? What type of data?
- Ln 15: Is this further development of PDAF or addition of new interface to connect PDAF with new models?
- Ln 34: common might not be the right word here.
- Ln 48-53: This paragraph needs to be rephrased (framework, external framework, within framework). It has just too many frameworks.
- Ln 70: PDAF with joint state parameter update for CLM was also used in the following study:
Shrestha, P., W. Kurtz, G. Vogel, J.-P. Schulz, M. Sulis, H.-J. Hendricks Franssen, S. Kollet and C. Simmer (2018), Connection Between Root Zone Soil Moisture and Surface Energy Flux Partitioning Using Modeling, Observations, and Data Assimilation for a Temperate Grassland Site in Germany. JGR-Biogeosciences doi: 10.1029/2016JG003753
- Ln 73: “In this study, we present the coupling of ..”
- Ln 93: Rephrase. “The paper ends with “ is not appropriate.
- Ln 116: 1) variation methods, …2) sequential methods
- Ln 125: Perturbation vector missing in Eq. 1, where y is generally the observation vector. It is discussed much later in Ln 146. What is the measurement error?
- Section 2.3: There is always a discussion about older version, maybe the authors should discuss it before, and present their new formulation, rather than interchanging now and then. Maybe this would also highlight, what new work has been done.
- Ln 181: The “Figure 1” is not helpful, either improve or remove. Also, rephrase and elaborate the discussion.
- Ln 191: What is “CIME”?
- Ln 204: Maybe “clipping” ?
- Ln 218: Rephrase.
- Ln 232: in Wüstebach , and Belgium ?
- Ln 252: Explain the SWC unit.
- Ln 305: “to overestimate SWC” or “wet bias in SWC”
- Ln 333: What is variant here?
- Figures: Add subplot numbers (e.g., a), b))
- Figure 2: “In the diagram NMLST means namelist, SIM means simulation process, HIST means history file output, PID means PDAF identification number.” – this should as legend in Figure.
- Figure 3 caption: red (solid line), light green (dotted line).
Citation: https://doi.org/10.5194/gmd-2021-38-RC2 -
AC2: 'Reply on RC2', Lukas Strebel, 26 Oct 2021
Scientific/Detailed Comments: Referee comments in bold and author answer non-bold.
Major Comments
The motivation for this study is weak. The authors briefly mention about the difference between online and offline DA (Ln 55), but they need to better motivate the coupling CLM5.0 with PDAF. Is it more for the standalone DA with CLM5.0 or for CLM5.0 within the TSMP framework? What new does PDAF bring? How does it reduce the number of core-hours or computation time compared to other offline DA? And, how it scales with increase in domain size and time period of simulation? This needs to be discussed clearly.It is both for the standalone DA with CLM5, as shown in this study, and also the potential for future use in the complete TSMP framework. We intended to motivate both in the introduction, but we will improve this with comments below and comments from other referees. We did not perform computational performance comparison to other DA frameworks for this specific application. General scaling behavior of PDAF has been shown in Nerger et al. (2013) and Kurtz et al. (2016). Without extensive computational studies we do not want to discuss advantages and disadvantages of different DA frameworks in detail. Instead we want to focus on the specific implementation and application of a new coupling that can be used to perform DA with CLM5.
Kurtz et al. (2016) already presented the PDAF coupling to TSMP including CLM3.5. So, what is new in this study? I assume that there must be substantial work involved in developing the PDAF interface around CLM5.0 which has different software environment compared to earlier versions of CLM (e.g. CLM3.5). But it is not so clear in the current version of the manuscript.
The new developments in this study are the modifications to what Kurtz et al. (2016) presented. These modifications are necessary to interface with CLM5.0 which, as you mentioned, has a different software environment compared to earlier versions. We discuss the implementation and differences in section 2.3 in detail, but we will make the differences clearer and highlight the new developments more in the revised version.
Ln 85: This comes so suddenly. The authors need to provide better motivation to use single column model. The literature review is another weak part of the manuscript. The authors make no effort in presenting their results in context of previous findings. Also, does the improvement in soil moisture also improves the surface energy fluxes. For LSMs, improvements need to be explored soil states as well as fluxes. And, a discussion section is missing.
We provide references to studies using single-point simulations (ln 35-43) to motivate our choice of a single grid cell setup and also discuss other studies for the specific study site (ln 79-85). However, we will make the motivation clearer in the revised version.
We provide context for this study in the literature review for single-point studies (ln 35-43), for data assimilation in LSMs (ln 59-73), for the specific software framework (ln 73-77), and for the specific site (ln 79-85). The first reviewer has already pointed out some additional studies that we will include in our literature review. We will also add more literature references and make the respective contexts clearer in the revised version.
We will include results and discussion for the changes to ET in the revised version. We included the contents of a discussion section in the conclusion section. We will make this clear in the revised version.
There is no README file or User manual to reproduce the results presented in this study, also please provide a web URL for Zenodo and cite this paper in the References. The upload should also include scripts for processing the figures and observation data for reproducibility.
We will create a README, and add scripts for processing and observation data in the revised version.
Minor Comments
Ln 10: Even tuned second generation LSMs can be “accurate”, here maybe the authors want to imply that third generation LSMs better represent the key physical processes. Also, check in the rest of the manuscript.
Yes, we intended to stress the improvements in the representation of physical processes. We will modify the sentence to make this clear.
Ln 11: more? What type of data?
Various types, from new satellite products to new in-situ measurement stations, also new cosmic-ray and flux tower sites. We will add this information in the revised version.
Ln 15: Is this further development of PDAF or addition of new interface to connect PDAF with new models?
It is the addition of a new interface to connect PDAF with a new model, we will correct this sentence in the revised version.
Ln 34: common might not be the right word here.
Common as in ‘often used’ but we will change the sentence to make this clearer.
Ln 48-53: This paragraph needs to be rephrased (framework, external framework, within framework). It has just too many frameworks.
We will rephrase the paragraph with fewer ‘frameworks’.
Ln 70: PDAF with joint state parameter update for CLM was also used in the following study: Shrestha, P., W. Kurtz, G. Vogel, J.-P. Schulz, M. Sulis, H.-J. Hendricks Franssen, S. Kollet and C. Simmer (2018), Connection Between Root Zone Soil Moisture and Surface Energy Flux Partitioning Using Modeling, Observations, and Data Assimilation for a Temperate Grassland Site in Germany. JGR-Biogeosciences doi: 10.1029/2016JG003753
We will include this reference as another example of joint state parameter update with PDAF and clm3.5 in the revised version.
Ln 73: “In this study, we present the coupling of ..”
We will rephrase the sentence.
Ln 93: Rephrase. “The paper ends with “ is not appropriate.
We will correct the sentence in the revised version.
Ln 116: 1) variation methods, ...2) sequential methods
The reference we cite (Reichle 2008) calls them ‘variational methods’.
Ln 125: Perturbation vector missing in Eq. 1, where y is generally the observation vector. It is discussed much later in Ln 146. What is the measurement error?
We will move the inclusion of the perturbation to the observation vector closer to Eq. 1. For simplicity, the measurement error is assumed be constant and set to a RMS of 2%. We will mention this in the revised version.
Section 2.3: There is always a discussion about older version, maybe the authors should discuss it before, and present their new formulation, rather than interchanging now and then. Maybe this would also highlight, what new work has been done.
We compare to the coupling with the older version of CLM three times in this section: 1) In the section about the difference in time stepping between CLM5 and TSMP. Here the comparison is not strictly necessary, but highlights that the approach to modify the driver is the same as before even if the software environment has changed significantly. 2.) To point to the changes in CLM5 hydraulic parameter calculations, which includes the new changes with the addition of soil organic matter. 3.) To mention that the more complex software environment motivates the modification of the existing CLM5 ensemble mode.
Other comparisons in the section are not to older versions but to the framework of TSMP or PDAF specifically. We think it would be less useful to separate any comparisons, since they are mostly used to give context to new implementations. Nevertheless, we will highlight more clearly the new work that has been done in the revised version.Ln 181: The “Figure 1” is not helpful, either improve or remove. Also, rephrase and
elaborate the discussion.We use ‘Figure 1’ as a visual aid to describe the structure of both the actual implementation and the paragraphs in the section. We will improve the figure by adding more context to it.
Ln 191: What is “CIME”?
CIME is the default clm5.0 driver. We will add a definition for CIME to the revised version.
Ln 204: Maybe “clipping” ?
We will correct the sentence in the revised version.
Ln 218: Rephrase.
We will rephrase the sentences.
Ln 232: in Wüstebach , and Belgium ?
We will correct the sentence.
Ln 252: Explain the SWC unit.
We will add a definition for the volumetric soil water content.
Ln 305: “to overestimate SWC” or “wet bias in SWC”
We will rephrase the sentence.
Ln 333: What is variant here?
The online variant to differentiate from the offline variant of PDAF as discussed in the introduction and the next paragraph. We will clarify this in the text.
Figures: Add subplot numbers (e.g., a), b))
We will add subplot numbers.
Figure 2: “In the diagram NMLST means namelist, SIM means simulation process, HIST means history file output, PID means PDAF identification number.” – this should as legend in Figure.
We will add a legend with the shorthand.
Figure 3 caption: red (solid line), light green (dotted line).
We will correct the formatting.
Citation: https://doi.org/10.5194/gmd-2021-38-AC2