the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
LandInG 1.0: A toolbox to derive input datasets for terrestrial ecosystem modelling at variable resolutions from heterogeneous sources
Christoph Müller
Jens Heinke
Sibyll Schaphoff
Abstract. We present the Land Input Generator (LandInG) version 1.0, a new toolbox for generating input datasets for terrestrial ecosystem models (TEM) from diverse and partially conflicting data sources. While LandInG 1.0 is applicable to process data for any TEM, it is developed specifically for the open-source dynamic global vegetation, hydrology and crop growth model LPJmL (Lund-Potsdam-Jena with managed Land).
The toolbox documents the sources and processing of data to model inputs and allows for easy changes to the spatial resolution. It is designed to make inconsistencies between different sources of data transparent, so that users can make their own decisions on how to resolve these, should they not be content with the default assumptions made here.
As an example, we use the toolbox to create input datasets at 5 and 30 arc minutes spatial resolution covering land, country, and region masks, soil, river networks, freshwater reservoirs, irrigation water distribution networks, crop-specific annual land use, fertilizer, and manure application. We focus on the toolbox describing the data processing rather than only publishing the datasets as users may want to make different choices for reconciling inconsistencies, aggregation, spatial extent or similar. Also, new data sources or new versions of existing data become available continuously and the toolbox approach allows for incorporating new data to stay up-to-date.
- Preprint
(711 KB) -
Supplement
(1764 KB) - BibTeX
- EndNote
Sebastian Ostberg et al.
Status: closed
-
CC1: 'Comment on gmd-2022-291', Jinfeng Chang, 24 Jan 2023
Publisher’s note: this comment is a copy of RC1 and its content was therefore removed.
Citation: https://doi.org/10.5194/gmd-2022-291-CC1 -
RC1: 'Comment on gmd-2022-291', Jinfeng Chang, 24 Jan 2023
This is a comprehensive manuscript that described a toolbox for generating commonly used input datasets for terrestrial ecosystem modelling at two spatial resolution 5’ and 30’. The generated datasets include static inputs like land-sea mask, country and region mask, soil texture and pH, river routing, grid locations of lakes, rivers, dams and reservoirs, and dynamic inputs, a harmonized gridded annual land use and land management (irrigation and fertilization) for the historical period 1500-2017. The application of this toolbox for generating input datasets for LPJmL was presented as an example. The manuscript is well structed and very well written. I would think it is a valuable effort to facilitate the input generation. I only have a few suggestions as follows.
- Given the fact that 1) most of the source datasets existed or used in this toolbox has the highest resolution of 5 arc minutes, 2) the spatial resolution of the TEM simulation usually (if not all) depend on the coarse resolution of all input datasets, and 3) in many cases of this toolbox, the aggregation can only be done with an integer multiple of the source resolution, it could be better to give the possible resolutions for each of the input datasets.
- For all the datasets, it is essential to provide not only the reference, but also the link to the source datasets, the access date (as datasets can be updated), the original data format, and the data content (e.g., exact variable name used by the toolbox). Otherwise, it makes the toolbox much more difficult and less useful for users.
- It is understandable that the authors did not provide results on the created gridded maps as it might contain source datasets that require a license to publish. But for those datasets that were publicly available and has been licensed to distribute, it would be better to provide the resulted maps in addition to the code. For the resulted gridded land use and land management dataset, in particular, the strategy described in this manuscript is in some sense novel (or at least comprehensively described for the first time). Though the authors claimed that this manuscript is solely a description of the toolbox, putting the gridded land use and land management dataset into a public repository could be useful for the community.
Citation: https://doi.org/10.5194/gmd-2022-291-RC1 -
RC2: 'Comment on gmd-2022-291', Anonymous Referee #2, 09 Feb 2023
Review of “LandInG 1.0: A toolbox to derive input datasets for terrestrial ecosystem modelling at variable resolutions from heterogeneous sources” by Sebastian Ostberg, Christoph Müller, Jens Heinke, and Sibyll Schaphoff.
In their manuscript, Ostberg et al. describe the recently published version 1.0 of LandInG, focusing on the generation of detailed input data sets for TEMs and describing the algorithms used to derive inputs for LPJmL.
Overall this is a superb manuscript nearly ready for publication. Having used the LPJmL model more than a decade ago, I am thrilled by the improvement in the quality of documenting the input data sets, potentially extendable to other TEMs as well.
The manuscript could well be published as it is, though I have a few suggestions for minor improvements.
One thing I missed when reviewing the manuscript is a table summarizing the input data sets considered, preferably early in section 2. This would allow the reader to gain a quick overview.
I also found some of the highly detailed sections of the manuscript, for example section 2.5.3, somewhat more difficult to follow than other sections of the manuscript. This was largely due to the necessary level of detail, and improvement would be difficult – the authors need to be aware of it, though. One possible improvement might be to enable the reader to still understand the bulk of the document without needing to go into the sections dealing with the details, by one the one hand indicating which sections are safe to skip, and on the other hand by ensuring that no important information is lost to the reader skipping those sections. However, this may be more effort than worthwhile, so I leave it to the authors to decide.
I thank the authors for the care taken in copy-editing, as there were nearly zero spelling or grammar errors to be found in the manuscript, something a reviewer unfortunately cannot take for granted.
Finally, the reader subconsciously expects to see some maps. Is there really nothing worthwhile showing in map form? Maybe the authors find one or two examples from section 3 that illustrate current capabilities or improvements in comparison to previous approaches? Being unfamiliar with the exact output, I cannot make specific suggestions, however I did wonder what difference the choice in land-sea mask makes in comparison to CRU, and in a number of places I would have liked to see changes in comparison to Schaphoff18, but I know most of those maps will be rather boring due to the few changed points and small magnitude of changes, so I again leave it to the authors to decide.
Citation: https://doi.org/10.5194/gmd-2022-291-RC2 -
RC3: 'Comment on gmd-2022-291', Anonymous Referee #3, 17 Feb 2023
General comments
The manuscript by Ostberg et al. presents a tool aiming at harmonizing an ensemble of heterogeneous datasets in order to generate consistent input data for terrestrial ecosystem models. The data considered in the study are very diverse, including information relative to land-sea mask, river routing, dams, land use, nitrogen fertilizers, ...
In addition to the description of the methods developed, the manuscript reports on the application of the tool at two spatial resolutions (5’ and 30’) and evaluates how the data generated through the harmonization procedure compare to the raw data and/or to other reference datasets. Both sections dedicated to the description of the tool (section 2) and to its evaluation (section 3) are relevant and highly detailed.
All land modelling groups are faced to the problem of harmonizing input raw data. This is done internally into the model or externally but often quickly and in a more or less clean way, and is rarely – if not never – documented. In this respect, the tool developed by Ostberg et al. is very welcome for the land modelling community and the efforts for accurately describing / documenting the tool in the manuscript should be acknowledged.
The manuscript fits well into the scope of GMD journal (although ESSD, another Copernicus journal could have been envisaged to my opinion) and the description of the LandInG merits certainly to be published. However, some rewriting or reshaping should be envisaged to ease the reading of the manuscript. Here below are some points, which could be improved to my opinion.
Specific comments
The LandInG has been used for LPJmL but is generic enough to be used for other TEMs, with possibly some changes. The reading of the manuscript gives the impression that the authors don’t want to provide too many information specific to LPJmL about input requirement because of the genericity of the tool. As a consequence, it is not clear what are the input data needed by LPJmL and that need to be generated by the tool. To my opinion, the manuscript will gain assuming the toolbox has been used so far for LPJmL input data and to base the description of the methods on this model solely. It does not remove anything to the genericity of the tool. The LPJmL input data description could be done more clear at the beginning of each subsection (2.2, 2.3, 2.4.1, 2.4.2, ...). This is a detail but if focusing only on LPJmL context, formulations such as “TEMs such as LPJmL” (line 93, 259, 271, 274) could be removed of the section 2.
A specific section relative to the Application of the LandInG toolbox for other TEMs could be envisaged at the end of the manuscript or at the end of section 2. It would lighten the description of the method while gathering all the information about the genericity and possible further development envisaged to gain in genericity, in a specific section. The attempt is not to list all input data needed by any TEM, but to identify some of them for which an update of LandInG would be needed. For instance, one about the description of natural vegetation, which is computed internally in LPJmL but often prescribed in many models (this point is mentioned lines 275-280). Another feature may concern the soil information, since some models (see for instance Chaney et al., 2018 https://doi.org/10.5194/hess-22-3311-2018) start setting soil properties at the tile level and not only at the grid cell level.
Some information on the general rationale behind all the data processing would be useful prior describing the different steps done for generating input data for LPJmL (sections 2.1 to 2.5). For example, in section 2.5 on “land use and land management”, subsection 2.5.1 focuses on “country level source data” but the authors do not explain first why such country-scale data are needed. To my opinion, the authors should explain in few sentences in the first paragraph of section 2.5 and before subsection 2.5.1 that some data provide information for hundreds of crop types but only at country scale (FAOSTAT) while other data are gridded but provide only total cropland area for instance (HYDE). The authors want to take advantage of both. I think explaining this kind rationale first, prior going in all the details of the data processing, would be useful to the reader and not only for the data about “land use and land management” but for any kind of data (from section 2.1 to 2.5).
Although all the content of section 3 is of value, this section is quite long and is not always ease to read. I would suggest to shorten it and to limit it to the key results about the application of the tool at 5’ and 30’ resolution. If needed, part of the materials and of the results could be moved in Appendices. Similarly the section on ‘Technical notes’ could be moved to my opinion in a Appendix. If not using Appendices, I would encourage the authors to add an additional level to the subsection (3. X.X.X) in order to better structure this section and facilitate its reading.
In the Introduction section (line 22), the authors should add information on the objectives of the toolbox and of the manuscript, prior to detail what the sections contain. The name of the toolbox, in short (LandInG) or long name (Land Input Generator) is not even mentioned in the Introduction. Regarding the objectives, the abstract is more informative than the introduction itself.
Minor comments / Technical corrections
To my opinion, the naming of section 2 and subsections should be related to the data processes and harmonization procedures developed for generating the input data of LPJmL and not related to the raw data used, as it is in the current manuscript.
Line 44: “do not seem to contain any land”. Could you be more affirmative?
Line 178: I would suggest replacing “land” by “other land categories”
Line 197: could you specify what the elevation above sea level is used for in LPJmL?
Line 198-199 : could you detail what are the data from GranD you used in the toolbox before mentioning (lines 200-201) that data without storage capacity or reservoir area are removed.
Figure 1, page 11: Could you specify on top of the figure, what are the data you use from the different datasets (GAEZ, HYDE, MON, ...), below the green and blue boxes.
Lines 322-323: “Since the cropland assumptions underlying MON differ from the HYDE cropland used here”. Could you say more on this?
Line 323: “total” instead of “global” ?
Line 657: move the reference to Table 4 after “135 106 km2”
Line 657: could you give a reference for the estimation of “3% of ocean” in the land area estimate?
Legend of Figure 2: “Dashed lines indicate the part of total and irrigated harvested areas that are based on gaplling or extrapolation of country-level source data.” This is not so ease to visualize and to understand why some lines are vertical, and another covers all the light blue area. The dashed lines on the dark blue are almost not visible.
Line 863-864: “The toolbox designates any cropland area that exceeds the sum of crop-specific growing areas in a grid cell as fallow land.” I think this information should be moved in section 2 where you describe the toolbox itself.
Figure 3: Y-axis unit, replace “TgN” by “TgN yr-1”
Line 954: “Teragrams N per year (TgN yr-1)” instead of “Teragrams (Tg)”
Line 955: “TgN yr-1” instead of “Tg” (the same lines 958 and 962)
Figure 4: add “N” in the x-axis unit “(kgN/ha)”
Citation: https://doi.org/10.5194/gmd-2022-291-RC3 - AC1: 'Response to referee comments on gmd-2022-291', Sebastian Ostberg, 31 Mar 2023
Status: closed
-
CC1: 'Comment on gmd-2022-291', Jinfeng Chang, 24 Jan 2023
Publisher’s note: this comment is a copy of RC1 and its content was therefore removed.
Citation: https://doi.org/10.5194/gmd-2022-291-CC1 -
RC1: 'Comment on gmd-2022-291', Jinfeng Chang, 24 Jan 2023
This is a comprehensive manuscript that described a toolbox for generating commonly used input datasets for terrestrial ecosystem modelling at two spatial resolution 5’ and 30’. The generated datasets include static inputs like land-sea mask, country and region mask, soil texture and pH, river routing, grid locations of lakes, rivers, dams and reservoirs, and dynamic inputs, a harmonized gridded annual land use and land management (irrigation and fertilization) for the historical period 1500-2017. The application of this toolbox for generating input datasets for LPJmL was presented as an example. The manuscript is well structed and very well written. I would think it is a valuable effort to facilitate the input generation. I only have a few suggestions as follows.
- Given the fact that 1) most of the source datasets existed or used in this toolbox has the highest resolution of 5 arc minutes, 2) the spatial resolution of the TEM simulation usually (if not all) depend on the coarse resolution of all input datasets, and 3) in many cases of this toolbox, the aggregation can only be done with an integer multiple of the source resolution, it could be better to give the possible resolutions for each of the input datasets.
- For all the datasets, it is essential to provide not only the reference, but also the link to the source datasets, the access date (as datasets can be updated), the original data format, and the data content (e.g., exact variable name used by the toolbox). Otherwise, it makes the toolbox much more difficult and less useful for users.
- It is understandable that the authors did not provide results on the created gridded maps as it might contain source datasets that require a license to publish. But for those datasets that were publicly available and has been licensed to distribute, it would be better to provide the resulted maps in addition to the code. For the resulted gridded land use and land management dataset, in particular, the strategy described in this manuscript is in some sense novel (or at least comprehensively described for the first time). Though the authors claimed that this manuscript is solely a description of the toolbox, putting the gridded land use and land management dataset into a public repository could be useful for the community.
Citation: https://doi.org/10.5194/gmd-2022-291-RC1 -
RC2: 'Comment on gmd-2022-291', Anonymous Referee #2, 09 Feb 2023
Review of “LandInG 1.0: A toolbox to derive input datasets for terrestrial ecosystem modelling at variable resolutions from heterogeneous sources” by Sebastian Ostberg, Christoph Müller, Jens Heinke, and Sibyll Schaphoff.
In their manuscript, Ostberg et al. describe the recently published version 1.0 of LandInG, focusing on the generation of detailed input data sets for TEMs and describing the algorithms used to derive inputs for LPJmL.
Overall this is a superb manuscript nearly ready for publication. Having used the LPJmL model more than a decade ago, I am thrilled by the improvement in the quality of documenting the input data sets, potentially extendable to other TEMs as well.
The manuscript could well be published as it is, though I have a few suggestions for minor improvements.
One thing I missed when reviewing the manuscript is a table summarizing the input data sets considered, preferably early in section 2. This would allow the reader to gain a quick overview.
I also found some of the highly detailed sections of the manuscript, for example section 2.5.3, somewhat more difficult to follow than other sections of the manuscript. This was largely due to the necessary level of detail, and improvement would be difficult – the authors need to be aware of it, though. One possible improvement might be to enable the reader to still understand the bulk of the document without needing to go into the sections dealing with the details, by one the one hand indicating which sections are safe to skip, and on the other hand by ensuring that no important information is lost to the reader skipping those sections. However, this may be more effort than worthwhile, so I leave it to the authors to decide.
I thank the authors for the care taken in copy-editing, as there were nearly zero spelling or grammar errors to be found in the manuscript, something a reviewer unfortunately cannot take for granted.
Finally, the reader subconsciously expects to see some maps. Is there really nothing worthwhile showing in map form? Maybe the authors find one or two examples from section 3 that illustrate current capabilities or improvements in comparison to previous approaches? Being unfamiliar with the exact output, I cannot make specific suggestions, however I did wonder what difference the choice in land-sea mask makes in comparison to CRU, and in a number of places I would have liked to see changes in comparison to Schaphoff18, but I know most of those maps will be rather boring due to the few changed points and small magnitude of changes, so I again leave it to the authors to decide.
Citation: https://doi.org/10.5194/gmd-2022-291-RC2 -
RC3: 'Comment on gmd-2022-291', Anonymous Referee #3, 17 Feb 2023
General comments
The manuscript by Ostberg et al. presents a tool aiming at harmonizing an ensemble of heterogeneous datasets in order to generate consistent input data for terrestrial ecosystem models. The data considered in the study are very diverse, including information relative to land-sea mask, river routing, dams, land use, nitrogen fertilizers, ...
In addition to the description of the methods developed, the manuscript reports on the application of the tool at two spatial resolutions (5’ and 30’) and evaluates how the data generated through the harmonization procedure compare to the raw data and/or to other reference datasets. Both sections dedicated to the description of the tool (section 2) and to its evaluation (section 3) are relevant and highly detailed.
All land modelling groups are faced to the problem of harmonizing input raw data. This is done internally into the model or externally but often quickly and in a more or less clean way, and is rarely – if not never – documented. In this respect, the tool developed by Ostberg et al. is very welcome for the land modelling community and the efforts for accurately describing / documenting the tool in the manuscript should be acknowledged.
The manuscript fits well into the scope of GMD journal (although ESSD, another Copernicus journal could have been envisaged to my opinion) and the description of the LandInG merits certainly to be published. However, some rewriting or reshaping should be envisaged to ease the reading of the manuscript. Here below are some points, which could be improved to my opinion.
Specific comments
The LandInG has been used for LPJmL but is generic enough to be used for other TEMs, with possibly some changes. The reading of the manuscript gives the impression that the authors don’t want to provide too many information specific to LPJmL about input requirement because of the genericity of the tool. As a consequence, it is not clear what are the input data needed by LPJmL and that need to be generated by the tool. To my opinion, the manuscript will gain assuming the toolbox has been used so far for LPJmL input data and to base the description of the methods on this model solely. It does not remove anything to the genericity of the tool. The LPJmL input data description could be done more clear at the beginning of each subsection (2.2, 2.3, 2.4.1, 2.4.2, ...). This is a detail but if focusing only on LPJmL context, formulations such as “TEMs such as LPJmL” (line 93, 259, 271, 274) could be removed of the section 2.
A specific section relative to the Application of the LandInG toolbox for other TEMs could be envisaged at the end of the manuscript or at the end of section 2. It would lighten the description of the method while gathering all the information about the genericity and possible further development envisaged to gain in genericity, in a specific section. The attempt is not to list all input data needed by any TEM, but to identify some of them for which an update of LandInG would be needed. For instance, one about the description of natural vegetation, which is computed internally in LPJmL but often prescribed in many models (this point is mentioned lines 275-280). Another feature may concern the soil information, since some models (see for instance Chaney et al., 2018 https://doi.org/10.5194/hess-22-3311-2018) start setting soil properties at the tile level and not only at the grid cell level.
Some information on the general rationale behind all the data processing would be useful prior describing the different steps done for generating input data for LPJmL (sections 2.1 to 2.5). For example, in section 2.5 on “land use and land management”, subsection 2.5.1 focuses on “country level source data” but the authors do not explain first why such country-scale data are needed. To my opinion, the authors should explain in few sentences in the first paragraph of section 2.5 and before subsection 2.5.1 that some data provide information for hundreds of crop types but only at country scale (FAOSTAT) while other data are gridded but provide only total cropland area for instance (HYDE). The authors want to take advantage of both. I think explaining this kind rationale first, prior going in all the details of the data processing, would be useful to the reader and not only for the data about “land use and land management” but for any kind of data (from section 2.1 to 2.5).
Although all the content of section 3 is of value, this section is quite long and is not always ease to read. I would suggest to shorten it and to limit it to the key results about the application of the tool at 5’ and 30’ resolution. If needed, part of the materials and of the results could be moved in Appendices. Similarly the section on ‘Technical notes’ could be moved to my opinion in a Appendix. If not using Appendices, I would encourage the authors to add an additional level to the subsection (3. X.X.X) in order to better structure this section and facilitate its reading.
In the Introduction section (line 22), the authors should add information on the objectives of the toolbox and of the manuscript, prior to detail what the sections contain. The name of the toolbox, in short (LandInG) or long name (Land Input Generator) is not even mentioned in the Introduction. Regarding the objectives, the abstract is more informative than the introduction itself.
Minor comments / Technical corrections
To my opinion, the naming of section 2 and subsections should be related to the data processes and harmonization procedures developed for generating the input data of LPJmL and not related to the raw data used, as it is in the current manuscript.
Line 44: “do not seem to contain any land”. Could you be more affirmative?
Line 178: I would suggest replacing “land” by “other land categories”
Line 197: could you specify what the elevation above sea level is used for in LPJmL?
Line 198-199 : could you detail what are the data from GranD you used in the toolbox before mentioning (lines 200-201) that data without storage capacity or reservoir area are removed.
Figure 1, page 11: Could you specify on top of the figure, what are the data you use from the different datasets (GAEZ, HYDE, MON, ...), below the green and blue boxes.
Lines 322-323: “Since the cropland assumptions underlying MON differ from the HYDE cropland used here”. Could you say more on this?
Line 323: “total” instead of “global” ?
Line 657: move the reference to Table 4 after “135 106 km2”
Line 657: could you give a reference for the estimation of “3% of ocean” in the land area estimate?
Legend of Figure 2: “Dashed lines indicate the part of total and irrigated harvested areas that are based on gaplling or extrapolation of country-level source data.” This is not so ease to visualize and to understand why some lines are vertical, and another covers all the light blue area. The dashed lines on the dark blue are almost not visible.
Line 863-864: “The toolbox designates any cropland area that exceeds the sum of crop-specific growing areas in a grid cell as fallow land.” I think this information should be moved in section 2 where you describe the toolbox itself.
Figure 3: Y-axis unit, replace “TgN” by “TgN yr-1”
Line 954: “Teragrams N per year (TgN yr-1)” instead of “Teragrams (Tg)”
Line 955: “TgN yr-1” instead of “Tg” (the same lines 958 and 962)
Figure 4: add “N” in the x-axis unit “(kgN/ha)”
Citation: https://doi.org/10.5194/gmd-2022-291-RC3 - AC1: 'Response to referee comments on gmd-2022-291', Sebastian Ostberg, 31 Mar 2023
Sebastian Ostberg et al.
Model code and software
Code for LandInG v.1.0 sample application at 5 arc-minute and 30 arc-minute resolution Ostberg, Sebastian https://doi.org/10.5281/zenodo.7371650
Sebastian Ostberg et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
597 | 159 | 17 | 773 | 42 | 7 | 7 |
- HTML: 597
- PDF: 159
- XML: 17
- Total: 773
- Supplement: 42
- BibTeX: 7
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1