CLIMFILL: A Framework for Intelligently Gap-filling Earth Observations
- ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
- ETH Zürich, Rämistrasse 101, 8092 Zürich, Switzerland
Abstract. Earth observations have many missing values. Their abundance and often complex patterns can be a barrier for combining different observational datasets and may cause biased estimates. To overcome this, missing values in geoscientific data are regularly infilled with estimates through univariate gap-filling techniques such as spatio-temporal interpolation. However, these mostly ignore valuable information that may be present in other dependent observed variables. Here we propose CLIMFILL, a multivariate gap-filling procedure that builds up upon simple interpolation by additionally applying a statistical imputation method that is designed to account for dependence across variables. In contrast to popular up-scaling approaches, CLIMFILL does not need a gap-free gridded "donor" variable for gap-filling. CLIMFILL is tested using gap-free ERA5 re-analysis data of ground temperature, surface layer soil moisture, precipitation, and terrestrial water storage to represent central interactions between soil moisture and climate. These observations were matched with corresponding remote sensing observations and masked where the observations have missing values. CLIMFILL successfully recovers the dependence structure among the variables across all land cover types and altitudes, thereby enabling subsequent mechanistic interpretations. Soil moisture-temperature feedback, which is underestimated in high latitude regions due to sparse satellite coverage, is adequately represented in the multivariate gap-filling. Univariate performance metrics such as correlation and bias are improved compared to spatiotemporal interpolation gap-fill for a wide range of missing values and missingness patterns. Especially estimates for surface layer soil moisture profit taking into account the multivariate dependence structure of the data. The framework al- lows tailoring the gap-filling process to different environmental conditions, domains, or specific use cases and hence can be used as a flexible tool for gap-filling a large range of remote sensing and in situ observations commonly used in climate and environmental research.
Verena Bessenbacher et al.
Status: closed
-
CEC1: 'Comment on gmd-2021-164', Astrid Kerkweg, 21 Jul 2021
Dear authors,
in my role as Executive editor of GMD, I would like to bring to your attention our Editorial version 1.2: https://www.geosci-model-dev.net/12/2215/2019/
This highlights some requirements of papers published in GMD, which is also available on the GMD website in the ‘Manuscript Types’ section: http://www.geoscientific-model-development.net/submission/manuscript_types.html
In particular, please note that for your paper, the following requirement has not been met in the Discussions paper:
- "The main paper must give the model name and version number (or other unique identifier) in the title."
Please add a version number for CLIMFILL in the title upon your revised submission to GMD.
Yours,
Astrid Kerkweg
- AC1: 'Reply on CEC1', Verena Bessenbacher, 21 Oct 2021
-
RC1: 'Comment on gmd-2021-164', Anonymous Referee #1, 18 Aug 2021
Review of « CLIMFILL: A Framework for Intelligently Gap-filling Earth Observations » by V. Bessenbacher et al.
This manuscript addresses an important problem: the gap-filling of global observations and the generation of continuous spatial and temporal data. It is well written and the methodology is mostly clear. This said, I have some reservation regarding the justification of the methodology and the validation approach. These are detailed below.
Major comments:
- The method is well described, but involves a number of modeling choices (initial interpolation method, clustering approach, random forest estimation and averaging) that are not always justified, except by the experimental results showing that “it works”. The problem is that I cannot make sure that it works with the current benchmarking. Indeed, the proposed method is compared only against the interpolation of step 1 of the proposed method itself. It is not in my opinion a sufficient benchmark. While it shows that steps 2-4 do have some added value compared to the extremely simple interpolation of step 1, added value against other interpolation methods is not demonstrated. By construction, it is expected that steps 1-4 perform than step 1 alone. I suggest to demonstrate the performance of the proposed approach is to compare it against something slightly more sophisticated, and already known to work in such contexts, for example (co-)kriging, possibly with a separate variogram model in each of the clusters defined in step 3.
- The introduction stresses, with reason, that the reproduction of the dependencies between variables is critical, and that these dependencies are complex. However, the evaluation metric used relies on the assumption that these distributions are Gaussian, whereas it is clearly not the case, as seen in figure 6. Instead of eq. 2, I suggest using a metric that considers a numerical description of the joint distribution, such as for example the Jensen-Shannon divergence (or many other possible divergences available in the literature). Applied to the distributions in figure 6, the computational cost would be minimal.
- Some elements in figure 6 do not allow me to fully evaluate the results of the proposed method. I see significant differences between the distributions, e.g. in d) there is an important bias towards values of soil moisture around 0.3, which seems more important than in c). More generally, the distribution in d) looks globally like c) but smoothed (comparable to the smudging effect of adding a random noise). Similarly, in figure A1 there are important artifacts in the reproduction of the marginal distributions by CLIMFILL, which imply that the joint distribution is also inaccurate. Visually interpreting these effect is difficult because the joint distributions are presented as histograms, with counts of data instead of densities of probability. As a result, the integral of the joint distributions is not the same, especially for b). These histograms should be normalized by their integrals to reflect probabilities rather than counts.
- In figure 11 as well as in the supplementary figures, I cannot see that CLIMFILL is systematically better than the simple interpolation. A quantitative assessment might help highlighting such differences. Why not using the same regions in figure 12 as in figure 11? Is the focus on randomly chosen regions or on the areas of larger discrepancies?
- While I see the logic in separating the interpolation of a global trend (step 1) and detailed data-driven smaller scale features (steps 2-4), step 1 is a spatial KNN, which inherently assumes smoothness. This is a modeling decision having implications that are not evaluated. For example, the distributions in figure 6c and 6d present features that are absent from the dataset used to interpolate from (figure 6b). It means that some unobserved statistical properties have been created. It seems to me that the large peak in figure 6c and its smoother version in figure 6d are typical of nearest-neighbor algorithms that propagate a single nearest value far from observations.
- The interpolation approach is largely driven by the large number of features, which is fine and quite usual, but this means that it may not perform well in case of large gaps, or when some of these covariates are unknown. How does it perform when no covariates are present (or only e.g. topography and lat/lon)? Furthermore, it is mentioned in l. 155 and below that a shortcoming of existing gap-filling approaches is that they heavily rely on covariates and not on spatial relationships. As I understand it, CLIMFILL also relies largely on covariates (steps 2 and 3), and very little on spatial relationships (spatial dependence is only considered in step 1, and in a very loose way as through a nearest neighbor approach).
- One problem I see with the proposed approach is that it does not consider or attempt to quantify uncertainty. The values in the middle of a large gap are given with the same confidence as for a single pixel gap. Similarly, the uncertainty should be larger when few covariates are present. Furthermore, on l.232 it is mentioned that the different clusterings obtained (which may in some sense convey a sense of variability) are averaged, thus collapsing any uncertainty into a mean value.
Minor/editorial comments:
- Some of the references are quite outdated, such as Rubin (1976) that is mentioned repeatedly, whereas the literature on spatial statistics and geostatistics, which is precisely concerned with interpolation in similar spatio-temporal applications, is quite incomplete. Some starting points could be:
Cressie, N. and K. Wikle (2011). Statistics for Spatio-Temporal Data. New Jersey, Wiley.
Chilès, J. P. and P. Delfiner (2012). Geostatistics: Modeling Spatial Uncertainty: Second Edition.
- Limitations of Gaussian processes are mentioned in l. 136, but with no details. There are several applications in the literature where Gaussian processes (or other forms of random processes) have been successfully used with large datasets.
- 139: are well suited
- 153: provided by other variables
- Table 2, caption: it is not clear to me what is meant by “method class” in this table
- 170: outlook for possible future work
- Caption of figure 2: the framework is divides into four steps
- 203: constant maps describing properties
- 206: Please develop the motivation for the Takens Theorem. I do not see the link to the present approach, especially given the observational uncertainties considered here.
- 217: built from variables
- 216-218: This sentence seems to refer to the data format in the specific implementation described. Probably not needed in this methodological description.
- 244: I understand that the subscript “updated” stands for estimated value. A more common notation would be to use a hat.
- 244: the meaning of subscript m is unclear to me. Is it the same as in l. 217?
- 266: it is mentioned that the proposed approach could be used to interpolate sparse in-situ measurements. This could be expanded upon or removed, as I do not see how it could be achieved easily in the present form because the model is heavily data-driven. Same comment for l.489-490.
- Figure 5: why id the temporal window smaller in the future period than in the past?
- 305: the point is filled
- 323: are scalable
- 339: what is the criterion allowing to state that the shape of the distribution is well recovered?
- Legend of Figure 8: a) is described twice in the legend and c) is missing.
- Legend of Figure 8: “In swaths-only…”: incomplete sentence
- 361: Leads to…: incomplete sentence.
- 401: This last sentence is intriguing, especially during the presentation of results. It could be expanded upon in the discussion.
- 468: recovering the physical
- 498-500: “closing the largest gaps first”: it is not immediately clear to me that this would be the best strategy. One potential drawback of this approach might be (but I am not sure) to artificially reduce the uncertainty related to the large gaps, precisely where uncertainty is important. One could also argue that a strategy could be to start with areas that are fairly certain (i.e. small gaps).
- In figure A1, I recommend showing the precipitations on a log axis, and normalizing the joint distributions rather than displaying counts (same comment as for figure 6).
- AC2: 'Reply on RC1', Verena Bessenbacher, 21 Oct 2021
-
RC2: 'Review of “CLIMFILL: A Framework for Intelligently Gap-filling Earth Observations“', Rene Orth, 30 Aug 2021
Review of Bessenbacher et al., gmd-2021-164
“CLIMFILL: A Framework for Intelligently Gap-filling Earth Observations“This study introduces a sophisticated procedure to gap-fill Earth observation time series
while benefitting from independently and concurrently observed related variables.
The authors showcase the method with reanalysis data where some parts are intentionally masked,
and the reconstructed estimates are finally compared with the original data. Thereby, they consider
ground temperature, terrestrial water storage, surface layer soil moisture and precipitation and
discuss the results both in terms of reconstucted individual time series, and for the interactions
between reconstructed variables compared with respective estimates from the original data.-------------------
Recommendation:
I think the paper requires major revisions.This is a useful and timely contribution for the Earth science community, and interesting for the
readership of the Geoscientific Model Development. Benefitting from a growing suite of Earth observations,
complex statistical tools and machine learning applications are increasingly employed in Earth science research.
Mostly, these analysis tools require gap-free data which is often derived through gap-filling procedures.
In this context, improving the quality of the gap-filling by exploiting the relationships between the
independent Earth observations is a promising avenue.
However, I have some concerns regarding the description of the method and the benchmarking of the results,
as detailed below.--------------------
General comments:
(1) Comparing the results from the plain interpolation with that at the end of all four steps
of the gap-filling procedure is interesting to understand the method and the relevance of the
various steps. However, it is not a suitable benchmarking exercise as it is to be expected that
the results after four steps are closer to the original ERA5 data than the result after the first
relatively crude interpolation step.
Instead, an established univariate gap-filling technique should be employed here as a benchmark
to illustrate under which circumstances the presented methodology offers benefits over previous
approaches. Also, this could reveal to which is extent the gap filling can be improved by (i) complete
exploration of uni-variate time series beyond neighbors, versus (ii) a multivariate approach.(2) I think it would be useful for future CLIMFILL users to give more guidance on the methods to
use in each step of the algorithm. Table 2 offers many possible choices, but in addition some recommendations
would be needed on when to use which method and why. Also, the selection of employed variables
is important as their inter-relations are a key source for the gap reconstructions, so also some
additional advice on this would be helpful.(3) I think that the feature selection is a bit arbitrary and dependent on expert knowledge.
To somewhat address this issue, maybe several features could be used by default, such as the 34 features used
in the presented example and maybe even additional time lags and windows. Then, the random forest model
can be employed to rank the features by their importance (e.g. using SHAP value importance) to make
a more informed decision on the useful features. Finally, the gap-filling could be re-run with only
retaining relevant features.(4) There is advanced statistical and data science language used across the manuscript and I recommend
to clarify this with additional information to allow a broader geoscientific audience to follow this
manuscript. Please see my respective suggestions in the specific comments below.
I do not wish to remain anonymous - Rene Orth.
------------------Specific comments:
line 2: estimates for what?line 5: remove "up"
line 7: I agree that technically the algorithm does not require a gap-free donor variable; however
if all variables have gaps at the same time and if this period is longer, then the final gap-fill estimate
will naturally have a low qualityline 15: "profit", maybe rephrase as "are improved by"
lines 45, 144 & Table 1: Jung et al. 2019 and O & Orth 2021 are relevant studies in this context and
could be mentioned hereline 46: please clarify "scale somewhere between"
line 84: please clarify "difficult observational record"
lines 108/109 and 111 are in contrast to each other
line 151: this is unclear, please rephrase
line 154: "another" should be "other" I guess
Table 2, caption: "other" should be "another" I guess
Table 2, right column: "or more complex interpolation methods", "Guided by ...", these are not exactly examples
as the column title suggestsline 170: remove "on"
line 171: feels a bit random which letters are capitalized here and which are not
line 173: "the highly structured nature", please explain
Figure 2, caption: The framework is divided into four steps, not three.
line 178: Abbreviation CLIMFILL is mentioned earlier and should be explained at the first occasion
line 181: please clarify "correlation structure"
lines 203, 311: please clarify "constant"
line 216: quotation marks not needed
lines 229: please clarify "stabilising the results"
line 231: please clarify "terminal clusters"
line 243: I think this should be "to overwrite the former estimates"
lines 250/251: "learns different weights", please clarify
Figure 3, caption: replace "substracting" with "subtracting"
line 272: How are deserts defined and detected?
line 311: It should be 4 and not 3 additional features I guess?
line 314: please clarify "non-normality"
line 316: How does this add up to 34?
line 319: "respectively" should be added after "clusters" I guess
line 326: I wonder if and how different spatial resolutions can affect the accuracy of the gap filling,
it would be great if the authors could shortly discuss this.line 326: "where one fold is one year", please clarify
Figure 7, caption: what is "CLIMPUTE-RF"?
line 351: please clarify "det"
Figure 8, caption: sentences should not end with "with" and "create".
line 361: "This" should be added before "leads".
line 367, section 3.4: I very much like the idea of studying the performance of the gap-filling across missingness patterns and different severity of the gaps.
Figure 10, caption: the B-distance is not actually displayed in this figure
line 373: How exactly are the satellite swaths imitated?
line 401: I do not quite understand the point on the bias correction.
line 427: similar in "remotely sensed" data but underestimated in "satellite observations",
this should be the same thing?Figure 2: The figure is rather small now and should be enlarged to make it easier to see all details.
Figure 4: The months axis should not go to 12.5
References:
Jung, M., et al., The FLUXCOM ensemble of global land-atmosphere energy fluxes, Sci. Data 6, 74 (2019).
O, S. and R. Orth, Global soil moisture data derived through machine learning trained with in-situ measurements,
Sci. Data 8, 170 (2021).- AC3: 'Reply on RC2', Verena Bessenbacher, 21 Oct 2021
Status: closed
-
CEC1: 'Comment on gmd-2021-164', Astrid Kerkweg, 21 Jul 2021
Dear authors,
in my role as Executive editor of GMD, I would like to bring to your attention our Editorial version 1.2: https://www.geosci-model-dev.net/12/2215/2019/
This highlights some requirements of papers published in GMD, which is also available on the GMD website in the ‘Manuscript Types’ section: http://www.geoscientific-model-development.net/submission/manuscript_types.html
In particular, please note that for your paper, the following requirement has not been met in the Discussions paper:
- "The main paper must give the model name and version number (or other unique identifier) in the title."
Please add a version number for CLIMFILL in the title upon your revised submission to GMD.
Yours,
Astrid Kerkweg
- AC1: 'Reply on CEC1', Verena Bessenbacher, 21 Oct 2021
-
RC1: 'Comment on gmd-2021-164', Anonymous Referee #1, 18 Aug 2021
Review of « CLIMFILL: A Framework for Intelligently Gap-filling Earth Observations » by V. Bessenbacher et al.
This manuscript addresses an important problem: the gap-filling of global observations and the generation of continuous spatial and temporal data. It is well written and the methodology is mostly clear. This said, I have some reservation regarding the justification of the methodology and the validation approach. These are detailed below.
Major comments:
- The method is well described, but involves a number of modeling choices (initial interpolation method, clustering approach, random forest estimation and averaging) that are not always justified, except by the experimental results showing that “it works”. The problem is that I cannot make sure that it works with the current benchmarking. Indeed, the proposed method is compared only against the interpolation of step 1 of the proposed method itself. It is not in my opinion a sufficient benchmark. While it shows that steps 2-4 do have some added value compared to the extremely simple interpolation of step 1, added value against other interpolation methods is not demonstrated. By construction, it is expected that steps 1-4 perform than step 1 alone. I suggest to demonstrate the performance of the proposed approach is to compare it against something slightly more sophisticated, and already known to work in such contexts, for example (co-)kriging, possibly with a separate variogram model in each of the clusters defined in step 3.
- The introduction stresses, with reason, that the reproduction of the dependencies between variables is critical, and that these dependencies are complex. However, the evaluation metric used relies on the assumption that these distributions are Gaussian, whereas it is clearly not the case, as seen in figure 6. Instead of eq. 2, I suggest using a metric that considers a numerical description of the joint distribution, such as for example the Jensen-Shannon divergence (or many other possible divergences available in the literature). Applied to the distributions in figure 6, the computational cost would be minimal.
- Some elements in figure 6 do not allow me to fully evaluate the results of the proposed method. I see significant differences between the distributions, e.g. in d) there is an important bias towards values of soil moisture around 0.3, which seems more important than in c). More generally, the distribution in d) looks globally like c) but smoothed (comparable to the smudging effect of adding a random noise). Similarly, in figure A1 there are important artifacts in the reproduction of the marginal distributions by CLIMFILL, which imply that the joint distribution is also inaccurate. Visually interpreting these effect is difficult because the joint distributions are presented as histograms, with counts of data instead of densities of probability. As a result, the integral of the joint distributions is not the same, especially for b). These histograms should be normalized by their integrals to reflect probabilities rather than counts.
- In figure 11 as well as in the supplementary figures, I cannot see that CLIMFILL is systematically better than the simple interpolation. A quantitative assessment might help highlighting such differences. Why not using the same regions in figure 12 as in figure 11? Is the focus on randomly chosen regions or on the areas of larger discrepancies?
- While I see the logic in separating the interpolation of a global trend (step 1) and detailed data-driven smaller scale features (steps 2-4), step 1 is a spatial KNN, which inherently assumes smoothness. This is a modeling decision having implications that are not evaluated. For example, the distributions in figure 6c and 6d present features that are absent from the dataset used to interpolate from (figure 6b). It means that some unobserved statistical properties have been created. It seems to me that the large peak in figure 6c and its smoother version in figure 6d are typical of nearest-neighbor algorithms that propagate a single nearest value far from observations.
- The interpolation approach is largely driven by the large number of features, which is fine and quite usual, but this means that it may not perform well in case of large gaps, or when some of these covariates are unknown. How does it perform when no covariates are present (or only e.g. topography and lat/lon)? Furthermore, it is mentioned in l. 155 and below that a shortcoming of existing gap-filling approaches is that they heavily rely on covariates and not on spatial relationships. As I understand it, CLIMFILL also relies largely on covariates (steps 2 and 3), and very little on spatial relationships (spatial dependence is only considered in step 1, and in a very loose way as through a nearest neighbor approach).
- One problem I see with the proposed approach is that it does not consider or attempt to quantify uncertainty. The values in the middle of a large gap are given with the same confidence as for a single pixel gap. Similarly, the uncertainty should be larger when few covariates are present. Furthermore, on l.232 it is mentioned that the different clusterings obtained (which may in some sense convey a sense of variability) are averaged, thus collapsing any uncertainty into a mean value.
Minor/editorial comments:
- Some of the references are quite outdated, such as Rubin (1976) that is mentioned repeatedly, whereas the literature on spatial statistics and geostatistics, which is precisely concerned with interpolation in similar spatio-temporal applications, is quite incomplete. Some starting points could be:
Cressie, N. and K. Wikle (2011). Statistics for Spatio-Temporal Data. New Jersey, Wiley.
Chilès, J. P. and P. Delfiner (2012). Geostatistics: Modeling Spatial Uncertainty: Second Edition.
- Limitations of Gaussian processes are mentioned in l. 136, but with no details. There are several applications in the literature where Gaussian processes (or other forms of random processes) have been successfully used with large datasets.
- 139: are well suited
- 153: provided by other variables
- Table 2, caption: it is not clear to me what is meant by “method class” in this table
- 170: outlook for possible future work
- Caption of figure 2: the framework is divides into four steps
- 203: constant maps describing properties
- 206: Please develop the motivation for the Takens Theorem. I do not see the link to the present approach, especially given the observational uncertainties considered here.
- 217: built from variables
- 216-218: This sentence seems to refer to the data format in the specific implementation described. Probably not needed in this methodological description.
- 244: I understand that the subscript “updated” stands for estimated value. A more common notation would be to use a hat.
- 244: the meaning of subscript m is unclear to me. Is it the same as in l. 217?
- 266: it is mentioned that the proposed approach could be used to interpolate sparse in-situ measurements. This could be expanded upon or removed, as I do not see how it could be achieved easily in the present form because the model is heavily data-driven. Same comment for l.489-490.
- Figure 5: why id the temporal window smaller in the future period than in the past?
- 305: the point is filled
- 323: are scalable
- 339: what is the criterion allowing to state that the shape of the distribution is well recovered?
- Legend of Figure 8: a) is described twice in the legend and c) is missing.
- Legend of Figure 8: “In swaths-only…”: incomplete sentence
- 361: Leads to…: incomplete sentence.
- 401: This last sentence is intriguing, especially during the presentation of results. It could be expanded upon in the discussion.
- 468: recovering the physical
- 498-500: “closing the largest gaps first”: it is not immediately clear to me that this would be the best strategy. One potential drawback of this approach might be (but I am not sure) to artificially reduce the uncertainty related to the large gaps, precisely where uncertainty is important. One could also argue that a strategy could be to start with areas that are fairly certain (i.e. small gaps).
- In figure A1, I recommend showing the precipitations on a log axis, and normalizing the joint distributions rather than displaying counts (same comment as for figure 6).
- AC2: 'Reply on RC1', Verena Bessenbacher, 21 Oct 2021
-
RC2: 'Review of “CLIMFILL: A Framework for Intelligently Gap-filling Earth Observations“', Rene Orth, 30 Aug 2021
Review of Bessenbacher et al., gmd-2021-164
“CLIMFILL: A Framework for Intelligently Gap-filling Earth Observations“This study introduces a sophisticated procedure to gap-fill Earth observation time series
while benefitting from independently and concurrently observed related variables.
The authors showcase the method with reanalysis data where some parts are intentionally masked,
and the reconstructed estimates are finally compared with the original data. Thereby, they consider
ground temperature, terrestrial water storage, surface layer soil moisture and precipitation and
discuss the results both in terms of reconstucted individual time series, and for the interactions
between reconstructed variables compared with respective estimates from the original data.-------------------
Recommendation:
I think the paper requires major revisions.This is a useful and timely contribution for the Earth science community, and interesting for the
readership of the Geoscientific Model Development. Benefitting from a growing suite of Earth observations,
complex statistical tools and machine learning applications are increasingly employed in Earth science research.
Mostly, these analysis tools require gap-free data which is often derived through gap-filling procedures.
In this context, improving the quality of the gap-filling by exploiting the relationships between the
independent Earth observations is a promising avenue.
However, I have some concerns regarding the description of the method and the benchmarking of the results,
as detailed below.--------------------
General comments:
(1) Comparing the results from the plain interpolation with that at the end of all four steps
of the gap-filling procedure is interesting to understand the method and the relevance of the
various steps. However, it is not a suitable benchmarking exercise as it is to be expected that
the results after four steps are closer to the original ERA5 data than the result after the first
relatively crude interpolation step.
Instead, an established univariate gap-filling technique should be employed here as a benchmark
to illustrate under which circumstances the presented methodology offers benefits over previous
approaches. Also, this could reveal to which is extent the gap filling can be improved by (i) complete
exploration of uni-variate time series beyond neighbors, versus (ii) a multivariate approach.(2) I think it would be useful for future CLIMFILL users to give more guidance on the methods to
use in each step of the algorithm. Table 2 offers many possible choices, but in addition some recommendations
would be needed on when to use which method and why. Also, the selection of employed variables
is important as their inter-relations are a key source for the gap reconstructions, so also some
additional advice on this would be helpful.(3) I think that the feature selection is a bit arbitrary and dependent on expert knowledge.
To somewhat address this issue, maybe several features could be used by default, such as the 34 features used
in the presented example and maybe even additional time lags and windows. Then, the random forest model
can be employed to rank the features by their importance (e.g. using SHAP value importance) to make
a more informed decision on the useful features. Finally, the gap-filling could be re-run with only
retaining relevant features.(4) There is advanced statistical and data science language used across the manuscript and I recommend
to clarify this with additional information to allow a broader geoscientific audience to follow this
manuscript. Please see my respective suggestions in the specific comments below.
I do not wish to remain anonymous - Rene Orth.
------------------Specific comments:
line 2: estimates for what?line 5: remove "up"
line 7: I agree that technically the algorithm does not require a gap-free donor variable; however
if all variables have gaps at the same time and if this period is longer, then the final gap-fill estimate
will naturally have a low qualityline 15: "profit", maybe rephrase as "are improved by"
lines 45, 144 & Table 1: Jung et al. 2019 and O & Orth 2021 are relevant studies in this context and
could be mentioned hereline 46: please clarify "scale somewhere between"
line 84: please clarify "difficult observational record"
lines 108/109 and 111 are in contrast to each other
line 151: this is unclear, please rephrase
line 154: "another" should be "other" I guess
Table 2, caption: "other" should be "another" I guess
Table 2, right column: "or more complex interpolation methods", "Guided by ...", these are not exactly examples
as the column title suggestsline 170: remove "on"
line 171: feels a bit random which letters are capitalized here and which are not
line 173: "the highly structured nature", please explain
Figure 2, caption: The framework is divided into four steps, not three.
line 178: Abbreviation CLIMFILL is mentioned earlier and should be explained at the first occasion
line 181: please clarify "correlation structure"
lines 203, 311: please clarify "constant"
line 216: quotation marks not needed
lines 229: please clarify "stabilising the results"
line 231: please clarify "terminal clusters"
line 243: I think this should be "to overwrite the former estimates"
lines 250/251: "learns different weights", please clarify
Figure 3, caption: replace "substracting" with "subtracting"
line 272: How are deserts defined and detected?
line 311: It should be 4 and not 3 additional features I guess?
line 314: please clarify "non-normality"
line 316: How does this add up to 34?
line 319: "respectively" should be added after "clusters" I guess
line 326: I wonder if and how different spatial resolutions can affect the accuracy of the gap filling,
it would be great if the authors could shortly discuss this.line 326: "where one fold is one year", please clarify
Figure 7, caption: what is "CLIMPUTE-RF"?
line 351: please clarify "det"
Figure 8, caption: sentences should not end with "with" and "create".
line 361: "This" should be added before "leads".
line 367, section 3.4: I very much like the idea of studying the performance of the gap-filling across missingness patterns and different severity of the gaps.
Figure 10, caption: the B-distance is not actually displayed in this figure
line 373: How exactly are the satellite swaths imitated?
line 401: I do not quite understand the point on the bias correction.
line 427: similar in "remotely sensed" data but underestimated in "satellite observations",
this should be the same thing?Figure 2: The figure is rather small now and should be enlarged to make it easier to see all details.
Figure 4: The months axis should not go to 12.5
References:
Jung, M., et al., The FLUXCOM ensemble of global land-atmosphere energy fluxes, Sci. Data 6, 74 (2019).
O, S. and R. Orth, Global soil moisture data derived through machine learning trained with in-situ measurements,
Sci. Data 8, 170 (2021).- AC3: 'Reply on RC2', Verena Bessenbacher, 21 Oct 2021
Verena Bessenbacher et al.
Model code and software
CLIMFILL Bessenbacher, Verena https://github.com/climachine/climfill
Verena Bessenbacher et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,008 | 342 | 29 | 1,379 | 11 | 13 |
- HTML: 1,008
- PDF: 342
- XML: 29
- Total: 1,379
- BibTeX: 11
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1