the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Quantifying Causal Contributions in Earth Systems by Normalized Information Flow
ChinHsien Cheng
Simon A. T. Redfern
Abstract. To understand the plethora of important processes that are characterized by their complexity, from global pandemics to global climate change, it may be critical to quantify causal contributions between time series variables. Here, we examine an empirical linear relationship between the rate of changing causes and effects with various multipliers. Sign corrected normalized information flow (nIF_{c}) tends to provide the best estimates of causal contributions, often in situations where such causality is poorly reflected by regressions. These include: i) causal contributions with alternating feedback (correlation) sign, ii) significant causal timelags, iii) significant noise contributions, and iv) comparison among many causes to an overall mean effect, especially with teleconnection. Estimates of methaneclimate feedbacks with both observational and Earth system model CESM2 data are given as examples of nonlinear process quantification and model assessment. The relative causal contribution is hypothesized to be proportional to nIF, i.e. the ratio between entropy (degree of uncertainty) received from the causevariable (i.e. information flow, IF) and the total entropy change of the effectvariable. Large entropy, associated with noise, deteriorates the estimates of total entropy change, and hence nIF, while the proportional relationship between the relative causal contribution and IF improves.
 Preprint
(7142 KB) 
Supplement
(20189 KB)  BibTeX
 EndNote
ChinHsien Cheng and Simon A. T. Redfern
Status: closed

CC1: 'Comment on gmd2021196', Daniel Fiifi Hagan, 23 Jul 2021
A very interesting paper with a really good application which the community needs. Kudos! I have two concerns that I really hope the authors can address:
1. It seems to me that the authors have missed the recent papers of Liang's group which have addressed almost all the issues raised in this paper. I realized that in Cheng and Redfern's paper, the references to Liang's works are only up to 2018. I am convinced that these recent works will give more insight on how this study is framed. Please see below for these papers:
Entropy  Free FullText  Normalized Multivariate Time Series Causality Analysis and Causal Graph Reconstruction (mdpi.com)
Entropy  Free FullText  A Note on Causation versus Correlation in an Extreme Situation (mdpi.com)
(11) (PDF) El Niño Modoki thus far can be mostly predicted more than 10 years ahead of time (researchgate.net)
(11) (PDF) Measuring the importance of individual units in producing the collective behavior of a complex network (researchgate.net)
Panel Data Causal Inference Using a Rigorous Information Flow Analysis for Homogeneous, Independent and Identically Distributed Datasets  IEEE Journals & Magazine  IEEE Xplore
2. The authors suggest using pearson correlation sign to correct IF causality, however, I think this suggestion seems to ignore the differences between the meaning of the signs of IF and the meanings of the sign of pearson correlation (PC). They are not the same. As the authors rightly noted, Liang provided some semantics to the intepretation of the signs of IF when positive or negative, and I believe they should not lose sight of that when considering the signs of the PC. PC signs merely suggest the direction of change between two consecutive time points of two time series. This is not the case for IF signs. Liang's own interpretation (which have over the years evolved) imply that the IF signs characterize entropy in the system being analyzed. So I disagree here that we can simply integrate these two. I am conviced that when the changes in the time series are very small(so that correlations are mostly insignificant), this suggested formalism would break down. Thus, one may have to begin looking at the standard deviations or another quality to complement it. Moreover, Liang's notes on causation vs causality (which he has noted very often in previous studies) shed more light on this. PC signs are qualitative while IF signs are quantitative. Recently though, Liang has advocated more for the use of the absolute IF, even though I personally believe there is more merit in paying attention to the signs (and I believe the authors agree with me on this point).
I could also comment on the contribution of noise and the different delays and their impacts which the authors have noted, but these are all well captured in Liang's paper (please see first link in the papers above), so I wish the authors would have a look at these studies and include the outcomes in this paper.
Again, I believe the applications in this paper is very necessary to the community, so I hope it gets out there to make others more aware of the potential of this formalism.
Cheers
Citation: https://doi.org/10.5194/gmd2021196CC1 
AC1: 'Reply on CC1', Simon Redfern, 27 Jul 2021
We are grateful for these comments, especially for sharing the recent papers by Liang. Where appropriate, we will incorporate these into our revised paper and reframe our discussion wherever appropriate. In general, the findings from Liang’s recent papers are consistent with ours and help to better contextualise some of our findings. In particular, we will certainly cite the following:
 Liang, X.S., Normalized Multivariate Time Series Causality Analysis and Causal Graph Reconstruction Entropy, 2021. 23(6): p. 679. The success of information flow under heavynoise conditions demonstrated here supports our findings, showing the potentially improved estimates of causal contribution estimate given by IF. We are also grateful for you pointing us to "X. S. Liang, F. Xu, and Y. Rong, El Niño Modoki thus far can be mostly predicted more than 10 years ahead of time. Scientific Report, under revision" which we will be happy to cite.
 The first paper above, Liang (2021) Entropy 23: 679, also shows the capability of IF and nIF in analysing causality among multivariate time series and identifying cofounder and self loops. In another paper (Liang, X. Measuring the importance of individual units in producing the collective behavior of a complex network. 2021. arXiv:2104.09290) it is noted that “the node with largest information flow is indeed most crucial for the (causal) network”. For our empirical examples, the causal contributions are calculated based on the interdependent function while the dependencies carried forward over self loops are not counted as causal contributions. The character of IF shown in Liang’s papers explains why these carry forward dependencies do not significantly affect the estimates.
 We have pointed out the major inaccuracies for estimating causal contributions by nIF under large noise conditions coming from the noise instead of IF. As a result, a potential future improvement in estimates of causal contributions might be achieved by differentiating the selfdependency information flow from other noise and considering more than one parent variable. We have noted this difficulty since “cumulative information flow does not equal to the sum of the information flows to other individual units”, as pointed out by Liang in Measuring the importance of individual units in producing the collective behavior of a complex network. 2021. arXiv:2104.09290.
Regarding your second comment: “the authors suggest using Pearson correlation sign to correct IF causality… seems to ignore the differences between the meaning of the signs of IF and the meanings of the sign of Pearson correlation (PC)”; this misconception may be a result of insufficient clarification in our manuscript, which we can address. We would like to acknowledge the difference between the signs determined by these two methods. Both can be useful but neither of them can replace the other. From the application point of view, in estimating the positive vs negative feedback between two timeseries, the sign of correlation or covariance is more directly relevant. In our empirical assessment, we only try to verify our hypothesis (equation 7) and whether integrating the “magnitude of (normalized) information flow” and the “sign of Pearson correlation” could be useful. We already point out that indirect qualitative application of IF or nIF does not always fully utilize the determined causality, especially if we wish to quantify the varying interdependent contributions between causally related variables. We therefore try to make better use of the determined magnitude of IF and nIF, while agreeing that the application of the sign of IF should be explored further.
Citation: https://doi.org/10.5194/gmd2021196AC1 
CC2: 'Reply on AC1', Daniel Hagan, 11 Aug 2021
I am grateful to the authors for addressing the points I raised. I have a couple of follow up points:
A particular recommendation I would like to make here is that based on your comment, I think the sign of correlation is not a correction for the IF formalism, but rather a proposed alternative form of Liang's IF, and as such should not be written as a correction in the manuscript. A correction would imply that the original form is wrong. Probably, using "adjusted" might be more appropriate. Additionally, this clarity will allow anyone who might make use of this alternative form to carefully consider under which conditions this would be preferred.
Secondly, as the authors rightly pointed out, correlation is used to determine 'feedback'. IF does NOT measure feedback (I am hoping here that my definition for what a feedback is and the authors' might not be different). Here, I refer to Seneviratne et al 2010 (EarthScience Review), which I think gets it right. A feeback describes a twoway coupling, where a coupling refers to the degree to which one variable controls another one  this is what I think IF does. So rightly, correlation does describe a feedback, however, IF does not. Here, I refer to an example in
Lines 57 on page 4 of the manuscript (hopefully, I am reading the most current version): "Since the original positive/negative sign of IF refers to the increasing/decreasing trend of uncertainty (or decreasing/increasing trend of predictability) (Liang, 2014, 2018), to determine the direction of positive vs negative feedback, we apply a “signcorrected” nIFc and IFc taking the sign given by the Pearson correlation coefficient between variables X and Y (i.e. RXY). Magnitudes indicate the strength of causality."
If I am not wrong, I would like to ask the authors to please revise the use of the word feedback. We ought to be careful when we use causality and feedback interchangeably. In Liang's 2016 paper, he did point out how the IF concept is grounded on the principle of nil causality, which would not allow for the two words to be used interchangeably (it seems).
I think this also raises another question of whether the IFc describes a feedback or a coupling.
Cheers!
Citation: https://doi.org/10.5194/gmd2021196CC2

AC1: 'Reply on CC1', Simon Redfern, 27 Jul 2021

RC1: 'Comment on gmd2021196', Anonymous Referee #1, 17 Aug 2021
The manuscript 'Quantifying Causal Contributions in Earth Systems by Normalized Information Flow' by Cheng and Redfern addresses an important scientific question: how do we best measure causal relationships in coupled systems. While a good part of the manuscript is wellwritten and while I generally find the topic exciting and worth publicising, I found it impossible to assess for myself if the presented method is indeed the advance it is claimed to be. This might be a result of me being unfamiliar with the cited Liang papers. In any case, I therefore stopped with a detailed review of the paper after section 3.1, because I felt I was unable to follow the detailed arguments and maths purely based on the material presented, and I unfortunately lack the time to read up on all of the cited literature myself. I thus conclude that a major revision to the introduction of the method is essential (at least) before I could recommend publication. I also feel that a clearer and more intuitive discussion of the results and examples would be a massive step forward in terms of increasing the potential impact of this paper. Below, I list a few detailed comments that I hope the authors will find helpful.
Major points:
 I indeed think that the comparison to other work could be extended (see also other comments in the public discussion). Next to the references included, this should also involve the following citation:
Runge et al. Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances 5, eaau4996 (2019). https://advances.sciencemag.org/content/5/11/eaau4996.abstract
 page 3: the motivation of the IF and nIF formulas (2) to (4) needs to be clarified. What is the intuition behind them? Given how central it is to the paper, a reference to Liang 2014 is not sufficient. In this context, a comparison to other methods might be particularly insightful. A discussion, which may well be focused on the ‘big picture’, would certainly also be useful for the general reader who may be less accustomed to think about concepts such as ‘transfer entropy’ and ‘causality’. In particular, a better explanation of why the method is different from a simple linear correlation (which can be causal or not) is required. Overall, I suggest a significant extension to section 2.1 (a page or more, stepping through the equations; describing them intuitively, maybe even using examples for each variable).
Further corrections:
 page 1, L.14/15: do you maybe mean: …, ‘especially concerning a network of teleconnections’?
 page 2: the first two sentences are duplicates. Please revise.
 page 2 l. 23: would downtune to: ‘yet established’
 page 2 l.28 you mean: ‘good quantitative measures of causal strength’? How would you define good here? Would be careful/thoughtful about the wording.
 page 3. L. 18/19: I don’t understand the reasoning for using m abs(nIF) here. Could you clarify?
 do you think only a linear and second order comparison is sufficient to establish a benchmark? How about other, e.g. nonparametric forms, which can capture more complex nonlinear relationships?
 Can you clarify why a calibration factor might be needed in reality? What does that tell us about the correctness/suitability of the approach?
 page 4, l. 2/3: isn’t the second term an important selffeedback aspect? I think this is quite a broad approximation that would require a problemspecific justification.
 page 4, l. 9: at this point it is still unclear to me why the magnitude would imply a strength of causality, or wrt l.5 why IF represents a ‘trend of uncertainty’. This has to be introduced more carefully and intuitively. I am sure that should be possible and is key to improve the accessibility of this paper.
 l. 31: in such a case
 l. 32: and how about the corresponding feedback of methane ON temperature?
 page 5: the entire discussion of the cases in Figure 1 and the test case should be extended and could be written much more clearly. I really have difficulties to follow. I suggest you motivate each problem and give real examples (even if you don’t calculate them – maybe point to citations). In Figure 1 are X and Y multidimensional in most cases I assume? If yes, this should be visualized as well compared to case (a).
 Figures 2, 3 and 4 are hard to dechiffre in terms of size of the labels.
Citation: https://doi.org/10.5194/gmd2021196RC1 
AC3: 'Reply on RC1', Simon Redfern, 20 Sep 2021
Responses to referee 1:
The manuscript 'Quantifying Causal Contributions in Earth Systems by Normalized Information Flow' by Cheng and Redfern addresses an important scientific question: how do we best measure causal relationships in coupled systems. While a good part of the manuscript is wellwritten and while I generally find the topic exciting and worth publicising, I found it impossible to assess for myself if the presented method is indeed the advance it is claimed to be. This might be a result of me being unfamiliar with the cited Liang papers. In any case, I therefore stopped with a detailed review of the paper after section 3.1, because I felt I was unable to follow the detailed arguments and maths purely based on the material presented, and I unfortunately lack the time to read up on all of the cited literature myself. I thus conclude that a major revision to the introduction of the method is essential (at least) before I could recommend publication. I also feel that a clearer and more intuitive discussion of the results and examples would be a massive step forward in terms of increasing the potential impact of this paper. Below, I list a few detailed comments that I hope the authors will find helpful.
We will revise the manuscript to update new references and improve its readability. In particular, we will incorporate the examples shown above which clarify the concerns raised here.
Major points:
 I indeed think that the comparison to other work could be extended (see also other comments in the public discussion). Next to the references included, this should also involve the following citation: Runge et al. Detecting and quantifying causal associations in large nonlinear time series datasets. Science Advances 5, eaau4996 (2019). https://advances.sciencemag.org/content/5/11/eaau4996.abstract
We will revise our introduction and discussion with further updated references.
 page 3: the motivation of the IF and nIF formulas (2) to (4) needs to be clarified. What is the intuition behind them? Given how central it is to the paper, a reference to Liang 2014 is not sufficient. In this context, a comparison to other methods might be particularly insightful. A discussion, which may well be focused on the ‘big picture’, would certainly also be useful for the general reader who may be less accustomed to think about concepts such as ‘transfer entropy’ and ‘causality’. In particular, a better explanation of why the method is different from a simple linear correlation (which can be causal or not) is required. Overall, I suggest a significant extension to section 2.1 (a page or more, stepping through the equations; describing them intuitively, maybe even using examples for each variable).
Our revision discusses the limitations of regression method and the most commonly used causal analysis (Granger causality). We hope the new key finding mentioned in the beginning of these responses help address the concerns. Nevertheless, detailed comparison with all other causal analysis is beyond the scope of this work.
Further corrections:
 page 1, L.14/15: do you maybe mean: …, ‘especially concerning a network of teleconnections’?
Yes
 page 2: the first two sentences are duplicates. Please revise.
Noted. Thanks.
 page 2 l. 23: would downtune to: ‘yet established’
Agree. Thank you.
 page 2 l.28 you mean: ‘good quantitative measures of causal strength’? How would you define good here? Would be careful/thoughtful about the wording.
Noted. We will rewrite the sentence. Nevertheless, since this manuscript focuses more on the practicality of methods based on mockup data verification, rather than derivation of the fundamental theory behind our hypothesis, it may be difficult to avoid all subjective adjectives in some places.
 page 3. L. 18/19: I don’t understand the reasoning for using m abs(nIF) here. Could you clarify?
Initially, we suspected that nIF plays similar role to R^{2}, and we hence replaced R^{2} in mR^{2} by nIF to tested the outcome. This is, however, proved rather unnecessary since it does not resolve the problem of m from regression method with low R^{2}. We intend to remove discussion of mnIF as a quantifier in the revised manuscript to be replaced by the md.nIF, as discussed at the beginning of these responses.
 do you think only a linear and second order comparison is sufficient to establish a benchmark? How about other, e.g. nonparametric forms, which can capture more complex nonlinear relationships?
We are not trying to establish a benchmark that nIF (or md.nIF) would be the best approach for estimating causal contributions for Earth system. There are many useful modelling approaches, including artificial intelligence etc. Nevertheless, we have established and empirically verified a simple and useful equation with hypothesis that the md.nIF is proportional to the causal sensitivity. To address this concern, we suggest that we may rephrase our title to “Assessing the Practicality of (modified) Normalized Information Flow for Quantifying Causal Contributions in Earth Systems”.
 Can you clarify why a calibration factor might be needed in reality? What does that tell us about the correctness/suitability of the approach?
As highlighted in comment AC2, the calibration factor is approximately equal to the maximal m when R^{2} 1 for md.nIF, or 2 x maximal m when R^{2} 1 for nIF. Our simple example illustrated in Figure R1 (of AC2) also illustrates that the strength of causality (the proportional relationship) is better reflected by IF, nIF, and md.nIF as compared to m in regressions.
 page 4, l. 2/3: isn’t the second term an important selffeedback aspect? I think this is quite a broad approximation that would require a problemspecific justification.
Indeed, there is uncertainty associated with a series of variable arising from a cause variable, including selfdependency, and impacts from other cause variables amongst other factors. A feedback loop is complete only when causal contributions from both directions are present. Therefore, we now focus on IF(XY) and IF(YX) and their (modified) normalized forms. As discussed at AC2, there remain issues with the two terms in the normalized information and that will require further work beyond the scope of the current manuscript.
 page 4, l. 9: at this point it is still unclear to me why the magnitude would imply a strength of causality, or wrt l.5 why IF represents a ‘trend of uncertainty’. This has to be introduced more carefully and intuitively. I am sure that should be possible and is key to improve the accessibility of this paper.
At line 5 we discuss the original sign of IF. In brief, we consider that the “flow of uncertainty (i.e. magnitude of IF)” represents “causality”. When IF is positive, our interpretations is that the cause variable is providing more “uncertainty” to the effect variable, or destabilizing the effect variable. Negative IF implies reduced uncertainty received by the effect variable, or stabilization of effect variable. Note that this explanation may not be the final word in the significance of the sign of IF, especially when the inventor Liang now tends to promote the use of its absolute magnitude. We will clarify this point in the revision, but shall not visit it in depth since the sign of IF is replaced by the sign of correlation which better describes whether the two variables are amplifying or attenuating each other.
 31: in such a case
Noted.
 32: and how about the corresponding feedback of methane ON temperature?
In the revised manuscript, we will cite Liang’s more recent paper which highlights that indirect causal influence is usually with much lower IF than those with direct influence. This is why it can still be applied in cases with bidirectional feedback. In our first version of manuscript, all mockup data examples contain bidirectional causal contributions, and the method still reasonably reflects the direct causal contributions. From physical principles, we already know that additional methane warms the climate, but that is not the focus of this paper and does not lessen the value of applying our method to understand how climate affects atmospheric methane concentrations.
 page 5: the entire discussion of the cases in Figure 1 and the test case should be extended and could be written much more clearly. I really have difficulties to follow. I suggest you motivate each problem and give real examples (even if you don’t calculate them – maybe point to citations). In Figure 1 are X and Y multidimensional in most cases I assume? If yes, this should be visualized as well compared to case (a).
Figure 1 illustrates the complexity of realworld problems with multiple causes and teleconnections. Examples given include the mockup data and the methaneclimate feedbacks. We will explore how to improve the clarity.
 Figures 2, 3 and 4 are hard to dechiffre in terms of size of the labels.
Noted. We will explore how to improve the Figure readability.
Citation: https://doi.org/10.5194/gmd2021196RC1
Citation: https://doi.org/10.5194/gmd2021196AC3

RC2: 'Comment on gmd2021196', Anonymous Referee #2, 02 Sep 2021
The paper uses the methodology developed of Liang over the last years for the information flow in coupled systems. It advocates the use of normalized information flow, or nIF, to (i) characterize causal relationships of the coupled variables and (ii) to assess the quality of model simulations from a causality perspective.
The paper is timely in the sense that causal methods are increasingly popular, with very different approaches nicely summarized e.g. by the Runge et al. (2019) paper cited in this manuscript where, as far as the reviewer remembers, the information flow of Liang is missing. So this could be a welcome contribution to causal approaches in the geosciences.
Unfortunately, the paper is excessively difficult to read. It starts with the equation (2)  which is eq. (5) of Liang (2021) in Entropy, which should be referenced to, but this has been mentioned already in the comments  which to the uninitiated comes out of nowhere, and is not motivated but only described. Why should this particular algebraic combination of covariances be called "information flow"? In the same spirit, how should a reader easily grasp the meaning of the normalization factor Z from eq. (4)? How do you separate the changes in marginal entropy into the selfdependent and the noise term for observed (as opposed to generated) time series; in other words, how do you calculate Z and nIF in this situation? There is no hint given in ch. 2.1.
Next, extensive use is made of an artificial example (called mockup data), but should the reader be interested in this, she has to refer to the Supplementary Material. Here, she finds a system of two coupled firstorder equations for the variables X1 and Y1  written in a manner more complicated than necessary, since the overall term dY1/dt is not taken out but repeated three times, and with seemingly arbitrary numerical constants (1.1, 1.5, 300, 1.8, 0.000005 and so on)  which is not motivated by any means. The reviewer also wonders in which sense the noise terms on the right side of the Table deserve that name  being a sum of a constant, a deterministic trigonometric function of time, and the function itself? How could that be noisy, where is a stochastic process involved?
In the main text, there is talk about X and Y (not X1 and Y1) and in Figure 2, we suddenly have d(partial) X1, dX2 and dX3  what are these? In the heading of the Figure, it says that dX2=dX1 and dX3=dX1, so it is only one quantity after all? The graphs show three curves, so they are again different? The reader should look for red boxes according to the legend  there are no red boxes in Fig. 2. And what do we see on the y axis, actually? The variables themselves? Their partial derivatives? The information flow? What does the axis legend "by mR^2" even mean?
At the beginning of ch. 3.1, there is talk of "where causality becomes more important"  as opposed to what? And how do you know that, given observations and measurements (only)?
The reviewer didn't get the concept of the "1:2:3 ratio" either nor how this could convert (apparently at high noise level) to a "1:2:3 ratio"  isn't that the same since pairwise signs would cancel out?
For the comparison of observations and model runs, here for CH4 growth rates, the reviewer has a hard time to discern the upper panels of Fig. 6 and 7. They look exactly identical, apart from the fact that the time axis for Fig. 7 is shorter (up to 2012). CESM2 can't be that "perfect"? Also, the observations/simulations give a rather blurred image along the latitudes and the time axis, whereas the estimates have a finer resolution. How is that possible, and how do the authors come to the conclusion that nIF is doing best, and that CESM2 fails to reproduce the spatial pattern?
The paper raises a lot of questions. It requires a substantial revision (major) before having the chance to come close to be readable. The potential of the method is present, but it has to be motivated much more explicitly and the examples have to be explained, shown in the main paper (a time series graph of the very target variables X and Y would be handy) and then a clearer demonstration and indication why nIF and its variants are superior to a regression approach, or in other words, why correlation and causality are different concepts, to the extent that you can have two causally connected variables with a Pearson correlaton coefficient of zero.
A promising application of the framework seems to be to determine the effective lag between cause and effect by lagging one of the two such that the nIF value (or one of its variants) is maximized, and compare this to a conventional cross correlation analysis. This is merely a suggestion since these lags are to be expected, in particular in the context of teleconnected variables.
Citation: https://doi.org/10.5194/gmd2021196RC2 
AC4: 'Reply on RC2', Simon Redfern, 20 Sep 2021
Responses to referee 2:
The paper uses the methodology developed of Liang over the last years for the information flow in coupled systems. It advocates the use of normalized information flow, or nIF, to (i) characterize causal relationships of the coupled variables and (ii) to assess the quality of model simulations from a causality perspective.
The paper is timely in the sense that causal methods are increasingly popular, with very different approaches nicely summarized e.g. by the Runge et al. (2019) paper cited in this manuscript where, as far as the reviewer remembers, the information flow of Liang is missing. So this could be a welcome contribution to causal approaches in the geosciences.
Unfortunately, the paper is excessively difficult to read. It starts with the equation (2)  which is eq. (5) of Liang (2021) in Entropy, which should be referenced to, but this has been mentioned already in the comments  which to the uninitiated comes out of nowhere, and is not motivated but only described. Why should this particular algebraic combination of covariances be called "information flow"? In the same spirit, how should a reader easily grasp the meaning of the normalization factor Z from eq. (4)? How do you separate the changes in marginal entropy into the selfdependent and the noise term for observed (as opposed to generated) time series; in other words, how do you calculate Z and nIF in this situation? There is no hint given in ch. 2.1.
The key to this paper is exploration of the hypothesis that a proportional relationship exists between causal sensitivity between two coupled variables and the IF, nIF or md.nIF, (i.e. equation 7), and thus its useful but conditional applications. We feel that further detailed theoretical discussion of Liang’s earlier papers will not help clarify this, especially since the normalizing method is still being researched and has scope for potential future development. However, we are very grateful for the comments on different ways of thinking of Z which has driven us to attempt to answer these questions by developing the use of md.nIF as outlined above.
Next, extensive use is made of an artificial example (called mockup data), but should the reader be interested in this, she has to refer to the Supplementary Material. Here, she finds a system of two coupled firstorder equations for the variables X1 and Y1  written in a manner more complicated than necessary, since the overall term dY1/dt is not taken out but repeated three times, and with seemingly arbitrary numerical constants (1.1, 1.5, 300, 1.8, 0.000005 and so on)  which is not motivated by any means. The reviewer also wonders in which sense the noise terms on the right side of the Table deserve that name  being a sum of a constant, a deterministic trigonometric function of time, and the function itself? How could that be noisy, where is a stochastic process involved?
The Table is placed in the Supplementary because it does not provide much information beyond which equations were used. The Figures in the main text present the results that explain the situations where the proposed proportional relationship works better than regressions. We will improve the clarity of these Figures and the Tables. Regarding the “arbitrary numerical constants”, these refer to the calibration factor alpha in equation 1 which can be removed from the legends and labels in the revision.
In the main text, there is talk about X and Y (not X1 and Y1) and in Figure 2, we suddenly have d(partial) X1, dX2 and dX3  what are these? In the heading of the Figure, it says that dX2=dX1 and dX3=dX1, so it is only one quantity after all? The graphs show three curves, so they are again different? The reader should look for red boxes according to the legend  there are no red boxes in Fig. 2. And what do we see on the y axis, actually? The variables themselves? Their partial derivatives? The information flow? What does the axis legend "by mR^2" even mean?
We thank the referee for pointing out this error in the original submission. The legend should be “𝜕X2_{Y1}/𝜕t = 2𝜕X1_{Y1}/𝜕t”, “𝜕X3_{Y1}/𝜕t = 3𝜕X1_{Y1}/𝜕t”. We missed out the 2 and 3 in front of 𝜕X1_{Y1}. The Figure captions will be corrected to eliminate this omission. The red boxes highlight trends with negative values of the ordinate, which represent the designed and estimated causal contributions (i.e. the partial derivatives) using the variety of methods explored. Thus, when we state “by mR^{2}” we refer to estimates based on mR^{2} as the multiplier in equation 1.
At the beginning of ch. 3.1, there is talk of "where causality becomes more important"  as opposed to what? And how do you know that, given observations and measurements (only)?
This word “important” refers to relative conditions when such causality is not captured or misinterpreted by regressions. We will rephrase this sentence.
The reviewer didn’t get the concept of the “1:2:3 ratio” either nor how this could convert (apparently at high noise level) to a “1:2:3 ratio” – isn’t that the same since pairwise signs would cancel out?
Noise operates differently from regressions since both the sign and magnitude of the gradient is influenced by the noise. Hence, when the sign is incorrect, the cancelling effect for the 1:2:3 ratio may be result in a set of ratios in the order 3:2:1. This is seen in subFigs a,e,i of the new Figure R1 described above. However, this does not greatly affect the magnitude of IF, nIF, and md.nIF, especially the IF. Hence, 1:2:3 may turn into 1:2:3 when the noise is too large, resulting in the correlation sign misrepresenting the direction.
For the comparison of observations and model runs, here for CH4 growth rates, the reviewer has a hard time to discern the upper panels of Fig. 6 and 7. They look exactly identical, apart from the fact that the time axis for Fig. 7 is shorter (up to 2012). CESM2 can’t be that “perfect”? Also, the observations/simulations give a rather blurred image along the latitudes and the time axis, whereas the estimates have a finer resolution. How is that possible, and how do the authors come to the conclusion that nIF is doing best, and that CESM2 fails to reproduce the spatial pattern?
Yes, the first rows are the same, because we were testing the observed and modelled temperatures and precipitations against the observed methane concentrations. A failure to capture the causal trend based on either modelled C_{CH4} or observed C_{CH4} implies improper underlying model processes. In fact, a typical research method for understanding the methaneclimate feedback processes is to tune various parameters in processbased models to explore how to best fit the observed concentration and isotoperatio trends. It is hence justifiable to show the inadequacy of CESM2 based on modelled climatic variables and observed C_{CH4}.
The relatively ‘blurred’ image from the observation is due to smoothing in reconstructing the observed trends. The apparently sharper resolution of estimated contribution trends is not a result of image resolution, but has two possible origins: i) the estimated contributions better reflect the location of the causes of netemissions, while the observations are affected by atmospheric mixing; ii) the nIF method tends to accentuate sharp fluctuations due to the difficulty to differentiate whether nIF is approaching 0 or 1.
We will also revise the examples to show estimates based on zonal C_{CH4} instead of (or in addition to) estimates based on global C_{CH4}. Also, we will include estimates based on md.nIF.
The paper raises a lot of questions. It requires a substantial revision (major) before having the chance to come close to be readable. The potential of the method is present, but it has to be motivated much more explicitly and the examples have to be explained, shown in the main paper (a time series graph of the very target variables X and Y would be handy) and then a clearer demonstration and indication why nIF and its variants are superior to a regression approach, or in other words, why correlation and causality are different concepts, to the extent that you can have two causally connected variables with a Pearson correlaton coefficient of zero.
We are grateful to the reviewer for their comments and will indeed carry out major revision to improve readability and clarity.
A promising application of the framework seems to be to determine the effective lag between cause and effect by lagging one of the two such that the nIF value (or one of its variants) is maximized, and compare this to a conventional cross correlation analysis. This is merely a suggestion since these lags are to be expected, in particular in the context of teleconnected variables.
This will require further analysis. For nIF, we wonder if this leadlag effect is more accurately captured when the nIF is around 0.5, which is often the case when there are significant causal contributions. However, some sharp spikes due to the dH_{Y}^{*}/dt term may mislead the leadlag information. We will explore if this problem can be mitigated by md.nIF.
Citation: https://doi.org/10.5194/gmd2021196RC2
Citation: https://doi.org/10.5194/gmd2021196AC4

AC4: 'Reply on RC2', Simon Redfern, 20 Sep 2021

AC2: 'Comment on gmd2021196', Simon Redfern, 20 Sep 2021
Responses to the Referees’ Comments to Quantifying Causal Contributions in Earth Systems by Normalized Information Flow
We appreciate all the comments from referees. Before we proceed to detailed responses, we would like to highlight that our manuscript focuses more on the practicality of the method and hence lacks detailed theoretical background. Nevertheless, we would like to briefly share some further points that may help clarify some concerns and explain the results we have obtained:
When R^{2} in regression approaches 1 with very strong causal strength, nIF tends to approach 0.5 instead of 1. This differs from our earlier understanding. By carefully looking into the contributions between the three terms of Z (equation 4): , it can be seen that IF_{X}_{}_{Y} and dH_{Y}^{*}/dt tend to become similar to each other and the dH_{Y}^{noise}/dt term approaches zero, resulting the nIF approaching 0.5. Furthermore, small change in mockup data value may often lead to contrasting values of nIF at ~0 and ~1. This partly explains some of the sharp fluctuations of estimates by nIF in our results. Nevertheless, an approximately proportional relationship between the causal sensitivity and nIF still holds. This is because most significant causal contributions occur when nIF is around 0.5. Comparing these outcomes to regression, in reression the range of m/maximalm lies from 0 to 1while the corresponding rang is 0 to 0.5 for nIF. The calibration factor for nIF, alpha, should hence be approximately equivalent to the 2 x maximalm when R^{2} 1. We have also explored the results by removing the dH_{Y}^{*}/dt term from Z. Not surprisingly, this modified nIF (abbreviated as md.nIF) now approaches 1 when there is strong causal influence without any other causal driver (See Fig below). This also slightly improves the accuracy of estimated causal contributions, as compared to the original nIF. To summarize the pros and cons of various methods: both IF and nIF are reflective of (and approximately proportional to) causal sensitivity (equation 7) to certain extent. However, IF is not normalized and problems surface when the ranges of variability differ; nIF helps (imperfectly) to normalize the causal signal, but the issues with nonspecified causal terms dH_{Y}^{*}/dt and dH_{Y}^{noise}/dt do lead to other effects. On the other hand, regressions reflect causality poorly (a high R^{2} may still occur in a situation of nilcausality), and a rather low R^{2} also weakens the estimates of causailty due to several effects, including the presence of extreme values, the occurrence of alternating sign within the analyzed period, and significant leadlag times between cause and effects, alongside the problems associated with excluding strong noise contribution. Figure R1 (below) shows a case with a strong singledirectional causal relationship (Y influences X but not the other way round). In the revised manuscript, we highlight the practicability of the (modified) normalized information flow for quantifying causal contributions, but we will also call for further study to improve methods to normalize the information flow.