the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Empirical Assessment of Normalized Information Flow for Quantifying Causal Contributions
Abstract. To understand the plethora of important processes that are characterized by their complexity, such as global climate change, it is important to quantify causal contributions between time series variables. Here, we examine the hypothesis that the normalized causal sensitivity (nCS) can be measured by the (modified) normalized information flow, nIF (or mdnIF). The instantaneous causal sensitivity is defined by absolute causal contributions to the effect variable over the change in cause variable. The nCS needs to be comparable among i) causes, ii) at different times and iii) from various locations. Therefore, if our hypothesis holds, the nIF must also fulfil these three requirements. We verify, empirically, that the causal contributions between variables can be reasonably estimated by the product of a constant “maximal causal sensitivity” and a modified nIF. Between opposite causal directions, causal sensitivity can be further normalized by the larger “maximal causal sensitivity”. Our method is useful when there are: i) strong but hard-to-quantify noise contributions to the effect variable, ii) significant causal time-lags with a need to estimate the lag, iii) many causes from various locations to an overall mean effect with a need to differentiate their causal contributions, or iv) causal contributions at higher order.
- Preprint
(3671 KB) - Metadata XML
-
Supplement
(18481 KB) - BibTeX
- EndNote
Status: closed
-
RC1: 'Comment on gmd-2022-106', Anonymous Referee #1, 21 May 2022
The manuscript ‘Empirical assessment of normalized information flow for quantifying causal contributions’ by Cheng and Redfern considers how causal sensitivity could be measured through information flow in the context of climate science. The main potential contribution lies in the empirical definition of measuring causal influences between variables (i.e. causal sensitivity) as a product of a constant maximal causal sensitivity and a modified normalized information flow.
This version of the manuscript is much improved over a previous one that I reviewed, especially at the beginning. However, with the Results section when first test cases are evaluated the clarity is again lost (for me, at least). I also still find the mathematical notation confusing, possibly because I am not familiar with a few papers frequently cited in the manuscript. Variables seem to come and go, become multidimensional and scalar as you wish, and I don’t see how different locations and time lags are fed into the overall picture. A good example is also Figure 1 where a thorough mathematical notation would allow the reader to immediately comprehend the meaning. I suggest introducing a consistent notation in a subsection that differentiates between all these points, which is then used throughout the manuscript and its figures.
Specific comments:
- l. 68: allow for a comparison?
- l. 69: ‘for simplicity’ probably suffices/is simple enough
- l. 79-81: I am slightly confused wrt the notation here. If X and Y are single variables (as implied by the text), how can then there be a maximum that is different from the same term itself? The text appears to imply that this is a statement over different locations and times, however, should this not lead to some sort of vector or matrix notation? L. 83/84 also seem to imply that there are at least multiple X. Bold notation needed?
- l. 110-115: here it is implied that somehow local or non-local does not play a role, so I am starting to wonder here how this relates to the notation above. I also don’t understand at this point the link between the interchangeability over causes at different times and locations/identification of particular causes and how that links to natural methane emissions and global mean temperature. Is the global mean not exactly the opposite of identifying locations where specific processes lead to methane emissions? Maybe only rephrasing is needed? Could you clarify?
- l. 137-140: this raises the question as to why one would not use lagged relationships of X and Y in the set of variables? Is the discrete nature of lags a problem?
- Eqs (11)-(13) my first impression would be that the importance of shared causal influences will be problem-dependent. Could you clarify how each treatment would help/or not/ to generalize the concept, i.e. if problem-dependence in how information flows would affect the validity of the choice for Z?
- l. 185: word ‘noise’ missing here somewhere?
- l. 187: I don’t understand why 21 steps of time lag equal 21% of each time analysed window? Can you explain the idea?
- l. 190: might be worth explaining that teleconnections stand for spatial interactions here? Again, how could the additional spatial dependency be better included in the mathematical notation employed here?
- l. 198-199: similar – suddenly X1, X2, X3 and Y1, Y2, Y3 are introduced, which I assume should indicate a problem with three X-variables and three Y-variables? Why this choice? Where was this introduced?
- Figure 1: again clarity of notation, e.g. in (b) what is the meaning of multiple arrows between X and Y? Representing multiple variables? Time lags? Spatial points? I don’t understand why there are no teleconnections here, but there are in the other subfigures? Somehow this has to do with the crossing arrows, but I doubt that many will understand why this is a way to symbolize teleconnections (or how they are imagined here). For me, the notation throughout the manuscript is confusing and still reduces the clarity too much. There are multiple processes (X and Y) which are related spatially and temporally (with potential lags)? However, how do these differ in the notation, how are they made obvious? Maybe write a subsection where you formally introduce the notation you are using and be consistent afterwards.
- Figure 2 - I am lost here: what do the different colours stand for? How is this a test? I need a clear instruction as to how to read this plot. Why are the results good? Why do they confirm the hypothesis? Why does the second column sometimes look like a flat line? Which lines should be the same? The previous section already became less clear, but latest here I have literally no idea what is going on anymore. The reader has to work really hard to keep track. This needs to be improved.
- Figure 3 same.
- This could partly be helped, of course, by more clearly explaining what is going on in the Results sections here.
- Figure 5 same.
Citation: https://doi.org/10.5194/gmd-2022-106-RC1 - AC2: 'Reply on RC1', Simon Redfern, 13 Jul 2022
-
RC2: 'Comment on gmd-2022-106', Anonymous Referee #2, 06 Jun 2022
The paper is sort of a continuation of the methodological work by Liang about information flow. Is starts with the concept of causal sensitivity, based on partial derivatives of an effect variable Y as dependent on a causal variable X, time, and other variables, and also potentially containing noise.
Full comprehension of the framework is not warranted without referring to Liang's papers, and the reader is left alone why the ratio of a partial derivative of Y to the total time derivative of X should bear the word "causal". Also, notation-wise, the total derivative of Y with respect to time would be a sum of several partial derivatives of Y wrt all variables (X1, X2, ...) times the total derivative of them:
dY/dt = partial Y / partial X1 * dX1/dt + partial Y / partial X2 + ... + partial Y / partial t
so it one wants to single out the part which is due to X1 (say), eq. (1) in line 81 would simply be
nCS = | partial Y / partial X1 | - or otherwise, the reviewer doesn't understand the notation \partial Y(X)) / partial t
but the main problem is why this would be called "causal" sensitivity?
Also, in applications in the Earth Sciences where time series of observations are available, how do you calculate the partial derivatives from them if you are ignorant of the underlying processes?
The connection to IF is presented only indirectly - by stating the hypothesis that nCS is approximately equal to |nIF|. If there is no alternative way to calculate nIF, how would you be able to test this hypothesis? Where is the independent definition of nIF ?
Later, one special case is considered - linear models (very unlikely to work for complex systems) where IF has a representation through covariance values. In addition, it seems that the authors believe in a decomposition of the normalization factor into a simple sum of IF, a noise component and a "self-dependence" term. Why should it be that simple? And how would you be able to discern the three terms, given only the time series of X and Y? The surprising answer is in eq. (8) to (10) which show that the self-dependence of Y is dependent on X, and the noise term (contribution of other variables) is also dependent on X. How is that possible? What are the assumptions about the phase space structure, stationarity etc. which go into that?
The reviewer also notes that md3 = md2 whenever the sign in the absolute bracket of eq. (13) is positive, and =2 md1 - md2 in the opposite case, so in which sense is md3 anything new once you have md1 and md2?
The "empirical tests" chapter lists no less than 8 artificially generated time series ("designed mock-up data sets") without, however, providing any details. The curious reader might want to reproduce the numerical results shown in the Figures, but there is no clue how to do that. What exactly did you choose as (say) "1D example with fluctuating self-dependency noise-contribution and a sinlge causal direction"? One has to refer to the supplement (not referenced to in lines 181-190 of the manuscript) to find answers to these questions - however, also this is difficult since mathematical notation is wrong (example: what does the sum "_2 ^nt 1/nt" mean? The summation index (n) can't be the upper limit of the sum itself, and if the user has to choose nt first, i.e. nt is a constant for the sum, the latter is juts (nt -1)/nt, which hardly makes sense? Fundamentally, if you have the partial derivative of X1(Y1) explicitly given as a time-varying function, you simply can't require that the partial derivative of Y1(X1) would be exactly zero, contradicting the inverse function theorem. What is going on here?
In the previous chapter, dependence on other variables was considered as "noise", and there was the self-dependence term. But now, in l. 177, self-dependent terms are suddenly also noise, adding to the confusion.
The reviewer was fully lost when there was talk about the "1:2:3" ratio for 1D examples where X1, X2, X3, Y1, Y2 and Y3 are occurring. How is that a "1D" example?
The figures in the results section are not illuminating to the reviewer. It seems like one has to recognize a 21-units time lag from Fig. 3; however, even if one happens to know which panels have to be compared here, there are 1000 time steps shown, so the 21-units lag would only make a difference of 2.1%, you would need a magnifier to see anything here (the text in l. 187 talks about a 21% effect, which seem to indicate that there were somehow windows of length 100 analyzed each time, but this is mentioned nowhere else and is not visible in the Figures).
Eq. (20) seems to be a differential equation for Xadj, but not even the units fit here (unless both X and t are dimensionless, which wouldn't be case in any applications). Admittedly, the reviewer didn't even get the "25-75% split" mentioned in l. 249.
The only way to come thtough the material presented is by going through the (uncommented!) Matlab scripts provided as fileshare by the authors. Do you really expext this from a reviewer, let alone a "normal" reader?
As the reviewer doesn't see an easy fix to render the paper comprehensible., there are no detailed comments to the text (there would be many!). Still, there could be some interesting ideas, not the least since causal inference approaches (like Granger causality, CCM, PCMCI etc.) are quite fashionable in recent years in the Earth Sciences, and the concept of "higher order dependency" might be interesting. However, in its current form, the concepts are not communicated in a way that would render the paper acceptable for publication.
Citation: https://doi.org/10.5194/gmd-2022-106-RC2 - AC3: 'Reply on RC2', Simon Redfern, 13 Jul 2022
- AC1: 'Comment on gmd-2022-106', Simon Redfern, 13 Jul 2022
Status: closed
-
RC1: 'Comment on gmd-2022-106', Anonymous Referee #1, 21 May 2022
The manuscript ‘Empirical assessment of normalized information flow for quantifying causal contributions’ by Cheng and Redfern considers how causal sensitivity could be measured through information flow in the context of climate science. The main potential contribution lies in the empirical definition of measuring causal influences between variables (i.e. causal sensitivity) as a product of a constant maximal causal sensitivity and a modified normalized information flow.
This version of the manuscript is much improved over a previous one that I reviewed, especially at the beginning. However, with the Results section when first test cases are evaluated the clarity is again lost (for me, at least). I also still find the mathematical notation confusing, possibly because I am not familiar with a few papers frequently cited in the manuscript. Variables seem to come and go, become multidimensional and scalar as you wish, and I don’t see how different locations and time lags are fed into the overall picture. A good example is also Figure 1 where a thorough mathematical notation would allow the reader to immediately comprehend the meaning. I suggest introducing a consistent notation in a subsection that differentiates between all these points, which is then used throughout the manuscript and its figures.
Specific comments:
- l. 68: allow for a comparison?
- l. 69: ‘for simplicity’ probably suffices/is simple enough
- l. 79-81: I am slightly confused wrt the notation here. If X and Y are single variables (as implied by the text), how can then there be a maximum that is different from the same term itself? The text appears to imply that this is a statement over different locations and times, however, should this not lead to some sort of vector or matrix notation? L. 83/84 also seem to imply that there are at least multiple X. Bold notation needed?
- l. 110-115: here it is implied that somehow local or non-local does not play a role, so I am starting to wonder here how this relates to the notation above. I also don’t understand at this point the link between the interchangeability over causes at different times and locations/identification of particular causes and how that links to natural methane emissions and global mean temperature. Is the global mean not exactly the opposite of identifying locations where specific processes lead to methane emissions? Maybe only rephrasing is needed? Could you clarify?
- l. 137-140: this raises the question as to why one would not use lagged relationships of X and Y in the set of variables? Is the discrete nature of lags a problem?
- Eqs (11)-(13) my first impression would be that the importance of shared causal influences will be problem-dependent. Could you clarify how each treatment would help/or not/ to generalize the concept, i.e. if problem-dependence in how information flows would affect the validity of the choice for Z?
- l. 185: word ‘noise’ missing here somewhere?
- l. 187: I don’t understand why 21 steps of time lag equal 21% of each time analysed window? Can you explain the idea?
- l. 190: might be worth explaining that teleconnections stand for spatial interactions here? Again, how could the additional spatial dependency be better included in the mathematical notation employed here?
- l. 198-199: similar – suddenly X1, X2, X3 and Y1, Y2, Y3 are introduced, which I assume should indicate a problem with three X-variables and three Y-variables? Why this choice? Where was this introduced?
- Figure 1: again clarity of notation, e.g. in (b) what is the meaning of multiple arrows between X and Y? Representing multiple variables? Time lags? Spatial points? I don’t understand why there are no teleconnections here, but there are in the other subfigures? Somehow this has to do with the crossing arrows, but I doubt that many will understand why this is a way to symbolize teleconnections (or how they are imagined here). For me, the notation throughout the manuscript is confusing and still reduces the clarity too much. There are multiple processes (X and Y) which are related spatially and temporally (with potential lags)? However, how do these differ in the notation, how are they made obvious? Maybe write a subsection where you formally introduce the notation you are using and be consistent afterwards.
- Figure 2 - I am lost here: what do the different colours stand for? How is this a test? I need a clear instruction as to how to read this plot. Why are the results good? Why do they confirm the hypothesis? Why does the second column sometimes look like a flat line? Which lines should be the same? The previous section already became less clear, but latest here I have literally no idea what is going on anymore. The reader has to work really hard to keep track. This needs to be improved.
- Figure 3 same.
- This could partly be helped, of course, by more clearly explaining what is going on in the Results sections here.
- Figure 5 same.
Citation: https://doi.org/10.5194/gmd-2022-106-RC1 - AC2: 'Reply on RC1', Simon Redfern, 13 Jul 2022
-
RC2: 'Comment on gmd-2022-106', Anonymous Referee #2, 06 Jun 2022
The paper is sort of a continuation of the methodological work by Liang about information flow. Is starts with the concept of causal sensitivity, based on partial derivatives of an effect variable Y as dependent on a causal variable X, time, and other variables, and also potentially containing noise.
Full comprehension of the framework is not warranted without referring to Liang's papers, and the reader is left alone why the ratio of a partial derivative of Y to the total time derivative of X should bear the word "causal". Also, notation-wise, the total derivative of Y with respect to time would be a sum of several partial derivatives of Y wrt all variables (X1, X2, ...) times the total derivative of them:
dY/dt = partial Y / partial X1 * dX1/dt + partial Y / partial X2 + ... + partial Y / partial t
so it one wants to single out the part which is due to X1 (say), eq. (1) in line 81 would simply be
nCS = | partial Y / partial X1 | - or otherwise, the reviewer doesn't understand the notation \partial Y(X)) / partial t
but the main problem is why this would be called "causal" sensitivity?
Also, in applications in the Earth Sciences where time series of observations are available, how do you calculate the partial derivatives from them if you are ignorant of the underlying processes?
The connection to IF is presented only indirectly - by stating the hypothesis that nCS is approximately equal to |nIF|. If there is no alternative way to calculate nIF, how would you be able to test this hypothesis? Where is the independent definition of nIF ?
Later, one special case is considered - linear models (very unlikely to work for complex systems) where IF has a representation through covariance values. In addition, it seems that the authors believe in a decomposition of the normalization factor into a simple sum of IF, a noise component and a "self-dependence" term. Why should it be that simple? And how would you be able to discern the three terms, given only the time series of X and Y? The surprising answer is in eq. (8) to (10) which show that the self-dependence of Y is dependent on X, and the noise term (contribution of other variables) is also dependent on X. How is that possible? What are the assumptions about the phase space structure, stationarity etc. which go into that?
The reviewer also notes that md3 = md2 whenever the sign in the absolute bracket of eq. (13) is positive, and =2 md1 - md2 in the opposite case, so in which sense is md3 anything new once you have md1 and md2?
The "empirical tests" chapter lists no less than 8 artificially generated time series ("designed mock-up data sets") without, however, providing any details. The curious reader might want to reproduce the numerical results shown in the Figures, but there is no clue how to do that. What exactly did you choose as (say) "1D example with fluctuating self-dependency noise-contribution and a sinlge causal direction"? One has to refer to the supplement (not referenced to in lines 181-190 of the manuscript) to find answers to these questions - however, also this is difficult since mathematical notation is wrong (example: what does the sum "_2 ^nt 1/nt" mean? The summation index (n) can't be the upper limit of the sum itself, and if the user has to choose nt first, i.e. nt is a constant for the sum, the latter is juts (nt -1)/nt, which hardly makes sense? Fundamentally, if you have the partial derivative of X1(Y1) explicitly given as a time-varying function, you simply can't require that the partial derivative of Y1(X1) would be exactly zero, contradicting the inverse function theorem. What is going on here?
In the previous chapter, dependence on other variables was considered as "noise", and there was the self-dependence term. But now, in l. 177, self-dependent terms are suddenly also noise, adding to the confusion.
The reviewer was fully lost when there was talk about the "1:2:3" ratio for 1D examples where X1, X2, X3, Y1, Y2 and Y3 are occurring. How is that a "1D" example?
The figures in the results section are not illuminating to the reviewer. It seems like one has to recognize a 21-units time lag from Fig. 3; however, even if one happens to know which panels have to be compared here, there are 1000 time steps shown, so the 21-units lag would only make a difference of 2.1%, you would need a magnifier to see anything here (the text in l. 187 talks about a 21% effect, which seem to indicate that there were somehow windows of length 100 analyzed each time, but this is mentioned nowhere else and is not visible in the Figures).
Eq. (20) seems to be a differential equation for Xadj, but not even the units fit here (unless both X and t are dimensionless, which wouldn't be case in any applications). Admittedly, the reviewer didn't even get the "25-75% split" mentioned in l. 249.
The only way to come thtough the material presented is by going through the (uncommented!) Matlab scripts provided as fileshare by the authors. Do you really expext this from a reviewer, let alone a "normal" reader?
As the reviewer doesn't see an easy fix to render the paper comprehensible., there are no detailed comments to the text (there would be many!). Still, there could be some interesting ideas, not the least since causal inference approaches (like Granger causality, CCM, PCMCI etc.) are quite fashionable in recent years in the Earth Sciences, and the concept of "higher order dependency" might be interesting. However, in its current form, the concepts are not communicated in a way that would render the paper acceptable for publication.
Citation: https://doi.org/10.5194/gmd-2022-106-RC2 - AC3: 'Reply on RC2', Simon Redfern, 13 Jul 2022
- AC1: 'Comment on gmd-2022-106', Simon Redfern, 13 Jul 2022
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
979 | 214 | 49 | 1,242 | 87 | 53 | 43 |
- HTML: 979
- PDF: 214
- XML: 49
- Total: 1,242
- Supplement: 87
- BibTeX: 53
- EndNote: 43
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1