Empirical Assessment of Normalized Information Flow for Quantifying Causal Contributions

Cheng, Chin-Hsien; Redfern, Simon

doi:https://doi.org/10.5194/gmd-2022-106

Preprints

https://doi.org/10.5194/gmd-2022-106

Preprints

Submitted as: methods for assessment of models

25 Apr 2022

Submitted as: methods for assessment of models |

| 25 Apr 2022

Status: this preprint was under review for the journal GMD but the revision was not accepted.

Empirical Assessment of Normalized Information Flow for Quantifying Causal Contributions

Chin-Hsien Cheng and Simon Redfern

Abstract. To understand the plethora of important processes that are characterized by their complexity, such as global climate change, it is important to quantify causal contributions between time series variables. Here, we examine the hypothesis that the normalized causal sensitivity (nCS) can be measured by the (modified) normalized information flow, nIF (or mdnIF). The instantaneous causal sensitivity is defined by absolute causal contributions to the effect variable over the change in cause variable. The nCS needs to be comparable among i) causes, ii) at different times and iii) from various locations. Therefore, if our hypothesis holds, the nIF must also fulfil these three requirements. We verify, empirically, that the causal contributions between variables can be reasonably estimated by the product of a constant “maximal causal sensitivity” and a modified nIF. Between opposite causal directions, causal sensitivity can be further normalized by the larger “maximal causal sensitivity”. Our method is useful when there are: i) strong but hard-to-quantify noise contributions to the effect variable, ii) significant causal time-lags with a need to estimate the lag, iii) many causes from various locations to an overall mean effect with a need to differentiate their causal contributions, or iv) causal contributions at higher order.

Received: 17 Apr 2022 – Discussion started: 25 Apr 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Preprint (PDF, 3671 KB)

Supplement (18481 KB)

Download & links

Preprint (3671 KB)
Metadata XML
Supplement (18481 KB)
BibTeX
EndNote

Chin-Hsien Cheng and Simon Redfern

Status: closed

RC1:
'Comment on gmd-2022-106', Anonymous Referee #1, 21 May 2022
The manuscript ‘Empirical assessment of normalized information flow for quantifying causal contributions’ by Cheng and Redfern considers how causal sensitivity could be measured through information flow in the context of climate science. The main potential contribution lies in the empirical definition of measuring causal influences between variables (i.e. causal sensitivity) as a product of a constant maximal causal sensitivity and a modified normalized information flow.

This version of the manuscript is much improved over a previous one that I reviewed, especially at the beginning. However, with the Results section when first test cases are evaluated the clarity is again lost (for me, at least). I also still find the mathematical notation confusing, possibly because I am not familiar with a few papers frequently cited in the manuscript. Variables seem to come and go, become multidimensional and scalar as you wish, and I don’t see how different locations and time lags are fed into the overall picture. A good example is also Figure 1 where a thorough mathematical notation would allow the reader to immediately comprehend the meaning. I suggest introducing a consistent notation in a subsection that differentiates between all these points, which is then used throughout the manuscript and its figures.

Specific comments:

l. 68: allow for a comparison?

l. 69: ‘for simplicity’ probably suffices/is simple enough

l. 79-81: I am slightly confused wrt the notation here. If X and Y are single variables (as implied by the text), how can then there be a maximum that is different from the same term itself? The text appears to imply that this is a statement over different locations and times, however, should this not lead to some sort of vector or matrix notation? L. 83/84 also seem to imply that there are at least multiple X. Bold notation needed?

l. 110-115: here it is implied that somehow local or non-local does not play a role, so I am starting to wonder here how this relates to the notation above. I also don’t understand at this point the link between the interchangeability over causes at different times and locations/identification of particular causes and how that links to natural methane emissions and global mean temperature. Is the global mean not exactly the opposite of identifying locations where specific processes lead to methane emissions? Maybe only rephrasing is needed? Could you clarify?

l. 137-140: this raises the question as to why one would not use lagged relationships of X and Y in the set of variables? Is the discrete nature of lags a problem?

Eqs (11)-(13) my first impression would be that the importance of shared causal influences will be problem-dependent. Could you clarify how each treatment would help/or not/ to generalize the concept, i.e. if problem-dependence in how information flows would affect the validity of the choice for Z?

l. 185: word ‘noise’ missing here somewhere?

l. 187: I don’t understand why 21 steps of time lag equal 21% of each time analysed window? Can you explain the idea?

l. 190: might be worth explaining that teleconnections stand for spatial interactions here? Again, how could the additional spatial dependency be better included in the mathematical notation employed here?

l. 198-199: similar – suddenly X1, X2, X3 and Y1, Y2, Y3 are introduced, which I assume should indicate a problem with three X-variables and three Y-variables? Why this choice? Where was this introduced?

Figure 1: again clarity of notation, e.g. in (b) what is the meaning of multiple arrows between X and Y? Representing multiple variables? Time lags? Spatial points? I don’t understand why there are no teleconnections here, but there are in the other subfigures? Somehow this has to do with the crossing arrows, but I doubt that many will understand why this is a way to symbolize teleconnections (or how they are imagined here). For me, the notation throughout the manuscript is confusing and still reduces the clarity too much. There are multiple processes (X and Y) which are related spatially and temporally (with potential lags)? However, how do these differ in the notation, how are they made obvious? Maybe write a subsection where you formally introduce the notation you are using and be consistent afterwards.

Figure 2 - I am lost here: what do the different colours stand for? How is this a test? I need a clear instruction as to how to read this plot. Why are the results good? Why do they confirm the hypothesis? Why does the second column sometimes look like a flat line? Which lines should be the same? The previous section already became less clear, but latest here I have literally no idea what is going on anymore. The reader has to work really hard to keep track. This needs to be improved.

Figure 3 same.

This could partly be helped, of course, by more clearly explaining what is going on in the Results sections here.

Figure 5 same.
Citation: https://doi.org/10.5194/gmd-2022-106-RC1
- AC2: 'Reply on RC1', Simon Redfern, 13 Jul 2022
  
  please see pdf "Reviewer1_response"
  
  Citation: https://doi.org/10.5194/gmd-2022-106-AC2
RC2:
'Comment on gmd-2022-106', Anonymous Referee #2, 06 Jun 2022

The paper is sort of a continuation of the methodological work by Liang about information flow. Is starts with the concept of causal sensitivity, based on partial derivatives of an effect variable Y as dependent on a causal variable X, time, and other variables, and also potentially containing noise.

Full comprehension of the framework is not warranted without referring to Liang's papers, and the reader is left alone why the ratio of a partial derivative of Y to the total time derivative of X should bear the word "causal". Also, notation-wise, the total derivative of Y with respect to time would be a sum of several partial derivatives of Y wrt all variables (X1, X2, ...) times the total derivative of them:

dY/dt = partial Y / partial X1 * dX1/dt + partial Y / partial X2 + ... + partial Y / partial t

so it one wants to single out the part which is due to X1 (say), eq. (1) in line 81 would simply be

nCS = | partial Y / partial X1 | - or otherwise, the reviewer doesn't understand the notation \partial Y(X)) / partial t

but the main problem is why this would be called "causal" sensitivity?

Also, in applications in the Earth Sciences where time series of observations are available, how do you calculate the partial derivatives from them if you are ignorant of the underlying processes?

The connection to IF is presented only indirectly - by stating the hypothesis that nCS is approximately equal to |nIF|. If there is no alternative way to calculate nIF, how would you be able to test this hypothesis? Where is the independent definition of nIF ?

Later, one special case is considered - linear models (very unlikely to work for complex systems) where IF has a representation through covariance values. In addition, it seems that the authors believe in a decomposition of the normalization factor into a simple sum of IF, a noise component and a "self-dependence" term. Why should it be that simple? And how would you be able to discern the three terms, given only the time series of X and Y? The surprising answer is in eq. (8) to (10) which show that the self-dependence of Y is dependent on X, and the noise term (contribution of other variables) is also dependent on X. How is that possible? What are the assumptions about the phase space structure, stationarity etc. which go into that?

The reviewer also notes that md3 = md2 whenever the sign in the absolute bracket of eq. (13) is positive, and =2 md1 - md2 in the opposite case, so in which sense is md3 anything new once you have md1 and md2?

The "empirical tests" chapter lists no less than 8 artificially generated time series ("designed mock-up data sets") without, however, providing any details. The curious reader might want to reproduce the numerical results shown in the Figures, but there is no clue how to do that. What exactly did you choose as (say) "1D example with fluctuating self-dependency noise-contribution and a sinlge causal direction"? One has to refer to the supplement (not referenced to in lines 181-190 of the manuscript) to find answers to these questions - however, also this is difficult since mathematical notation is wrong (example: what does the sum "_2 ^nt 1/nt" mean? The summation index (n) can't be the upper limit of the sum itself, and if the user has to choose nt first, i.e. nt is a constant for the sum, the latter is juts (nt -1)/nt, which hardly makes sense? Fundamentally, if you have the partial derivative of X1(Y1) explicitly given as a time-varying function, you simply can't require that the partial derivative of Y1(X1) would be exactly zero, contradicting the inverse function theorem. What is going on here?

In the previous chapter, dependence on other variables was considered as "noise", and there was the self-dependence term. But now, in l. 177, self-dependent terms are suddenly also noise, adding to the confusion.

The reviewer was fully lost when there was talk about the "1:2:3" ratio for 1D examples where X1, X2, X3, Y1, Y2 and Y3 are occurring. How is that a "1D" example?

The figures in the results section are not illuminating to the reviewer. It seems like one has to recognize a 21-units time lag from Fig. 3; however, even if one happens to know which panels have to be compared here, there are 1000 time steps shown, so the 21-units lag would only make a difference of 2.1%, you would need a magnifier to see anything here (the text in l. 187 talks about a 21% effect, which seem to indicate that there were somehow windows of length 100 analyzed each time, but this is mentioned nowhere else and is not visible in the Figures).

Eq. (20) seems to be a differential equation for Xadj, but not even the units fit here (unless both X and t are dimensionless, which wouldn't be case in any applications). Admittedly, the reviewer didn't even get the "25-75% split" mentioned in l. 249.

The only way to come thtough the material presented is by going through the (uncommented!) Matlab scripts provided as fileshare by the authors. Do you really expext this from a reviewer, let alone a "normal" reader?

As the reviewer doesn't see an easy fix to render the paper comprehensible., there are no detailed comments to the text (there would be many!). Still, there could be some interesting ideas, not the least since causal inference approaches (like Granger causality, CCM, PCMCI etc.) are quite fashionable in recent years in the Earth Sciences, and the concept of "higher order dependency" might be interesting. However, in its current form, the concepts are not communicated in a way that would render the paper acceptable for publication.

Citation: https://doi.org/10.5194/gmd-2022-106-RC2
- AC3: 'Reply on RC2', Simon Redfern, 13 Jul 2022
  
  Please see pdf - "Reviewer2_repsonse"
  
  Citation: https://doi.org/10.5194/gmd-2022-106-AC3
AC1: 'Comment on gmd-2022-106', Simon Redfern, 13 Jul 2022

Please see pdf.

Citation: https://doi.org/10.5194/gmd-2022-106-AC1

Status: closed

RC1:
'Comment on gmd-2022-106', Anonymous Referee #1, 21 May 2022
The manuscript ‘Empirical assessment of normalized information flow for quantifying causal contributions’ by Cheng and Redfern considers how causal sensitivity could be measured through information flow in the context of climate science. The main potential contribution lies in the empirical definition of measuring causal influences between variables (i.e. causal sensitivity) as a product of a constant maximal causal sensitivity and a modified normalized information flow.

This version of the manuscript is much improved over a previous one that I reviewed, especially at the beginning. However, with the Results section when first test cases are evaluated the clarity is again lost (for me, at least). I also still find the mathematical notation confusing, possibly because I am not familiar with a few papers frequently cited in the manuscript. Variables seem to come and go, become multidimensional and scalar as you wish, and I don’t see how different locations and time lags are fed into the overall picture. A good example is also Figure 1 where a thorough mathematical notation would allow the reader to immediately comprehend the meaning. I suggest introducing a consistent notation in a subsection that differentiates between all these points, which is then used throughout the manuscript and its figures.

Specific comments:

l. 68: allow for a comparison?

l. 69: ‘for simplicity’ probably suffices/is simple enough

l. 79-81: I am slightly confused wrt the notation here. If X and Y are single variables (as implied by the text), how can then there be a maximum that is different from the same term itself? The text appears to imply that this is a statement over different locations and times, however, should this not lead to some sort of vector or matrix notation? L. 83/84 also seem to imply that there are at least multiple X. Bold notation needed?

l. 110-115: here it is implied that somehow local or non-local does not play a role, so I am starting to wonder here how this relates to the notation above. I also don’t understand at this point the link between the interchangeability over causes at different times and locations/identification of particular causes and how that links to natural methane emissions and global mean temperature. Is the global mean not exactly the opposite of identifying locations where specific processes lead to methane emissions? Maybe only rephrasing is needed? Could you clarify?

l. 137-140: this raises the question as to why one would not use lagged relationships of X and Y in the set of variables? Is the discrete nature of lags a problem?

Eqs (11)-(13) my first impression would be that the importance of shared causal influences will be problem-dependent. Could you clarify how each treatment would help/or not/ to generalize the concept, i.e. if problem-dependence in how information flows would affect the validity of the choice for Z?

l. 185: word ‘noise’ missing here somewhere?

l. 187: I don’t understand why 21 steps of time lag equal 21% of each time analysed window? Can you explain the idea?

l. 190: might be worth explaining that teleconnections stand for spatial interactions here? Again, how could the additional spatial dependency be better included in the mathematical notation employed here?

l. 198-199: similar – suddenly X1, X2, X3 and Y1, Y2, Y3 are introduced, which I assume should indicate a problem with three X-variables and three Y-variables? Why this choice? Where was this introduced?

Figure 1: again clarity of notation, e.g. in (b) what is the meaning of multiple arrows between X and Y? Representing multiple variables? Time lags? Spatial points? I don’t understand why there are no teleconnections here, but there are in the other subfigures? Somehow this has to do with the crossing arrows, but I doubt that many will understand why this is a way to symbolize teleconnections (or how they are imagined here). For me, the notation throughout the manuscript is confusing and still reduces the clarity too much. There are multiple processes (X and Y) which are related spatially and temporally (with potential lags)? However, how do these differ in the notation, how are they made obvious? Maybe write a subsection where you formally introduce the notation you are using and be consistent afterwards.

Figure 2 - I am lost here: what do the different colours stand for? How is this a test? I need a clear instruction as to how to read this plot. Why are the results good? Why do they confirm the hypothesis? Why does the second column sometimes look like a flat line? Which lines should be the same? The previous section already became less clear, but latest here I have literally no idea what is going on anymore. The reader has to work really hard to keep track. This needs to be improved.

Figure 3 same.

This could partly be helped, of course, by more clearly explaining what is going on in the Results sections here.

Figure 5 same.
Citation: https://doi.org/10.5194/gmd-2022-106-RC1
- AC2: 'Reply on RC1', Simon Redfern, 13 Jul 2022
  
  please see pdf "Reviewer1_response"
  
  Citation: https://doi.org/10.5194/gmd-2022-106-AC2
RC2:
'Comment on gmd-2022-106', Anonymous Referee #2, 06 Jun 2022

The paper is sort of a continuation of the methodological work by Liang about information flow. Is starts with the concept of causal sensitivity, based on partial derivatives of an effect variable Y as dependent on a causal variable X, time, and other variables, and also potentially containing noise.

Full comprehension of the framework is not warranted without referring to Liang's papers, and the reader is left alone why the ratio of a partial derivative of Y to the total time derivative of X should bear the word "causal". Also, notation-wise, the total derivative of Y with respect to time would be a sum of several partial derivatives of Y wrt all variables (X1, X2, ...) times the total derivative of them:

dY/dt = partial Y / partial X1 * dX1/dt + partial Y / partial X2 + ... + partial Y / partial t

so it one wants to single out the part which is due to X1 (say), eq. (1) in line 81 would simply be

nCS = | partial Y / partial X1 | - or otherwise, the reviewer doesn't understand the notation \partial Y(X)) / partial t

but the main problem is why this would be called "causal" sensitivity?

Also, in applications in the Earth Sciences where time series of observations are available, how do you calculate the partial derivatives from them if you are ignorant of the underlying processes?

The connection to IF is presented only indirectly - by stating the hypothesis that nCS is approximately equal to |nIF|. If there is no alternative way to calculate nIF, how would you be able to test this hypothesis? Where is the independent definition of nIF ?

Later, one special case is considered - linear models (very unlikely to work for complex systems) where IF has a representation through covariance values. In addition, it seems that the authors believe in a decomposition of the normalization factor into a simple sum of IF, a noise component and a "self-dependence" term. Why should it be that simple? And how would you be able to discern the three terms, given only the time series of X and Y? The surprising answer is in eq. (8) to (10) which show that the self-dependence of Y is dependent on X, and the noise term (contribution of other variables) is also dependent on X. How is that possible? What are the assumptions about the phase space structure, stationarity etc. which go into that?

The reviewer also notes that md3 = md2 whenever the sign in the absolute bracket of eq. (13) is positive, and =2 md1 - md2 in the opposite case, so in which sense is md3 anything new once you have md1 and md2?

The "empirical tests" chapter lists no less than 8 artificially generated time series ("designed mock-up data sets") without, however, providing any details. The curious reader might want to reproduce the numerical results shown in the Figures, but there is no clue how to do that. What exactly did you choose as (say) "1D example with fluctuating self-dependency noise-contribution and a sinlge causal direction"? One has to refer to the supplement (not referenced to in lines 181-190 of the manuscript) to find answers to these questions - however, also this is difficult since mathematical notation is wrong (example: what does the sum "_2 ^nt 1/nt" mean? The summation index (n) can't be the upper limit of the sum itself, and if the user has to choose nt first, i.e. nt is a constant for the sum, the latter is juts (nt -1)/nt, which hardly makes sense? Fundamentally, if you have the partial derivative of X1(Y1) explicitly given as a time-varying function, you simply can't require that the partial derivative of Y1(X1) would be exactly zero, contradicting the inverse function theorem. What is going on here?

In the previous chapter, dependence on other variables was considered as "noise", and there was the self-dependence term. But now, in l. 177, self-dependent terms are suddenly also noise, adding to the confusion.

The reviewer was fully lost when there was talk about the "1:2:3" ratio for 1D examples where X1, X2, X3, Y1, Y2 and Y3 are occurring. How is that a "1D" example?

The figures in the results section are not illuminating to the reviewer. It seems like one has to recognize a 21-units time lag from Fig. 3; however, even if one happens to know which panels have to be compared here, there are 1000 time steps shown, so the 21-units lag would only make a difference of 2.1%, you would need a magnifier to see anything here (the text in l. 187 talks about a 21% effect, which seem to indicate that there were somehow windows of length 100 analyzed each time, but this is mentioned nowhere else and is not visible in the Figures).

Eq. (20) seems to be a differential equation for Xadj, but not even the units fit here (unless both X and t are dimensionless, which wouldn't be case in any applications). Admittedly, the reviewer didn't even get the "25-75% split" mentioned in l. 249.

The only way to come thtough the material presented is by going through the (uncommented!) Matlab scripts provided as fileshare by the authors. Do you really expext this from a reviewer, let alone a "normal" reader?

As the reviewer doesn't see an easy fix to render the paper comprehensible., there are no detailed comments to the text (there would be many!). Still, there could be some interesting ideas, not the least since causal inference approaches (like Granger causality, CCM, PCMCI etc.) are quite fashionable in recent years in the Earth Sciences, and the concept of "higher order dependency" might be interesting. However, in its current form, the concepts are not communicated in a way that would render the paper acceptable for publication.

Citation: https://doi.org/10.5194/gmd-2022-106-RC2
- AC3: 'Reply on RC2', Simon Redfern, 13 Jul 2022
  
  Please see pdf - "Reviewer2_repsonse"
  
  Citation: https://doi.org/10.5194/gmd-2022-106-AC3
AC1: 'Comment on gmd-2022-106', Simon Redfern, 13 Jul 2022

Please see pdf.

Citation: https://doi.org/10.5194/gmd-2022-106-AC1

Chin-Hsien Cheng and Simon Redfern

Supplement

https://doi.org/10.5194/gmd-2022-106-supplement

Chin-Hsien Cheng and Simon Redfern

Viewed

Total article views: 1,432 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
1,112	264	56	1,432	117	73	70

HTML: 1,112
PDF: 264
XML: 56
Total: 1,432
Supplement: 117
BibTeX: 73
EndNote: 70

Views and downloads (calculated since 25 Apr 2022)

Month	HTML	PDF	XML	Total
Apr 2022	106	16	4	126
May 2022	97	15	3	115
Jun 2022	58	12	2	72
Jul 2022	81	30	7	118
Aug 2022	51	15	1	67
Sep 2022	57	5	0	62
Oct 2022	50	4	0	54
Nov 2022	44	5	0	49
Dec 2022	34	4	0	38
Jan 2023	34	3	0	37
Feb 2023	45	8	0	53
Mar 2023	45	3	0	48
Apr 2023	44	5	0	49
May 2023	13	5	1	19
Jun 2023	6	3	0	9
Jul 2023	7	10	1	18
Aug 2023	11	11	0	22
Sep 2023	17	7	3	27
Oct 2023	18	7	1	26
Nov 2023	5	4	0	9
Dec 2023	9	0	9
Jan 2024	7	1	0	8
Feb 2024	18	8	2	28
Mar 2024	13	14	5	32
Apr 2024	22	4	8	34
May 2024	18	4	3	25
Jun 2024	34	3	2	39
Jul 2024	9	4	1	14
Aug 2024	11	1	5	17
Sep 2024	14	3	0	17
Oct 2024	16	1	0	17
Nov 2024	31	3	1	35
Dec 2024	17	3	0	20
Jan 2025	4	5	1	10
Feb 2025	18	6	0	24
Mar 2025	11	5	1	17
Apr 2025	7	8	0	15
May 2025	10	5	2	17
Jun 2025	20	14	2	36

Cumulative views and downloads (calculated since 25 Apr 2022)

Month	HTML	PDF	XML	Total
Apr 2022	106	16	4	126
May 2022	97	15	3	115
Jun 2022	58	12	2	72
Jul 2022	81	30	7	118
Aug 2022	51	15	1	67
Sep 2022	57	5	0	62
Oct 2022	50	4	0	54
Nov 2022	44	5	0	49
Dec 2022	34	4	0	38
Jan 2023	34	3	0	37
Feb 2023	45	8	0	53
Mar 2023	45	3	0	48
Apr 2023	44	5	0	49
May 2023	13	5	1	19
Jun 2023	6	3	0	9
Jul 2023	7	10	1	18
Aug 2023	11	11	0	22
Sep 2023	17	7	3	27
Oct 2023	18	7	1	26
Nov 2023	5	4	0	9
Dec 2023	9	0	9
Jan 2024	7	1	0	8
Feb 2024	18	8	2	28
Mar 2024	13	14	5	32
Apr 2024	22	4	8	34
May 2024	18	4	3	25
Jun 2024	34	3	2	39
Jul 2024	9	4	1	14
Aug 2024	11	1	5	17
Sep 2024	14	3	0	17
Oct 2024	16	1	0	17
Nov 2024	31	3	1	35
Dec 2024	17	3	0	20
Jan 2025	4	5	1	10
Feb 2025	18	6	0	24
Mar 2025	11	5	1	17
Apr 2025	7	8	0	15
May 2025	10	5	2	17
Jun 2025	20	14	2	36

Viewed (geographical distribution)

Total article views: 1,363 (including HTML, PDF, and XML) Thereof 1,363 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 01 Jul 2025

Short summary

Causality is one of the foundations of scientific understanding and progress. Statistical models extrapolate historical trends into the future through statistical tools, but may still lack insight into the physical underlying processes. We have developed a method to quantify physical causal contributions between observational time series. It plugs the gap between process-based and statistical models, providing a key to unlocking and understanding causality in Earth systems science processes.


Total:	0
HTML:	0
PDF:	0
XML:	0

Empirical Assessment of Normalized Information Flow for Quantifying Causal Contributions

Supplement

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.