Variational regional inverse modeling of reactive species emissions with PYVAR-CHIMERE-v2019

Fortems-Cheiney, Audrey; Pison, Isabelle; Broquet, Grégoire; Dufour, Gaëlle; Berchet, Antoine; Potier, Elise; Coman, Adriana; Siour, Guillaume; Costantino, Lorenzo

doi:https://doi.org/10.5194/gmd-14-2939-2021

Articles | Volume 14, issue 5

https://doi.org/10.5194/gmd-14-2939-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-14-2939-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 14, issue 5

Model description paper

|

26 May 2021

Model description paper |

| 26 May 2021

Variational regional inverse modeling of reactive species emissions with PYVAR-CHIMERE-v2019

Audrey Fortems-Cheiney, Isabelle Pison, Grégoire Broquet, Gaëlle Dufour, Antoine Berchet, Elise Potier, Adriana Coman, Guillaume Siour, and Lorenzo Costantino

Download

Final revised paper (published on 26 May 2021)
Preprint (discussion started on 04 Sep 2019)

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Review of GMD paper by Fortems-Cheiney et al.', Anonymous Referee #1, 09 Sep 2019
- AC1: 'Reply to referee #1', Audrey Fortems-Cheiney, 29 Apr 2020
SC1: 'Executive Editor Comment on gmd-2019-186', Astrid Kerkweg, 05 Nov 2019
- AC3: 'Reply to the Executive Editor comment', Audrey Fortems-Cheiney, 29 Apr 2020
RC2: 'Comments on "Variational regional inverse modeling of reactive species emissions with PYVAR-CHIMERE" by Audrey Fortems-Cheiney et al. 2019', Anonymous Referee #2, 05 Dec 2019
- AC2: 'Reply to Referee #2', Audrey Fortems-Cheiney, 29 Apr 2020

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

AR by Audrey Fortems-Cheiney on behalf of the Authors (01 May 2020) Manuscript

ED: Referee Nomination & Report Request started (26 May 2020) by Ignacio Pisso

RR by Anonymous Referee #1 (01 Jun 2020)

Suggestions for revision or reasons for rejection

I thank the authors for the numerous additions and improvements to their paper. The description of the model and inversion system is now much more complete.
Nevertheless, I still see some important issues (see below) that should be addressed before I can recommend the paper for publication in GMD.

Main comments

1) There is an unusually large number of typos, grammatical and orthographical errors. This is quite distracting while reviewing a paper. I encourage the authors to put more efforts into this aspect in the future.

2) The performance of the inversions is relatively poor, especially in the NOx case. Two possible explanations should be explored: 1) the number of iterations -- I doubt that the criterion of 90% reduction of the gradient of J is sufficient. Is computing time really such a hard constraint that more iterations cannot be tested? I would be curious to see the results for a 99% reduction. 2) the choice of a priori errors is perplexing: 30% in the case of NOx, 100% for CO. There is no possible justification for this. Furthermore, correlations are used for one compound, not for the other. The small NOx emission errors likely explain the poor match of a posteriori NO2 with OMI. I think we can all agree that the choice of a priori errors and correlations is arbitrary to some (large) degree. This is why it is important to assess whether the inversion results are dependent on such choices. Clearly this exploration is not considered important here, which I think is a mistake. Especially in view of the weak bias reduction achieved with the current setup.

Minor comments

Figures 4-5 Is it a coincidence that the right panel of Fig. 4 presents high values off the coast of Egypt, which is where the MACC-based run is most different from the standard one using LMDZ-INCA? It looks like the run without emissions used boundary and initial conditions from MACC.

l 365 Why is the median appropriate to take proper account of the AK ?

Technical/language corrections

The appearance of mathematical symbols appears variable throughout the manuscript. For example,, the B1/2 notation of l 170 (also 173) and 293. Please use consistent
fonts throughout the text and equations.

l 95 "Van" --> "van"

l 162 "high-non linearity" -> "high non-linearity"

l 162-165 : Difficult to read due to sentence within parentheses; please rephrase.

l 200: delete "a first time"

l 204 Problem with reference (parentheses)

l 210 A space is missing before "The adjoint"

l 212 "Then, it has been parallelized. This work" --> "The code has been parallelized. This task"

l 213 "whole code, associatedwith" --> "entire code, associated with"

l 213 missing space between sentences

l 214 missing space between sentence

l 216 "lead" --> "conducted"

l 219, 221, 222 Insert space after the bullet

l 219 "For the geometry" --> "Regarding geometry". Id. for next bullets

l 233 "infer" -> "infers"

Figure 2 The blue could be a bit lighter for lisibility (also in Figure 3)

l 248 "to constrain could be" --> "to be constrained might be"

l 252 Insert a comma after "add"

l 256-260 I cannot understand what is explained here. What is meant by "activity maps and/or masks for regions"? What is meant by "control of budgets"? Etc. Please rephrase.

l 264-265 "to define the diagonal standard deviation matrix SIGMA" : do you mean that the matrix is diagonal, or are your referring to the diagonal of the matrix? SIGMA has not been defined previously.

l 281 Insert space before "where"

l 291 Subscript "t" in "Ct"

l 293 "The calculations involving B1/2" : it would be helpful to use equation numbers, to help the reader figure out where such calculations were mentioned.

l 294 Subscripts for Ct and Cs

l 298 "computation time"

l 313 "Simplified scheme describing how...". "prepares" --> "prepare". In the colored box, what does mean "PYVAR y building"?

l 344 Insert a space before "OMI"

l 344 "to present an illustration"

l 381 and 383 "come from" --> e.g. "are obtained from"

l 388 Insert a space after "site"

l 391 "strongly driven" --> "strongly influenced"

Figure 4, the right panel is denoted using "a)" instead of "b)", please correct.
Fig 4 & 5 The legend could be more clear, e.g. "CO surface concentrations between 1-7 March 2015, simulated by CHIMERE...". You might drop "over Europe"

Figure 5 The legend says "relative difference" but the figure shows "ppbv"... Which one is it? In any case, the color scale is poorly chosen since almost all values are positive. Relative differences of 15% are still significant.

l 406 Drop "of their equivalents"

l 408-411 What is meant by "general trend in emissions"? Furthermore, if the underestimation persists throughout the year, it might still be due to specific activity sectors, isn't it? Please rephrase in a more logical manner.

l. 412 "chemistry with OH" rephrase.

Figures 6 & 7 Legend: "Mean bias" --> "Mean biases"

l 419 & 423 "for 7-day" is ambiguous. Only one value for the period, or several?

l 427 "for 1-day" : as above.

l 433 "ofthe"

l 433 "We hardly have sources of estimates" : weird, re-phrase.

l 437 "is" --> "are"

l 439 remove the commas before the year in the references.

l 449 "obtained" --> "obtain"

l 451 "even 100% of ucertainty lead to" --> "even an uncertainty of 100% leads to"

l 463 "derivenby" --> "driven by"

l 480 Space missing after Portugal

l 500 Please write 1.4Ee+15 as 1.4(times)10^{15}. Same elsewhere in the paper (e.g. legend of Figure 7)

Hide

RR by Anonymous Referee #3 (27 Jun 2020)

Suggestions for revision or reasons for rejection

General Comments:

This manuscript presents an added inverse modeling system capability for CHIMERE based on PYVAR, which has been successfully used with LMDz for GHG flux estimation. The manuscript describes its model development and show some very promising results for CO and NOx. This version of the manuscript has improved relative to original submission, addressing several of previous reviewers comments. I commend the authors for this revision. However, while this is relevant to the scope of GMD, I have several concerns with regards to the clarity and presentation of its description and substantive presentation of results, especially on quantifying some uncertainties or describing its fidelity. I suggest minor revisions to address the following concerns before publication:

1) Description of the system could be improved by:
a) clear differentiation of new developments in PYVAR for reactive species. In its current form, it appears to be just an implementation of PYVAR to CHIMERE. If so, detailed testing strategies is warranted to show the fidelity of i) CHIMERE adjoint/TLM, ii) PYVAR global minimization especially for the current application to reactive species. How are these tested and quantified?
b) clear differentiation between PYVAR and 4D-Var. How is PYVAR different than other 4DVar approaches? What are its advantages and disadvantages? What are its strengths and limitations? Please add some context.
c) Clear description of Figure 3. In its current form, it shows a list or table of variables and parameters. It would be clearer if details on this Figure are discussed. What do these variables represent? Are there utilities in PYVAR system that are used? If so, it may be better to name and describe those utilities.
d) clear mathematical expressions and their representation (including consistent use of nomenclature). In particular, how does the time component of the cost function (i.e., 4DVAr versus PYVAR) treated in PYVAR? Can the “add”, “mult”, and “scale” be incorporated in the cost function notation? Where are these correction estimated and applied in the algorithm?
2) Presentation of results could be improved by:
a) Some diagnostics to check optimality of the inversion algorithm. For example, i) posterior error covariances – if calculated especially error reduction estimates, ii) RMSEs. This reviewer understands that comparison with independent measurements is beyond the scope of this paper. But at least provide some indication of its optimality. Results on scaling factors and increments can only be interpreted if compared to independent datasets and other approaches. However, one can show for example that the minimization reached its minimum or show the breakdown of the cost function to show that the observations are able to constrain some elements of the control vector. In its current form, this is shown qualitatively in Figure 5 to 8. At least quantify these by statistics other than mean biases. These are very promising results and should be highlighted more.
b) Spell and grammar checks, as well as clearer formats of tables.

Specific Comments:

1) Abstract: Please add some numbers for your results (especially error reduction)
2) Line 16: “in addition to greenhouse gases”. I understand that PYVAR has been used for GHG before, but I don’t think it’s used here. Good for the intro but perhaps not in the abstract. Focus on what exactly is new here.
3) Introduction: While discussion on emissions and inversions can be interesting, some of these are general statements which may not be exactly what this paper is specifically addressing and demonstrating. Focus on what issues/problems exactly the study addresses. For example, is this study addressing high resolution emission estimations (at 0.5 degrees?) or O3 or GHG? I understand that some of these statements motivate future work but perhaps make it more concise and focus more on what exactly does this paper at its current form address in terms of scientific problems? Why are other approaches not sufficient? What are the limitations of these approaches?
4) Line 102. Check spacing
5) Line 121. “in input of the CTM”. Please clarify
6) Line 124: “during the inversion process (surface fluxes …)” Perhaps make this two separate sentences?
7) Line 121-134: parameters versus variables? Are they the same?
8) Line 130: “observation errors”. Why not call it model-data mismatch?
9) Line 132: “given their prior estimates, the observations, the CTM and the associated uncertainties”. First, it may be better if “the” article is omitted on the succeeding segments of the sentence. Second, “given the CTM” may not be accurate way to describe this minimization.
10) Line 133: “in the following”. Please clarify or omit.
11) Line 135: What is xb? There is no mention of time in this section. Be consistent in notations (italic versus non-italic, bold vs not bold) for all expressions not just Eq. 1. What are the dimensions of these vectors and matrices?
12) Line 137: “state vector x”. Is this the same as control vector? What do you mean by state? Is a parameter a state?
13) Line 139: “includes the CTM”. Please clarify.
14) Line 157: I suggest to may Eq 2 as a separate line. Should B be bold?
15) Line 142: “errors are assumed to be centered and to have Gaussian distribution”. What do you mean by centered? Unbiased? Is it necessary to assume Gaussianity? If so, then why not just solve the analytical solution? Because H is non-linear?
16) Line 159: “optimal solution”. What does optimality mean here?
17) Line 163: Careful on spacing between words
18) Line 172: I suggest to make the modified equation 2 as a separate line. How is this linearization implemented? i.e., where is this linearization point relative to the iteration interval? Does making shorter intervals improve minimization and representation of non-linearity?
19) Line 173: what do you mean by “norms”.
20) Line 175: how do you address local minimum?
21) Line 177-184: If there’s an estimate of posterior uncertainty in this system, is this used in the study? Please state which approach is used.
22) Figure 1. Are B and R fixed? Caption has bold fonts.
23) Line 199: “without chemistry a first time”. Please clarify.
24) Section 3.2 How do you diagnose if these adjoints are calculated accurately? Are there tests conducted for this purpose?
25) Line 207-216. Please check spacing between words.
26) Line 214: “lead with”. Please clarify.
27) Line 217-222. While important, I suggest to have them numbered but part of the paragraph rather than bulleted. And please elaborate each one.
28) Line 221: “when no species requires them”. Please clarify. Do you mean for GHG for all – chem, dep? Or for a particular species that do not have either of these processes?
29) Line 224: “currently operational”. Please clarify. Does this mean it is used in operational mode to forecast and predict? Also, is there a particular version of PYVAR and CHIMERE and PYVAR-CHIMERE used in this study?
30) Table 1 is very informative. Please format accordingly, especially separating the header as it becomes confusing to read. Not sure if the “example of the definition..” row should be there. Can it be in the title?
31) Section 3.3. Discussion of correction types is very informative as well. Is it possible to show how these are related to Eq. 1 to 3? Isnt it that the control vector consists of elements –corresponding to each grid point and species? If so, how is “scale” implemented to maps or masks for regions?.
32) Line 254: “which is similar to the control vector of budgets…” Please elaborate.
33) Line 256: “adding the obtained values to the …” please rephrase.
34) Line 259: “standard deviation coefficient”. Please clarify. Is it really a coefficient? And since this is an error covariance matrix, should the diagonal elements be error variance not error standard deviation?
35) Line 260-262: Very important statement. But please elaborate or rephrase. What is standard deviation of the uncertainty?
36) Line 266: “variances”. Are these error variances?
37) Line 270: “ error correlation between fluxes of CO and NOx, are not coded yet”. Please elaborate on its potential effect on your estimation?
38) Line 296: How about calling this “Observation Operators”?
39) Line 298: Please note spacing between words.
40) Section 3.4. I think this is very relevant. Please elaborate Figure 3. In its current form, it is not clear what this Figure represents and how we can use it to interpret results. I think coding of these operators is a vital step in the assimilation and should be given more emphasis. Are these utilities also available? How good are the adjoints of these operators? Are there tests to diagnose their accuracy?
41) Please check bold fonts in line 311 to 312
42) Line 314-318: Please highlight in your notations if these are scalars or vectors. And please add corresponding dimensions. What is the difference between small (c_m(o)) and big C_m(o. What is x_a?
43) Line 328-334: This is also informative. Is there a reference for parallelization approach in PYVAR and CHIMERE? How does it scale with more CPUs? 4 hours seem to be a long time, isn’t it? Please elaborate and compare with other systems.
44) Line 336-343. Bold? Check spacing between words.
45) Line 391-392. Why are they not different?
46) Figure 5 caption. “differences are in %” is in contrast to the units in the figure.
47) Figure 6 and 7. Is it possible to show difference plots? And more statistics (RMSEs, correlation, bias? Error reduction? Are these really surface concentrations? They are column measurements, right? What about initial conditions? Has this change as well since these are part of the control vector? Superscript on units?
48) Section 4.2. Should this be presented prior to section 4.1.3 since some of the plots are for the posterior estimates?
49) Section 4.2.1. Can this be summarized in a table and discuss a little bit in the text as to the rational of the choice of these parameters? Am I to assume that NOx emissions are estimated only for 1 day, and all days are the same? For CO, what do you mean by 7-day? Average? How are emissions incorporated in CHIMERE in terms of time? Is there a distribution? i.e., diurnal and weekly cycle?
50) Section 4.2.2. Please check spacing of words and bold fonts.
51) Section 4.2.3. Is it possible to break down the components of J? How about emission error reduction? How do you ensure that these increments are “resolved by the observations”. It would be great to see error reduction plots, if posterior error covariances are calculated. How about initial conditions? Did this change as well?
52) Line 508-516. What is the implication of this to overall cost and computing and optimality of minimization including error correlation of CO and NOX (and spatial correlation against superobbing) as well as increase in dimension of control vector? This also entails using this system at higher spatiotemporal resolution, right? It would be great to have a section on limitations before future implicatio

Hide

ED: Reconsider after major revisions (08 Jul 2020) by Ignacio Pisso

Dear Authors,

The reviewers acknowledge the work invested in preparing the responses to the previous reports. Nevertheless they still have questions regarding the manuscript and indicate a number of improvements that could be made. Please address their comments in order to make the manuscript ready for publication.

Best regards,

The editor

Report #1

I thank the authors for the numerous additions and improvements to their paper. The description of the model and inversion system is now much more complete.
Nevertheless, I still see some important issues (see below) that should be addressed before I can recommend the paper for publication in GMD.

Main comments

1) There is an unusually large number of typos, grammatical and orthographical errors. This is quite distracting while reviewing a paper. I encourage the authors to put more efforts into this aspect in the future.

2) The performance of the inversions is relatively poor, especially in the NOx case. Two possible explanations should be explored: 1) the number of iterations -- I doubt that the criterion of 90% reduction of the gradient of J is sufficient. Is computing time really such a hard constraint that more iterations cannot be tested? I would be curious to see the results for a 99% reduction. 2) the choice of a priori errors is perplexing: 30% in the case of NOx, 100% for CO. There is no possible justification for this. Furthermore, correlations are used for one compound, not for the other. The small NOx emission errors likely explain the poor match of a posteriori NO2 with OMI. I think we can all agree that the choice of a priori errors and correlations is arbitrary to some (large) degree. This is why it is important to assess whether the inversion results are dependent on such choices. Clearly this exploration is not considered important here, which I think is a mistake. Especially in view of the weak bias reduction achieved with the current setup.

Minor comments

Figures 4-5 Is it a coincidence that the right panel of Fig. 4 presents high values off the coast of Egypt, which is where the MACC-based run is most different from the standard one using LMDZ-INCA? It looks like the run without emissions used boundary and initial conditions from MACC.

l 365 Why is the median appropriate to take proper account of the AK ?

Technical/language corrections

The appearance of mathematical symbols appears variable throughout the manuscript. For example,, the B1/2 notation of l 170 (also 173) and 293. Please use consistent
fonts throughout the text and equations.

l 95 "Van" --> "van"

l 162 "high-non linearity" -> "high non-linearity"

l 162-165 : Difficult to read due to sentence within parentheses; please rephrase.

l 200: delete "a first time"

l 204 Problem with reference (parentheses)

l 210 A space is missing before "The adjoint"

l 212 "Then, it has been parallelized. This work" --> "The code has been parallelized. This task"

l 213 "whole code, associatedwith" --> "entire code, associated with"

l 213 missing space between sentences

l 214 missing space between sentence

l 216 "lead" --> "conducted"

l 219, 221, 222 Insert space after the bullet

l 219 "For the geometry" --> "Regarding geometry". Id. for next bullets

l 233 "infer" -> "infers"

Figure 2 The blue could be a bit lighter for lisibility (also in Figure 3)

l 248 "to constrain could be" --> "to be constrained might be"

l 252 Insert a comma after "add"

l 256-260 I cannot understand what is explained here. What is meant by "activity maps and/or masks for regions"? What is meant by "control of budgets"? Etc. Please rephrase.

l 264-265 "to define the diagonal standard deviation matrix SIGMA" : do you mean that the matrix is diagonal, or are your referring to the diagonal of the matrix? SIGMA has not been defined previously.

l 281 Insert space before "where"

l 291 Subscript "t" in "Ct"

l 293 "The calculations involving B1/2" : it would be helpful to use equation numbers, to help the reader figure out where such calculations were mentioned.

l 294 Subscripts for Ct and Cs

l 298 "computation time"

l 313 "Simplified scheme describing how...". "prepares" --> "prepare". In the colored box, what does mean "PYVAR y building"?

l 344 Insert a space before "OMI"

l 344 "to present an illustration"

l 381 and 383 "come from" --> e.g. "are obtained from"

l 388 Insert a space after "site"

l 391 "strongly driven" --> "strongly influenced"

Figure 4, the right panel is denoted using "a)" instead of "b)", please correct.
Fig 4 & 5 The legend could be more clear, e.g. "CO surface concentrations between 1-7 March 2015, simulated by CHIMERE...". You might drop "over Europe"

Figure 5 The legend says "relative difference" but the figure shows "ppbv"... Which one is it? In any case, the color scale is poorly chosen since almost all values are positive. Relative differences of 15% are still significant.

l 406 Drop "of their equivalents"

l 408-411 What is meant by "general trend in emissions"? Furthermore, if the underestimation persists throughout the year, it might still be due to specific activity sectors, isn't it? Please rephrase in a more logical manner.

l. 412 "chemistry with OH" rephrase.

Figures 6 & 7 Legend: "Mean bias" --> "Mean biases"

l 419 & 423 "for 7-day" is ambiguous. Only one value for the period, or several?

l 427 "for 1-day" : as above.

l 433 "ofthe"

l 433 "We hardly have sources of estimates" : weird, re-phrase.

l 437 "is" --> "are"

l 439 remove the commas before the year in the references.

l 449 "obtained" --> "obtain"

l 451 "even 100% of ucertainty lead to" --> "even an uncertainty of 100% leads to"

l 463 "derivenby" --> "driven by"

l 480 Space missing after Portugal

l 500 Please write 1.4Ee+15 as 1.4(times)10^{15}. Same elsewhere in the paper (e.g. legend of Figure 7)

Report #2

General Comments:

This manuscript presents an added inverse modeling system capability for CHIMERE based on PYVAR, which has been successfully used with LMDz for GHG flux estimation. The manuscript describes its model development and show some very promising results for CO and NOx. This version of the manuscript has improved relative to original submission, addressing several of previous reviewers comments. I commend the authors for this revision. However, while this is relevant to the scope of GMD, I have several concerns with regards to the clarity and presentation of its description and substantive presentation of results, especially on quantifying some uncertainties or describing its fidelity. I suggest minor revisions to address the following concerns before publication:

1) Description of the system could be improved by:
a) clear differentiation of new developments in PYVAR for reactive species. In its current form, it appears to be just an implementation of PYVAR to CHIMERE. If so, detailed testing strategies is warranted to show the fidelity of i) CHIMERE adjoint/TLM, ii) PYVAR global minimization especially for the current application to reactive species. How are these tested and quantified?
b) clear differentiation between PYVAR and 4D-Var. How is PYVAR different than other 4DVar approaches? What are its advantages and disadvantages? What are its strengths and limitations? Please add some context.
c) Clear description of Figure 3. In its current form, it shows a list or table of variables and parameters. It would be clearer if details on this Figure are discussed. What do these variables represent? Are there utilities in PYVAR system that are used? If so, it may be better to name and describe those utilities.
d) clear mathematical expressions and their representation (including consistent use of nomenclature). In particular, how does the time component of the cost function (i.e., 4DVAr versus PYVAR) treated in PYVAR? Can the “add”, “mult”, and “scale” be incorporated in the cost function notation? Where are these correction estimated and applied in the algorithm?
2) Presentation of results could be improved by:
a) Some diagnostics to check optimality of the inversion algorithm. For example, i) posterior error covariances – if calculated especially error reduction estimates, ii) RMSEs. This reviewer understands that comparison with independent measurements is beyond the scope of this paper. But at least provide some indication of its optimality. Results on scaling factors and increments can only be interpreted if compared to independent datasets and other approaches. However, one can show for example that the minimization reached its minimum or show the breakdown of the cost function to show that the observations are able to constrain some elements of the control vector. In its current form, this is shown qualitatively in Figure 5 to 8. At least quantify these by statistics other than mean biases. These are very promising results and should be highlighted more.
b) Spell and grammar checks, as well as clearer formats of tables.

Specific Comments:

1) Abstract: Please add some numbers for your results (especially error reduction)
2) Line 16: “in addition to greenhouse gases”. I understand that PYVAR has been used for GHG before, but I don’t think it’s used here. Good for the intro but perhaps not in the abstract. Focus on what exactly is new here.
3) Introduction: While discussion on emissions and inversions can be interesting, some of these are general statements which may not be exactly what this paper is specifically addressing and demonstrating. Focus on what issues/problems exactly the study addresses. For example, is this study addressing high resolution emission estimations (at 0.5 degrees?) or O3 or GHG? I understand that some of these statements motivate future work but perhaps make it more concise and focus more on what exactly does this paper at its current form address in terms of scientific problems? Why are other approaches not sufficient? What are the limitations of these approaches?
4) Line 102. Check spacing
5) Line 121. “in input of the CTM”. Please clarify
6) Line 124: “during the inversion process (surface fluxes …)” Perhaps make this two separate sentences?
7) Line 121-134: parameters versus variables? Are they the same?
8) Line 130: “observation errors”. Why not call it model-data mismatch?
9) Line 132: “given their prior estimates, the observations, the CTM and the associated uncertainties”. First, it may be better if “the” article is omitted on the succeeding segments of the sentence. Second, “given the CTM” may not be accurate way to describe this minimization.
10) Line 133: “in the following”. Please clarify or omit.
11) Line 135: What is xb? There is no mention of time in this section. Be consistent in notations (italic versus non-italic, bold vs not bold) for all expressions not just Eq. 1. What are the dimensions of these vectors and matrices?
12) Line 137: “state vector x”. Is this the same as control vector? What do you mean by state? Is a parameter a state?
13) Line 139: “includes the CTM”. Please clarify.
14) Line 157: I suggest to may Eq 2 as a separate line. Should B be bold?
15) Line 142: “errors are assumed to be centered and to have Gaussian distribution”. What do you mean by centered? Unbiased? Is it necessary to assume Gaussianity? If so, then why not just solve the analytical solution? Because H is non-linear?
16) Line 159: “optimal solution”. What does optimality mean here?
17) Line 163: Careful on spacing between words
18) Line 172: I suggest to make the modified equation 2 as a separate line. How is this linearization implemented? i.e., where is this linearization point relative to the iteration interval? Does making shorter intervals improve minimization and representation of non-linearity?
19) Line 173: what do you mean by “norms”.
20) Line 175: how do you address local minimum?
21) Line 177-184: If there’s an estimate of posterior uncertainty in this system, is this used in the study? Please state which approach is used.
22) Figure 1. Are B and R fixed? Caption has bold fonts.
23) Line 199: “without chemistry a first time”. Please clarify.
24) Section 3.2 How do you diagnose if these adjoints are calculated accurately? Are there tests conducted for this purpose?
25) Line 207-216. Please check spacing between words.
26) Line 214: “lead with”. Please clarify.
27) Line 217-222. While important, I suggest to have them numbered but part of the paragraph rather than bulleted. And please elaborate each one.
28) Line 221: “when no species requires them”. Please clarify. Do you mean for GHG for all – chem, dep? Or for a particular species that do not have either of these processes?
29) Line 224: “currently operational”. Please clarify. Does this mean it is used in operational mode to forecast and predict? Also, is there a particular version of PYVAR and CHIMERE and PYVAR-CHIMERE used in this study?
30) Table 1 is very informative. Please format accordingly, especially separating the header as it becomes confusing to read. Not sure if the “example of the definition..” row should be there. Can it be in the title?
31) Section 3.3. Discussion of correction types is very informative as well. Is it possible to show how these are related to Eq. 1 to 3? Isnt it that the control vector consists of elements –corresponding to each grid point and species? If so, how is “scale” implemented to maps or masks for regions?.
32) Line 254: “which is similar to the control vector of budgets…” Please elaborate.
33) Line 256: “adding the obtained values to the …” please rephrase.
34) Line 259: “standard deviation coefficient”. Please clarify. Is it really a coefficient? And since this is an error covariance matrix, should the diagonal elements be error variance not error standard deviation?
35) Line 260-262: Very important statement. But please elaborate or rephrase. What is standard deviation of the uncertainty?
36) Line 266: “variances”. Are these error variances?
37) Line 270: “ error correlation between fluxes of CO and NOx, are not coded yet”. Please elaborate on its potential effect on your estimation?
38) Line 296: How about calling this “Observation Operators”?
39) Line 298: Please note spacing between words.
40) Section 3.4. I think this is very relevant. Please elaborate Figure 3. In its current form, it is not clear what this Figure represents and how we can use it to interpret results. I think coding of these operators is a vital step in the assimilation and should be given more emphasis. Are these utilities also available? How good are the adjoints of these operators? Are there tests to diagnose their accuracy?
41) Please check bold fonts in line 311 to 312
42) Line 314-318: Please highlight in your notations if these are scalars or vectors. And please add corresponding dimensions. What is the difference between small (c_m(o)) and big C_m(o. What is x_a?
43) Line 328-334: This is also informative. Is there a reference for parallelization approach in PYVAR and CHIMERE? How does it scale with more CPUs? 4 hours seem to be a long time, isn’t it? Please elaborate and compare with other systems.
44) Line 336-343. Bold? Check spacing between words.
45) Line 391-392. Why are they not different?
46) Figure 5 caption. “differences are in %” is in contrast to the units in the figure.
47) Figure 6 and 7. Is it possible to show difference plots? And more statistics (RMSEs, correlation, bias? Error reduction? Are these really surface concentrations? They are column measurements, right? What about initial conditions? Has this change as well since these are part of the control vector? Superscript on units?
48) Section 4.2. Should this be presented prior to section 4.1.3 since some of the plots are for the posterior estimates?
49) Section 4.2.1. Can this be summarized in a table and discuss a little bit in the text as to the rational of the choice of these parameters? Am I to assume that NOx emissions are estimated only for 1 day, and all days are the same? For CO, what do you mean by 7-day? Average? How are emissions incorporated in CHIMERE in terms of time? Is there a distribution? i.e., diurnal and weekly cycle?
50) Section 4.2.2. Please check spacing of words and bold fonts.
51) Section 4.2.3. Is it possible to break down the components of J? How about emission error reduction? How do you ensure that these increments are “resolved by the observations”. It would be great to see error reduction plots, if posterior error covariances are calculated. How about initial conditions? Did this change as well?
52) Line 508-516. What is the implication of this to overall cost and computing and optimality of minimization including error correlation of CO and NOX (and spatial correlation against superobbing) as well as increase in dimension of control vector? This also entails using this system at higher spatiotemporal resolution, right? It would be great to have a section on limitations before future implicatio

Hide

AR by Audrey Fortems-Cheiney on behalf of the Authors (29 Oct 2020) Manuscript

ED: Referee Nomination & Report Request started (25 Nov 2020) by Ignacio Pisso

RR by Anonymous Referee #1 (26 Nov 2020)

Suggestions for revision or reasons for rejection

I thank the authors for the substantial improvements to the paper in response to the 2 reviewers. There remains however a critical area which deserves more attention from the authors. When this main issue will be addressed, I'll be happy to recommend the article to this journal.

Major comment

The authors state that "the biases between OMI and simulated NO2 tropospheric columns is a complex topic (...) Addressing it properly is thus clearly out of the scope of this paper." This is true, but still, the inversion of emissions is expected to bring the model much closer to the observations even when the model and data are flawed. This article should show that the behaviour of the inversion system is well understood, which I am not convinced. In the NOx inversion E, the simulated NO2 columns are increased by only about 10% over northern Germany and the Netherlands (based on Table 5 and Figure 6), despite large emission increments (>20%, possibly much more). Why this lack of sensitivity? Given the relatively long NOx lifetime in winter, there could be a strong dependence of the columns on the emissions during the preceding days. Does the system account for that? Something makes it difficult for the inversion system to reproduce the observations. All I ask is that the cause(s) for that inability are identified.

It is stated on l. 476-482 that the discrepancies might have different causes including biases in the observations, in the emissions, and in the model. Nevertheless, the basic assumption of inverse modelling is that errors in the emissions play the dominant part. Therefore, we expect a substantial reduction of the bias after inversion, at least if the observations do not have huge uncertainties. What are the relative uncertainties in the NO2 observations used here? The fact that only few iterations were needed to reach near-convergence (Table 3) indicates that the errors in the observations are very high, and this could partly explain the poor performance of the inversion. Please clarify.

Minor comments

The authors state in their rebuttal that "in some cases, the minimizer finds a localminimum. Once in such a local minimum, the minimizer runs many simulations but cannot get out into a new direction and perform more iterations." Have you encountered clear instances of local minimum being found in the minimization process? From my experience, this is very unlikly because the non-linearity of the atmospheric system is not so high, and the background term of the cost function ensures that the solution is not very far from the a priori.

l. 583: after "over the Netherlands", add "and northern Germany". Please also adapt the abstract.

Legend of Table 5 : "over the Netherlands" --> "over northern Germany"

l. 537 "probability" --> "likelihood"

l. 545 "assume" --> "expect" (?)

Table 4 For the MB, RMSE and STD, replace e.g. 15.88 by 15.9

Hide

ED: Reconsider after major revisions (04 Dec 2020) by Ignacio Pisso

The reviewer is rising the following point:

Major comment

The authors state that "the biases between OMI and simulated NO2 tropospheric columns is a complex topic (...) Addressing it properly is thus clearly out of the scope of this paper." This is true, but still, the inversion of emissions is expected to bring the model much closer to the observations even when the model and data are flawed. This article should show that the behaviour of the inversion system is well understood, which I am not convinced. In the NOx inversion E, the simulated NO2 columns are increased by only about 10% over northern Germany and the Netherlands (based on Table 5 and Figure 6), despite large emission increments (>20%, possibly much more). Why this lack of sensitivity? Given the relatively long NOx lifetime in winter, there could be a strong dependence of the columns on the emissions during the preceding days. Does the system account for that? Something makes it difficult for the inversion system to reproduce the observations. All I ask is that the cause(s) for that inability are identified.

It is stated on l. 476-482 that the discrepancies might have different causes including biases in the observations, in the emissions, and in the model. Nevertheless, the basic assumption of inverse modelling is that errors in the emissions play the dominant part. Therefore, we expect a substantial reduction of the bias after inversion, at least if the observations do not have huge uncertainties. What are the relative uncertainties in the NO2 observations used here? The fact that only few iterations were needed to reach near-convergence (Table 3) indicates that the errors in the observations are very high, and this could partly explain the poor performance of the inversion. Please clarify.

Hide

AR by Audrey Fortems-Cheiney on behalf of the Authors (08 Jan 2021) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (27 Jan 2021) by Ignacio Pisso

RR by Anonymous Referee #1 (27 Jan 2021)

ED: Publish subject to minor revisions (review by editor) (27 Jan 2021) by Ignacio Pisso

The reviewer received your latest updates and now recommends publication after minor revisions are addressed:

From the reviwer:

"I thank the authors for investigating the lack of sensitivity through additional computations. Those tests are useful. Of course there is a strong non-linearity as the increased NOx emissions tend often to increase OH and decrease the lifetime of NOx. There might be other chemical effects, which would require a more detailed scientific study, as the authors state in the conclusions. Besides chemical effects, the lack of sensitivity is further enhanced by 1) the (presumably small) contribution of non-anthropogenic emissions, and 2) the contribution of emissions during the preceding days. I think this should be mentioned in the manuscript.

I can't say I agree entirely that the inversion system fails to reproduce the patterns of the observations **because** of this non-linearity. It does not help, of course, in the sense that very large emission increments are needed to overcome the negative feedbacks and match the observations. And very large increments are penalized in the cost function. In your test with anthropogenic emissions multiplied by 3, the relative emission increment was uniform, whereas in the optimisation, the system is free to modify the emission distribution. In a test with infinite a priori emission errors and no spatial correlation, the system would very probably do a better job. In your setup, with correlations and conservative emission error estimates, the system finds a compromise (which is perfectly reasonable). I would appreciate if the discussion could reflect the fact that the choice of errors and correlations has a likely strong impact on the results."

Hide

AR by Audrey Fortems-Cheiney on behalf of the Authors (28 Jan 2021) Author's response Author's tracked changes Manuscript

ED: Publish subject to minor revisions (review by editor) (25 Feb 2021) by Ignacio Pisso

There seem to be missing corrections asked by referee 2 from version 4 of the manuscript onwards:

These points seem not to be addressed neither in gmd-2019-186-author_response-version2.pdf
nor in gmd-2019-186-manuscript-version[5-7].pdf
Maybe they were not included in the uploaded version? Or am I missing something?

Until 33) the points are addressed in gmd-2019-186-manuscript-version5.pdf
The following points seem to be still missing:

34) Line 259: “standard deviation coefficient”. Please clarify. Is it really a coefficient? And since this is an error covariance matrix, should the diagonal elements be error variance not error standard deviation?
35) Line 260-262: Very important statement. But please elaborate or rephrase. What is standard deviation of the uncertainty?
36) Line 266: “variances”. Are these error variances?
37) Line 270: “ error correlation between fluxes of CO and NOx, are not coded yet”. Please elaborate on its potential effect on your estimation?

38) Line 296: How about calling this “Observation Operators”?

40) Section 3.4. I think this is very relevant. Please elaborate Figure 3. In its current form, it is not clear what this Figure represents and how we can use it to interpret results. I think coding of these operators is a vital step in the assimilation and should be given more emphasis. Are these utilities also available? How good are the adjoints of these operators? Are there tests to diagnose their accuracy?

The reviewer asked for clarifications on figure 3 but it was removed altogether, and further clarifications seem to be missing. Were those considered in an intermediate version?

42) Line 314-318: Please highlight in your notations if these are scalars or vectors. And please add corresponding dimensions. What is the difference between small (c_m(o)) and big C_m(o. What is x_a?

Matrices, vectors and scalars are still indicated with the same typeface in v7.

43) Line 328-334: This is also informative. Is there a reference for parallelization approach in PYVAR and CHIMERE? How does it scale with more CPUs? 4 hours seem to be a long time, isn’t it? Please elaborate and compare with other systems.

47) Figure 6 and 7. Is it possible to show difference plots? And more statistics (RMSEs, correlation, bias? Error reduction? Are these really surface concentrations? They are column measurements, right? What about initial conditions? Has this change as well since these are part of the control vector? Superscript on units?

I agree with the reviewer the it is difficult to tell the differences apart. Could at least a suitable colour map be used in order to visually appreciate the differences?

48) Section 4.2. Should this be presented prior to section 4.1.3 since some of the plots are for the posterior estimates?
49) Section 4.2.1. Can this be summarized in a table and discuss a little bit in the text as to the rational of the choice of these parameters? Am I to assume that NOx emissions are estimated only for 1 day, and all days are the same? For CO, what do you mean by 7-day? Average? How are emissions incorporated in CHIMERE in terms of time? Is there a distribution? i.e., diurnal and weekly cycle?
These points seem unchanged, has the explanation been lost? The table may be not necessary, but some questions are unanswered.

51) Section 4.2.3. Is it possible to break down the components of J? How about emission error reduction? How do you ensure that these increments are “resolved by the observations”. It would be great to see error reduction plots, if posterior error covariances are calculated. How about initial conditions? Did this change as well?

Only part of the questions are addressed in the text.

52) Line 508-516. What is the implication of this to overall cost and computing and optimality of minimization including error correlation of CO and NOX (and spatial correlation against superobbing) as well as increase in dimension of control vector? This also entails using this system at higher spatiotemporal resolution, right? It would be great to have a section on limitations before future implications

Part of the question is still not addressed.

Hide

AR by Audrey Fortems-Cheiney on behalf of the Authors (03 Mar 2021) Author's response Author's tracked changes Manuscript

ED: Publish as is (23 Mar 2021) by Ignacio Pisso

AR by Audrey Fortems-Cheiney on behalf of the Authors (06 Apr 2021)

Short summary

Up-to-date and accurate emission inventories for air pollutants are essential for understanding their role in the formation of tropospheric ozone and particulate matter, for anticipating pollution peaks and for identifying the key drivers that could help mitigate their emissions. Complementarily with bottom-up inventories, the system described here aims at updating and improving the knowledge on the high spatiotemporal variability of emissions of air pollutants.