the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A standardized methodology for the validation of air quality forecast applications (F-MQO): Lessons learnt from its application across Europe
Lina Vitali
Kees Cuvelier
Antonio Piersanti
Alexandra Monteiro
Mario Adani
Roberta Amorati
Agnieszka Bartocha
Alessandro D'Ausilio
Paweł Durka
Carla Gama
Giulia Giovannini
Stijn Janssen
Tomasz Przybyła
Michele Stortini
Stijn Vranckx
Philippe Thunis
Abstract. A standardized methodology for the validation of short-term air quality forecast applications was developed in the framework of FAIRMODE activities. The proposed approach, focusing on specific features to be checked when evaluating a forecasting application, investigates the model capability to detect sudden changes of pollutants concentrations levels, to predict threshold exceedances and to reproduce air quality indices. The proposed formulation relies on the definition of specific forecast Modelling Quality Objective and Performance Criteria, defining the minimum level of quality to be achieved by a forecasting application when it is used for policy purposes. The persistence model, which uses the most recent observed value as predicted value, is used as benchmark for the forecast evaluation. The validation protocol has been applied to several forecasting applications across Europe, using different modelling paradigms and covering a range of geographical contexts and spatial scales. The method is successful, with room for improvement, in highlighting shortcomings and strengths of forecasting applications. This provides a useful basis for using short-term air quality forecast as a supporting tool for correct information to citizens and regulators.
Lina Vitali et al.
Status: open (until 28 Jun 2023)
-
CEC1: 'Comment on gmd-2023-65', Juan Antonio Añel, 05 May 2023
reply
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.htmlYou have archived the Delta Tool Software on a repository not suitable for scientific publication. Therefore, please, publish your code in one of the appropriate repositories (see those listed in our policy), and reply to this comment with the relevant information (link and DOI) as soon as possible, as they should be available for the Discussions stage.Also, you say that the relevant input/output data are available upon request. We can not accept this. All the data have to be published with the manuscript at the submission time, open and without restrictions to get access to them.If you do not fix these issues, we will reject your manuscript for publication in our journal. I should note that, actually, your manuscript should not have been accepted in Discussions, given this lack of compliance with our policy. Therefore, the current situation with your manuscript is irregular.Also, you must include in a potentially reviewed version of your manuscript the modified 'Code and Data Availability' section, with the DOIs of the code and data.Regards,Juan A. AñelGeosci. Model Dev. Exec. EditorCitation: https://doi.org/10.5194/gmd-2023-65-CEC1 -
AC1: 'Reply on CEC1', Philippe Thunis, 12 May 2023
reply
Dear Dr. Añel
Following up on your comments, we have created a zenodo repository where all data and tools relevant to the above mentioned publication can be freely downloaded. We also updated the section on code availability in the paper itself with a link to this repository. Let me know how to proceed with the updated manuscript.
Best regards
Philippe Thunis on behalf of all co-authors
Citation: https://doi.org/10.5194/gmd-2023-65-AC1 -
CEC2: 'Reply on AC1', Juan Antonio Añel, 12 May 2023
reply
Dear authors,
Thanks for your reply. However, the fact that you have created a repository does not solve the problem. You must publish here (in reply to this comment or my previous one) the link and DOI of the repository. Otherwise, we continue being unable to check if such a repository (and therefore your manuscript) complies with our policy, and any potential reader of the manuscript in Discussions will continue without access to it.
In this way, despite your reply, the situation about your manuscript has not changed, as your claim to have created a repository does not improve the situation of lack of accessibility to the code and replicability of your work. Therefore, if you do not reply by posting the necessary information, we continue having to reject your manuscript for publication.
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2023-65-CEC2 -
AC2: 'Reply on CEC2', Philippe Thunis, 12 May 2023
reply
Dear Dr Anel
The link to the repository is the following
Philippe Thunis, & Lina Vitali. (2023). Supporting data and tool, for the paper "A standardized methodology for the validation of air quality forecast applications (F-MQO): Lessons learnt from its application across Europe" (Version v1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7928289
Best regards
Philippe Thunis
Citation: https://doi.org/10.5194/gmd-2023-65-AC2 -
CEC3: 'Reply on AC2', Juan Antonio Añel, 12 May 2023
reply
Dear authors,
Thanks for adding the link to the repository. Unfortunately, as it is right now, it continues to not comply with our policy. Currently, in the Zenodo repository, you have deposited a binary file for the Delta Tool Sofware, not the code of the software. We need the code of the software. A binary file does not let us check if the code that you have used is correct or not.
Also, apparently, the Delta Tool is used to generate the output data to which you apply your methodology. Later in the text, you perform comparisons, etc. We need that you publish in the repository the software that you use for the remainder of computations in your work, not only the Delta Tool.
Also, part of the data that you have shared is in .CDF format. I do not know if the Delta Tool is able to read this format; however, it seems that it is a proprietary format for which specific proprietary software is necessary to read it. This precludes the replicability of your work. For example, I have not been able to open the CDF files that you have shared on my computer, so I can not check the data that they contain. Therefore, it would be good that you share them using a format that is open and complies with ISO international standards (e.g. .ods, .dat, .csv...)
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2023-65-CEC3 -
AC3: 'Reply on CEC3', Philippe Thunis, 19 May 2023
reply
Dear Dr. Anel
We have updated the repository accordingly. Please find below the new DOI link
Philippe Thunis, & Lina Vitali. (2023). Supporting data and tool, for the paper "A standardized methodology for the validation of air quality forecast applications (F-MQO): Lessons learnt from its application across Europe" (Version v2) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7949868
Please note that the methodology is included within the DeltaTool itself and that all data presented in the paper have been generated with this DeltaTool. They are all available in the repository together with the DELTA tool source code. Finally, Figure 6 has been produced with Excel with the corresponding data file available in FA5/additional_data.
Best regards
Philippe Thunis
Citation: https://doi.org/10.5194/gmd-2023-65-AC3 -
CEC4: 'Reply on AC3', Juan Antonio Añel, 21 May 2023
reply
Dear authors,
Many thanks for your reply and for providing the pending assets for your manuscript. We can now consider the current version of your manuscript compliant with our Code and Data Policy.
As a side note, if I guess it correctly, the Delta tool is written in the IDL language. It would be good if you could include in your manuscript the details about the IDL interpreter that you use for your work (including the full version number) if it is the proprietary one owned by Harris Geospatial or GLD from GNU.
Regards,
Juan A. Añel
Geosci. Model Dev. Executive Editor
Citation: https://doi.org/10.5194/gmd-2023-65-CEC4
-
CEC4: 'Reply on AC3', Juan Antonio Añel, 21 May 2023
reply
-
AC3: 'Reply on CEC3', Philippe Thunis, 19 May 2023
reply
-
CEC3: 'Reply on AC2', Juan Antonio Añel, 12 May 2023
reply
-
AC2: 'Reply on CEC2', Philippe Thunis, 12 May 2023
reply
-
CEC2: 'Reply on AC1', Juan Antonio Añel, 12 May 2023
reply
-
AC1: 'Reply on CEC1', Philippe Thunis, 12 May 2023
reply
-
RC1: 'Comment on gmd-2023-65', Anonymous Referee #1, 16 May 2023
reply
Review of "A standardized methodology for the validation of air quality forecast applications (F-MQO): Lessons learnt from its application across Europe" by Lina Vitali et al.
Major comments:
This paper presents a methodology for the validation of air quality forecast. The principle is based on the detection of sudden changes in pollutants concentrations, identification of threshold exceedances and capability to reproduce air quality indices. They apply their methodology to several forecasts across Europe. +The article is well-written and fairly long but contains problems of principle that make it unacceptable as is. These three major comments are described below. I suggest major revisions in order to give the authors the opportunity to improve the content of the study.
1. The study suggests that the collaborative work will really blend the different approaches and draw out a general and new message. The confrontation of the different strengths and weaknesses could produce a more rigorous way of validating any type of forecast. But it is in fact a succession of applications of the same method on several particular cases. There is no homogeneous conclusion between the different results.
2. The forecast qualification method (persistence model and the statistical scores) is really basic and contains nothing new. It is recommended to the authors to (i) propose new criteria to evaluate the forecasts, (ii) use the different forecasts and the different models to try to produce a more multi-model conclusion about the forecast and more independent of the case studied. The persistence model is too basic to provide information. The forecast must be validated by comparison to observations measurements with (D-1) (satellite, surface stations, soundings etc.). Comparing the forecast results to a persistence model that does not exist really, and is by principle erroneous, is a approach difficult to understand.
3. I don't really agree with the expression 'fit for purpose' used several times. A chemistry-transport model is supposed to reproduce correctly the physics and chemistry independently of its use: analysis, scenario or forecast. It can have different scores depending on its chemical mechanism, resolution and therefore representativeness. There is no discussion on the representativeness of the models used and therefore no real interpretation on the score values according to the cases studied and the models.
The conclusion is that additional tests are needed. Which ones and why not do them before submitting the work?
Minor comments
The introduction details well the different forecast platforms currently existing and their way to evaluate the quality of their forecast. Some sentences are very long and should be shortened. For example p.2, l.52
Finally, the main goal of this paper is to add a new validation’s methodology by calculating the model ability to simulate sudden changes and peaks of concentrations. First, it is not really new and second the use of the persistence model is not adapted to have realistic model vs observations comparisons.
2. Methodology
p.3 l.74: It is not always interesting to replace expression by acronyms. Too much acronyms tends to have a paper difficult to read. For example, you can explicitely write ‘air quality forecast’ in place of AQF.
p.4 l.109: please define MQO acronym in its first use.
2.1 It is questionable whether there is an interest to use the ‘persistence model’. A lot of models and institutes are now doing forecast, then the use of such basic approach has not interest. Of course, to compare any other method to the ‘persistence model’ must provide better results. Statistically, the hazard is also probably better than the persistence model.
The whole section 2 does not provide any novelty: all statistical scores are already well known and published, widely used in all operational centers. Please highlight what is really new in this study.
3. Forecasting applications:
Table 2: The models are more than dispersion models, being chemistry-transport models.
In general, the models presentation is very long and there is no comparison between the models (except the Table 2). To discuss the basis differences could provide more insight on the forecast results behaviour.
4. Results, lessons learnt and discussion
p.11 l.280: There is an effort to try to categorize the results according to the scale of simulation, European, national, regional, but this does not really lead to a discussion of the forecasting issues related to these scales: boundary conditions, nesting, emission resolution etc.
p.13: Figures 1, 2, 3 and 4 are very complex, small and difficult to read. I was not able to read the 'summary statistics' part. And it is not really commented in the document.
Citation: https://doi.org/10.5194/gmd-2023-65-RC1 -
RC2: 'Comment on gmd-2023-65', Anonymous Referee #2, 22 May 2023
reply
The paper presents an interesting extension of model validation approach based mainly to FAIRMODE activities and guidance documents.
Its novelty lies mainly in the extension of the already well documented methodology for forecast models. Also the case-studies documented and used in the paper are naturally interesting and also mostly new information.
MAJOR
The paper states correctly that the criteria for model performance evaluation should be dependent on the purpose of the model but it should be stated even more clearly in the paper that the presented methodology is just an addon to the existing methodology,
The suggested metrics is probably useful for providing practical information specifically on the model forecasting capabilities, but should be used ONLY together with the “standard” evaluation – not ever as a standalone measure of forecast models “fit-for-purpose” evaluation.
Although the presented methodology seems to provide interesting and useful information for assessing forecast performance, but
it should be more clearly stated that the metric it defines is just one “random” choice: comparing everything to persistence model is easy to do and provides some information on model forecast skills, but due to the extremely simplistic nature of persistence model, the method does not necessarily capture all relevant measures for evaluating model performance
Authors should state clearly, that this methodology (at least not yet at this stage) does not really “validate” anything: there is not a chance to form any clear and absolute criteria classifying model performance to acceptable/unacceptable based on the presented methodology, so there is NOT enough justification for recommending :
“Therefore, it is recommended that forecasting applications fulfill the standard assessment MQO, as defined in Janssen and Thunis (2022), as well as the additional forecast objectives and criteria, as defined within the new specific protocol.”
But, the methodology could be a good tool especially for model developers to find out forecast-specific issues in their models, especially features which don’ t show up in the standard statistical model evaluation . So my simple suggestion is, that the emphasis of the paper would be clearly stated to be more like experimenting with different, forecast-relevant statistical metrics, but not really claiming that the new index MQOf would already be proven to be good for serving as a real measure stating something definite on model forecasting skills or even comparing models with each other.
Instead, it would be good to add some discussions on the observed shortcomings of the methodology and suggest some improvements for it
MINOR
1 One of the main justifications for the methodology seems to be paper by Mittermaier et al, 2008. , Please add some sentences including also the weaknesses of this NWP-persistence approach , showing clearly that also with the NWp’s some issues were identified+ the main reason for not “officially” selecting some real model for reference seemed to be fear of failing too often against the reference, so the reason for suggesting persistence as reference was more a political compromise than choice justified by science.
2 I might have just got lost in the very busy results section, but I did not find any reference/results for standard MQO-scores for the case-studies? This would help to understand how the model quality is related to model forecasting skill
3 figures should in general be much clearer: now there is obviously lot of info presented in those, but in many cases , simply impossible to read
Citation: https://doi.org/10.5194/gmd-2023-65-RC2
Lina Vitali et al.
Lina Vitali et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
418 | 63 | 24 | 505 | 4 | 3 |
- HTML: 418
- PDF: 63
- XML: 24
- Total: 505
- BibTeX: 4
- EndNote: 3
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1