Implementation and application of Ensemble Optimal Interpolation on an operational chemistry weather model for improving PM2.5 and visibility predictions
Abstract. The data assimilation technique is one of the important ways to reduce the uncertainty of atmospheric chemistry model input and improve the model forecast accuracy. In this paper, an ensemble optimal interpolation assimilation (EnOI) system for a regional online chemical weather numerical forecasting system (GRAPES_Meso5.1/CUACE) is developed for operational use and efficient updating of the initial fields of chemical components. A heavy haze episode in eastern China was selected, and the key factors affecting the EnOI, such as localization length-scale, ensemble size, and assimilation moment, were calibrated by sensitivity experiments. The impacts of assimilating ground-based PM2.5 observations on the model chemical initial field and PM2.5, visibility forecasts were investigated. The results show that assimilation of PM2.5 significantly reduces the uncertainty of the initial PM2.5 field. The mean error and root mean square error (RMSE) of initial PM2.5 for mainland China have all decreased by more than 75 %, and the correlation coefficient could be improved to more than 0.95. Even greater improvements appear in North China. For the forecast fields, assimilation of PM2.5 improves PM2.5 and visibility forecasts throughout the lead time window of 24 h. The PM2.5 RMSE can be reduced by 10 %–21 % within 24 h, but the assimilation effect is most obvious in the first 12 h. The assimilation moment chosen at 1200 UTC is more effective than that at 0000 UTC for improving the forecast, because the discrepancy between simulation and observation at 1200 UTC is larger than that at 0000 UTC, indicating the assimilation efficiency will be higher when the bias of the model is higher. Assimilation of PM2.5 also improves visibility forecast accuracy significantly. When the PM2.5 increment is negative, it corresponds to an increase in visibility, and when the PM2.5 analysis increment is positive, visibility decreases. It is worth noting that the improvement of visibility forecasting by assimilating PM2.5 is more obvious in the light pollution period than in the heavy pollution period, since visibility is much more affected by humidity during the heavy pollution period accompanied by low or extreme low visibility. To get further visibility improvement, especially for extreme low visibility during severe haze pollution, not only PM2.5 but also relative humidity should be simultaneously assimilated as well.
Siting Li et al.
Status: final response (author comments only)
RC1: 'Comment on gmd-2022-207', P. Armand, 16 Nov 2022
- AC1: 'Reply on RC1', Ping Wang, 08 Apr 2023
RC2: 'Comment on gmd-2022-207', Anonymous Referee #2, 09 Dec 2022
- AC2: 'Reply on RC2', Ping Wang, 08 Apr 2023
Siting Li et al.
Siting Li et al.
Viewed (geographical distribution)
The article by Siting et al. is about the implementation of an Ensemble Optimal Interpolation (EnOI) method in the numerical chemistry & weather prediction system GRAPES_Meso5.1 / CUACE and the application of the method to try to improve PM2.5 and visibility forecasts of pollution episodes in Eastern China. The authors strive to calibrate the parameters of the EnOI, namely the "localization length scale", which is the spatial range of the assimilation, the ensemble size, and the time at which assimilation should be carried out. They also investigate the impact of assimilating PM2.5 observations on the simulated PM2.5 concentration field. The mean error (ME) and the root mean square error (RMSE) on the initial PM2.5 concentration field is reduced when assimilating the data. According to the authors, the forecasts of the PM2.5 concentration and visibility fields are also improved throughout the lead time window of 24 hours, especially when the assimilation time is 1200 UTC because the discrepancy between simulation and observation is larger compared to 0000 UTC. Again according to the authors, the visibility forecasts by assimilating PM2.5 are further improved for light pollution episodes in comparison with heavy pollution episodes, which are more affected by humidity. Thus, for extreme low visibility during severe haze pollution, the authors recommend to assimilate both PM2.5 and humidity observations.
The work carried out by Siting et al. essentially consists of optimal interpolation where the model error covariance matrix is evaluated by an ensemble approach with the members of the ensemble being previous timeframes of the PM2.5 concentration field. Beyond the fact that the PM2.5 field which assimilates data at a given time and at the close following times actually presents better statistics compared to the observations, the results obtained do not seem very convincing to me. However, this work constitutes an approach which deserves to be studied and is therefore worthy of publication. The article could be improved by taking into account the following remarks.
L55 to 61 – Instead of listing a very large number of references, authors should briefly indicate what they contain.
L95 – Develop the acronym "EnOI" at least in the title of the section.
L102 – I don't understand the point of the double notation "x" and "psi". If there is a good reason, the authors should give it, otherwise the notations should be simplified.
L110 - In fact, the ensemble is used to estimate the error covariance matrix of the model. The authors could say it simply and directly.
L112 – The relationship between "psi_a", "psi_f" and "psi_i" should be given.
L113 - What the acronym "AF" corresponds to is not indicated (even if we understand well). The many acronyms used throughout the text should be made explicit and, in my opinion, fewer abbreviations should be used.
L114 – As the scalar "alpha" is used to weight the model and the observations, one would expect to see "1 – alpha" in formula (6). Are the authors sure about this formula?
L118 – Although the use of a length scale or spatial range in data assimilation is understood, this concept is poorly introduced. The expression "to avoid all observations…" is not clear at all and needs to be rephrased.
L118 – What does the "localization scheme" look like? The authors should give the formula of it.
L125 – We understand later in the reading of the article how the ensemble of PM2.5 concentration fields is constructed. It would be better if the authors explained it in this section of the article.
L129 – The sentence "Compared with the traditional EnOI..." can only be understood after reading the rest of the article. This sentence should be explained at this point in the article.
L134 – The horizontal and vertical dimensions and resolutions of the simulation domains used by the GRAPES_Meso5.1 and CUACE models should be indicated.
L166 – What do the authors call a "warm" restart? Is that really the right term?
L183 – Finally it is said that "the N hourly model forecasts before the assimilation moment were used as the ensemble samples to approximate" the model error covariance matrix. It could have been mentioned before.
L207 – If we look at figure 4, it seems very difficult to me to see what is the optimal "localization length scale" especially since the metrics on the correlation (CORR) and the errors (RMSE, MB and ME) are close.
L248 – This is the same problem as for my previous remark. The number of members in the ensemble seems to me chosen in a practical, if not "ad hoc", way. It should be studied whether other pollution episodes lead to the same choice of parameters "ensemble size" and "localization length scale".
L265 – Thanks to the authors for expanding the acronyms to facilitate reading.
L271 – What is a "sheet-like" concentration distribution? Is it really the correct term?
L287 – In Figure 8, there are very significant differences between the forecasts both with and without data assimilation and the observations, in particular on December 19 and December 23. Not only the amplitude, but mainly the dynamics of the concentrations are very different. Do the authors have an explanation on this point?
L391 – The final part of the sentence "… and relatively consistent in (20)" seems to be lacking. Please, correct.
L305 – Figure 9 does not seem to me to show convincingly that the assimilation at 1200 UTC (DA12) is better than the assimilation at 0000 UTC (DA00). Have the authors looked at, from a more fundamental point of view, why this should be the case?
L311 – Is it a general property that assimilation at 1200 UTC would be better than assimilation at 0000 UTC? The authors should be more careful about their assertion.
L323 – There is no linear relationship between visibility and PM2.5 concentration. Thus, it is not surprising that the result of assimilation improves the result on visibility for light pollution episodes, whereas this improvement does not exist or is insignificant for heavy pollution episodes.
L334 – Also in Figure 11 (as reported for Figure 8), there are large discrepancies between the PM2.5 concentration predictions with or without data assimilation and the observations. Would the authors have an explanation on not only the amplitudes, but above all the dynamics which are very different?