Evaluation of air quality forecasting system FORAIR_IT over Europe and Italy at high resolution for year 2017

Air pollution represents a global threat leading to large impacts on health and ecosystems and many European areas still show a poor air quality. Many measures and policies have been adopted in the past decades at European, national, regional and even local level and many tools have been developed to tackle this issue. Among these tools, the European Air Quality Directive places more emphasis on the use of models for air quality assessment and management. Within this context, air quality forecasting systems play an important role in supporting decision makers when short-term actions are required to reduce 5 human health risks by limiting population exposure. In this framework, at European level within the Copernicus Atmosphere Monitoring Service (CAMS), the regional air quality models participating to the service provide 4-day daily forecasts of the main atmospheric pollutants concentrations, in the lowest layers of the atmosphere. This work presents the development and the performances evaluation of FORAIR_IT, an high-resolution air quality forecasting system operating at both European (20 km) and Italian (4 km) scales. Its skill results are compared with CAMS_50 interim ensemble reanalysis (IRA) ones and a long 10 lasting PM10 exceeding event, occurred in Emilia Romagna region in October 2017, is studied in more detail. Results show similar skill scores between FORAIT_IT and CAMS_50. Comparing the annual average of the monthly Root Mean Square Error Difference (RMSED) between FORAIT_IT first forecast day and CAMS_50, over the European domain the RMSED is 0.6,1.7,1.4 and 7.4 μg/m3 for daily mean PM10 and PM2.5 and daily maximum for NO2 and O3, respectively, while over the Italian domain it is -1.2,0.3,-4.3 and 3.8 μg/m3. The importance of increasing model resolution in the region of interest is 15 highlighted by the lower values of RMSED over Italy with respect to Europe. The results obtained by the detailed analysis of the PM10 exceeding event suggests the crucial role of the meteorological forcing in capturing both the timing and the intensity of the exceedances. As far as we know FORAIR_IT is the first forecasting system at high spatial resolution at Italian National level.

and originally it started on the track of QualeAria forecasting system (http://www.qualearia.it; D'Allura et al.), operating since 2007, then taking an autonomous path linked also to the computing infrastructure of ENEA CRESCO. The new forecasting system, whose upgrades are now funded by the Italian Ministry of Environment, is maintained and operated by ENEA (Italian National Agency for New Technologies, Energy and Sustainable Economic Development) completing the modelling tools of national air quality to support the implementation of the Air Quality Directive, as stated by the Italian law (Legislative Decree 5 155/2010). Section 2 describes the FORAIR_IT atmospheric modelling system and the used setup, while Section 3 shows the evaluation of forecasts at European and national levels for the year 2017. Section 4 discusses extensively a particulate matter (PM) episode in the Emilia Romagna Region, in the heart of Po Valley, an area often experiencing high pollution levels due to high anthropogenic emissions and unfavourable meteorological conditions. The last section summarizes the performances of forecasting system and foreseen developments are discussed.
10 2 Air quality forecasting system: description and setup The components of the modelling system are the same as in AMS-MINNI (Atmospheric Modelling System -Italian National Integrated Model to support the international negotiation on atmospheric pollution), whose performances have been evaluated in several studies over Italy (Mircea et al., , 2016Ciancarella et al., 2016;Vitali et al., 2019) and in the intercomparison exercise EURODELTA-III over Europe (Bessagnet et al., 2014(Bessagnet et al., , 2016). The modelling system is based on the chemical transport 15 model FARM -Flexible Air quality Regional Model - (Silibello et al., 2008(Silibello et al., , 2012Gariazzo et al., 2007;Kukkonen et al., 2012;Adani et al., 2015), the meteorological driver RAMS -Regional Atmospheric Modelling System - (Cotton et al., 2003), the pre-processor SURFPro -SURFace-atmosphere interface PROcessor - (Arianet, 2011) and the emission processor EMMA -EMission MAnager - (Arianet, 2014).
FORAIR_IT forecasts on daily basis the concentrations in the three following days. Simulations are carried out, in two-way nesting mode, on the two domains shown in Fig. 1. The first one (EU) covers Europe at about 20kmx20km spatial resolution; the second one (IT) covers Italy at 4kmx4km resolution. This study focuses on forecast assessment of four pollutants (PM10, PM2.5, O 3 , NO 2 ) over both domains. RAMS is a prognostic non-hydrostatic meteorological model, based on terrain following vertical coordinate system, mul-5 tiple grid two-way nesting scheme and nudging techniques. In FORAIR_IT application it operates with a polar stereographic grid centred over Italy and 35 unevenly spaced levels in sigma coordinates. Convection has been parametrized following Kuo (1974) on the coarse grid and explicitly solved on the fine one.
SURFPro is a diagnostic meteorological post-processor for the calculation of turbulent parameters. Different schemes are available to compute Planetary Boundary Layer (PBL) scaling parameters and horizontal and vertical diffusion coefficients. 10 Concerning mixing height evaluation, the maximum value between outcomes of Maul (1980) version of Carson (1973) encroachment method and mechanical mixing height by Venkatram (1980) has been used for daytime estimate. Instead, for night-time mixing height, the minimum value has been chosen between Nieuwstadt (1981) extension of Zilitinkevich (1972) formula and Venkatram (1980) empirical relationship results. The vertical diffusivity has been parameterized following Lange (1989), as implemented in CMAQ-Models3 (Byun et al., 1999), while the horizontal one is the sum of the values computed 15 using Smagorinsky (1963) formulation and an empirical function dependent on local stability class and wind speed. The chemical transport model FARM is a three-dimensional Eulerian model. It includes transport (advection and turbulent diffusion), gas phase chemistry by means of SAPRC-99 chemical scheme (Carter, 2000) and in-cloud chemistry. Aerosol processes are simulated by means of AERO3 (Binkowski, 1999;Binkowski and Roselle, 2003) for aerosol dynamics, ISOR-ROPIA (Nenes et al., 1998;Fountoukis and Nenes, 2007) for aerosol inorganic chemistry and SORGAM (Schell et al., 2001) 20 for secondary organic aerosol production. Wet and dry removal processes are also taken into account. The former is computed according to EMEP model approach (EMEP, 2003), distinguishing between in-cloud and below-cloud removal, the latter is evaluated by means of the well-known resistance model (Seinfeld and Pandis, 1998). It runs with automatic time step that is internally computed to satisfy the Courant-Friedrichs-Lewy (CFL) criteria.
Within FORAIR_IT, FARM has been implemented using 16 unevenly spaced Z-levels up to 10000 m, with the first one centred 25 at 20 m above the ground.
The forecast system needs chemical and meteorological boundary conditions as external real-time input data: chemical data are downloaded from the global service (C-IFS) of Copernicus Atmospheric Service (CAMS, http://atmosphere.copernicus.eu) using ftp protocol at 15 UTC corresponding to the 00Z run with a spatial resolution of 80 km; meteorological data come from and Correlation Coefficient (CORR) in panels a2(b2), a3(b3) and a4(b4) as a function of forecast day (D0, D1, D2),respectively. Monthly RMSE results, computed on daily averages for CAMS_50 and FORAIR_IT D0, are shown in Fig. 2  Europe, which was mainly driven by a significant long-range transport contribution (Tarasson et al., 2018;Fagerli et al., 2019).
Hourly RMSE values ( Fig. 2 a2), as a function of forecast time, present a daily cycle with two minima (one in the morning 15 and one in the afternoon) and two maxima (around midnight and midday, respectively). This behaviour can be observed for CAMS_50 results too but with a better defined bell-shape daily cycle. Moreover, in CAMS_50 cycle, midday maximum is as high as midnight one, whereas FORAIR_IT daily cycle has midnight maximum substantially higher than midday one, especially for the median and the 75 th percentile. RMSE performances remain almost unchanged as function of forecast time, even if a slightly positive trend can be observed and may be due to the decrease of accuracy in describing atmospheric forcing and 20 boundary conditions. Overall RMSE CAMS_50 results are better than FORAIR_IT ones; anyway ca. 3 µg/m 3 mean distance between median values may be considered an acceptable outcome considering the inherent nature diversity of the two realizations.
MMB values (Fig. 2 a3)  CORR (Fig. 2 a4) ranges from 0.6 in the late morning of the first day of forecast to ca. 0.35 in the afternoon of the third day.
As MMB, CORR decreases as a function of the forecast time and it is the statistical metric most affected by the forecast length.
CAMS_50 shows a better CORR than FORAIR_IT and this may be due to the assimilation of some of the observations. In summary, over the European domain, FORAIR_IT performance is comparable with CAMS_50 one on monthly basis RMSE, PM10 RMSED values are 0.6 and -1.2 µg/m 3 for European and Italian domain, respectively. The negative value highlights the 15 better perfomances of FORAIR_IT with respect to CAMS_50 on the Italian domain with respect to the European one, meaning how a higher resolution of the computational grid in the region of interest increases the forecast performances. Moreover the higher consistency between FORAIR_IT and CAMS_50 results is probably related to the fact that in CAMS_50 simulations the observations over Italy were not assimilated since they were not available on hourly basis. Table 2 shows RMSE, BIAS and CORR per forecast day (D0, D1, D2) and station area (rural, suburban, urban). As expected all the skills deteriorate as function of forecast time. However, the accuracy reduction is quite limited suggesting that the prediction could be extended to more days in advance. Highest errors (RMSE and BIAS) are found for background urban stations despite the higher correlation. Overall PM10 performances are consistent with the results obtained by other air quality 5 forecasting systems (e.g. Honoré et al., 2008;Žabkar et al., 2015). Group of panels a and b of Fig. 3 shows skill scores for PM2.5 computed over Europe and Italy respectively.
Monthly RMSE values, computed on European domain using daily averages of CAMS_50 results and of FORAIR_IT first

25
The CORR as function of the forecast time (panel a4 in Fig. 3) exhibits an accentuated daily cycle for FORAIR_IT simulation that is not present for CAMS_50 outcomes. CORR daily cycle shows the highest values at 9 and the lowest ones at 17 of each forecast day. A slight decrease of FORAIR_IT CORR values along with the forecast time is also observed: the highest median value is about 0.6 during the first forecast day and the lowest value is about 0.47 during the last one. This behaviour is quite similar to PM10 one, as well as the fact that CORR is the metric where the differences between FORAIR_IT and CAMS_50 30 performances are the highest. Mean FORAIR_IT CORR for the first forecast day is about 0.52 whereas CAMS_50 one is higher than 0.8. CORR is probably the metric most affected by the assimilation process; indeed, as already observed for PM10 outcomes, differences between FORAIR_IT and CAMS_50 correlation performances are not so evident, when computed over the Italian domain (panel b4 in Fig. 3).
In general, most of the FORAIR_IT skills show better results when computed on the Italian domain (group of panels b in 35 Fig. 3), regarding both the comparison with observations and the inter comparison with CAMS_50 performances.
Monthly RMSE values computed using first day forecast daily averages (panel b1 in Fig. 3) show the same seasonal cycle, already observed on the European domain, but the values are lower: the highest error is found in January (26 µg/m 3 ) and the lowest one in May and September (7 µg/m 3 ). Overall FORAIR_IT and CAMS_50 outcomes present the same pattern.
FORAIR_IT skills are slightly better than CAMS_50 ones during winter months, whereas they are slightly worse in spring.

5
FORAIR_IT RMSE as function of forecast day (panel b2 in Fig. 3) exhibits a slight increase along with the forecast time, however also for the last day of forecast the median value of FORAIR_IT RMSE is still lower than CAMS_50 one.
Also for the MMB skills (panel b3 in Fig. 3) FORAIR_IT performs better than CAMS_50: MMB median value is very close to 0 even if a slight degradation appears along the forecast days.
CAMS_50 performs better than FORAIR_IT only with respect to correlation (panel b4 in Fig. 3   Panels a and b of Fig. 4 show skill scores for Nitrogen Dioxide (NO 2 ), computed over Europe and Italy respectively.
On the European domain, the RMSE as function of months for FORAIR_IT D0 realization (panel a1 in Fig. 4)  Hourly RMSE values (panel a2 in Fig. 4) show very similar daily cycles for FORAIR_IT and CAMS_50 simulations, with 15 two peaks, one in the morning and one in the evening. The highest value is in the morning for CAMS_50, and in the evening for FORAIR_IT; moreover, in this case, a slight increase of FORAIT_IT RMSE values from D0 to D2 is also observed.
MMB median values (panel a3 in Fig. 4)  Also hourly RMSE skills (panel b2 in Fig. 4) are quite in agreement with performances on European domain. In particular, the same daily cycle is observed, with both the morning and the evening peaks. However values are higher, especially as far as 35 the evening peak is concerned. FORAIR_IT daily cycle is consistent with CAMS_50 one, even if the latter presents a wider range of variation. In particular CAMS_50 has two similar maxima in the morning and in the afternoon (about 20 µg/m 3 ) and FORAIR_IT has a lower maxima in the morning (16 µg/m 3 ) and similar to CAMS_50 maxima in the afternoon. As far as the minima are concern, CAMS_50 shows better skill than FORAIR_IT with morning values reaching 11 µg/m 3 and 14 µg/m 3 and afternoon values about 10 µg/m 3 and 12 µg/m 3 for CAMS_50 and FORAIR_IT respectively.

5
MMB skills (panel b3 in Fig. 4) in FORAIR_IT and CAMS_50 realizations show different daily cycles, as it occurs for PM10, PM2.5 and NO 2 over Europe. Furthermore, here the two curves are quite anti-correlated, a part from a small shift. Overall MMB skills show that underestimation occurrences are more frequent than overestimation ones, especially for CAMS_50. Performances trends over the three days forecast period do not show any degradations, indicating that forecast might be extended further without losing much accuracy.

10
According to correlation skills (panel b4 in Fig. 4)  for PM10 and PM2.5, may be due to either meteorological conditions, being more dispersive along with the forecast time, or to global boundary conditions, underestimating more in D2 than in D0. Generally, the best performances in terms of all the met-5 rics are found for rural stations and again the overall scores suggest that the forecast may be acceptable for more than three days.
19 https://doi.org/10.5194/gmd-2020-54 Preprint. Discussion started: 11 March 2020 c Author(s) 2020. CC BY 4.0 License. (on Italian domain) of the error associated to CRMSE. However, for some stations problems arise also in capturing the large observed daily excursion since the 5% of the stations has skill variance less than 0.7.
Correlation coefficient values are shown in panel a4 of Fig. 5 where a slight decrease of performances can be noticed along Europe as the reason of the performances deterioration in this area during the winter. Again, FORAIR_IT values are higher than CAMS_50 ones but the differences are less accentuated over Italy than over Europe.  5). Moreover, both these skills show a slight deterioration along with the forecast time.

10
In conclusion, the comparison between FORAIR_IT and CAMS_50 performances over Italian domain shows similar results with respect to European one, even if differences are less accentuated, probably due to a better representations of the processes thanks to the increase of spatial resolution on Italian FORAIR_IT domain.This is also evident from RMSED values that are 7.4 and 3.8 µg/m 3 over European and Italian domain, respectively.

PM episode on Po valley
Air pollution in Italy is one of the main environmental issues due to the high number of days with daily mean PM10 con- concentration underestimation is partly compensated by an overestimation of PM2.5, while from 17 th October to the 28 th the modelling system starts to underestimate also PM2.5, and consequently, the overall agreement between observed and predicted PM concentrations worsens. It is interesting also to highlight that during the considered period the variability of the observations increases, as evidenced by the increase in their standard deviation, while the model variation does not reflect this high variability. This could reflect local features in some meteorological variables that the model is not perfectly able to forecast (although the model well reproduces the absence of precipitation recorded over the studied period) or the effects of other variables, such as emission sources, that do not allow the accurate reconstruction of local processes. In order to understand the differences between observed and simulated variability the PM10 average concentration over the period 9 th -22 nd October was analyzed. The model average forecast (upper 15 panel 6) shows a typical regional distribution of PM with higher concentrations expected close to the Po Valley (in the north of ER region) and lower concentration in the south where the Appennini mountains chain begins. The model predictions highlight also an area with a strong gradient of PM10 concentration (around 15 µg/m 3 in few kilometers) stretching from NW to SE in ER region. This high gradient area may be particularly challenging for the model and specific analysis were then conducted to analyze its variability over the period under consideration. Daily PM10 concentration averages were computed for the period 20 7 th -24 th of October in order to evaluate the position of this gradient area over ER. Fig. A1 (in supplementary material) shows that, during the considered days, the advection of PM10 from the NW of Po Valley is highly variable with also significant inter day variability. In particular the highest PM10 concentrations are found over Lombardy region and they are transported toward ER region. The concentration plume changes several time its directions, from NW to SE, then to E and then again to SE. Consequently the highest gradient area shifted several times its position over ER with usually the north east part of ER  The observed and modelled PM2.5 composition is presented in Fig. 8 to capture the actual spatial and temporal variability in meteorological conditions results clear. However, despite the incomplete agreement in the absolute values between the model and the regional monitoring network, FORAIR_IT was able to predict the first peak of air pollution associated with stable meteorological conditions.

Summary and conclusions
Despite many efforts in reducing anthropogenic emission some exceedances of the European Directive standard limit value are 25 still recorded. In Italy, one of the main environmental concern, especially in the Po valley, are exceedances of the daily mean PM10 limits. Therefore, the forecasting of such events may be crucial for protecting both envirnoment and human health by applying special mitigation procedures. In the present work, the FORAIR_IT air quality forecasting system has been presented. Eventually, the analysis of the specific ER event shows that the capacity of capturing both the large scale processes and the small scale ones is crucial in order to capture the variability and the timing of the exceedances. In conclusion, FORAIR_IT may 10 be considered an up to date forecasting system whose skills are comparable with the ones presented in Copernicus (CAMS) program, and as far as we know it is the first forecasting system at high spatial resolution proposed at Italian National level.
Furthermore, FORAIR_IT may be a useful tool to be used by the policy makers in order to apply extraordinary procedure to prevent/mitigate exceedances. However, as forecasting systems require continuous developments and updates to release increasingly accurate forecasts, the following development are under investigation: the operationally evaluation of the first day of forecast with one day delay with respect to the production; the assimilation of insitu chemical measurements in order to better initialize the forecast; RMSE is not a dimensionless variable and requires knowledge of typical mean values to be interpreted, being also strongly dominated by the largest values, due to the squaring operation.

RM SE
BIAS gives information of systematic error of the model but, as well as the RMSE, depends on typical mean values of the pollutants taken into account.
Correlation is a statistical dimensionless indicator that measure the extent to which two or more variables fluctuate together.
bounded by the values -2 and +2.