Forecasting of regional methane from coal mine emissions in the Upper Silesian Coal Basin using the on-line nested global regional chemistry climate model MECO(n)(MESSy v2.53)

Methane is the second most important greenhouse gas in terms of anthropogenic radiative forcing. Since preindustrial times, the globally averaged dry mole fraction of methane in the atmosphere has increased considerably. Emissions from coal mining are one of the primary anthropogenic methane sources. However, our knowledge about different sources and sinks of methane is still subject to great uncertainties. Comprehensive measuring campaigns, as well as reliable chemistry climate models, are required to fully understand the global methane budget and to further develop future climate mitigation 5 strategies. The CoMet 1.0 campaign (May to June 2018) combined airborne in-situ, as well as passive and active remote sensing measurements to quantify the emissions from coal mining in the Upper Silesian Coal Basin (USCB, Poland). Roughly 502 kt of methane are emitted from the ventilation shafts per year. In order to help the campaigns flight planning, we performed 6day forecasts using the on-line coupled, three times nested global and regional chemistry climate model MECO(n). We applied three nested COSMO/MESSy instances going down to a spatial resolution of 2.8 km over the USCB. The nested global/regional 10 model system allows for the separation of local emission contributions from fluctuations in the background methane. Here we introduce the forecast setup and assess the model skill by comparing different observations with the individual forecast simulations. Results show that MECO(3) is able to simulate the observed methane plumes and the large scale patterns (including vertically integrated values) reasonably well. Furthermore we receive reasonable forecast results up to forecast day four.

model is required, which is provided by the on-line coupled model system "MESSyfied ECHAM and COSMO models nested n times" (MECO(n), Kerkweg and Jöckel, 2012b;Mertens et al., 2016). To increase the resolution of our forecasts, we apply a nesting approach with three simultaneously running COSMO/MESSy instances down to a spatial resolution of 2.8 km. Section 2.2 presents the model setup and the implementation of two different methane tracers. We describe the details of the new forecast system (Sect. 2.3) and discuss its evaluation. We evaluate the model performance by comparing the methane mixing 5 ratios simulated by the two finest resolved COSMO/MESSy instances with airborne observational data. In Sect. 3 we show the comparisons with data that were sampled using three different measuring methods during the CoMet 1.0 campaign. Moreover, we assess the forecast performance firstly by internal comparison of the individual forecast days with the analysis simulation of CoMet 1.0 (Sect. 4.1) and secondly by comparison of the forecast results with the observations of CoMet 1.0 (Sect. 4.2).

Model Description
The numerical global chemistry climate model ECHAM/MESSy (EMAC, Jöckel et al., 2010) consists of the Modular Earth Submodel System (MESSy) that is coupled to the general circulation model ECHAM5 (Roeckner et al., 2006). EMAC comprises various submodels that describe different tropospheric and middle atmospheric processes. It is operated with a 90 layer vertical resolution up to about 80 km altitude, a T42 spectral resolution (T42L90MA) and a time step length of 720 s. For our detailed evaluation of MECO(n) with respect to tropospheric chemistry is given in the fourth part of the MECO(n) publication series . In the present study we use MECO(3) based on MESSy version 2.53.
The MESSy submodel S4D (Jöckel et al., 2010) on-line samples the model results along a specific track of a moving object, such as air planes or ships. The simulation data is horizontally (and optionally also vertically) interpolated to the track and sampled at every time step of the model. This guarantees the highest possible output frequency (each model time step) of 5 respective vertical curtains along the track. The submodel SCOUT (Jöckel et al. 2010) on-line samples the model results as a vertical column at a fixed horizontal position. The high frequency model output is useful for comparison with stationary observations, such as ground-based spectroscopy or lidar measurements.

Model Setup
To resolve the local emissions from the ventilation shafts in the USCB, we operate MECO(n) with three nested instances,

Methane tracers
CoMet aims to quantify the methane emissions in the USCB region, which actually arise from coal mining. In order to separate these emissions within our model, we define two different methane tracers. One tracer takes into account all methane emission fluxes (hereafter called CH4_FX) and includes the background methane, which is advected into the model domain. The second tracer (hereafter called PCH4) only considers the point source emissions of the ventilation shafts. In this way, we are able 30 to trace back the methane enhancements of the first tracer CH4_FX (equivalent to what has been measured) to the coal mine emissions. Figure 3 shows an overview of both tracers, the involved submodels and the corresponding emission inventories. We Figure 1. Overview of all three COSMO/MESSy domains over Europe (CM50), over Central Europe (CM7) and over the USCB in Poland (CM2.8), as well as the corresponding temporal and spatial resolution. The black arrows indicate the data exchange between the different models. The driving model EMAC is nudged towards divergency, vorticity, temperature and the logarithm of surface pressure from ECMWF.
SST and SIC are prescribed as boundary conditions.  initialize these two independent tracers for EMAC and for all three COSMO/MESSy instances equally. The initial conditions for the forecast simulations are derived from a continuous analysis simulation, which is described in detail in Sect. 2.3.

CH 4 Point sources (PCH4)
The PCH4 tracer considers only point source emissions, that are emitted by the ventilation shafts of the various coal mines in the USCB. In Fig. 4   . Further details on the names and exact position of the different mines can be found in the Supplement. The total point source methane emissions in this area are estimated to be 465 kt/a (CoMet ED v1 inventory). Emissions of single coal 10 mines are split equally between the corresponding ventilation shafts. For the definition of point sources, we apply the MESSy submodel TREXP that is described in detail by Jöckel et al. (2010).

Gridded methane emissions (CH4_FX)
The second tracer is called CH4_FX and includes all methane emission fluxes, anthropogenic and natural. We use an inventory which consists of two different parts, both monthly averaged: the year 2012 of the EMPA inventory (Frank, 2018)   All ventilation shafts are gathered in the south-west of Poland close to the polish city Katowice and the Czech border. data are imported and transformed to the computational grid (IMPORT_GRID, Kerkweg and Jöckel, 2015). The emission fluxes are then converted into tendencies of the tracer CH4_FX (OFFEMIS, Kerkweg and Jöckel, 2012b, therein described as OFFLEM). Processes that are related to the methane chemistry in the model are described in the MESSy submodel CH4 (Frank, 2018). The submodel simulates the chemical loss of methane including the depletion by photolysis rate calculated by the submodel JVAL (Sander et al., 2014). The CH4 submodel uses predefined fields of the oxidation reaction partners OH, O 1 D 5 and Cl which, for our setup, are derived as monthly averages (2007-2016) from a previous interactive chemistry simulation and read by IMPORT_GRID.

The Forecast System
In order to achieve the best initial conditions of PCH4 and CH4_FX, the daily forecast simulations are branched from a continuous analysis simulation, which is essentially a hind-cast simulation until the start of the forecast day. In the analysis simulation 10 EMAC is nudged by Newtonian relaxation of temperature, vorticity, divergence and the logarithm of surface pressure towards the 6-hourly ECMWF operational analysis data. SST and SIC, derived from the same data set, are prescribed as boundary conditions for EMAC. The initial conditions of CH4_FX are derived as monthly climatological average (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) of the simulation SC1SD-base-01, which is similar to the RC1SD-base-10 simulation (described in detail by Jöckel et al. (2016)).
PCH4 is initialized with zero. The starting date of the analysis simulations is April 1st, 2018, which results in a spin up time 15 of 45 days. Nudging is applied in every model time step. The nudging fields (6-hourly data) and the prescribed SST and SIC (12-hourly data) are linearly interpolated in time. For this interpolation in time, starting and continuing the analysis simulation requires two nudging time steps ahead of the simulated time. An analysis simulation which should start at 00:00 UTC, hence requires the nudging data of the time steps 06:00 UTC and 12:00 UTC. Once the respective time period is simulated and the corresponding restart file is written, a new forecast simulation is triggered. The forecast branches as a restart from the analysis simulation and simulates a time period of six days by using the 6-hourly ECMWF operational forecast data for the EMAC nudging. PCH4 and CH4_FX are automatically initialized from the restart files. Throughout this process the analysis simulation continues. The forecast system is visualized schematically in Fig. 5. As soon as the pre-processed nudging files become 5 available, the analysis simulation runs for about 50 min. Each forecast simulation takes about 8 hours and the post processing takes another 1.5 to 2 hours. The 8 hours are for 144 message passing interface (MPI) tasks on an Intel Xeon E5-2680v3 based Linux Cluster (6 nodes à 12 dual cores), whereby 6, 18, 56, and 64 tasks were used for the model instances EMAC CM50, CM7, and CM2.8, respectively. In our example, a forecast that simulates a time period starting at forecast day one at 00:00 UTC, is readily post-processed on forecast day two at around 04:30 UTC (after approximately 28.5 hours). Throughout both 10 campaigns, forecasts were delivered every 12 hours and made available online on a web page. In order to guarantee a continuous and uninterrupted supply of forecasts, we run the simulations alternately on two independent HPC (High Performance Computing) Clusters. An example of a forecast web product, which shows the forecast starting on June 07, 2019 at 00:00 UTC can be found here: https://doi.org/10.5281/zenodo.3518926 . The post-processing include the vertical integration of PCH4 and CH4_FX into a total column dry air average mixing ratio, called XPCH 4 and XCH 4 for PCH4 and 15 CH4_FX, respectively. It is calculated as follows: where χ CH 4 is the methane mixing ratio, m dry stands for the mass of dry air in a grid box and summation is carried out over all vertical levels. Figure 6 shows the design of XPCH 4 and XCH 4 which appeared on the forecast website. It is an example of a snapshot during CoMet 1.0 simulated with CM2.8.

20
3 Evaluation of Analysis Simulation

Observational Data
During CoMet 1.0, methane was measured by active remote sensing. The instrument is an "integrated path differential absorp-   Table 1). Both data sets have a temporal resolution of 1 s. A Picarro CRDS G1301-m instrument was installed on board of the DLR research 5 aircraft Cessna 208B (D-FDLR) and operated by the DLR in Oberpfaffenhofen. We compare seven flight observations to our model. Data sets are named accordingly P1 -P7 (see Table 1) and have a temporal resolution of 1 s.
Upon completion of CoMet 1.0, we conducted the analysis and forecast simulations again and used the specific geographical flight track coordinates (in degrees), pressure altitudes (in hPa) and time steps (in UTC) of all flights for the S4D submodel.
The simulated data were then sampled as track-following curtains at each model time step; i.e. every 720 s, 240 s, 60 s and 10 30 s for EMAC, CM50, CM7 and CM2.8, respectively. However, our evaluation in this study only considers the two finest COSMO/MESSy instances CM7 and CM2.8. For the comparisons with the in-situ observation, the curtain is further subsampled onto the flight altitude by linear interpolation. As the observed data have a finer temporal resolution than the model   output, they are averaged over 60 s for CM7 and over 30 s for CM2.8. In order to compare our model results with those of the CHARM-F measurements, we calculate the dry air mixing ratio between surface and aircraft (in the following referred to as X f l CH4) using the S4D submodel output.

Comparison with Analysis results
As the analysis simulation is nudged towards the ECMWF operational analysis data, we assume that this simulation reproduces the observed meteorology best. Thus, in order to find the best estimate of our model performance, the observations are compared to the analysis simulation results first. The model performance is analyzed with respect to pattern similarity and amplitude, i.e. root mean square error (RMSE), standard deviation, correlation coefficient and normalized mean bias error 5 (NMBE): where χ sim is the simulated methane mixing ratio, χ obs stands for the observed methane mixing ratio and the summation is over all n time steps. Compared to the observations, all CH4_FX model results are equally biased towards lower methane mixing ratios (see Fig. S2-S5 in the Supplement). The systematic bias is due to the lower background methane in the model. 3), where global methane mixing ratios were lower than in 2018 (Nisbet et al., 2019). As OH is initialized from the same simulation (also as monthly climatological mean) as methane, the OH field might play a role, too. Yet, due to the short simulation period, this should not have a significant influence in our MECO(n) simulations. In order to evaluate the anomalies resulting from coal mine emissions, rather than the discrepancies in background methane, we apply a bias correction to all 3.2.2 (D-FDLR and HALO in-situ). In Sect. 3.2.3 we discuss all statistical results graphically. From a visual point, amplitudes also appear to be similar. Mismatches can be seen on 6 June, around 09:30 UTC (see Fig. 7 (a)) where model results are slightly shifted in time. Observed XCH 4 values follow a negative trend until 10:10 UTC, which is not simulated by the models. On June, 7 the observed amplitudes are larger than those of the model results. On both days CM7 and CM2.8 do not differ significantly from each other. Furthermore, the comparisons reveal a continuous and constant bias.

Comparison with CHARM-F observations
30 Simulated X f l CH 4 values are shifted towards smaller values compared to C1 and C2. Table 2 lists the root mean square error (RMSE) in µmol/mol, the NMBE in %, and the correlation coefficient for all comparisons of the observations with the model  results. NMBE is negative for all cases and ranges from -4.1 % to -5.3 %. NMBE as well as RSME are lower for C1 (0.08 µmol/mol) as for C2 (0.10 µmol/mol), which confirm the assumption of higher mean amplitude similarity (standard deviation) on June, 6 as on June, 7. and J2 cover the USCB, but also parts which are lying outside the smallest model domain (see Fig. 9). Gaps in the simulated CM2.8 mixing ratios mark the temporal leaving of the smallest area.  the fine structure of the observations around 200 hPa. This can be seen in Fig. 8 (a) at 13:20 UTC and in Fig. 8 Fig. 11 (a) and (b)). Figure 10 (a) and (b) compare the simulated CH4_FX tracer mixing ratios along the flight tracks to the observations. Pattern similarity is good for both flights and background methane shows little variability. to a specific coal mine, which is apparently not resolved well in CM2.8, and even less in CM7. In general, observed peaks in the afternoon flight are lower than those of the morning flight. This is also seen in the model results. Again, simulated methane peaks exceed the observational peaks, but not as significantly, as seen for P4. The NMBE is consequently lower for P5, with 1.17 % and 1.18 % for CM7 and CM2.8 respectively.  CM7 values and clearly show a more distinct structure. In the afternoon this difference is even more remarkable. CM2.8 is able to simulate the variability more precisely, whereas CM7 does not resolve the smaller patterns seen in the observations (e.g.   These findings indicate, that the simulated planetary boundary layer (PBL) during the morning is too low. On the contrary, the observed methane peaks between 09:10 and 09:35 UTC (a) can be seen in the S4D results. Here, the PBL already extended towards higher altitudes and the flight track is crossing the simulated methane plume (see panel (b)). Overall, CM7 and CM2.8

Comparison with in-situ measurements
show smaller methane mixing ratios than observed.

Taylor Diagram
Taylor diagrams combine three statistical metrics to better compare and interpret different model performances. They summarize standard deviation (radial distance from the origin), correlation coefficient (angle) and centered RMSE (dashed semi circles) in a single diagram (Taylor, 2001). Thanks to the normalization of standard deviation and centered RMSE (NRMSE), metrics become non-dimensional and different model results can be compared to each other. The point on the horizontal axis 5 displaying a normalized standard deviation of 1 outlines the point where model results fit perfectly the observations. Figure 13 shows the results of the statistical analysis of CM7 (circles) and CM2.8 (triangles) compared to the observations (Table 1).
The three data sets differ in measuring technique, as well as in their geographical extent, duration and daytime of sampling.
This makes it difficult to define an overall model skill. The meteorological situation, such as wind conditions or convection, and topographical features may lead to further uncertainties. The comparison to J1 and J2 is the best in the Taylor diagram.

10
Results are close to the reference point on the horizontal axis (normalized standard deviation = 1.0) and correlation coefficients are very high, especially for CM7. The measurements cover larger areas outside the USCB including for example the Czech Republic. Other than for C and P observations, the horizontal distribution of the methane plumes only plays a minor role. The JIG samples focus on the vertical gradient of methane in the atmosphere, which is well represented in the model (Fig. S1 in the Supplement). The comparisons to C are reasonably well, too. The C1 pattern statistically differs from the observations, 15 which may be due to a temporal or spatial shift of the plume in the model at the beginning of June, 6. But normalized standard deviations are close to 1 and CM7 and CM2.8 agree equally with the observations. Since CHARM-F measures the total column average mixing ratio, mismatches between actual and simulated PBLH, are less apparent. The comparisons to the smaller scale P observations assess the models ability to represent regional scale features like coal mine emission plumes. As described in Sect. 3.2.2, the skill also depends on how good the model actually simulates the PBLH. We can further see the highest variability between CM2.8 and CM7 for the comparisons with P. Depending on the grid size of the model, the very localized methane enhancements can be either more diluted or more intensified in the model results. And, mixing ratios sampled very close to the ventilation shafts are often not resolved by the model. Furthermore, high wind speeds, for example during the 5 sampling of P7 (see Supplement), lead to a low correlation with MECO(n) and a normalized standard deviation larger than 2. P7 is consequently not present in the Taylor diagram. Finally, C and J data were sampled on June 6 and 7, where wind conditions were stable. Additional data would be necessary in order to compare the models' skill on days with less perfect conditions, such as for P.

Evaluation of Forecast Skill
10 A good forecast should be able to simulate both, amplitude and pattern variability, of the observed methane mixing ratios in the atmosphere. To identify the temporal evolution of the forecast skill with each forecast day, we therefore calculate skill scores (after Taylor, 2001) that consider standard deviation and correlation coefficient. We used the two different skill scores and which either emphasize the similarity of the amplitudes, or the similarity of the patterns. R is the correlation coefficient between forecast and observation, R 0 is the maximum attainable correlation coefficient, and σ f is the ratio of the standard deviation of the forecast to that of the observation. We assume R 0 to be 1, although in reality maximum correlation coefficients between observations and simulation can not be reached due to differences in spatial and temporal resolution. The skill ranges between 20 0 and 1, with small values indicating low skill and high values indicating high skill. We use the analysis simulation as a reference observation to evaluate a theoretical forecast skill. As the forecasts are branched from the analysis simulation we aim to quantify the deviation of the forecast from the analysis with increasing forecast day. The results are discussed in Sect. 4.1.
In order to find the actual skill of the forecast, we further compare the different forecast days to the observations C1, C2, J1, J2, P4 and P5. Section 4.2 describes these results. of the area which reveals a skill score larger than 0.7. The results are shown in red for CM7 and in blue for CM2.8. Panels (a) and (b) refer to the different skill scores S V and S C , respectively.
On forecast day I to III, CM7 shows slightly larger values than CM2.8. This is most terse for S C , which puts greater emphasis on the correlation coefficient. However, differences between the two model instances are rather small. The forecast skill is very large at forecast day I. Here, the forecasts are branched from the analysis simulation, which results in a good agreement 5 between reference and forecast. Both skills decrease with increasing forecast day, whereby the skill in (b) shows a steeper decrease than the skill in (a). This suggests that the correlation between forecast and analysis reduces faster than the similarity of amplitudes. For S V , the area which exceeds a threshold of 0.7, covers about 65 % and 40 % at forecast day II and III, whereas for S C it only covers about 50 % and 25 %, respectively. From forecast day IV onwards, less than 20 % (S V ) or 10 % (S C ) of the area reveal a skill larger than 0.7. The lower correlation could be attributed to a displacement of the simulated 10 plume in time or space, which would also explain the fact that the normalized standard deviation remains within the given range.  Figure 15 shows the skill score S V calculated for the different forecast days I to VI when compared to the observations C1, C2, J1 and J2 (see panel (a)) and to the observations P1 -P7 (see panel (b)). Results for CM7 and CM2.8 are shown on the left and right side, respectively. In contrast to the theoretical skill, where S V and S C clearly decrease with increasing forecast day, a reduction of the skill is not obvious here. Whereas the theoretical skill score is defined to measure the skill, averaged downwind methane plumes, which are easier to forecast than the variability of the methane background in the overall model domain. Considering the difference between the single observations, S V is highest for J1 and J2 with values above 0.8. They are followed by C1 and C2 with values above 0.65 (except for C1 on forecast day I) and P1, P4, P5 and P6 mainly showing a skill between 0.6 and 0.8. The skill is lowest for the comparison with P2 and P7. S V emphasizes the similarity of amplitude between forecast and observation. This similarity seems to be highest with HALO in-situ and CHARM-F observations. As patterns and measurement techniques. J and C observations measure larger scale vertical and integrated horizontal distributions of methane, respectively. Both are well represented in MECO(n). Modelling the smaller scale features observed by P is however challenging, which is also reflected in the comparison with the forecasts. S V does not vary significantly among the different forecast days nor does it show any specific trend. Results for C1, P4, P5 and P7 drop at forecast day V, but increase again at forecast day VI. In contrast, P2 suddenly increases at forecast day V. Differences between CM7 and CM2.8 are rather small. 5 Figure 16 summarizes the results of the skill score S C . S C is generally lower than S V , which is due to higher weighting of the correlation coefficient. Overall, skill is best for J1, J2, C2, P4 and P5, meaning that model and observations correlate well here.

Actual Forecast Skill
As mentioned in Sect. 3.2.3, conditions for measuring the downwind methane plumes on these days were favorable and model results agree better with the observations. On the contrary, P2 and P7 show again very low values (c. also Fig. 15). In panel (

Discussion
Overall, the comparison of the analysis simulation with airborne derived measurements show that MECO(n) is able to sim- Apart from that, the timing of the simulated peaks is in good agreement for all observations. Compared to CHARM-F observations, the simulated X f l CH 4 peaks show similar or slightly lower amplitudes. underestimation of the simulated PCH4 peaks might be the fact that we assume a temporally constant methane release from the ventilation shafts. But in reality the emitted amount of methane varies from day to day. This might have a small influence on the results, but would not explain the large differences between PCH4 and the observations. Overall, CM7 is able to simulated the large scale observations (HALO in-situ) and the vertically integrated methane (CHARM-F) as precisely as CM2.8. When compared to small scale measurements (D-FDLR in-situ) the model overestimates the observed peaks. This is especially true 25 for the finer resolved CM2.8, where methane mixing ratios are larger than the mixing ratios simulated by CM7. Smaller grid cells may catch locally enhanced methane mixing ratios in the plume, whereas coarser grid cells cover a larger portion of the methane plume and mixing ratios may be more diluted. Additionally, CM2.8 is able to better simulate the fine structure of the small scale observations. However, the differences are rather small and the observed methane peaks are well represented in both model instances.

30
The theoretical forecast skill illustrates the deviation of the forecast from the analysis simulation. Results show a decreasing trend with increasing forecast day. Nevertheless, correlation and amplitude similarity of a single forecast day show a broad variation. The evaluation of the actual forecast skill reveals even less clear results. The amplitudes seem to be constant or at least do not show any specific trend with increasing forecast day, but the correlation between observation and forecast slightly decreases. Forecast day V seems to yield the lowest skill for P4 and P5 and also for C1 and J1, which is less obvious as for P4 35 and P5. Due to the fact that these observations were sampled during only one day, namely the 6th of June 2018 in the morning and in the afternoon, all comparisons for the fifth forecast day are related to the forecast simulation start date 2nd of June.
Disagreement may be due specific meteorological situation of this day. In order to make a general statement about the forecast skill, it would be necessary to compare additional observations within a broader time span.

5
We successfully conducted 6-day-forecast simulations of methane with the on-line coupled three times nested global and regional chemistry climate model system MECO(3). The forecasts branch from a continuous analysis simulation, where EMAC is nudged towards the operational ECMWF analysis data. This is essential for appropriate initial forecast conditions. We continuously delivered the forecasts during 1.0 and analyzed the model and forecast performance with respect to the observations. The advantage of using the global/regional model system is, that we are able to simulate both, the point source emissions and the 10 background methane. For the latter, it is essential to provide lateral boundary conditions to the nested model instances, which are consistent with the meteorology, i.e. the dynamical boundary conditions. This makes it possible to distinguish between local source emissions and fluctuations in the background methane, which is important for the quantification of different methane sources. Even though the data for Newtonian relaxation are first coarsened to a horizontal grid resolution corresponding to the T42 spectral truncation, and then nested three times down to a spatial resolution of 2.8 km, MECO (3)