The CAMS volcanic forecasting system utilizing near-real time data assimilation of S5P/TROPOMI SO2 retrievals

10 The Copernicus Atmosphere Monitoring Service (CAMS), operated by the European Centre for Medium-Range Weather Forecasts on behalf of the European Commission, provides daily analyses and 5-day forecasts of atmospheric composition, including forecasts of volcanic sulphur dioxide (SO2) in near-real time. CAMS currently assimilates total column SO2 retrievals from the GOME-2 instruments on MetOp-B and -C and the TROPOMI instrument on Sentinel-5P which give information about the location and strength of volcanic plumes. However, the operational TROPOMI and GOME-2 15 retrievals do not provide any information about the height of the volcanic plumes and therefore some prior assumptions need to be made in the CAMS data assimilation system about where to place the resulting SO2 increments in the vertical. In the current operational CAMS configuration, the SO2 increments are placed in the mid-troposphere, around 550 hPa or 5 km. While this gives good results for the majority of volcanic emissions, it will clearly be wrong for eruptions that inject SO2 at very different altitudes, in particular exceptional events where part of the SO2 plume reaches the stratosphere. 20 A new algorithm, developed by DLR for GOME-2 and TROPOMI and optimized in the frame of the ESA-funded Sentinel5P Innovation–SO2 Layer Height Project, the Full-Physics Inverse Learning Machine (FP_ILM) algorithm, retrieves SO2 layer height from TROPOMI in NRT in addition to the SO2 column. CAMS is testing the assimilation of these data, making use of the NRT layer height information to place the SO2 increments at a retrieved altitude. Assimilation tests with the 25 TROPOMI SO2 layer height data for the Raikoke eruption in June 2019 show that the resulting CAMS SO2 plume heights agree better with IASI plume height retrievals than operational CAMS runs without the TROPOMI SO2 layer height information and that making use of the additional layer height information leads to improved SO2 forecasts than when using the operational CAMS configuration. By assimilating the SO2 layer height data the CAMS system can predict the overall location of the Raikoke SO2 plume up to 5 days in advance for about 20 days after the initial eruption. 30

The Copernicus Atmosphere Monitoring Service (CAMS), operated by the European Centre for Medium-Range Weather Forecasts on behalf of the European Commission, provides daily analyses and 5-day forecasts of atmospheric composition, including forecasts of volcanic sulphur dioxide (SO2) in near-real time. CAMS currently assimilates total column SO2 retrievals from the GOME-2 instruments on MetOp-B and -C and the TROPOMI instrument on Sentinel-5P which give information about the location and strength of volcanic plumes. However, the operational TROPOMI and GOME-2 15 retrievals do not provide any information about the height of the volcanic plumes and therefore some prior assumptions need to be made in the CAMS data assimilation system about where to place the resulting SO2 increments in the vertical. In the current operational CAMS configuration, the SO2 increments are placed in the mid-troposphere, around 550 hPa or 5 km.
While this gives good results for the majority of volcanic emissions, it will clearly be wrong for eruptions that inject SO2 at very different altitudes, in particular exceptional events where part of the SO2 plume reaches the stratosphere. 20 A new algorithm, developed by DLR for GOME-2 and TROPOMI and optimized in the frame of the ESA-funded Sentinel-5P Innovation-SO2 Layer Height Project, the Full-Physics Inverse Learning Machine (FP_ILM) algorithm, retrieves SO2 layer height from TROPOMI in NRT in addition to the SO2 column. CAMS is testing the assimilation of these data, making use of the NRT layer height information to place the SO2 increments at a retrieved altitude. Assimilation tests with the 25 TROPOMI SO2 layer height data for the Raikoke eruption in June 2019 show that the resulting CAMS SO2 plume heights agree better with IASI plume height retrievals than operational CAMS runs without the TROPOMI SO2 layer height information and that making use of the additional layer height information leads to improved SO2 forecasts than when using the operational CAMS configuration. By assimilating the SO2 layer height data the CAMS system can predict the overall location of the Raikoke SO2 plume up to 5 days in advance for about 20 days after the initial eruption. 30 therefore try to avoid volcanic plumes and after the 2010 eruption of the Icelandic Eyjafjallajökull volcano (e.g. Stohl et al., 2011;Dacre et al., 2011;Thomas and Prata, 2011) European air traffic was grounded for several days. Forecasts of the location and the altitude of volcanic SO2 or ash plumes can therefore provide important information for the aviation industry.
Satellite retrievals of volcanic ash and SO2 can help to track volcanic plumes, as done by the Support to Aviation Control Service (sacs.aeronomie.be; Brenot et al., 2014) and the EUNADICS (European Natural Airborne Disaster Information and 45 Coordination System for Aviation) prototype Early Warning System (Brenot et al., 2021). These services, as well as plume dispersion modelling (e.g. de Leeuw et al., 2020;Harvey and Dacre, 2016), are used by the Volcanic Ash Advisory Centres (VAACs) to advise civil aviation authorities in case of volcanic eruptions. While SO2 is often used as a proxy for ash, the SO2 and ash plumes can be located at different altitudes and be transported in different directions as was the case for the Icelandic Grímsvötn eruption in 2011 (Moxnes et al., 2013, Prata et al., 2017. 50 The Copernicus Atmosphere Monitoring Service (CAMS), operated by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Commission, provides daily analyses and 5-day forecasts of atmospheric composition, including forecasts of volcanic SO2 in near-real time (NRT). However, the assimilation and forecasting of volcanic SO2 plumes in NRT is difficult. Since the CAMS forecast system runs within 3 hours of the observations 55 being taken, information about volcanic SO2 emission strength and the altitude of SO2 plumes is usually not available, with only the total column-integrated SO2 amount (TCSO2) able to be provided to adjust the model's predictions. CAMS assimilates near-real-time TCSO2 using the method described in Flemming and Inness (2013) in its operational NRT system to routinely assimilate TCSO2 retrievals from the Global Ozone Monitoring Experiment-2 (GOME-2) instruments produced by Eumetsat's Satellite Application Facility on Atmospheric Composition Monitoring (ACSAF) and from the Sentinel-5P 60 Tropospheric Monitoring Instrument (TROPOMI) provided by the European Space Agency (ESA). Both are using retrievals developed by the German Aerospace Centre (DLR) which give information about the emitted volcanic SO2 and their horizontal location in NRT but do not provide any information about the altitudes of the volcanic plumes. Prior assumptions therefore need to be made in the CAMS data assimilation system about where in the vertical the resulting SO2 increments should be placed. In the absence of NRT height information, the default is to place the SO2 increments in the mid-65 troposphere, around 550 hPa or 5 km. Although clearly a simplified approach, the method is a reasonable approximation to the real situation, using the data assimilation procedure as a mid-tropospheric SO2 source in areas of elevated volcanic TCSO2. The SO2 analysis field will then be transported by the model's prevailing winds and thereby result in quite realistic volcanic SO2 plumes. While this method produces good results for a large number of volcanic eruptions that inject SO2 into the mid-troposphere, it will clearly be wrong for eruptions that inject SO2 at very different altitudes, in particular for the most 70 explosive events where part of the SO2 reaches the stratosphere. In those cases, the CAMS system will not be able to forecast the SO2 transport well, because the model SO2 plume will be located at the wrong altitude where the prevailing winds might transport the SO2 in the wrong direction or height. The availability and use of NRT information about the altitude of the volcanic plumes would greatly improve the quality of the CAMS SO2 analysis and subsequent forecasts.

75
For hindcasts of volcanic eruptions with a system that does not run in NRT it is easier to make use of better injection height information. In this case, observations about injection height and emission strength might be available. Furthermore, CAMS can run an ensemble of SO2 tracers emitted at different altitudes and determine the best altitude and emission strength from comparisons of the resulting model fields with the available TCSO2 observations, using a method described in Flemming and Inness (2013). The parameters (plume height and emission flux) derived in this way can subsequently be used to provide a 80 volcanic SO2 source term in the CAMS forecast model and can also be used in the data assimilation system to modify the SO2 background error standard deviation to peak at the corresponding model level. However, this is not possible in NRT. https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License.
A new algorithm, developed by DLR for GOME-2 and adapted to TROPOMI, which is currently being optimized in the frame of the ESA-funded Sentinel-5P (S5P) Innovation-SO2 Layer Height Project (S5P+I: SO2LH), the Full-Physics Inverse 85 Learning Machine (FP_ILM) algorithm (Hedelt et al., 2019), retrieves SO2 layer height (LH) information from TROPOMI in NRT in addition to the SO2 column. This is different from the operational ESA NRT TROPOMI retrieval which does not provide plume height information. CAMS is testing the assimilation of the FP_ILM data, making use of the NRT LH information. In this paper we document the current use of the operational TCSO2 data in the CAMS data assimilation system, present results from assimilation tests with the FP_ILM TROPOMI SO2 LH data for the eruption of the Raikoke 90 volcano in June 2019 and show that making use of the NRT LH information leads to improved SO2 analyses and in particular SO2 forecasts. This paper is structured in the following way. Section 2 describes the SO2 datasets used in this study and Section 3 describes the CAMS model and SO2 data assimilation setup. Section 4 presents the results from the assimilation of TROPOMI data for 95 eruption of Raikoke in June 2019, including sensitivity studies to evaluate choices made for the SO2 background errors, and evaluates the quality of the resulting SO2 analyses and forecasts with and without LH information. Section 5 presents the conclusions.

Datasets
The SO2 satellite data currently used in the CAMS NRT system are the operational TCSO2 retrievals from TROPOMI on 100 S5P produced by ESA and from the GOME-2 instruments on MetOp-B and MetOp-C produced by Eumetsat's ACSAF.
These retrievals come with a volcanic flag, i.e. the data producers mark the pixels that are affected by volcanic SO2, and only pixels that are flagged as volcanic are assimilated in the CAMS system. Using TROPOMI in addition to GOME-2 has two advantages: (1) TROPOMI has better spatial coverage and a lower detection limit than GOME-2 and (2) because TROPOMI has a different overpass time (9.30 UTC for MetOp, 13.30 UTC for S5P) using both instruments improves the chances of 105 having an overpass over a volcano when an eruption happens or shortly afterwards.

NRT TROPOMI TCSO2 retrieval
TROPOMI on board the S5P satellite provides high-resolution spectral measurements in the ultraviolet (UV), visible (Vis), near infrared and shortwave-infrared parts of the spectrum, allowing several atmospheric trace gases to be retrieved, including SO2 from the UV-Vis part of the spectrum. The horizontal resolution of TROPOMI for the UV-Vis is 5.5 km x 3.5 110 km (7 km x 3.5 km before 6 August 2019) with daily global coverage. The theoretical baseline for the operational TROPOMI SO2 retrieval is described in Theys et al. (2017) and further information can be found in Algorithm Theoretical Basis Document (ATBD), Product User Manual (PUM) and readme files available from the TROPOMI website (http://www.tropomi.eu/documents/). The atmospheric SO2 vertical column density is retrieved in three fitting windows (312-326 nm, 325-335 nm and 360-390 nm) using a Differential Optical Absorption Spectroscopy (DOAS) method (Platt 115 and Stutz, 2008;Platt, 2017), in which the slant SO2 column is retrieved and converted into vertical columns by using air mass factors. The log-ratio of the observed UV-visible spectrum of radiation backscattered from the atmosphere and an observed reference spectrum are used to derive a slant column density, which represents the SO2 concentration integrated along the mean light path through the atmosphere. This is performed by fitting SO2 absorption cross-sections to the measured reflectance in a given spectral interval. In a second step, slant columns are corrected for possible biases. Finally, 120 the slant columns are converted into vertical columns by means of air mass factors obtained from radiative transfer calculations, accounting for the viewing geometry, clouds, surface properties and prior SO2 vertical profile shapes. A volcano activity detection algorithm going back to Brenot et al. (2014) is used to identify elevated SO2 values from volcanic https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License. eruptions (see Table 1). CAMS only assimilates SO2 pixels that have flag values of 1 (enhanced SO2 detection) or 2 (enhanced SO2 detection in vicinity of known volcano). Furthermore, only TROPOMI SO2 pixels with values greater than 5 125 DU are assimilated in the operational CAMS system to avoid assimilating SO2 from outgassing volcanoes which are covered by SO2 emissions in the CAMS model. The TROPOMI SO2 data are averaged to the model resolution (TL511, about 40km) before being used in the CAMS system.   of the SO2 LH based on Sentinel-5 precursor/TROPOMI data using a coupled Principal Component Analysis and Neural Network approach including regression. This algorithm is an improvement of the original FP_ILM algorithm developed by Efremenko et al. (2017) for the retrieval of the SO2 LH based on GOME-2 data using a Principal Component Regression technique. Recently, this algorithm has also been adapted to retrieve SO2 LH data from the Ozone Monitoring Instrument (OMI) on the Aura satellite (Fedkin et al., 2021). Furthermore, the FP_ILM algorithm has been used for the retrieval of 140 ozone profile shapes (Xu et al., 2017) and the retrieval of surface properties accounting for bidirectional reflectance distribution function effects (Loyola et al., 2020). In general, the FP_ILM algorithm creates a mapping between the spectral radiance and atmospheric parameter using machine learning methods. The time-consuming training phase of the algorithm using radiative transfer model calculations is performed off-line, and only the inversion operator has to be applied to satellite measurements which makes the algorithm extremely fast, and it can thus be used in NRT processing environments. SO2 LH 145 is retrieved in NRT from TROPOMI UV earthshine spectra in the wavelength range 311-335 nm with an accuracy of better than 2 km for SO2 columns greater than 20 DU. In this paper we use v3.1 of the FP_ILM SO2 LH retrieval.

NRT GOME-2 TCSO2 retrieval
GOME-2 (Munro et al., 2016) on board the MetOp-A, -B and -C satellites measures in the UV and Vis part of the spectrum 150 (240-790 nm). MetOp-B and -C have a swath of 1920 km at 40 km x 80 km ground pixel resolution, while MetOp-A has a narrower swath of 960 km at 40 km x 40 km. Global coverage with GOME-2 is achieved within 1.5 days. The GOME-2 measurements allow for the retrieval of ozone and a range of atmospheric trace gases, including SO2 which is retrieved with the GOME Data Processor (GDP) developed by DRL that uses a DOAS method. GDP4.8 is used for GOME-2A and GOME-2B (with a fitting window from 315-326 nm) and GDP4.9 for GOME-2C (with a fitting window of 312-326 nm to 155 include the strong SO2 line at 313 nm). Input parameters for the DOAS fit include the absorption cross section of SO2 and the absorption cross sections of interfering gases, ozone and NO2, and a correction is made in the DOAS fit to account for the ring effect (rotational Raman scattering). An empirical interference correction is applied to the SO2 slant column values https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License.
to reduce the interference from ozone absorption (Rix et al., 2012). To reduce the interference from ozone absorption, the retrieval includes the fitting of two pseudo ozone cross-sections following the approach of Puķīte et al. (2010). As in the case 160 for the TROPOMI dataset, a volcano activity detection algorithm is used to identify elevated SO2 values from volcanic eruptions. Such flags were implemented in GDP4.8 (see Table 2) and further improved in GDP4.9 to use the same flagging as for TROPOMI (see Table 1). CAMS only assimilates the GOME-2 SO2 data that are flagged as volcanic (value=1 for GDP4.8; value=1 or 2 for GDP4.9) and assimilates GOME-2B and GOME-2C in the NRT system operational in 2021. In this paper only SO2 data from GOME-2B are used. 165 Flag value Description

IASI SO2 plume altitude retrieval
The Infrared Atmospheric Sounding Instrument (IASI) is flying on board of EUMETSAT's MetOp-A (since 2006), MetOp-170 B (since 2012) and MetOp-C (since 2017) satellite platforms (Clerbaux et al., 2015). The instruments measure the upwelling radiances in the thermal infrared spectral range extending from 645 to 2760 cm −1 , with high radiometric quality, 0.5 cm −1 spectral resolution. A total of 120 views are collected over a swath of ∼ 2200 km using a stare-and-stay mode of 30 arrays of 4 individual elliptical pixels, each of which is 12 km diameter at nadir, increasing at the larger viewing angles. IASI provides global monitoring of total ozone, carbon monoxide, methane, ammonia, nitric acid and SO2, among others atmospheric 175 constituents.
The IASI/MetOp SO2 columnar data are operationally provided by the EUMETSAT's ACSAF. In Clarisse et al. (2012) a novel algorithm for the sounding of volcanic SO2 plumes above ∼5 km altitude was presented and applied to IASI observations. The algorithm is able to view a wide variety of total column ranges (from 0.5 to 5000 D.U.), exhibits a low 180 theoretical uncertainty (3-5 %) and near real time applicability which was demonstrated for the recent eruptions of Sarychev in Russia, Kasatochi in Alaska, Grimsvötn in Iceland, Puyehue-Cordon Caulle in Chile and Nabro in Eritrea (Tournigand et al., 2020.) A validation of this algorithm on the Nabro eruption observations using forward trajectories and CALIOP/CALIPSO space-born lidar coincident measurements is presented in Clarisse et al. (2014) where the expansion of the algorithm to also provide SO2 plume altitudes is further described. The IASI/MetOp SO2 ACSAF product includes five 185 SO2 column data at assumed layer heights of 7, 10, 13, 16 and 25 km, as well as a retrieved best estimate for the SO2 plume altitude and associated SO2 column. Note that the SO2 plume altitudes provided by this algorithm are quantized every 0.5km. This dataset is publicly available from https://iasi.aeris-data.fr/SO2_iasi_a_arch/ .
For the requirements of the validation against the CAMS experiments, all available IASI SO2 plume altitude retrievals for 190 the Raikoke volcano 2019 eruption were gridded onto a 1x1° grid at 3h intervals per day. The equivalent CAMS SO2 plume altitude, i.e. the altitude where the maximum SO2 load occurs in the CAMS SO2 profiles, was chosen for the collocations. In the case where two CAMS altitudes provided the same SO2 load, the mean was assigned as the CAMS SO2 plume altitude. https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License.

CAMS model
The chemical mechanism of ECMWF's Integrated Forecast System (IFS) is a modified and extended version of the Carbon Bond 2005 chemistry scheme (CB05, Yarwood et al. 2005) chemical mechanism for the troposphere, as also implemented in the chemical transport model (CTM) TM5 (Huijnen et al., 2010). CB05 is a tropospheric chemistry scheme with 57 species and 131 reactions. The chemistry module of the IFS is documented in more detail in Flemming et al. (2015) and Flemming 200 et al. (2017) and more recent updates in Inness et al. (2019). The CB05 chemistry scheme is coupled to the AER aerosol bulk scheme (Remy et al. 2019) for the simulation of sulphate, nitrate and ammonium aerosols. More up-to-date information is available from atmosphere.copernicus.eu. In the model version used in this paper, the CAMS system uses the CAMS-GLOBANTv4.2 anthropogenic emissions (Granier et al., 2019) which include anthropogenic SO2, as well as a climatology of SO2 outgassing volcanic emissions based on satellite retrievals (Carn et al., 2016). 205

CAMS data assimilation system
The IFS uses an incremental four-dimensional variational (4D-Var) data assimilation system (Courtier et al. 1994). SO2 is one of the atmospheric composition fields that is included in the control vector and minimized together with the meteorological control variables in the CAMS system (Inness et al., 2015, Flemming andInness, 2013). The current 210 operational CAMS configuration uses a weak constraint formulation of 4D-Var which includes a model error term for the meteorological variables (Laloyaux et al., 2020) that corrects mainly the stratospheric temperature bias and also improves slightly the stratospheric winds. In the CAMS 4D-Var system, the control variables are the initial conditions at the beginning of the assimilation window, with the aim of providing the best initial conditions for the subsequent forecast. The background error covariance matrix in the ECMWF data assimilation system is given in a wavelet formulation (Fisher 2004(Fisher , 2006. This 215 allows both spatial and spectral variations of the horizontal and vertical background error covariances. The CAMS background errors are constant in time. The horizontal resolution of the NRT CAMS 2021 operational system as well as that of the data assimilation experiments presented in this paper is approximately 40 km, corresponding to a triangular truncation of TL511 or a reduced Gaussian grid with a resolution of N256 (more information can be found at https://confluence.ecmwf.int/display/FCST/Gaussian+grids). The operational CAMS system uses two minimisations (the so-220 called inner loops) at reduced horizontal resolution, currently at TL95 and TL159 corresponding to horizontal resolutions of about 210 km and 125 km. This means that wavenumbers up to 95/159 can be represented in the wavelet formulation for the background errors. For the experiments presented in this paper, slightly higher horizontal resolutions of TL159/TL255 were used for the inner loops (corresponding to about 125 and 80 km, respectively). The CAMS model and data assimilation system has 137 model levels in the vertical, between the surface and 0.01 hPa and uses a 12-hour 4D-Var configuration with 225 assimilation windows from 3 to 15 UTC and from 15-3 UTC.

CAMS NRT TCSO2 assimilation configuration (baseline configuration)
The SO2 data assimilated in the CAMS NRT configuration are total column values. To calculate the model equivalent of the observations the CAMS SO2 field is interpolated to the time and location of the measurements and the CAMS SO2 columns are calculated as a simple vertical integral between the surface and the top of the atmosphere. While the background error 230 statistics for most of the atmospheric composition fields (Inness et al., 2015) were either calculated with the National Meteorological Center (NMC) method (Parrish and Derber, 1992) or from an ensemble of forecast differences (following a method described by Fisher and Andersson, 2001)  The SO2 wavelet file in the NRT CAMS configuration (also called baseline configuration in this paper) is formed of diagonal vertical wavenumber correlation matrices, with the value on the diagonal controlled by a horizontal Gaussian 250 correlation function with a standard deviation of 250 km and a globally constant vertical standard deviation profile. The values of the elements on the diagonal of the vertical correlation matrix are the same at every level but vary for each wavenumber. If TCSO2 data are assimilated the largest correction to the model's background will be applied where the background errors are largest, i.e. in the mid-troposphere around 550 hPa. The CAMS SO2 analysis is univariate, i.e. there are no cross correlations between SO2 background errors and the other atmospheric composition control variables. 255

Data assimilation configuration for TCSO2 LH data
If information about the altitude of the volcanic SO2 layer is known in NRT a different approach can be followed. In this case, we use a background error standard deviation profile that is constant in height (e.g. red line in Fig. 1) and calculate the SO2 column not between the surface and the top of the atmosphere, but between the pressure values that correspond to the 260 bottom and the top of the retrieved volcanic SO2 layer. The depth of this layer is currently set in the FP_ILM retrieval as 2 km, which corresponds to the uncertainty of the retrieved layer height. This approach mimics the procedure of using averaging kernels with box profiles given for the SO2 layer. Results from sensitivity studies regarding the choice of the constant background error standard deviation value are given below in Section 4.2. The Raikoke volcano, located on the Kuril Islands south of the Kamchatka peninsula, erupted around 18 UTC on 21 June 2019 and emitted SO2 and ash in a series of explosive events until about 6 UTC on 22 July. The SO2 and ash plume rose to around 8-18 km (Muser et al., 2020;Grebennikov et al., 2020) meaning a considerable amount of the SO2 reached the 270 stratosphere. The volcanic cloud was transported around much of the northern hemisphere, was observed by TROPOMI and GOME-2 for about a month and was also observed with ground-based measurements (Vaughan et al., 2021;Grebennikov et al., 2020) and other satellites (Muser et al., 2020). Figure 2 shows the TCSO2 burden from the Raikoke eruption as calculated from NRT TROPOMI and GOME-2B data. The satellite data were gridded onto a 1⁰x1⁰ degree grid and the area of all grid cells with SO2 values greater than the listed threshold values was calculated. For a threshold of 1 DU the SO2 275 burdens from TROPOMI and GOME-2B were around 1.5 Tg and 1.1 Tg, respectively. These values agree with findings by de Leeuw et al. (2020) and make the eruption the largest since the eruption of the Nabro volcano in 2011 (de Leeuw et al, 2020;Goitom et al, 2015;Clarisse et al., 2014). The 'dip' in the TROPOMI SO2 burden after the initial peak is an artefact that results from missing observations in the TROPOMI NRT data. 285 Figure 3 shows a timeseries of the SO2 LH information that was retrieved for the Raikoke plume from TROPOMI with the FP_ILM retrieval. It shows that volcanic SO2 can be detected and the SO2 LH information retrieved for about 3 weeks after the eruption. The bulk of the SO2 was located above 300 hPa, (about 9 km) with a considerable amount above 200 hPa (about 12 km). This is considerably higher than the 550 hPa that is assumed as the plume location in the CAMS operational

Sensitivity studies for assimilation of TCSO2 data
Several data assimilation experiments were run for the period 22 June to 21 July 2019 to test the assimilation of the SO2 LH data and to compare the results with the CAMS baseline configuration, listed in   error standard deviation values. In all these experiments GOME-2 SO2 data were not assimilated, and GOME-2B is used as a fully independent dataset for the validation. 310 The low resolution of the minimisation (TL95/TL159 in the CAMS system operational in 2021) is a factor that limits the ability of the SO2 analysis to reproduce small-scale SO2 features seen in the observations because it gives a lower limit for the length scale of the horizontal background error correlations that can be used, i.e. for the operational CAMS configuration only wavenumbers up to 95/159 can be represented. The smallest wavelength (λmin) that can be represented by two grid 315 points on a linear grid is where R is the radius of the Earth and nmax the maximum wavenumber of the truncation (95 or 159 for the inner loops in the operational CAMS configuration), i.e. twice the size of a grid box. This means that the minimum wavelengths which can be represented with two grid points for TL95, TL159 and TL255 are about 420 km, 250 km, 160 km, respectively and smaller 320 scale horizontal structures cannot be represented in the background error wavelet formulation. Figure 4 illustrates this and shows horizontal SO2 correlations at the surface for horizontal background error length scales of 50km, 100km and 250 km for truncations of TL95, TL159 and TL255. The 'wriggles' seen in the TL95 (and to a lesser extent in the TL159) plots show that the shorter background error correlations length scales cannot be properly resolved at these truncations. Even at TL255 some minor oscillations are still visible for horizontal correlation length scales of 50 km. Therefore, to properly resolve 325 smaller-scale plumes the resolutions of the inner loops would need to be even higher than TL255. Figure 4 also illustrates how far an increment from a single SO2 observation would be spread out in the horizontal and therefore affect grid points away from the observation. The operational NRT CAMS configuration uses minimisations at TL95/TL159 and a length scale of 250 km for the horizontal SO2 background error correlations. For the data assimilation experiments shown in this paper we use inner loops 335 of TL159/TL255 to allow us to use a Gaussian correlation function with a length scale of 100 km and therefore resolve slightly smaller-scale features than in the operational NRT CAMS system. Figure 5 shows the CAMS TCSO2 analysis fields on 27 June 2019 resulting from the assimilation of the TROPOMI SO2 LH data when horizontal background error correlation length scales of 50, 100 and 250 km were used (experiments LH50, LH100, LH250), while using the same background error standard deviation profile of 1e -7 kg/kg in all cases. Also shown are the NRT TROPOMI and GOME-2B 340 TCSO2 retrievals for that day. The figure illustrates the large impact of the horizontal background error correlation length scale on the SO2 analysis, as the SO2 plume is considerably more spread out in the CAMS analysis when longer horizontal correlations are used, and that better agreement with the features seen in the observations is found for shorter horizontal correlations. Figure 6 shows timeseries of SO2 burden and plume area for a threshold of 5 DU from TROPOMI, GOME-2B and the three SO2 LH experiments to further assess the impact on the SO2 analysis of changing the horizontal correlation 345 https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License. length scale. We see that the SO2 burden and plume area calculated from the observations are overestimated by all three 350 CAMS TCSO2 analyses. This overestimation is a well-known feature usually seen in the operational NRT CAMS volcanic SO2 assimilation. Figure 6 illustrates that a major factor causing this overestimation is the choice of the horizontal background error correlation length scale and that by choosing a length scale of 250 km the SO2 burden and plume area are about 6 times larger than for a length scale of 50 km. This implies that a limiting factor for correctly reproducing the SO2 burden and plume area in the CAMS analysis is the resolution of the inner loops as it limits the horizontal correlation length 355 scale that can be chosen for the background errors. A coarser inner loop resolution requires a longer horizontal length scale because shorter wavelengths cannot be resolved properly. If the aim is to reproduce finer-scale volcanic plumes with the CAMS data assimilation system, the horizontal resolution of the inner loops will have to be increased. For the main LH experiment used in this paper we decided to use a horizontal correlation length scale of 100 km which can be represented properly if the resolutions of the inner loops are TL159/TL255. 360 Another factor that influences the results of the SO2 analysis is the value of the background error standard deviation profile. This is illustrated in Figure 7 which shows time series of SO2 burden and plume area from TROPOMI, GOME-2B and three SO2 LH experiments with varying background error standard deviation values (0.7e -7 , 1.0e -7 , 1.4e -7 kg/kg). All experiments 370 used a horizontal background error correlation length scale of 100 km. The larger the background error standard deviation, the larger the correction that is made by the SO2 analysis and the larger the SO2 burden and plume area become. However, the impact of changing the background error standard deviation is not as big as changing the horizontal background error correlation length scale and increasing the standard deviation value from 0.7e -7 kg/kg to 1.4e -7 kg/kg doubles the SO2 burden and plume area. 375 For the remainder of this paper the LH experiment that uses a value of 0.7e-7 kg/kg for the background error standard deviation and a horizontal background error correlation length scale of 100 km is used (abbreviated as LHexp).

Results of TCSO2 assimilation tests for the Raikoke 2019 eruption 385
The SO2 analysis fields and 5-day forecasts for the Raikoke eruption from the SO2 layer height experiment (LHexp) and the baseline experiment with the CAMS configuration (BLexp) are now assessed in more detail. This assessment includes (1)  We evaluate the SO2 analyses and forecasts against GOME-2B and TROPOMI NRT TCSO2 retrievals. GOME-2B TCSO2 data are fully independent because they are not used in our SO2 assimilation experiments, and TROPOMI NRT TCSO2 retrievals are useful to demonstrate in how far the analyses manage to reproduce the TROPOMI TCSO2 values. It has to be kept in mind that the version of the FP_ILM SO2 LH retrieval used in this study (v3.1) attains its optimal accuracy of 2km 395 for SO2 columns greater than 20 DU and hence, in LHexp, no TCSO2 observations below 20 DU are assimilated. For the evaluation, the SO2 analyses and forecasts, as well as the satellite data, are gridded onto a 1⁰x1⁰ grid. Figure 8 shows a timeseries of the number of observations that are actively assimilated in both experiments, i.e. the number of 1⁰x1⁰ grid points with active observations, and illustrates that there are more active data in BLexp where NRT TROPOMI SO2 data with values greater than 5 DU are assimilated (i.e. as done in the operational CAMS system) than in LHexp where only data 400 with LH TCSO2 greater than 20 DU are assimilated.  Figure 9 also illustrates that GOME-2B and NRT TROPOMI TCSO2 show the same features of the plume, however the TROPOMI NRT lower detection limit facilitates the retrieval of smaller TCSO2 values around the edges of the plumes. The 415 FP_ILM SO2LH retrieval (v3.1) does not provide reliable information for TCSO2 < 20 DU and therefore only picks up those parts of the plume that are associated with the highest SO2 load. This also explains the lower number of active observations seen in Fig.8. Especially during the later stages of the eruption parts of the plume are missed by the FP_ILM SO2 LH retrieval. Nevertheless, when assimilating the FP_ILM SO2 LH data we find good agreement with the NRT TROPOMI data and the GOME-2B data in LHexp (Fig. 9, column 1) when the CAMS analysis reports SO2 values < 20DU. 420 Figure 10 shows timeseries of the SO2 burden from NRT TROPOMI, GOME-2B and the two experiments calculated for threshold values of 5 DU and 30 DU, and Figure 11 shows the corresponding timeseries of the plume areas. already seen in Figures 6 to 8, namely that the plumes are more spatially dispersed in the analysis than in the observations. The overestimation of the SO2 burden is larger in LHexp than in BLexp with maximum values of 3 Tg and 2 Tg, 430 respectively, compared to 1.5 and 1.2 Tg for NRT TROPOMI and GOME-2B. However, the plume area is larger in BLexp with maximum extent of about 1e 7 km 2 , compared to 0.8e 7 km 2 in LHexp and 0.2e 7 km 2 calculated from the observations. BLexp fails to capture the higher SO2 column values, leading to an underestimation of plume area and SO2 burden for a threshold of 30 DU, while LHexp does have TCSO2 values > 30 DU but overestimates both plume area and SO2 burden.  which lies between 0 and 1, with a value of 1 indicating a perfect score. We also us the critical success index (CSI), defined as CSI=hits/(hits+misses+false alarms) (3) which additionally considers the number of false alarms and again has values between 0 and 1 with 1 indicating a perfect 460 score (Nurmi, 2003). These are point based comparisons and might score badly for features that are close but slightly misplaced between observations and model.  Figure 12 shows the POD from LHexp and BLexp for various TCSO2 analysis thresholds (3, 5, 10, 30 DU) scored against NRT TROPOMI and GOME-2B data. The results are very similar for both satellites. The parts of the plume with lower TCSO2 values are well captured by both experiments with POD values above 0.9 for BLexp for most of the period and POD values above 0.8 for LHexp. The POD in LHexp decreases towards the end of the depicted period because the number of assimilated data drops strongly (see Fig. 8), while more observations are assimilated in BLexp at the later stage of the 475 episode. BLexp, however, does not capture the higher values observed by NRT TROPOMI and GOME-2B well while LHexp has a much higher POD for those parts of the plume. No values above 30 DU are detected after 5 July 2019. Figure 13 shows the CSI from LHexp and BLexp, the measure that also penalises the false alarms. As expected, these values are considerably lower than the POD (with maximum values around 0.6) because plume area and SO2 burden are 480 https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License. overestimated in both experiments (see Fig. 9) leading to numerous false alarms. Both experiments behave similarly for the lower thresholds but TCSO2 values greater than 30 DU are again captured better in LHexp. In summary, as far as the TCSO2 analysis fields are concerned the performance of LHexp and BLexp is similar for TCSO2 490 columns below 10 DU, but BLexp does not capture the higher SO2 values as well as LHexp. Both experiments overestimate the SO2 burden and the plume area compared to the TROPOMI NRT and GOME-2B observations.

Vertical location of the SO2 plume
While the TCSO2 analyses from LHexp and BLexp score similarly in the detection of the TCSO2 plume observations by 495 GOME-2B and NRT TROPOMI, at least for values less than 10 DU, the vertical distributions of the SO2 plumes from the experiments differ considerably. Figure 14 shows vertical cross sections along 60⁰N between 120-300⁰E through the SO2 plume on 29 June 2019 from LHexp and BLexp. The figure illustrates that the bulk of the SO2 plume is located between 200-100 hPa in LHexp while it is located much lower, between 600-400 hPa, in BLexp. To assess which vertical distribution is more realistic, in Figure 15 we compare the plume heights from the experiments with SO2 altitudes derived from IASI 500 LATMOS ULB data (Clarisse et al., 2012) for the period 22 to 29 June 2019. The CAMS plume altitude was calculated as the altitude where the highest SO2 value were found in the CAMS SO2 profiles. The figure shows that the plume height in LHexp agrees well with the independent IASI plume altitude with a mean bias of 0.4±2.2 km, while BLexp underestimates the plume altitude with a mean bias of -5.1±2.1 km. Figure 15 illustrates that the altitude of the Raikoke SO2 plume in the CAMS analysis is considerably improved if SO2LH data are used than when using the baseline configuration.

Quality of the 5-day TCSO2 forecasts 515
Next, we assess the quality of the 5-day TCSO2 forecasts started from the LHexp and BLexp SO2 analyses. Figure 16 shows a timeseries of POD for a TCSO2 threshold of 5 DU from LHexp and BLexp for NRT TROPOMI and GOME-2B for the initial SO2 analysis and forecasts valid on the same day at different lead-times (24 to 120 hours). The figure shows that the skill decreases with increasing forecast lead-time in both experiments, but that the degradation of skill with forecast lead- during June while values drop considerably during July when even the short 24-hour forecasts from BLexp only have POD values between 0.2 and 0.4. In other words, in BLexp the skill of forecasting the location of the SO2 plumes seen by GOME-530 2B and the NRT TROPOMI one day in advance is similar to the skill of forecasting the SO2 plumes 4 days in advance in LHexp. The main reason for the lower forecast quality in BLexp is the fact that the SO2 plumes are located at the wrong altitude (see Fig. 15) so that the prevailing winds will not transport the SO2 in the correct direction.
To further assess the forecast skill, we use the fractional skill score (FSS) which is a spatial comparison. It was originally 535 used to assess the quality of precipitation forecasts (Roberts and Lean, 2007) but has more recently also been used to assess https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License. the skill of dispersion models to capture volcanic plumes (de Leeuw et al, 2020;Dacre et al., 2016;Harvey and Dacre, 2016). The FSS is calculated using the ratio of the modelled and observed fractional coverage of the SO2 plume at each location for various horizontal scales (neighbourhoods) and thresholds, and it assesses how the skill of the forecast varies depending on those parameters. To calculate it we grid the model TCSO2 analyses and forecasts at various lead-times and the 540 NRT TROPOMI and GOME-2B TCSO2 observations on a 1⁰x1⁰ grid and create binary fields for the chosen thresholds (in our case for TCSO2 > 1, 3,5,10,20,30,50 DU). Then, for each grid point, the fraction of surrounding grid points that exceed the threshold is calculated from the model field and the observations. To establish at which horizontal scale the SO2 analysis or forecast is useful we repeat this exercise with neighbourhoods of varying scales (i.e. 1, 3, 5°, corresponding to neighbourhoods of 1, 9 and 25 grid boxes, respectively). An FSS of 1 means perfect alignment of the features in the 545 observations and the model and an FSS of 0 a total mismatch. We use values of FSS greater than 0.5 to define a simulation that has some skill. This value was also used by de Leeuw et al. (2020) and Harvey and Dacre (2016). The FSS for a neighbourhood of length n is calculated following Roberts and Lean (2007) as where MSE is the Mean Square Error and MSE(n)=0 for a perfect forecast of neighbourhood with length n. The reference MSE for each neighbourhood length n is given by:    (1) the skill timescale is longer for the smaller thresholds, i.e. the overall shape of the plume is easier to reproduce than smaller scale filaments with higher TCSO2 values, (2) the skill timescale drops with lead-time, (3) the skill timescale increases if the neighbourhood size is increased, with a larger increase for higher thresholds, pointing to errors in the location 570 of structures with higher TCSO2, which are reduced with a larger grid, and (4) that the skill timescale is greater in LHexp than in BLexp, leading to better forecasts of the plume longer in advance, as already seen in Fig. 16.
We now look at the individual panels in more detail. Figure 17a shows the skill timescales of the TCSO2 analyses from LHexp and BLexp against NRT TROPOMI and illustrates as already seen in Section 4.3.1 that these give similarly useful 575 TCSO2 fields (especially for the lower thresholds), but that the number of useful days is slightly larger in LHexp and that BLexp fails to capture the highest TCSO2 values. It is interesting to see the large number of days with FSS>0.5 for the threshold of 20 DU in LHexp, because this is the value below which no TCSO2 data are assimilated in LHexp. Figure 17b shows that the 24-hour forecasts in LHexp have similar skill to the analysis, which a skill timescale of 24 days for the 1 DU threshold and a neighbourhood size of 1°, illustrating that the 24-hour from the LHexp analysis can predict the overall 580 location of the SO2 plume very well. For higher thresholds (> 30 DU) this drops to about 5 days after the eruption, and there is no skill for a threshold of 50 DU. The skill of the 48 and 72-hour forecasts ( Fig. 17c and d) are similar to that of the 24hour one for the thresholds up to 10 DU, but at a neighbourhood size of 1° the higher values (>20 DU) have no skill anymore. As there is still skill on a 5-day timescale for these forecasts for a neighbourhood size of 3°, this suggests that it is the location of the filaments with high TCSO2 values that is not correct rather than the forecast not maintaining any of the 585 higher TCSO2 values. Even the 96 and 120-hour forecasts ( Fig. 17e and f) in LHexp have a skill timescale of slightly more than 20 days for the 1 DU threshold at 1°, but the skill drops markedly for the higher thresholds, and for the 120-hour forecasts skill is only found for thresholds up to 10 DU at 3° for up to 3 days after the eruption. Nevertheless, Fig. 17 shows that by assimilating SO2 LH data the CAMS system can predict the overall location of the SO2 plume up to 5 days in advance for about 20 days after the initial eruption. This corresponds to the time when the FP_ILMSO2 LH retrieval does not 590 detect volcanic SO2 anymore (see Fig. 8). Figure 17 shows that the BLexp analysis has skill timescales similar to LHexp, confirming what was already seen in Figures   12 and 13. Despite placing the SO2 cloud at the wrong altitude, the overall shape of the SO2 plume is still captured by the SO2 analysis. However, for higher thresholds the number of useful days after the eruption is smaller in BLexp and the 595 forecast skill drops more steeply with forecast lead-time than in LHexp. There is no skill for the 24-hour forecast at 1° for thresholds greater than 20 DU, and for the 48-hour forecasts the skill timescale for a 1 DU threshold at 1° is 15 days, compared to 23 days in LHexp. The skill timescale remains around 14 days in BLexp for the 72 and 96-hour forecasts for a 1 DU threshold at 1° and then drops to 6 days at 120-hours. For the 72 to 120-hour forecasts there is no skill for the higher thresholds for a neighbourhood size of 1°, pointing to a worse misplacement of the smaller scale features of the plume with 600 higher TCSO2 values than in LHexp. Figure 18 shows the skill timescale for LHexp and BLexp analyses and forecasts against GOME-2B data. For GOME-2B the number of useful forecast days are generally slightly lower, especially for a threshold of 1 DU which might just be an artefact because GOME-2B does not detect so many volcanic pixels with low values. For thresholds of 3-30 DU the GOME-605 2 results for a neighbourhood size of 1° or 3° are very similar to the TROPOMI results for all the forecast ranges, with skill timescales of about 10 days for forecast lead times up to 72 hour and around 5 days for the 96-hour forecasts. Again, the performance of BLexp is worse than of LHexp and for the 48-hour forecasts there is almost no skill in BLexp for the 1° neighbourhoods.

Conclusions
In this paper we document the procedure used to assimilate near-real time TCSO2 retrievals from the TROPOMI and 615 GOME-2 instruments in the operational CAMS NRT data assimilation system and explore the use of TROPOMI SO2 layer height data provided by the ESA-funded Sentinel-5P Innovation-SO2 Layer Height Project and produced with the Full-Physics Inverse Learning Machine algorithm (v3.1) developed by DLR. The assimilation of the FP_ILM SO2 LH data was tested for the 2019 Raikoke eruption and compared with results obtained when assimilating NRT TROPOMI TCSO2 data with the operational CAMS configuration. 620 While the operational CAMS approach of placing the SO2 increment in the mid-troposphere around 550 hPa gives surprisingly good results for the TCSO2 analyses and short-range forecasts in a lot of situation (including this case), the vertical distribution of SO2 in the baseline analysis is clearly wrong for the Raikoke eruption which injected a copious amount of SO2 into the stratosphere. By using the FP_ILM TROPOMI SO2 LH data this can be much improved as 625 comparison with the independent SO2 plume heights retrieved from IASI show. While the LH experiment agrees well with the IASI LATMOS ULB plume altitude retrievals, with a mean bias of 0.4±2.2 km, the baseline experiment underestimates the plume altitude with a mean bias of -5.1±2.1 km. Consequently, the assimilation of the FP_ILM LH data leads to much improved SO2 forecasts and should improve the usefulness of the CAMS SO2 forecasts for users and also for the aviation industry. 630 In the baseline experiment the forecast skill drops much more for longer forecast lead-times than in the LH experiment, which is seen when comparing point skill scores such as probability of detection and critical success index and when using the fractional skill score that also assesses spatial skill. Timeseries of the Probability of Detection score show that in the baseline experiment, the skill of forecasting the location of the Raikoke SO2 plume seen by GOME-2B and the NRT 635 TROPOMI one day in advance is similar to the skill of forecasting the SO2 plume 4 days in advance in LHexp. The FSS shows that compared to NRT TROPOMI, even the 120-hour forecasts of the LH experiment have a significant skill up to 20 days after the initial eruption for the prediction of TCSO2 for a 1 DU threshold and a neighbourhood size of 1°, suggesting that the overall location of the SO2 plume is well reproduced. The skill is smaller for higher TCSO2 thresholds (about 5 days https://doi.org/10.5194/gmd-2021-219 Preprint. Discussion started: 10 September 2021 c Author(s) 2021. CC BY 4.0 License. for forecast ranges up to 96-hours on a 1° grid), illustrating that it is more difficult to accurately predict the location of areas 640 with higher SO2 columns which usually have smaller spatial scales. The skill timescale is shorter for the baseline experiment, with values around 15 days after the initial eruption for the 1 DU threshold for forecast ranges up to 96-hours and 5 days for the 120-hours forecasts, but there is no skill for any of the higher thresholds at a neighbourhood size of 1° from 72-hour forecasts onwards. By assimilating FP_ILM SO2 LH data the CAMS system can predict the overall location of the Raikoke SO2 plume up to 5 days in advance for about 20 days after the initial eruption. 645 Our study also documents some issues of the CAMS TCSO2 assimilation approach, namely the overestimation of the SO2 burden and plume area by the data assimilation system, both in the operational configuration and when using the FP_ILM SO2 LH data. The main reason for this overestimation is the coarse horizontal resolution used in the minimisations (currently TL95 and TL159 in the operational CAMS system) which limits the wavenumbers that can be resolved in the wavelet 650 formulation of the SO2 background errors. This in turn limits the horizontal correlation length scale that can be used for the SO2 background errors and that determines how for the increments from individual observations are spread out in the horizontal. In this paper we used TL159/TL255 as the resolutions for the minimisations, but to properly resolve small scale structures the resolutions of the minimisations would have to be even higher. Obviously, this would increase the numerical cost of running the minimisation. 655 Other reasons that can contribute to an overestimation of the SO2 burden or plume area in the CAMS SO2 analysis could be the use of anthropogenic SO2 emissions in the CAMS model as the satellite data used for the comparisons are only the volcanic pixels. However, tests run without the anthropogenic emissions (not shown in this paper) did not show large differences compared to the experiments presented here, suggesting that this is not a big effect for the Raikoke eruption. 660 Another possibility could be the fact that the satellite might miss a plume or part of a plume, but that the whole plume is present in the model. Finally, for the FP_ILM LH data the retrieval is limited to TCSO2> 20 DU (in v3.1) and lower values that might correct an overestimation from the previous analysis cycle are not used. In future we hope to also test the assimilation of IASI SO2 retrievals with plume height information that would add extra information in the CAMS system.

665
One limitation in using the TROPOMI SO2 LH data is that the version of the FP_ILM retrieval used in this study (v3.1) only produces reliable information for TCSO2>20 DU so that most of the smaller volcanic eruptions that happen on a more regular basis than big explosive eruptions would be missed if only the FP_ILM TROPOMI SO2 LH data were assimilated in the CAMS NRT system. Improvements to the FP_ILM retrieval algorithm are on-going so that it should be possible to lower this limit in the future. 670

Code and Data availability
The model data used in this paper are available from https://apps.ecmwf.int/research-experiments/expver/ using the following DOIs for the 6 experiments: