Estimation of CH 4 emission based on an advanced 4D-LETKF assimilation system

. Methane (CH 4 ) is the second major greenhouse gas after carbon dioxide (CO 2 ) which has substantially increased during recent decades in the atmosphere, raising serious sustainability and climate change issues. Here, we develop a data assimilation system for in situ and column-averaged concentrations using a local ensemble transform Kalman ﬁlter (LETKF) to estimate surface emissions of CH 4 . The data assimilation performance is tested and optimized based on idealized settings using observation sys-tem simulation experiments (OSSEs), where a known surface emission distribution (the truth) is retrieved from synthetic observations. We tested three covariance inﬂation methods to avoid covariance underestimation in the emission estimates, namely ﬁxed multiplicative (FM), relaxation-to-prior spread (RTPS), and adaptive multiplicative. First, we assim-ilate the synthetic observations at every grid point at the surface level. In such a case of dense observational data, the normalized root mean square error (RMSE) in the analyses over global land regions is smaller by 10 %–15 % in the case of RTPS covariance inﬂation method compared to FM. We have shown that integrated estimated ﬂux seasonal cycles over 15 regions using RTPS inﬂation are in reasonable agreement between true and estimated ﬂux, with 0.04 global normalized annual mean bias. We then assimilated the column-averaged CH 4 concentration by sampling the model simulations at Greenhouse Gases Observing Satellite (GOSAT) observation locations and time for another OSSE. Similar to the case of dense observational data, the RTPS covariance inﬂa-tion method performs better than FM for GOSAT synthetic observation in terms of normalized RMSE (2 %–3 %) and integrated ﬂux estimation comparison with the true ﬂux. The annual mean averaged normalized RMSE (normalized mean bias) in LETKF CH 4 ﬂux estimation in the case of RTPS and FM covariance inﬂation is found to be 0.59 (0.18) and 0.61 (0.23), respectively. The χ 2 test performed for GOSAT synthetic observations assimilation suggests high underestimation of background error covariance in both RTPS and FM covariance inﬂation methods; however, the underestimation is much higher ( > 100 % always) for FM compared to RTPS covariance inﬂation method.

Abstract. Methane (CH 4 ) is the second major greenhouse gas after carbon dioxide (CO 2 ) which has substantially increased during recent decades in the atmosphere, raising serious sustainability and climate change issues. Here, we develop a data assimilation system for in situ and columnaveraged concentrations using a local ensemble transform Kalman filter (LETKF) to estimate surface emissions of CH 4 . The data assimilation performance is tested and optimized based on idealized settings using observation system simulation experiments (OSSEs), where a known surface emission distribution (the truth) is retrieved from synthetic observations. We tested three covariance inflation methods to avoid covariance underestimation in the emission estimates, namely fixed multiplicative (FM), relaxation-to-prior spread (RTPS), and adaptive multiplicative. First, we assimilate the synthetic observations at every grid point at the surface level. In such a case of dense observational data, the normalized root mean square error (RMSE) in the analyses over global land regions is smaller by 10 %-15 % in the case of RTPS covariance inflation method compared to FM. We have shown that integrated estimated flux seasonal cycles over 15 regions using RTPS inflation are in reasonable agreement between true and estimated flux, with 0.04 global normalized annual mean bias. We then assimilated the columnaveraged CH 4 concentration by sampling the model simulations at Greenhouse Gases Observing Satellite (GOSAT) observation locations and time for another OSSE. Similar to the case of dense observational data, the RTPS covariance inflation method performs better than FM for GOSAT synthetic observation in terms of normalized RMSE (2 %-3 %) and in-tegrated flux estimation comparison with the true flux. The annual mean averaged normalized RMSE (normalized mean bias) in LETKF CH 4 flux estimation in the case of RTPS and FM covariance inflation is found to be 0.59 (0.18) and 0.61 (0.23), respectively. The χ 2 test performed for GOSAT synthetic observations assimilation suggests high underestimation of background error covariance in both RTPS and FM covariance inflation methods; however, the underestimation is much higher (>100 % always) for FM compared to RTPS covariance inflation method.
Regional CH 4 emissions can be estimated from CH 4 concentration fields and chemistry transport models using Bayesian synthesis approaches based on inverse modeling techniques (e.g., Enting, 2002). In such approaches, emissions are optimized on a coarse resolution (e.g., for a limited number of predefined regions) mostly using surface-based observations. CH 4 concentrations are provided by the NOAA cooperative air sampling network sites (Lan et al., 2022) and other networks by the World Data Centre for Greenhouse Gases (WDCGG) website, hosted by the Japan Meteorological Agency. In recent years, satellite measurements have been made by the Greenhouse Gases Observing Satellite (GOSAT) or the TROPOspheric Monitoring Instrument (TROPOMI) (Lorente et al., 2021), covering the globe with fine spatiotemporal scales. GOSAT has provided an extensive global observations of column CH 4 concentrations since 2009 (Yoshida et al., 2013). Some of the inverse modeling studies utilize the satellite observations for CH 4 flux estimation (Zhang et al., 2021;Maasakkers et al., 2016), but this requires enormous computational resources as a result of dealing with more flux regions and more observations.
Grid-based CH 4 flux optimization is also performed using adjoint technique (4-D Var data assimilation) and an ensemble Kalman filter (EnKF) but was limited to small sets of observations (Houweling et al., 1999;Meirink et al., 2008;Bruhwiler et al., 2014). Bruhwiler et al. (2014) followed the EnKF method of Peters et al. (2005) to estimate the CH 4 surface fluxes that utilizes an offline atmospheric chemistry tracer model (ACTM) framework. Techniques such as 4-D Var and EnKF are important to estimate CH 4 fluxes since they can assimilate a large number of observations and manage high-resolution fluxes. In the EnKF system, a flowdependent forecast error covariance structure is provided by ensemble model forecasts, while it does not need an adjoint model, which makes it a simple but powerful tool for flux estimation. One of the limitations of the EnKF method is the dependence of the resolution of state vector on ensemble size, which can give spurious results if the number of ensemble members is much smaller than the rank of the error covariance matrix (Houtekamer and Zhang, 2016).
A local ensemble transform Kalman filter (LETKF) is a type of square-root EnKF that performs analysis locally in space without perturbing the observations (Ott et al., 2002(Ott et al., , 2004Hunt et al., 2007). LETKFs are computationally efficient since the observations are assimilated simultaneously and not serially; it is simple to account for observation error correlation. Miyazaki et al. (2011) and Kang et al. (2012) demonstrated the implementation of LETKF data assimilation system by coupling an ACTM for carbon-cycle research using atmospheric CO 2 observations. It is also extensively applied for the emission estimation of short-lived species using satellite data (Skachko et al., 2016;Miyazaki et al., 2019;Sekiya et al., 2021). In this work, we will estimate the CH 4 fluxes using a LETKF data assimilation system. Assimilation windows ranging from 6 h (Kang et al., 2012) to sev-eral months (Bruhwiler et al., 2014) have been used, depending on the desired time resolution of the estimated emissions, which is often limited by the observational data density. The time frame over which the system behaves linearly and in what time frame the observations respond to the control variables, such as atmospheric transport, as well as observation abundance, must also be taken into consideration. Within an assimilation window, where and when the fluxes would be constrained by specific observations is to be ascertained by the correlation between ensemble prior fluxes and the ensemble CH 4 concentration simulation from a forward model (Liu et al., 2016).
The main objective of this work is to develop an advanced 4-D data assimilation system based on a LETKF that simultaneously estimates atmospheric distributions and surface fluxes of CH 4 . Observation system simulation experiments (OSSEs) are conducted to assess the performance of the LETKF since it is important to test the system against the known emissions or the truth. The OSSE LETKF setup of top-down CH 4 flux estimation using an online ACTM is an essential step before implementation in real in situ and satellite observation.

Formulation of the LETKF system
We briefly describe the LETKF in the application of CH 4 flux estimation, while detailed derivation of equations and code implementation are given elsewhere (Hunt et al., 2007;Miyazaki et al., 2011;Miyoshi et al., 2010). The notation used here for LETKF formulation is adopted from Kotsuki et al. (2017). In the LETKF, the background ensemble (columns of matrix x b ) in a local region evolved from a set of perturbed initial conditions. The background ensemble mean, x b , and its perturbation, X b , are estimated from the ensemble forecast as where m indicates the ensemble size. The background error covariance matrix P b in the m-dimensional ensemble is defined as The analysis ensemble mean x a is derived using background ensemble mean x b and ensemble perturbations X b as where H , Y , R, andP a denote the linear observation operator, ensemble perturbation matrix in the observation space (Y ≡ H x), observation error covariance matrix, and analysis error covariance matrix in the ensemble space, respectively.
The superscripts "o", "b", and "a" denote the observations, background (prior), and analysis (posterior), respectively. w a defines the analysis increment (or analysis weight) in observation space and is derived using the information about observational increment y o − H x b . The analysis error covariance matrix (P a ) in the m-dimensional ensemble space is spanned by ensemble perturbation (Hunt et al., 2007) and defined as Finally, the analysis ensemble perturbations X a at the central grid point are derived such as where {(m − 1)P a } 1/2 is a multiple of the symmetric square root of the local analysis error covariance matrix in ensemble space and could be computed by a singular vector decomposition method. The LETKF solves the analysis update (Eqs. 3 and 5) at every model grid point independently by assimilating local observations within the localization cutoff radius.
We have applied a gross error check as a quality control to exclude observations that are far from the first guess; the appropriate degrees of the gross error check are also examined. Figure 1 shows the schematic diagram of our LETKF setup with two ensemble members for three consecutive assimilation cycles with an 8 d assimilation window. The analysis is obtained at the midpoint time of the assimilation window ( Fig. 1). The analyzed (updated) surface flux is used for the next data assimilation cycle starting from the midpoint time of the previous data assimilation window. The state vector augmentation approach is used to estimate the atmospheric CH 4 surface flux (Kang et al., 2012;Miyazaki et al., 2011).
Assimilation window size and ensemble members are chosen based on computational efficiency and estimation accuracy. A larger assimilation window means fluxes are constrained by more observations; however, it requires handling of large matrix optimization which is difficult in cases of dense observation and introduces sampling errors related to transport errors. In this study, a few sensitivity experiments were performed to demonstrate the choice of assimilation window length and ensemble size when GOSAT synthetic observations are assimilated in Sect. 4.2.

Covariance inflation
The LETKF data assimilation needs variance inflation to mitigate the underdispersed ensemble. We tested three methods: fixed multiplicative (FM), relaxation-to-prior spread (RTPS), and adaptive multiplicative covariance inflation.
The fixed multiplicative (FM) inflation method (Anderson and Anderson, 1999) inflates the prior ensemble by inflating the background error covariance matrix P b defined in Eq. (2) as where P b tmp represents the temporary background error covariance matrix, which is inflated by a factor γ .
The other inflation methods used to prevent the reduction of ensemble spread are relaxation-to-prior perturbation (RTPP) (Zhang et al., 2004) and relaxation-to-prior spread (RTPS) (Whitaker and Hamill, 2012). The RTPP method relaxes the reduction of the ensemble spread after updating the ensemble perturbations, which blends the background and analysis ensemble perturbations as where α RTPP denotes the relaxation parameter of the RTPP. The RTPS inflation method relaxes the reduction of the ensemble spread by relaxing the analysis spread to prior spread as where σ and α RTPS denote the ensemble spread and relaxation parameter of the RTPS, respectively. The range of the α RTPS parameter is bounded by [0,1]. This study focuses mainly on the FM and RTPS covariance inflation methods. In addition, Miyoshi (2011) applied adaptive inflation by determining the multiplicative inflation factors at every grid point at every analysis step using the observationspace statistics derived by Daley (1992) and Desroziers et al. (2005).
where the operator "< · >" denotes the statistical expectation and d = y o − H x b (observation minus first guess), and R is the error observation covariance matrix.
The impact of using the adaptive multiplication inflation method is discussed in the GOSAT synthetic observation assimilation experiments in Sect. 4.2.

MIROC4-ACTM
Model for Interdisciplinary Research on Climate, version 4.0 (MIROC4)-based ACTM (hereafter referred to as MIROC4-ACTM) (Patra et al., 2018;Bisht et al., 2021) is used here for CH 4 concentration simulations. The model simulations have been performed at a horizontal grid resolution of approximately 2.8 × 2.8 • latitude-longitude (T42 spectral truncations) and at hybrid vertical coordinates of 67 levels (Earth's surface to 0.0128 hPa; Watanabe et al., 2008). Bisht et al. (2021) performed multi-tracer analysis and demonstrated the importance of very well resolved stratosphere in the MIROC4-ACTM that illustrates better extratropical stratospheric variabilities and simulated tropospheric dynamical fields. The meteorological fields in MIROC4-ACTM are nudged to the JMA reanalysis (JRA-55) data (Kobayashi et al., 2015). In the first assimilation window (Cycle 1), the dotted lines show the ensemble forecast of CH 4 concentrations (with two ensemble members), the solid line shows the linear combination of the forecasts, and the filled circles show the observations of CH 4 concentration. The data assimilation finds the linear combination of the ensemble forecast by estimating the weight (w a ) that best fits the observations throughout the assimilation window. The analysis weight is applied to obtain optimal surface fluxes (F ) and the concentration of CH 4 at the intermediate time of the data assimilation window. The updated analyzed concentration ensembles are used as initial conditions after relaxation (X a, RLX ) (Eq. 8) for the next ensemble forecast. The spread of the ensemble members represents the forecast error. The schematic is adapted from Kalnay and Yang (2010) and Miyazaki et al. (2011). 3 Experimental setup

Construction of known surface emissions (truth)
Present OSSEs intend to develop basic tuning strategies before the actual data to be assimilated, which is useful to accelerate the operational use of real observations. The OSSE has been discussed here by exploiting the known "truth". The synthetic observations to be assimilated in the OSSE are generated from nature runs which use bottom-up surface emission (true) data to simulate global 3-D CH 4 concentrations. The true surface CH 4 emissions are prepared on the monthly scale using anthropogenic and natural sectors, minus the surface sinks due to bacterial consumption in the soil (Chandra et al., 2021). The anthropogenic emissions were obtained from the Emission Database for Global Atmospheric Research, version 4.3.2 inventory (EDGARv4.3.2) (Janssens-Maenhout et al., 2019), which includes the emissions from the major sectors, such as fugitive sources, enteric fermentation and manure management, and solid waste and wastewater handling. The biomass burning emissions are taken from the Global Fire Database (GFEDv4s) (van der Werf et al., 2017) and Goddard Institute for Space Studies emissions (Fung et al., 1991). The wetland and rice emissions are taken from the process-based model of the terrestrial biogeochemical cycle, Vegetation Integrated Simulator of Trace gases (VISIT) (Ito, 2019), which is based on Cao et al. (1996). Other natural emissions, such as those from the ocean, termites, and mud volcanoes are taken from the TransCom-CH 4 inter-comparison experiment . The total emissions are taken as the truth for the OSSEs, and the concentration simulated by MIROC4-ACTM will be referred to as synthetic observations.

Prior flux preparation and LETKF setting
Based on our understanding of CH 4 inverse modeling, the uncertainty in regional flux estimation is found to be 30 % or lower (Chandra et al., 2021). Therefore, we attempted to reproduce the true flux by starting with a prior flux that is lower than the true flux by 30 % (prior flux has the same seasonal cycles as true flux). The MIROC4-ACTM is initialized with a spin-up of 3 years (2007-2009) with prior flux distribution. The initial CH 4 distribution on 1 January 2007 was taken from an earlier simulation of 27 years. An initial perturbation with standard deviation of approximately 6 %-8 % spread is applied to the a priori flux as the initial ensemble spread, whereas no ensemble perturbation was applied to the initial CH 4 concentration. The sensitivity of the initial ensemble spread to CH 4 flux estimation is discussed in Sect. 4.2. The uncertainty to perturb prior fluxes is generated based on random positive values with normal distribution. The monthly scale prior emission is linearly interpolated at 6-hourly intervals to be used in the MIROC4-ACTM simulation for data assimilation. This study performs two LETKF data assimilation experiments. In these experiments, we provided initial perturbation on a regional basis over land (53 different land regions; Chandra et al., 2021), and at every grid over ocean, no spatial error correlation between grid points is considered among ensemble members. However, in Sect. 4.2.5, we also discussed the sensitivity of CH 4 data assimilation by providing the initial ensemble spread at every grid by considering the horizontal spatial error correlation between grid points among ensemble members, with a global mean correlation of 20 %.

Experiment 1: synthetic dense observation formulation
The OSSE setting with very accurate and dense observation surface data is an attempt to demonstrate that the data assimilation system works reasonably in the estimation of the true surface flux. Errors in the estimated flux could arise due to the insufficient ensemble size and also the implemented inflation methods to overcome the undersampling, along with a simplified forecast process of the emissions. In real data assimilation, there are additional sources of potential errors, such as atmospheric transport and inappropriate prior or observation uncertainties. In our OSSEs, CH 4 fluxes as mentioned in Sect. 3.2 are used as "true" fluxes in generating synthetic observations (CH 4 concentrations). In Experiment 1, the simulated surface layer CH 4 concentrations at each grid for the entire globe were used as synthetic observations. We added a constant measurement uncertainty of 5 ppb, which is typically achieved by the present-day measurement systems (e.g., Lan et al., 2022). In this study, the CH 4 observations are assimilated by applying the observation error covariance localization (Kotsuki et al., 2020) to reduce the spurious spatial correlation due to a smaller ensemble size than the degrees of freedom of the system (R where d h and d v denote the horizontal distance (km) and vertical difference (log[Pa]) between the analysis model grid point and observation location. The tunable parameters σ h and σ v are the horizontal localization scale (km) and vertical localization scale (log[Pa]), respectively. Using the spatial localization technique, we have estimated the CH 4 flux for each grid by choosing the CH 4 observations that influence the grid point using an optimal cutoff radius ( 3.65σ h, v ; Miyoshi et al., 2007) with a horizontal covariance localization (σ h ) of 2200 km and a vertical covariance localization (σ v ) of 0.3 in the natural logarithmic pressure (log[Pa]) coordinate. The localization is performed to improve the signal-to-noise ratio of ensemble-based covariance. Numerous sensitivity experiments have been performed by varying the horizontal and vertical localization length in order to obtain the optimized CH 4 flux that best compares with the truth. The LETKF assimilates the observations within the specified radius to solve the analysis state at each grid point independently (Liu et al., 2016;Kotsuki et al., 2020). The state vector of the analysis includes the atmospheric CH 4 concentration, which is the prognostic variable of forecast model, and the state vector is further augmented by the surface CH 4 flux, which is not a model prognostic variable. This augmentation enables the LETKF to directly estimate the parameter through the background error covariance with observed variables (Baek et al., 2006). The state vector augmentation is implemented similar to that used by Miyazaki et al. (2011). This approach analyzes CH 4 flux during the analysis step. The purpose of the simultaneous CH 4 emission and concentration optimization is to reduce the uncertainty of the initial CH 4 concentrations on the CH 4 evolution during the assimilation window and to maximize the observations potential (Tian et al., 2014).
The atmospheric CH 4 concentration is changed during both the analysis and forecast steps. A challenge of this scheme is that the analysis increment is added to the model state at each analysis step, without considering the global total CH 4 mass conservation in the model but consistent with the observed local CH 4 abundance.
In this case, the surface flux at every model grid point is analyzed with an 8 d assimilation window during the year 2010 with 100 ensemble members. The ensemble size and assimilation window are chosen based on the CH 4 flux estimation accuracy calculated by performing a sensitivity experiment for the ensemble size (60, 80, and 100) and assimilation window (3 and 8 d), respectively (not shown).

Experiment 2: synthetic satellite observation formulation
One way to address the real-world CH 4 flux estimation problem is to first make the OSSE dataset like real observations. In this OSSE, we have assimilated synthetic column-averaged CH 4 concentrations with a coverage mimicking GOSAT satellite observations. We prepared a model-simulated column-averaged CH 4 concentration (XCH 4 ) dataset that is spatiotemporally sampled with GOSAT observations as follows: XCH 4 = XCH 4(a priori) + j h j a j (CH 4(ACTM) − CH 4(a priori) ) j , (10) where XCH 4 is the column-averaged model-simulated CH 4 concentration. XCH 4(a priori) is a priori column-averaged concentration. CH 4(ACTM) and CH 4(a priori) are the CH 4 profile from ACTM and a priori, respectively. h j is the pressure weighting function (j is the vertical layer index), and a j represents the averaging kernel matrix for the column retrieval, which is the sensitivity of the retrieved total column at the various ("j ") atmospheric levels. In the next step, we added the same retrieval (XCH 4 ) error as GOSAT to the XCH 4 (ACTM-simulated) to make the OSSE more realistic and then attempt to estimate the true fluxes. In this case, the CH 4 flux has been estimated for each grid by choosing the CH 4 observation with a cutoff radius ( 3.65 σ h,v ), with a horizontal covariance localization (σ h ) of 5000 km and a vertical covariance localization (σ v ) of 0.35 in the natural logarithmic pressure (log[Pa]) coordinate. The optimal horizontal and vertical covariance localization values are chosen based on a trial-and-error method (those with the best fits to estimate the CH 4 flux when compared with truth). A long cutoff radius has been chosen due to sparse observational coverage of GOSAT. Covariance localization is necessary to remove long-range erroneous correlations and for mitigating sampling errors in the ensemble-based error covariance with a limited ensemble size (Miyoshi et al., 2007;Greybush et al., 2011;Kotsuki et al., 2020). The surface flux is analyzed at every model grid point with an 8 d assimilation window and 100 ensemble members; they are chosen based on the sensitivity experiments discussed in Sect. 4.2.
4 Results and discussion

Experiment with dense OSSEs
The time series of normalized root mean square error where x a i and x t i are the analysis and true state at ith model grid point, n is the total number of grid points, andx t represents the mean of true flux) in the analyses over the global landmass region is shown in Fig. 2. The normalized global RMSE is calculated using FM and RTPS inflation methods (Fig. 2) after assimilating synthetic observation at every grid (Sect. 3.4). Noteworthy is that the experiment with the FM inflation method shows 10 %-15 % larger error in estimating the atmospheric surface CH 4 flux compared to the RTPS inflation method. One of the reasons of the better RMSE using the RTPS inflation method is the higher number of degrees of freedom provided by relaxation (α RTPS ) in the ensemble spread (Eq. 8) that could nudge the ensemble of CH 4 concentrations towards observations. The initial flux analysis spread using RTPS and FM is shown in the Supplement (Fig. S1) and shows larger initial analysis flux spread over Brazil, tropical America, and Asia in RTPS inflation compared to the FM inflation method. We performed numerous sensitivity tests with the RTPS inflation method and found that uniform relaxation is not substantial for some of the regions. Figure 2 shows the RMSE for FM, fixed RTPS (α RTPS = 0.4; applied globally, the optimized value is obtained by manual fine-tuning), and conditional RTPS (α RTPS = 0.3-0.7 applied different α RTPS values regionally by manual fine-tuning). In the case of conditional RTPS, the optimal values of α RTPS , i.e., 0.6, 0.3, and 0.7 for the regions south of 20 • S, 20 • S-20 • N, and north of 20 • N, respectively, were obtained from data assimilation sensitivity calculations with varying α RTPS values for the three regions separately to best match the true states. We find that the conditional RTPS method improves the accuracy by ∼ 5 % compared to fixed RTPS and 10 %-15 % compared to FM. In the following, we discuss the results obtained using the conditional RTPS and FM inflation methods.
We have also shown the RMSE (not normalized) of the surface flux in the Supplement (Fig. S2). The flux RMSE has been estimated globally for both the inflation methods and also for the region south of 20 • N (by considering only those land grids which fall in the region south of 20 • N; Fig. S2) for comparative purposes. It was noticed that (Fig. S2 in the Supplement), above north of 20 • N, the flux estimation error is higher, specifically during spring-summer when CH 4 emissions peak over most of the northern hemispheric regions (Fig. 3). The high uncertainty during spring-summer (Fig. S2) in the flux estimation over these regions could appear due to the attenuation of surface observations as a result of active vertical mixing. The RMSE during autumn (Fig. S2) is comparable in the case of the global region and the region south of 20 • N, which indicates that the RMSE is arising from southern hemispheric regions, likely over Brazil, as it peaks during autumn (Fig. 3). Figure 3 shows a regional total flux seasonal cycle comparison of the estimated fluxes for 15 terrestrial regions with the cycles of the prior and true fluxes. The estimated flux retrieved using RTPS inflation method over different regions agrees well with that of the true flux. We intend to show the capability of LETKF-estimated fluxes over these regions using surface observations to mimic the true fluxes in our understanding of the terrestrial biosphere CH 4 cycle. These results are consistent with Fig. 2, with an annual global normal- It can also be noticed from Fig. 3 that estimated fluxes converge to true fluxes over most of the regions after about 2-3 months.
To see the degree of similarity in the flux distribution between the estimated and true fluxes, we show monthly mean spatial flux distribution for June and November in Figs. 4 and 5, respectively, along with the bias in the prior and estimated flux. As shown in Figs. 4 and 5, the general spatial patterns of the true flux are estimated well. These results suggest that our LETKF system is capable of reproducing continental spatial flux patterns by using such idealized dense surface observational data. However, some clear differences in flux estimation could be noticed from the FM and RTPS inflation method (Figs. 4 and 5); e.g., over the Eurasian and American continent, analysis with RTPS shows clear improvement compared to the FM covariance inflation method. We calculated the global mean normalized bias with the RTPS and FM covariance inflation method, which is found to be −0.04 and −0.11, respectively, over land regions, and this showed that RTPS significantly improved the flux estimation compared to the FM covariance inflation method.

Experiment by mimicking the real satellite observational dataset
In this section we discuss the LETKF flux estimation by assimilation of GOSAT synthetic CH 4 concentration observations. Figure 6 shows the model-simulated mean XCH 4 concentration sampled spatiotemporally with GOSAT observations during January and July for the year 2010 (sampling method discussed in Sect. 3.4). In this case we have shown different LETKF sensitivity experiments, such as LETKF sensitivity to (1) FM, RTPS, and adaptive multiplicative inflation; (2) the assimilation window; (3) the ensemble size; (4) the χ 2 test; and (5) the prior ensemble spread. In the LETKF sensitivity experiments from 1-4, the initial ensemble spread employed a similar method to Experiment 1, and conditional RTPS inflation method is used. A conditional RTPS method is also used in Sect. 4.2.6 for CH 4 flux estimation.

LETKF sensitivity to FM, RTPS, and adaptive multiplicative inflation
This study mainly emphasizes FM and RTPS inflation methods used in CH 4 LETKF data assimilation. The annual average normalized RMSE (absolute bias) with RTPS and FM covariance inflation is found to be 0.59 (0.18) and 0.64 (0.22), respectively. The RTPS inflation method performs better than the FM inflation method overall. In addition to RTPS inflation, a sensitivity test is also performed using an adaptive multiplicative inflation method.
In the adaptive inflation, we need to provide an initial multiplicative inflation factor at the beginning of data assimilation cycle (Cycle 1 in Fig. 1). Following the method of Miyoshi (2011), the multiplication inflation factor information calculated in the previous cycle (i.e., Cycle 1 in Fig. 1) is used for the next data assimilation cycle at every grid point (Cycle 2 in Fig. 1). We perform two sensitivity experiments. In the first (second) case, we provided 50 % (40 %) initial inflation in the beginning of Cycle 1 (Fig. 1). The normalized RMSE in the both the adaptive inflation sensitivity experiments is comparable (0.65, Supplement Fig. S3a) till July, but from the beginning of August, the RMSE increases exponentially in the first experiment. However, in terms of the χ 2 distribution, CH 4 flux estimation with the first sensitivity adaptive multiplicative inflation experiment (50 % initial inflation case) is better than with the second sensitivity experiment (Supplement Fig. S3b; χ 2 test described in Sect. 4.2.4). To identify the regions of high estimated CH 4 flux error, we have shown the background error spread in CH 4 flux estimation over 15 regions (Supplement Fig. S3c) and found that the spread over west and southeast Asia rises exponentially post-July, which indicates the rise of estimated CH 4 flux error over these regions in the first sensitivity adaptive multiplicative inflation experiment. Our analysis suggests that CH 4 flux estimation depends on the initial inflation factor provided in the beginning of the data assimilation cycle (Cycle 1, Fig. 1) in the adaptive multiplication method. Also, we need to be very careful to monitor the background error spread evolution with time to estimate the CH 4 flux with adaptive inflation; the χ 2 distribution analysis is not sufficient.
In the case of RTPP inflation, we found the parameter α RTPP is very difficult to fine-tune due to its very high sensitivity to estimating the CH 4 flux. We fail to obtain an optimized α RTPP value to estimate the CH 4 flux. Whitaker and Hamill (2012) also demonstrated the better accuracy of the LETKF meteorological data assimilation with RTPS compared to the RTPP covariance inflation method. They found the RTPP method produces very large errors if the inflation parameter exceeds the optimal value.

Assimilation window
The LETKF data assimilation window length determines the time span of the observations assimilated in each assimilation cycle. We have shown the sensitivity of two assimilation window size configurations, 3 and 8 d, in the Supplement Fig. S4. Our sensitivity experiments with window size configurations show that the 8 d long assimilation window estimates the CH 4 flux with better accuracy (∼ 10 %) compared to the 3 d assimilation window because more observational information is incorporated into the system with the 8 d long assimilation window. This study uses an 8 d assimilation window for CH 4 LETKF data assimilation. Figure 7a shows the RMSE using different ensemble members. The RMSE stabilizes gradually as the ensemble size increases from 60 to 80 to 100 ensemble members. The ensemble size dependency of flux estimation suggests the further scope of the improvement in flux estimation by increasing  The actual retrieval errors are added in the synthetic GOSAT observations. Data are shown for 2 representative months, depicting the Southern and Northern Hemisphere differences in data coverage. the ensemble members. In this study we stick to 100 ensemble members due to high computational cost while solving large covariance matrices. The larger error in flux estimation in the case of column-averaged synthetic GOSAT CH 4 observations assimilation compared to dense observations (Fig. 2) is likely due to the weaker constraint on surface fluxes provided by satellite observations and sparse observations.

χ 2 test
We have carried out a χ 2 test for the evaluation of background error covariance matrix (Miyazaki et al., 2012). For the χ 2 test, the innovation statistics are diagnosed from the observation minus forecast y o − H x b , the estimated error covariance in the observation space (H P b H T + R), and the number of observations k as Using this statistic, the χ 2 is defined as follows: The performance of the background error covariance matrix is determined based on the high and lower value of χ 2 . The χ 2 value should converge to 1; a value higher (lower) than 1 indicates underestimation (overestimation) of the background error covariance matrices. Our results suggest that the background error covariance matrix is highly underestimated in both RTPS and FM covariance inflation methods (Fig. 7b). However, the χ 2 values' convergence towards 1 is better in the case of RTPS compared to the FM covariance inflation method, which indicates the improved representation of background errors and then more appropriate data assimilation corrections in the case of the RTPS inflation method. The χ 2 distribution starts saturating after the month of March. Post-March analysis shows the background error covariance matrix underestimation is much higher (>100 %) in the case of FM compared to the RTPS covariance inflation method.

CH 4 LETKF sensitivity to the initial ensemble spread
A test case for CH 4 LETKF data assimilation has been performed, where the initial spread is provided by considering the initial perturbation on each model grid with spatial error correlation between grid points among ensemble members, with a global mean correlation of 20 %. In this case, we found that the analysis fluxes are extremely sensitive to the initial ensemble spread if prior fluxes are perturbed with more than 5 % prior uncertainty. Therefore, we used initial ensemble perturbation with only 2 % prior uncertainty. Reducing the initial ensemble spread reduces the CH 4 flux estimation sensitivity (>60 %). However, it also poses a challenge to mitigate the underdispersed background error covariance matrix.
We performed LETKF data assimilations in this case with the RTPS covariance inflation method (α RTPS = 0.9 optimized value is used here uniformly) with an 8 d long assimilation window and 100 ensemble members and calculated the normalized RMSE between the analysis and true fluxes (Supplement Fig. S5). It is noteworthy that the estimated error between the analysis and true fluxes (Fig. S5) with this setting (grid-wise initial ensemble spread) is still larger (25 %) than the case when the region-wise initial ensemble spread is used ( Fig. 7a; 100 ensemble size). It suggests that initial ensemble spreads among ensemble members need to be meticulously chosen so that they best represent CH 4 variability among ensembles to estimate the CH 4 flux. Note that the OSSEs used in this study did not consider the effects of model errors other than CH 4 fluxes, such as model transport errors. In real situations, model errors can have a substantial impact on flux estimates (Locatelli et al., 2013), which needs to be taken into account in background covariances. Therefore, the optimal data assimilation setting can differ between the OSSEs presented in this study and real observation cases. Further efforts, e.g., by conducting a more comprehensive OSSE that accounts for various model errors and by performing various sensitivity calculations in real cases, would provide an improved understanding of the optimal inflation settings to improve CH 4 flux estimates in following study.

Estimated CH 4 flux analysis
Figure 8 shows the regional flux seasonal cycle comparison for the estimated fluxes over 15 terrestrial regions with the cycles of the prior and true fluxes. We have also shown assimilation results in the case of the FM inflation method in the Supplement (Fig. S6), which shows the flux estimation disagreement over more regions compared to the RTPS inflation method, e.g., for tropical and North America, the whole African continent, and Australia-New Zealand.
We have shown the GOSAT observations in Figs. 6 and S7. We found very marginal flux estimation improvement over Central Africa after May (Fig. 8), which could be associated with the lower GOSAT coverage over this region (Fig. 6). On the other hand, over northern Africa, no improvement in flux estimation is found. In the case of dense OSSEs too (Fig. 3), we did not find satisfactory flux estimation over northern Africa, which is most probably related to the insufficient initial spread among ensemble members over this region (we used the same initial ensemble spread in both OSSE cases). Over Europe, GOSAT observations are remarkably fewer, specifically for the first few months (January-April; Supplement Fig. S7). Therefore, the flux update over Europe would be influenced by the observations from neighboring regions falling under the chosen cutoff radius that are mainly in northern Africa, where the flux estimation itself not satisfactory. It could also be noticed that the retrieval error added in this OSSE case is high over Europe (September-October;   Fig. S7) and its adjacent sea (Mediterranean Sea; June-August), which could also affect the surface CH 4 flux estimation. Figures 9 and 10 show spatial patterns of the true and estimated fluxes by assimilating the column-averaged CH 4 concentrations during June and November (Fig. 6). It may be noticed that the RTPS covariance inflation method is more able to estimate the true flux pattern compared to the FM covariance inflation method. The spatial pattern shown using the RTPS inflation method emphasizes the positive and negative bias in the estimated flux (Figs. 9 and 10) but generally agrees with the flux seasonal cycle plots shown in Fig. 8.
Our LETKF CH 4 data assimilation experiment by assimilating GOSAT synthetic observation with the implementation of the advanced RTPS covariance inflation method better estimates the time-evolving surface CH 4 fluxes compared to the FM covariance inflation method. The difficulty to estimate the surface CH 4 flux over a few regions may be overcome by applying additional methodologies, such as the assimilation of surface observations simultaneously and the use of information about the CH 4 flux climatology. A correction factor derived based on empirical formulation that could use CH 4 flux climatology information is needed to apply to maintain the CH 4 mass conservation. This could be imple- mented by checking the simulated CH 4 burden gain between years in comparison with the observed CH 4 growth rates.

Summary
In this study, we have introduced a 4D-LETKF data assimilation system that utilizes MIROC4-ACTM as a forward model for CH 4 flux estimation. This study has extensively tested both FM and RTPS inflation methods for the LETKF CH 4 flux estimation. We have conducted two experiments to demonstrate the ability of LETKF system to estimate the CH 4 surface flux globally. In Experiment 1, we have assimilated the synthetic dense surface CH 4 observations, while in Experiment 2, synthetic GOSAT CH 4 observations are assimilated. Based on the results of the sensitivity tests using FM and RTPS inflation methods in Experiment 1, we have found that RTPS inflation produces significantly less normalized RMSE (10 %-15 %) compared to the FM inflation method. In Experiment 2, we discussed LETKF parameters, such as different inflation techniques, ensemble size, assimilation window, initial ensemble spread sensitivity, and χ 2 test. The ensemble size (this study uses maximum 100 ensemble members) sensitivity test suggests that more ensem-ble members could help to accurately represent the covariance matrix with a higher number of degrees of freedom. The assimilation window sensitivity test shows that an 8 d assimilation window reduces the normalized flux RMSE by about 10 % compared to a 3 d assimilation window in the case of GOSAT synthetic observations assimilation.
Our approach of assimilation with RTPS inflation could provide a higher number of degrees of freedom to fit the ensemble of CH 4 concentrations to the observed ones, resulting in the improved analyzed fluxes. The RTPS inflation method is capable of obtaining reasonable flux estimates with a normalized annual mean bias of 0.04 and 0.61 in the case of dense surface synthetic observations and GOSAT synthetic observations, respectively. We demonstrated in our sensitivity OSSE with synthetic GOSAT observations that, over American and African continents and also over Australia-New Zealand, the LETKF data assimilation with the FM inflation method does not show much improvement in the true flux estimation, but the RTPS inflation method reasonably estimates the true flux over most of these regions. One of the reasons for better flux estimates with the RTPS inflation method is the drastic prevention of analysis spread. In the CH 4 LETKF flux estimation, the surface CH 4 flux is not a Figure 10. Same as Fig. 9 but for November. prognostic state vector in the ACTM, which results in the continuous decay of spread in analysis steps. The RTPS inflation method could mitigate such an underdispersed spread problem. This study finds that spatially homogeneous relaxation is not sufficient. It needs to be fine-tuned and applied conditionally.
The sensitivity of LETKF CH 4 flux estimation to the initial ensemble spread needs to be carefully dealt with when applied to real data assimilation system. A future OSSE with an additive covariance inflation technique could be interesting while applied with the RTPS inflation method for CH 4 LETKF data assimilation since in additive covariance inflation, initial estimated flux error cannot propagate. The state vector augmentation technique used here updates the flux after each data assimilation cycle, but it does not conserve the total atmospheric CH 4 amount, which is one of the limitations of this work. A correction factor needs to be implemented to conserve the total atmospheric CH 4 amount after completion of a few data assimilation cycles. We have not accounted for the transport error due to meteorological fields in this work ; in the case of real observation data assimilation, a week-long window may introduce transport errors in CH 4 analysis because of the nonlinear growth of ensemble perturbations.
Code and data availability. The LETKF source codes can be accessed from https://doi.org/10.5281/zenodo.7127658 (Bisht et al., 2022a). All the scripts for running the LETKF data assimilation software and the input and output result data files are available at https://doi.org/10.5281/zenodo.7098323 (Bisht et al., 2022b). The CH 4 ACTM simulation module coupled with MIROC4-AGCM can be accessed from https://doi.org/10.5281/zenodo.7118365 (Bisht et al., 2022c). The source code of MIROC4-AGCM is archived at https://doi.org/10.5281/zenodo.7274240 (Patra et al., 2022) with restriction because of the copyright policy of the MIROC developer community. This work did not contribute to the MIROC4 source code development.
Author contributions. The LETKF data assimilation experiments were designed by JSHB. PKP, MT, and TS helped to set up the LETKF code on MIROC4-ACTM for CH 4 data assimilation. The manuscript was prepared by JSHB, and analysis interpretation input and feedback were provided by PKP, TS, and KM. All coauthors, KM, TS, PKP, NS, MT, and YK, contributed to the writing and revision of the paper.