Several metrics have been proposed and utilized to diagnose the performance of linear Bayesian and geostatistical atmospheric inverse problems. These metrics primarily assess the reductions in the prior uncertainties, compare modeled observations to true observations, and check distributional assumptions. Although important, these metrics should be augmented with a sensitivity analysis to obtain a comprehensive understanding of the atmospheric inversion performance and improve the quality and confidence in the inverse estimates. In this study, we derive closed-form expressions of local sensitivities for various input parameters, including measurements, covariance parameters, covariates, and a forward operator. To further enhance our understanding, we complement the local sensitivity analysis with a framework for a global sensitivity analysis that can apportion the uncertainty in input parameters to the uncertainty associated with inverse estimates. Additionally, we propose a mathematical framework to construct nonstationary correlation matrices from a precomputed forward operator, which is closely tied to the overall quality of inverse estimates. We demonstrate the application of our methodology in the context of an atmospheric inverse problem for estimating methane fluxes in Los Angeles, California.

© 2022 Jet Propulsion Laboratory, California Institute of Technology. Government sponsorship acknowledged.

In atmospheric applications, inverse models are frequently used to estimate global- to regional-scale fluxes of trace gases from atmospheric measurements

The sensitivity analysis in this work is covered under local and global themes. Primarily, we focus on local sensitivity analysis (LSA), which measures the effect of a given input on a given output and is obtained by computing the partial derivatives of an output quantity of interest for an input factor (see

Overall, in atmospheric trace gas inversions, the LSA is mostly performed. Within this context, the LSA assesses how sensitive the posterior estimates of fluxes are regarding the underlying choices or assumptions, like the (1) observations included, (2) model data error covariance, (3) input prior information and its error, and (4) forward operator (for a discussion, see

The LSA can be grouped with standard information content approaches such as an averaging kernel and degrees of freedom for signal (DOFS; for details, see Sect.

In this study, we focus on the quality of the inverse estimates of the fluxes, which means providing diagnostic metrics to improve our understanding of the impact of input choices on the inverse estimates of fluxes and thus improve the quality of the inverse model. Specifically, in this technical note, we provide (1) closed-form expressions to conduct LSA by computing partial derivatives, (2) a scientifically interpretable framework for ranking thousands of spatiotemporally correlated input parameters with the same or different units of measurement, (3) a mathematical schema for conducting a global sensitivity analysis (GSA), and (4) a technique to assess the spatiotemporal correlation between the forward operators of two or multiple observations, which is tied to the overall diagnostics of the estimated fluxes and can lead to improved representation of errors in the forward operator.

In generic form, a linear inverse problem can be written as follows:

In a typical linear atmospheric inverse problem (see Fig.

The schema for performing a linear atmospheric inversion to obtain the estimates of the emissions of greenhouse gases. The middle column (the box with a green background) lists all of the input parameters that are required to perform an inversion, whereas the right column (the box with an orange background) lists the modeling process (box 7) and the output obtained after performing an inversion (box 8). Note that this work focuses on understanding and ranking the impact of the input parameters (boxes 3, 4, and 6) on the estimates of fluxes (box 8) and developing correlation structures from the forward operator (box 5).

Within the previously mentioned setup, the choice of the input parameters, including the forms of error structures, profoundly impacts the quality of the inverse estimates of fluxes. Understanding the impact of these input parameters is critical for evaluating the quality of the estimated fluxes. Thus, we first (Sect.

In inversions that assimilate all observations simultaneously, a forward operator for each observation included in an inversion is obtained from a transport model. These observations can be obtained from multiple platforms, including an in situ network of fixed locations on the surface, intermittent aircraft flights, and satellites. In most situations, the spatiotemporal coverage of these forward operators is visually assessed by plotting an aggregated sum or mean of their values over a spatial domain. However, the standard quantitative metrics to evaluate their coverage and intensity in space and time remain absent. In this study, we present two metrics for this assessment, which are defined below. These metrics are in accordance with the triangular inequality and are distances in their respective metric spaces.

Note that, in the published literature on trace gas inversions, sometimes the forward operator obtained from a transport model is referred to as a sensitivity matrix, Jacobian, or footprint. Henceforth, we always refer to the Jacobian, sensitivity matrix, or footprint as a forward operator to avoid misinterpretation. We show our application through forward operators constructed by running a Lagrangian transport model. However, the proposed methods can also be applied in the Eulerian framework (see

The integrated area overlap measurement index (IAOMI) summarizes the shared information content between two forward operators and hence indirectly between two observations. It is, therefore, a measure of the uniqueness of the flux signal associated with an observation compared to other observations.

A Venn diagram that defines IAOMI in terms of two hypothetical forward operators

Intuitively, IAOMI can be better understood spatially. For a given time point, consider two forward operators

Note that the IAOMI defined in Eq. (

A dissimilarity measure can be obtained from

The spatiotemporal area of dominance (STAD) stems naturally from IAOMI. For any two forward operators,

For two forward operators

One can use

For linear Bayesian and geostatistical inverse problems, the solutions (see

The analytical solutions for the unknown fluxes

Note that

The local sensitivity of

Note that by utilizing

We can directly compute the local sensitivity of

Sometimes, it is essential to know the influence of the prior of any particular grid point or an area consisting of few grid cells within

The local sensitivity of

There might be times when we would like to know the sensitivity of the transport (

In order to compute the local sensitivities of

All of the quantities

The first two quantities

The GSA is a process of apportioning the uncertainty in the output to the uncertainty in the input parameters. The term “global” stems from accounting for the effect of all input parameters simultaneously. This is different from LSA, where the impact of a slight change in each parameter on the functional output is considered separately, while keeping all other parameters constant. Although quite significant, a detailed GSA is challenging, as it requires knowledge of the probabilistic variations in all the possible combinations (also known as covariance) of the input parameters, which is unavailable in most situations. However, sometimes it might be possible to know the approximate joint variation in a small subset of the input parameters (e.g., the covariance between

Broadly, we can consider

It is challenging to estimate covariance quantities such as the cross-covariance between

Even after simplification, the implementation of Eq. (

Besides the variance-based method, there are many different approaches for performing a GSA, as described in Appendix

In atmospheric inverse modeling, we encounter two situations while ranking the importance of the parameters. This involves ranking the parameters when they have the same or different units. The situation of ranking parameters with the same units arises when we want to study the influence of a group of parameters, like observations with the same units. Comparatively, the ranking of parameters with different units occurs when we want to explore the impact of groups of parameters with dissimilar units of measurements, like observations in

In a regression-based approach, as described in Eq. (

Note that the least absolute shrinkage and selection operator (LASSO) or principal component analysis (PCA) can also rank parameters under multicollinearity. However, both of these methods result in unbounded weights. Furthermore, an “inference after selection” approach is ambiguous for the LASSO coefficients (see

The regression-based approach described above can rank parameters with the same and different units of measurement. However, an additional normalization step is required to obtain the overall rank of the parameters with varying units of measure, as in

Once the normalized sensitivity vectors are obtained for each group, the regression methodology, as described above, can be used to rank the importance of each group. The ranking methodology proposed above does not account for the nonlinear relationship between the estimates of the fluxes and the derivatives. If this is a concern, then the strength of the nonlinear relationship among the derivative vectors can be first obtained by computing the distance correlation between the fluxes and the local derivatives of the parameters. If necessary, variable transformation techniques such as the Box–Cox transformation (see

Note that in most batch inversion methods, the DOFS is used to assess the information content provided by observations.

Finally, all diagnostic methods applied in the context of any regression-based model can be used to understand the relationship between dependent and independent variables; however, what covariates to include in

To demonstrate the applicability of our methods, we utilize data from our published work on

For each observation included in the case study, a forward operator was obtained by using Weather Research Forecasting stochastic time inverted Lagrangian model (see

In this work, we identify STAD for the 4 d period during which the inversion was performed. The spatial domain of the study over this period is uniquely disaggregated by STAD, as shown in Fig.

Study area with county boundaries, measurement locations, and the spatiotemporal area of dominance of measurement locations. The dotted black line shows the area constrained by observations, as shown in

Overall, STAD for each site indicates the spatial regions of fluxes over a period that contribute most to the observational signal observed at a site, thus allowing us to associate the change in fluxes with the specific area in the basin for which reductions or increases in emissions are likely to have occurred. Some information in the observational signal is shared between observations from different sites. This shared information (though not shown) can be computed as part of STAD and forms part of the overall basin-scale estimates of fluxes that combine measurements from all sites. Note that STAD does not represent the network's coverage (i.e., regions of emissions constrained by observations). These regions are shorter than STAD (see the gray outline in Fig.

Heatmap of the aggregated forward operators for the case study period.

One of the main goals of the sensitivity analysis after performing an inversion is to identify the observations that had the most influence on the flux estimates. Other than the observations, it is also essential to explore the importance of the different input parameters to an inversion, like variance parameters in

The importance scores and ranking of eight sites based on the sensitivity of the estimated fluxes (

The importance of individual measurements in influencing

The sensitivities

For the case study in this work, we find that the observations collected at the GRA site that is located nearest to the source of the Aliso Canyon gas leak are most influential in governing

Outliers have a significant impact on these rankings. The high weight associated with even one observation from a site can make that site more important compared to other sites. For example, if we remove the observation with the highest weight from each site, ONT is the most important site, followed by GRA, CMP, IRV, CIT, FUL, BND, and USC. As part of the sensitivity analysis, examining the influence of the observations associated with high weights is crucial because they are likely to have an enormous impact on the flux estimates. Site-level importance should be judged not only by examining the aggregated ranking, as presented in Table

The ranking of each observation in influencing the estimates of fluxes can be obtained by examining the weights of the column vectors of

After the two-step normalization of

Some of these quantities are correlated and should be seen in conjunction. For example,

Grouped local sensitivities of the estimated fluxes (

The increased model resolution also results in the lower importance of

Scatterplots of the relationships between

This study lays out the techniques for assessing the quality of the inferred estimates of fluxes. A sensitivity analysis is an important diagnostic tool for understanding the impact of the choices made with respect to input parameters on the estimated fluxes. However, it is not a recipe for selecting the proper forms of

The case study in this work is designed only to demonstrate the methodologies described in Sect.

Like

The overall importance of

Other than the aggregation error, the aggregation of the estimated fluxes also has profound implications, as it affects the robustness of the estimated fluxes. It can be proved (see Appendix

The computational cost for calculating the analytical partial derivatives is minimal, as it is a one-time operation and is bounded by the computational cost to perform matrix multiplications, which at max is

Finally, the estimation of STAD and the importance of sites can be influenced by data gaps; therefore, it is not advised in the presence of vast differences in the number of observations between sites.

Our work makes novel and significant contributions that can improve the understanding of linear atmospheric inverse problems. It provides (1) a framework for post hoc analysis of the impact of input parameters on the estimated fluxes and (2) a way to understand the correlations in the forward operators of an atmospheric transport model. The authors are unaware of any work in which the local sensitivities with different units of measurement are compared to rank the importance of input parameters in a linear atmospheric inverse model.

Concerning forward operators, we provide mathematical foundations for IAOMI and JSD-based metrics. These two metrics can be used to construct a nonstationary error covariance for the atmospheric transport component of the model data mismatch matrix

The IAOMI and JSD-based metrics provide essential insights into the two critical and only required components for an inversion, namely observations and forward operators (e.g., the influence of the observation on the sources of the fluxes through STAD), which can be accomplished before conducting an inversion and should be complemented by post hoc LSA, which is necessary for understanding the behavior of an inverse model. Overall, LSA can answer questions like in which locations and in what order of precedence an observation was important for the influence on the estimated fluxes. This kind of analysis is entirely different from estimating uncertainty, which tells us the prior uncertainty reduction due to observations.

LSA is not a replacement for statistical tests that check the inverse models' underlying assumptions and model specifications, nor is it a recipe for selecting input parameters to an inverse model. However, as explained above, it has an essential role that can lead to an improved understanding of an atmospheric inverse model.

Earlier, many methods have been proposed and utilized to perform sensitivity analysis. These can be categorized as global and local sensitivity analyses. Global sensitivity analysis (GSA) includes Morris's (e.g.,

Compared to GSA, a local sensitivity method, such as Bayesian hyper-differential sensitivity analysis (HDSA;

The dissimilarity between forward operators can also be measured via entropy-based distances (for a definition, see

Here we show the proofs of two mathematical statements on the robustness and quality of the estimated fluxes, as mentioned in Sect.

The post-inversion aggregation or upscaling of any flux field

We can rearrange the columns of

Using the Cauchy–Schwarz inequality on

The aggregated forward operator

The first three properties are simple observations from the construction. So, here we provide proof of the fourth property.

So, not only is

Finally, the model resolution matrix for the inversion can be written as

The question is then as follows: what happens to the trace of the model resolution under the aggregated scenario? We provide proof for the simple batch Bayesian case in Lemma 1. The proof for the geostatistical case is similar and is left to the enthusiastic reader.

All of the code and data utilized in this study are available in the Supplement.

The supplement related to this article is available online at:

VY and SG contributed equally to the conceptualization, methodology, formal analyses, preparation, review, and editing of the paper. CM funded the work.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors thank Anna Karion, Kimberly Mueller, James Whetstone (National Institute of Standards and technology, NIST), and Daniel Cusworth (University of Arizona, UA) for their review and advice on the paper.

This work has been partially funded by NIST's Greenhouse Gas Measurements Program, support to the University of Notre Dame has been provided by NIST (grant no. 70NANB19H132), and support for the Jet Propulsion Laboratory (JPL) has been provided via an interagency agreement between NIST and NASA. A portion of this research has been carried out at the JPL, California Institute of Technology, under a contract with NASA (contract no. 80NM0018D0004).

This paper was edited by Leena Järvi and reviewed by Peter Rayner, Bharat Rastogi, and two anonymous referees.