Efficient ensemble generation for uncertain correlated parameters in atmospheric chemical models

Atmospheric chemical forecasts highly rely on various model parameters, which are often insufficiently known, as emission rates and deposition velocities. However, a reliable estimation of resulting uncertainties by an ensemble of forecasts is impaired by the high-dimensionality of the system. This study presents a novel approach to efficiently perturb atmospheric-chemical model parameters according to their leading coupled uncertainties. The algorithm is based on the idea that the forecast model 5 acts as a dynamical system inducing multi-variational correlations of model uncertainties. The specific algorithm presented in this study is designed for parameters which depend on local environmental conditions and consists of three major steps: (1) an efficient assessment of various sources of model uncertainties spanned by independent sensitivities, (2) an efficient extraction of leading coupled uncertainties using eigenmode decomposition, and (3) an efficient generation of perturbations for high-dimensional parameter fields by the Karhunen-Loéve expansion. Due to their perceived simulation challenge the method 10 has been applied to biogenic emissions of five trace gases, considering state-dependent sensitivities to local atmospheric and terrestrial conditions. Rapidly decreasing eigenvalues state high spatialand cross-correlations of regional biogenic emissions, which are represented by a low number of dominating components. Consequently, leading uncertainties can be covered by low number of perturbations enabling ensemble sizes of the order of 10 members. This demonstrates the suitability of the algorithm for efficient ensemble generation for high-dimensional atmospheric chemical parameters. 15


Introduction
Due to highly nonlinear properties of the atmosphere including its chemistry, forecast uncertainties vary significantly in space and time and among variables. During the last decades, increasing efforts have been put into estimating forecast uncertainties induced by different error sources. In this context, the method for generating an ensemble of forecasts is crucial as it determines the forecast probability distribution. While the represented details of the probability distribution increase with the number of 20 realizations, the ensemble size of high-dimensional atmospheric systems is limited by computational resources (Leutbecher, 2019). Thus, the major challenge is the generation of ensembles which sufficiently sample the forecast uncertainty within manageable computational efforts. This renders ensemble forecasting one of the most challenging research areas in atmospheric modeling (e.g. Bauer et al., 2015;Buizza, 2019).
In numerical weather prediction (NWP), different ensemble methods have been developed in order to account for uncer-25 tainties of initial conditions and the forecast model formulation. First studies were motivated by the fact that initial conditions induce dominant uncertainties to NWP systems. Bred vectors (BV, Toth and Kalnay, 1993) or Singular vectors (SV, Buizza et al., 1993) are used to efficiently generate initial perturbations along the directions of the fastest growing errors in a linearized or nonlinear forecast model, respectively. Another approach estimates uncertainties of initial conditions by applying random perturbations to observations (PO, Houtekamer et al., 1996) which are assimilated in the modeling system. 30 As errors in initial conditions cannot entirely explain forecast errors, two methods related to uncertainties within the NWP model have been developed. Firstly, the stochastic kinetic energy backscatter scheme (SKEBS, Shutts, 2005) accounts for uncertainties in the amount of energy which is backscattered from subgrid to resolved scales. The second group of methods focuses on uncertainties in model parameterizations, which rely on simplified assumptions about non-resolved processes. In the stochastic parameter perturbation scheme (SPP, Houtekamer et al., 1996), selected parameters within individual parame-35 terizations are multiplied with random numbers. In contrast, the stochastically perturbed parameterization tendencies scheme (SPPT, Buizza et al., 1999) considers uncertainties in the formulation of the parameterization schemes itself. Instead of perturbing individual parameters, total tendencies of state variables from all parameterizations are multiplied with appropriately scaled random numbers. Although perturbations are generated in a spatially and temporally correlated way, both, correlation scales and standard deviations of the random numbers are predefined as fixed values (e.g., Leutbecher et al., 2017;Lock et al., 40 2019).
While different methods for ensemble generation are successfully applied to NWP, less approaches are available for chemistry transport modeling. As chemistry transport models (CTMs) include a large number of trace gases and aerosol compounds, the dimension of the system is even higher than in NWP (Zhang et al., 2012a). Among other implications, this highdimensionality amplifies the amount of uncertainties which differ significantly between individual chemical compounds (Emili 45 et al., 2016). Besides using multi-model ensembles for estimation of forecast uncertainties (e.g., McKeen et al., 2007;Xian et al., 2019), there are only few attempts for ensemble generation within a single CTM. As CTMs are driven by meteorological forecasts, uncertainties in NWP are transferred to the chemical simulations. A comparably simple approach, which was used by Vautard et al. (2001) for the first time, employs an existing meteorological ensemble to drive the atmospheric chemical forecasts. However, estimations of chemical uncertainties solely driven by NWP ensembles do not necessarily represent re-50 lated uncertainties in CTMs. For example, Vogel and Elbern (2020) note that a global meteorological ensemble was not able to induce significant ensemble spread in surface-near forecasts of biogenic trace gases.
Multiple studies indicate that uncertainties of CTMs are mainly induced by uncertain model parameters -controlling emissions, chemical transformation and deposition processes -rather than initial conditions or meteorological forecasts (e.g., Elbern et al., 2007;Bocquet et al., 2015). Consequently, former attempts aim to account for uncertainties in model parameters or other 55 chemical input fields (for an overview see Zhang et al., 2012b, and references therein). However, perturbing parameter fields appears to suffer from the high-dimensionality of the system as independent perturbations of model parameters at each location and time remains impractical. Early studies like the one performed by Hanna et al. (1998) assume predefined uncertainties where perturbations are applied uniformly in space and time, ignoring any cross-correlations between parameters. This uniform of the dynamical system. Already Hanna et al. (1998) propose that introducing state-dependent uncertainties as well as crosscorrelations between parameters would provide a more realistic representation. The Karhunen-Loéve (KL) expansion provides an opportunity to account for such complex correlated uncertainties using eigenmode decomposition. While this approach is well established in engineering, it has rarely been applied in geophysical sciences. Goris and Elbern (2015) performed singular vector decomposition to determine optimal placement of trace gas observation sites. Siripatana et al. (2018) used the KL 70 expansion for dimension reduction in an idealized oceanographic ensemble data assimilation setup. To the knowledge of the authors, the KL expansion has not been applied in atmospheric chemistry ensemble modeling.
In order to address this issue, this study introduces a novel approach for optimized state-dependent parameter perturbation in atmospheric chemical models. The approach is based on the idea that the dynamical system induces multi-variational correlations of model states and uncertainties. In particular, the algorithm aims to provide (1) an efficient assessment of various 75 sources of uncertainties, (2) an efficient extraction of leading coupled uncertainties, and (3) an efficient generation of perturbations for high-dimensional parameter fields. The specific algorithm presented in Sect. 2 is designed for model parameters, which depend on model arguments like model inputs and configurations. Representative performance results are presented in Sect. 3 for biogenic emissions representing a highly uncertain, yet correlated set of parameters. Finally, Sect. 4 concludes this study by discussing benefits and limitations of the presented approach.

2 Algorithm
This section provides the theoretical description of the ensemble generation algorithm with respect to correlated model parameters. Making use of the Karhunen-Loéve (KL) expansion, the algorithm is denoted as KL ensemble algorithm thereafter. It is based on the fact that the forecast model acts as a dynamical system forcing spatial and multi-variational couplings of the atmospheric state. Thus, information on the size and coupling of forecast uncertainties can be extracted from differently configured 85 model simulations. The explicit algorithm presented here focuses on state-dependent model parameters which depend on the specific model setup. Generally, atmospheric models are sensitive to their specific simulation setup including a large variety of model inputs and configurations -henceforth denoted as model arguments. These arguments comprise a heterogeneous set including initial conditions, external input fields and the formulation of model parameterizations. For state-dependent parameters considered here, their sensitivities to the model setup are assumed to induce dominant uncertainties. Thus, the problem of 90 estimating multi-variational uncertainties is transferred to sensitivities to various model setups. The algorithm consists of three major steps which are described in the following: the estimation of sensitivities to the model setup including the assumption of independent sensitivities (Sect. 2.1), the extraction of leading coupled parameter uncertainties using a highly-parallelized eigenmode decomposition software (Sect. 2.2), and the ensemble generation by sampling perturbations from leading eigenmodes with the Karhunen-Loéve expansion (Sect. 2.3). A graphical overview of the major 95 steps composing the algorithm is given in Fig. 1.

Sensitivity Estimation
As a first step, essential uncertainties of the model parameters are formulated as multi-variational covariance matrix C ∈ R N ×N , where N is the dimension of the problem i.e. the total dimension of the set of considered model parameters. Generally, the covariances may be determined from any kinds of uncertainties like statistical model errors derived from operational 100 forecasts. Because this study focuses on state-dependent parameters, essential uncertainties are estimated from sensitivities of those parameters to different model arguments. These state-dependent sensitivities are realized as temporally-averaged sensitivity factors with respect to a selected reference. The temporal averaging makes the sensitivities being representative for a sufficient time interval for ensemble simulation.
For the formulation of the algorithm, each model argument i ∈ [1, I] is interpreted as arbitrary parameter with R i different The implementations represent different available options for each model argument including initial fields, sources of input information or selections of model parameterizations. Let Q s j (t) be the j-th forecast of a model parameter at discrete time t ∈ [t 1 , t T ] and position s ∈ S. Here, s ∈ S denotes the position in the field of considered parameters at all grid points, which is specified by the index set S = {s 1 , s 2 , ..., s N } (analogous to Ch. 4.2 in Xiu, 2010). Note that each j ∈ [0, J] represents one model setup {r 1 , r 2 , . . . , r I } as specific combination of implementations r i of each model argument From a stochastic point of view, the model parameters Q s j (t) can bee seen as induced pseudo-random variables. The pseudorandomness reflects the fact that these variables are treated to be random, but are in fact controlled by the model setup. In this regard, the selection of a specific implementation r i of a model parameter i is a deterministic modification interpreted as pseudo-stochastic realization. Then, the full set of available options R i of model argument i is a sample subspace of the 115 respective argument space. Being aware of this stochastic interpretation, the algorithm is formulated using the applicationoriented notation introduced above. An overview over the notation used in this section including the definition of variables is given in Appendix A.
The formulation of state-dependent sensitivities is based on Elbern et al. (2007) who demonstrated the suitability of amplification factors in the context of 4D-variational optimization of emissions. Let j * be a reference model setup (here j * := 1) 120 representing the selected reference implementation r i * = 1 of each model argument i ∈ [1, I]. Then, the model parameter Q s j (t) of model setup j at time t and position s is divided by its corresponding value from the reference configuration Q s j * (t). Thus, the sensitivity factor F s j is defined as temporal average of those over the time interval [t 1 , t T ]: Note that the sensitivity factors of the reference setup are one by definition: F s j * = 1 , ∀ s ∈ S.

125
Analogous to emission factors in Elbern et al. (2007), sensitivity factors are assumed to be lognormally distributed. Thus, the sensitivity factors are substituted to normally distributed sensitivities X s j in order to simplify their further treatment: Given the formulation of sensitivities, two different methods for covariance construction are presented in the following.

130
Generally, the covariance matrix should represent the complete set of essential uncertainties of the model parameters. Focusing on sensitivities to the model setup, essential uncertainties can be sampled using different implementations r i of multiple model Here, I is the total number of all model input and configuration options considered in the sensitivity analysis. Ideally, the covariance matrix is calculated from the sensitivities to all possible combinations of various implementations of each option. Given R i implementations r i ∈ [1, R i ] of each model argument i, the total number of all combined sensitivities With this, the full combined covariance between the sensitivities at positions s, s ∈ S is given by where µ fc (s/s ) := 1 is the mean value of full combined sensitivities at position s and s , respectively. For atmospheric model parameters, each sensitivity X s j requires its own forecast simulation. Thus, the full calculation of all combined sensitivities becomes computationally demanding even for a low number of implementations of a few model arguments. For example, considering six model arguments (I = 6) with two implementations (r i = 2, ∀ i = [1, I]) each, would already require J fc = J = 2 6 = 64 model executions prior to the ensemble generation. One method to overcome this issue is 145 to randomly sample a subset of J sc < J fc combined sensitivities and approximate the sampled combined mean and covariance by C sc (s, s ) := 1 However, a reasonable covariance estimation by random sampling still requires a large number of combined sensitivities 150 which would be in contradiction to the idea of an efficient ensemble generation approach. Thus, a different method is introduced below, which is based on the assumption of independent sensitivities.

Independent Sensitivities
As this study aims for an computationally efficient algorithm focusing on leading uncertainties, the computational efforts required for the estimation of sensitivities are critical. Thus, a new method for efficient covariance construction is introduced, 155 which reduces the number of required model executions prior to ensemble generation significantly. Instead of using a randomly sampled subset of combined sensitivities, the method uses only sensitivities with respect to single model arguments.
By assuming tangent linearity of sensitivities in the limits of imposed perturbations, these single sensitivities are extrapolated to approximate the full set of combined sensitivities. The assumption of tangent linearity equals mutual independence of the model arguments and thus every combined sensitivity factor F s j with arguments {r 1 , r 2 , . . . , r I } can be decomposed into a set 160 of independent sensitivity factors f s i to each single argument r i with i ∈ [1, I] Here, the independent sensitivity factors f s i are defined analogous to Eq. (1) using the model forecast ..,ri−1 * ,ri,ri+1 * ,...,r I * } (t) were only one argument r i differs from the reference setup: Further, with Eq. (2), every combined sensitivity X s j of implementation j at position s is given by were x s i (r i ) := ln f s i (r i ) is the independent sensitivity referring to a single modified model argument i ∈ [1, I] with implementation r i . Note that the independent sensitivity factors f s i (r i ) equal one when the implementation r i of model argument i equals the reference implementation r i * (analogous to Eq. (1) ). Consequently, the independent sensitivities x s i (r i ) vanish in 170 Eq. (7) for all i with r i = r i * and each combined sensitivity is given by the sum of independent sensitivities to those arguments, which differ from the reference setup In other words, the assumption of independence implies that the set of all combined sensitivities X s j j∈ [1,J] lies within a subspace which is spanned by the set of independent sensitivities x s i (r i ) ri∈ [1,Ri] i∈ [1,I] . Then, the full set of combined sensitivities 175 can be approximated by the subset of independent sensitivities following Eq. (8). This reduces the number of required forecasts with J indep J fc = J. Therewith, the independent method requires a significantly reduced number of simulations, one for the reference setup (r i * = 1 ∀i) and one for each other implementation r i ∈ [2, R i ] of each argument i ∈ [1, I].

180
Rather than approximating all combined sensitivities from independent sensitivities explicitly, the effects on mean sensitivities and covariances are derived in its general from in Appendix B. In the KL ensemble algorithm, the mean values µ indep (s) and covariances C indep (s, s ) of the sensitivities are directly calculated from the set of independent sensitivities by Note that the assumption of independence does not imply orthogonality between the input sensitivities. While the equations are exact under the given assumption, it might not be a sufficient approximation for many atmospheric processes.
The method of independent sensitivities allows the inclusion of additional uncertainties in a straightforward way. These additional uncertainties may originate form any other error source not represented as model arguments. For example, this 190 could be a known uncertainty in the formulation of the model itself. If such an additional uncertainty is given (e.g. from statistical evaluation), it can be included as additional sensitivity x s add with R add = 2. Based on Eq. (10), the independent mean and covariance including additional uncertainties are If the direction of the additional uncertainty is unknown (unsigned additional uncertatinty), the original definition of the mean values for independent sensitivities as given in Eq. (10a) is used instead of Eq. (11a). This ensures no impact of the additional uncertainty to the mean values of the parameters.

Eigenmode Decomposition
Once the multi-variational covariances are formulated, dominating directions of uncertainties are extracted as second step. This 200 extraction is realized by an eigenmode decomposition of the covariance matrix with λ d the d-th eigenvalue and ϕ d (s/s ) the s/s -th element of the corresponding eigenvector ϕ d ∈ R N for all d ∈ [1, N ]. As the presented ensemble approach focuses on dominant uncertainties, the D largest eigenvalues and corresponding eigenvectors are required λ 1 ≥ λ 2 ≥ · · · ≥ λ D with D < N . Here, the first eigenvalues represent the size of the most dominant uncertainties 205 and the corresponding eigenvectors their directions. Due to the high-dimensionality of atmospheric models, the covariance matrix may easily be of the order of 10 10 elements. This inhibits explicit storage of the matrix and makes the computation of the eigenproblem Eq. (12) very costly even for a low number of required eigenmodes (D N ). Therefore, a highly efficient software is required which is suitable for high-dimensional systems.
The ARPACK (ARnoldi PACKage, Lehoucq et al., 1997) package is a flexible tool for numerical eigen and singular value 210 decomposition. It is explicitly developed for large-scale problems and includes a set of specific algorithms for different types of matrices. The ARPACK software uses a reverse communication interface were the matrix needs to be given as operator acting on a given vector. APRACK makes use of the Implicitly Restarted Arndoldi Method (IRAM, Sorensen, 1997) which is based on the implicitly shifted QR-algorithm. As a covariance matrix is quadratic, symmetric and positive definite by construction, the IRAM method reduces to the Implicitly Restarted Lanczos Method (IRLM). In this study, the parallel version P-ARPACK 215 is used for the eigenmode decomposition, which balances the workload of processors and reduces the computation time. For a detailed description of the ARPACK software package see Lehoucq et al. (1997).

Ensemble Generation
The final step is the generation of an ensemble of perturbations based on the leading eigenmodes of parameter uncertainties.
This step makes use of the Karhunen-Loéve expansion -denoted as KL expansion thereafter -named after Karhunen (1947) and Loéve (1948). The following description is adopted from the notations of Schwab and Todor (2006) and Xiu (2010), to which the reader is referred for more details. In its discrete form, the KL expansion describes the s-th element of a stochastic process Y s ωp of dimension N as linear combination of orthogonal components with µ(s) denoting the mean value of the stochastic process and ω p its p-th random realization. Here, the deterministic fields 225 ψ d (s) are given by the eigenvalues λ d and eigenvectors ϕ d of the covariances of the stochastic process In this notation, the stochastic coefficients y d (ω p ) are independent random numbers with zero mean and unit standard deviation.
In the context of ensemble generation, the stochastic process is a set of perturbations Y s ωp s∈S whose essential uncertainties are formulated as covariance matrix of the sensitivities as defined in Sect. 2.1. Thus, the eigenvalues λ d and corresponding , the perturbation of ensemble member p ∈ [1, P ] at position s ∈ S is sampled from These perturbation factors will then be applied to the model parameters in the ensemble forecast.

245
The KL ensemble algorithm was implemented and integrated into the chemical data assimilation system EURAD-IM (EU-Ropean Air pollution Dispersion -Inverse Model). EURAD-IM combines a state-of-the-art chemistry transport model (CTM) with 4-dimensional variational data assimilation (Elbern et al., 2007). Based on meteorological fields precalculated by WRF-ARW (Weather Research and Forecasting -Advanced Research WRF, Skamarock et al., 2008), the Eulerian CTM performs forecasts of about 100 gas phase and aerosol compounds up to lower stratospheric levels. In addition to advection and diffu-250 sion processes, modifications due to chemical conversions are considered by the RACM-MIM chemical mechanism (Regional Atmospheric Chemistry Mechanism -Mainz Isoprene Mechanism, Pöschl et al., 2000;Geiger et al., 2003). Emissions from anthropogenic and biogenic sources as well as dry and wet deposition act as chemical sources and sinks, respectively. In this study, the EURAD-IM system provides forecasts of sensitivities to various model arguments used for covariance construction.
The concept of emission factors used for emission rate optimization in EURAD-IM was adapted in the KL ensemble approach.

255
The KL ensemble algorithm is applied to biogenic emissions which are known to be subject to large uncertainties. The  parameterizations (see Vogel and Elbern, 2020, for further details). For each argument i ∈ [1, 6], two different implementations are selected (R i = 2, ∀ i), the first of which is defined as reference implementation (r i * = 1, ∀ i). Significant effects are also found with respect to global meteorology, land surface model (LSM) and boundary layer schemes.
Here, weak nonlinear effects appear when RUC LSM is combined with the ACM2 boundary layer scheme or GFS global meteorology (panel 6 and 10 of Fig. 2a).
In contrast to the large amount of calculations required for combined sensitivities, the method of independent sensitivities is 295 additionally investigated. As described in Sect. 2.1.2, only sensitivities resulting from the change of a single model argument

Eigenmode Decomposition
Based on the respective formulation of combined or independent sensitivities, the leading eigenvalues and their associated 315 eigenvectors of the covariance matrices are calculated. For combined sensitivities, the eigenvalues given in Fig. 4 show a logarithmic decrease of about one order of magnitude within the first five modes. This indicates that the major uncertainties of the emissions factors are determined by a few leading directions. In other words, the fast decrease of leading eigenvalues confirms a high correlation of biogenic emissions through the domain and between different gases. The contribution of these  leading eigenmodes to local emission factors for each trace gas is given by the corresponding eigenvectors shown in Fig. 5 320 for isoprene. According to shape and size of the first eigenmode, it is almost exclusively induced by the sensitivity to land use information which is invariant to the other sensitivities. The subsequent eigenmodes represent common patterns of the remaining sensitivities which are therefore treated together. Figure 7 shows the complete leading eigenvectors for independent sensitivities with respect to all five biogenic trace gases.
As for combined sensitivities, the eigenmode decomposition extracts perpendicular components from the set of independent 325 sensitivities. The first eigenmode represents common features of the a-priori uncertatinty and the other independent sensitivities. While the second eigenmode is closely related to the effects of drought response and the land surface model, the sensitivity to land use dominates the third eigenmode.
The consideration of additional uncertainties in the independent method does not allow for a direct comparison of individual eigenmodes. The eigenvalues shown in Fig. 6 state a similar decrease of eigenvalues for independent sensitivities compared 330 to the combined method. Highly similar size and decrease rate of leading eigenvalues indicate a reasonable representation of the leading uncertainties by the independent method. However, nonlinearities arising from combined changes in the land surface model with global meteorology or boundary layer schemes are not captured by the linear assumption of independent sensitivities. Besides that, highly similar signals in eigenvectors for different biogenic gases shown in Fig. 7 state a considerably large correlation between those. Yet, individual patterns of each biogenic gas are also represented by the leading eigenmodes 335 for independent sensitivities. These patterns are also found for the eigenvectors of combined sensitivities (not shown) which confirms the suitability of the independent method with respect to multiple parameters.

Ensemble Generation
The different directions of the leading eigenmodes of the two methods prohibit a direct comparison of their perturbations. Due to the sufficient representation of dominant uncertainties and additional consideration of model-induced uncertainties, resulting

Discussion and Conclusions
This study introduces an optimized ensemble generation algorithm in which model parameters are efficiently perturbed according to their correlations. The approach is based on the fact that advection-diffusion-reaction equations including their forcing 355 parameters act as dynamical systems which induces multi-variational correlations of model states and uncertainties. It applies the Karhunen-Loéve expansion which approximates covariances of the model parameters by a limited set of leading eigenmodes. These modes represent the coupled leading uncertainties from which perturbations can be sampled efficiently. Based on this, stochastic sampling for ensemble generation is performed in an uncorrelated subspace spanned by the eigenmodes.
Generally, the presented algorithm is applicable to any set of model parameters in high-dimensional atmospheric systems, as 360 long as their joint uncertainties remain in the linear regime. Through the reduction of the sampling space, it is shown that the stochastic dimension of the problem can be reduced significantly. This makes the algorithm suitable for efficient ensemble generation of high-dimensional atmospheric models, where the computational costs are a critical and limiting quantity.
In the Karhunen-Loéve (KL) ensemble algorithm, perturbations are created from covariances of the stochastic process.
This means that for large ensemble sizes the statistics of the perturbations converge towards their input values determined 365 by the covariances. Consequently, the performance of the KL ensemble crucially relies on the formulation of the covariances.
Uncertainties not considered in the formulation of the covariances cannot be captured by the KL ensemble. The major benefit of the approach lies in the optimal properties of the perturbations focusing on leading uncertainties, providing an optimal coverage of uncertainties even for low ensemble sizes. Although the greatest benefit is achieved for highly correlated parameters, the algorithm enables the combination of major uncertainties even for uncorrelated parameters.

370
Focusing on model parameters which depend on local environmental conditions, state-dependent covariances are approximated from various related sensitivities. Generally, the covariances required for this approach can be defined in any way which is suitable to reflect the uncertainty of our knowledge. Covariance construction based on parameter sensitivities as presented in this study is just one among others. Potential deficiencies in the construction could be identified from a posteriori evaluation of the full ensemble. This would allow for an ongoing adjustment of the approach depending on the specific application.

375
As simulations of all possible combinations of sensitivities are computationally demanding, independent sensitivities are introduced in this study. Assuming tangent-linearity, multiple combined sensitivities can be represented by a low number of independent sensitivities. Representative results indicate that the major properties of leading sensitivities are captured by independent sensitivities. At the same time, using only 7 instead of 32 pre-calculated sensitivities reduces the computational effort of model simulations tremendously. Besides the reduction of computational resources, this method allows for the integration of 380 different kinds of uncertainties in a convenient way. However in many cases, the assumption of independent sensitivities may not be a good approximation. The user has to decide if the computational benefit justifies the neglection of nonlinear effects.
Once the sensitivities are calculated, the computational effort required for the generation of Karhunen-Loéve (KL) perturbations is mainly consumed by the numerical solution of the eigenproblem. The highly efficient parallelization of the solution of the eigenproblem by parallel-ARPACK renders the algorithm suitable for high-dimensional systems. Table 1  The presented application of the KL ensemble generation algorithm demonstrates its potential for an efficient estimation of forecast uncertainties induced by model parameters in high-dimensional atmospheric models. Specifically, the presented algorithm allows for (1) an efficient estimation of various sensitivities based on the assumption of independent sensitivities, (2) an efficient extraction of leading coupled uncertainties using highly parallelized eigenmode decomposition, and (3)  Author contributions. AV developed and implemented the algorithm, performed the simulations and wrote the manuscript. HE provided the basic idea, supervised the work, contributed to the developments and helped in the preparation of the manuscript.
Competing interests. The authors declare that they have to competing interests.

Position
s ∈ S Element in the index set representing the position of model parameter pn at grid box (xn, yn, zn).
-Independent f s i (r i ) Sensitivity factor w.r.t. single model argument i with implementation r i = r i * differing from the reference (assumed to be independent of other arguments).
Sensitivity X s j Sensitivity to model setup j at position s (see Eq. (2) ).
-Independent x s i (r i ) Sensitivity w.r.t. single model argument i with r i = r i * (assumed to be independent of other arguments).
Perturbation Y s ωp Perturbation as p-th random realization of the KL expansion at position s (see Eq. (15) ).

Perturbation
Factor x s 1 (r 1 ) + x s 2 (r 2 ) + · · · + x s 2 (r 2 ) + · · · + 1 R I · R I r I =1 x s I (r I ) The covariance C(s, s ) of all combined sensitivities X s/s j j∈ [1,J] at positions s/s ∈ S, respectively, can be calculated from the set of independent sensitivities x s/s i (r i ) ri∈ [1,Ri] i∈ [1,I] as follows: