Algorithmic differentiation for cloud schemes (IFS Cy43r3) using CoDiPack (v1.8.1)

Numerical models in atmospheric sciences not only need to approximate the flow equations on a suitable computational grid, they also need to include subgrid effects of many non-resolved physical processes. Among others, the formation and evolution of cloud particles is an example of such subgrid processes. Moreover, to date there is no universal mathematical description of a cloud, hence many cloud schemes have been proposed and these schemes typically contain several uncertain parameters. In this study, we propose the use of algorithmic differentiation (AD) as a method to identify parameters within the cloud scheme, to which the output of the cloud scheme is most sensitive. We illustrate the methodology by analyzing a scheme for liquid clouds, incorporated into a parcel model framework. Since the occurrence of uncertain parameters is not limited to cloud schemes, the AD methodology may help to identify the most sensitive uncertain parameters in any subgrid scheme and therefore help limiting the application of uncertainty quantification to the most crucial parameters.

1 Introduction 10 Modelling the atmosphere is a highly non-trivial task due to the multiscale and multicomponent nature of the atmospheric flow, where multiple physical processes on different length and timescales interact simultaneously (Orlanski, 1975). One particular result of the interaction of such processes is regularly observed in the sky: clouds appear and disappear. The evolution of a cloud (see, e.g., Lamb and Verlinde, 2011, for an in-depth treatment of cloud evolution) starts on the lengthscale of a few nanometers, where aerosol particles get wetted by the ambient water vapor leading to the formation of haze particles. If thermodynamic 15 conditions are fulfilled, i.e. the (relative) humidity is large enough, the haze particles grow further to become cloud droplets with typical diameters of about 10 µm − 30 µm. This growth process is described by combining Maxwell and Köhler theory (Rogers and Yau, 1989;Devenish et al., 2012;Köhler, 1936;Maxwell, 1877). Collisions of the cloud droplets eventually lead to rain drops with sizes even larger than 100 µm (see, e.g., Devenish et al., 2012;Grabowski and Wang, 2013, for a discussion including turbulence effects). Due to their weight, rain drops fall out of the cloud and form precipitation. Since all phase 20 transitions are connected with the release or consumption of latent heat, the formation and evaporation of a cloud can affect the ambient atmospheric flow by modifying the local buoyancy (see, e.g., Cotton et al., 2010). However, all aforementioned cloud processes are microphysical processes and not resolved in numerical models, in particular not in the operational models used for weather forecasts. In these models, the cloud itself is not resolved and instead considered as a subgrid process, calling for a representation of the impact of the cloud by using so-called "parameterizations" or "cloud schemes" (see, e.g., Khain et al., 2000). These schemes take the values of the resolved fields as input and compute the feedback of the unresolved process as an output.
In the literature, many cloud schemes are formulated as one-or two-moment schemes, predicting the mass mixing-ratio and, in case of a two-moment scheme, also the number concentration of the cloud species considered in the scheme (Khain et al.,5 2015), e.g. the number of cloud droplets per unit mass of dry air. However, at the moment no universal governing equation is available to describe the evolution of a cloud across all involved scales, explaining the existence of different formulations of the cloud processes. In addition, cloud schemes typically contain parameters with uncertain values, which may be introduced by artificial parameters or limited observational evidence of the precise value. A typical example for an uncertain parameter is the autoconversion rate, encoding the efficiency of colliding cloud droplets to form large and falling rain drops (Khain et al., 10 2015). Historically, the autoconversion rate was already introduced by Kessler, who advocated the partitioning of the whole size spectrum of droplets into small cloud droplets, with negligible sedimentation velocity, and falling rain drops (Kessler, 1969).
Most cloud schemes adopt the partitioning of the size spectrum, especially all schemes of Kessler-type. The cloud scheme to be described in Section 3 below is also of Kessler-type. More examples of processes with uncertainties in their description and of how uncertainties enter the description of clouds are discussed in Khain et al. (2015). 15 In any case, uncertain parameters introduce uncertainty in the cloud scheme at hand, and ultimately into the numerical model as a whole. To assess the uncertainty of the parameters, one usually performs sensitivity studies, e.g. by running ensemble simulations, where each ensemble member employs slightly different values for the parameters. However, usually the number of ensemble members is limited to small values posing an additional challenge to extract the desired signal from the simulations.
In this study, we propose the use of "algorithmic differentiation" (AD) as another way of identifying the parameters with 20 largest sensitivity. Although this method is well-known in computer science and engineering, its potential in meteorological contexts, and cloud physics in particular, is not yet fully exploited. Most applications of AD in meteorology are limited to individual studies investigating model sensitivities (usually with respect to initial values) by using adjoint models (e.g. Bischof et al., 1996b;van Oldenborgh et al., 1999;Kaminski et al., 1999;Xiao et al., 2008;Rauser et al., 2010;Zhang et al., 2013;Belikov et al., 2016) or studies targeting at applications in data assimilation (e.g. Le Dimet et al., 2002;Blessing et al., 2014). 25 We will introduce the technique in Section 2, but, in a nutshell, it provides the derivative of a given computer code with respect to selected parameters. A cloud scheme may be described as a function f , taking the flow characteristics y from the given gridbox together with parameters η as an input, and computing the feedback z of the cloud, i.e. z = f (y, η), as an output.
Assessing the sensitivity of the output z with respect to the parameters η amounts to computing the derivative dz dη . AD helps in evaluating this derivative by computing the derivative of the function f with respect to the parameters, where the hat-notation 30 indicates the implemented version of the mathematical function f in some computer code. Using this technique can help in the development of cloud schemes by providing the respective derivatives to machine accuracy in an automated fashion, i.e. without implementing finite difference approximations of f .
Recently, a field called "Uncertainty Quantification" emerged in mathematics as a more systematic combination of (numerical) analysis and statistics in order to study the propagation of uncertainties (e.g., Sullivan, 2015;Le Maître and Knio, 2010).

35
Although powerful methods for the investigation of uncertainties already exist, their practical use often limits the number of the considered uncertain parameters due to the curse of dimensionality, see Chertock et al. (2019) for an application in the context of cloud physics. Therefore, it is valuable to first identify the parameters with highest sensitivity using AD in order to limit a more rigorous or extensive investigation to these parameters.
We emphasize, that AD is not related to a specific application (as a cloud scheme) nor to a specific programming language.

5
Although we use an implementation in C++ together with an AD-tool suited for this language, AD-tools for other languages like Fortran are available (e.g. Bischof et al., 1996a). A list can be found on www.autodiff.org.
In this study, we will first explain the concept of algorithmic differentiation in Section 2, introduce the warm cloud scheme used for illustration purposes within an air parcel framework in Section 3 and show results in Section 4. Some concluding remarks are given in Section 5. 10 2 What is Algorithmic Differentiation?
Algorithmic differentiation (AD) is a mathematical theory that describes how the computation of derivatives in a computer program can be automatized. It was developed already in the early 80's and rediscovered several times over the past years. The most known resource is the book of Griewank and Walther (2008). A nice introduction is given by Neidinger (2010).
For the purpose of computing the derivative of a program, it is considered as a sequence of simple elemental (or intrinsic) 15 functions including the sine, cosine, multiplication, division, and addition. The concept of AD is to consider the given program as a composition of elemental functions and, by applying the chain rule, to compute its directional derivative by accumulating the derivatives of each elemental function within the program code. It is important to stress that AD does not generate a generalized representation of the derivative of the computer program. Instead, AD computes the derivatives alongside the execution path. The path might change due to conditional instructions within the code.

20
As an example, assume the computer program to be differentiated is given by where a, b, c, d are the input variables and w is the output. This program can now be split into elemental functions which yields the intermediate steps t 1 , t 2 required by AD as follows: (2) 25 For the application of the chain rule the Jacobian matrix has to be computed for each of these intermediate steps with respect to all input variables which is quite simple, e.g. for t 1 = t 1 (a, b, c, d) = a + b the matrix is (1, 1, 0, 0) since ∂t1 ∂a = ∂t1 ∂b = 1 and ∂t1 ∂c = ∂t1 ∂d = 0. However, for notational convenience it is common to drop the indication of the formal dependence of the elemental function t 1 and its derivatives on the input parameters c, d since they are unchanged in t 1 . In order to compute the directional derivative, the Jacobian matrix is multiplied with the desired direction (ȧ,ḃ,ċ,ḋ) T , where the dot notation is used in the AD theory to describe the corresponding derivative direction of a variable. Applying this process to the full procedure (2), the result iṡ t 1 =ȧ +ḃ, since, e.g., where we again neglected the formal dependency of t 1 on c and d. By computing the corresponding directional derivative statements in procedure (3) alongside the original statements in procedure (2), the directional derivative is computed for the whole computer program (1). Note that the choices (ȧ,ḃ,ċ,ḋ) = (1, 0, 0, 0) and (ȧ,ḃ,ċ,ḋ) = (0, 1, 0, 0) for the input directions for the AD computation results in the computation of the partial derivatives ∂w ∂a and ∂w ∂b .

10
The example shows the application of the forward AD mode on a simple computer program. The general theory of AD assumes that the elemental functions are evaluated at argument values where they are differentiable in a neighborhood of the argument value. Actually, many common elemental functions are differentiable everywhere as, e.g., the addition or multiplication, but there are elemental functions which are only differentiable except for some critical points. Examples for elemental functions with a critical point are the square root, the absolute value or division, which are not differentiable at the origin. It 15 mainly depends on the AD tool how these difficulties are treated. The AD tool "CoDiPack" used in this study provides the preprocessor option CODI_CheckExpressionArguments to throw an exception if such a critical point is encountered. In any case, the exact derivative of each elemental function, at least apart from the critical points, may be written down explicitly.
Note that conditional instructions (e.g. if-else switches) do not pose any problem, since this only alters the program path, i.e.
the specific sequence of elemental instructions which are executed. Since the whole computer program is a composition of the 20 differentiable elemental operations, the chain rule states that also the whole program is differentiable (apart from the critical points). Moreover, if the program is represented by the function f : R n → R m , the so-called forward mode of AD computeṡ In Equation (5), d f dx (x) is the Jacobian of the full program at state x, vectorẋ is the direction for which the program derivative is desired along the computational path and the variable x subsumes all input variables. In the notation of Section 1, we have 25 x = (y, η), i.e. x contains the cloud model and thermodynamic variables y together with the inherent parameters η. In this terminology, also an inherent parameter of the (cloud) model is now considered as a parameter, if this parameter is to be investigated. Equation (5) just states which result is computed by the forward mode of AD and not how it is computed. The evaluation of the derivative is done alongside of the primal computation of f (x) by applying Equation (5) to each elemental operation as 30 in the above example, i.e. the full Jacobian is never computed explicitly but accumulated by considering each code statement individually.
The second operation mode of AD is called the reverse AD mode. As will become clear, the reverse mode can be introduced by multiplying an adjoint direction from the left side to the derivative representation of the computer program, instead of multiplying a derivative direction from the right as in Equation (5). This yields the general equationx T =z T d f dx (x). The 5 standard AD notation for the adjoint of a variable v is the bar notationv and may be thought of as containing the immediate derivative of the current statement with respect to the particular variable.
As an example, for the statement w = t 1 * t 2 , the AD reverse mode evaluation is The information flow in Equation (6) is reversed for the adjoint variables: the input variable isw whilet 1 as well ast 2 are 10 output variables. Because of this reversal of the information flow, all reverse AD statements need to be evaluated in the reversed order. The reverse of the last statement of the program code f will be evaluated first, the second last statement as second, and so on. The reverse AD procedure for the example procedure (2) is then As discussed above, the statements from procedure (2) are now handled in the reverse order. The valuesā,b,c andd contain 15 the derivatives of w with respect to themselfs. Takingd as an example, according to the chain rule this is and the value of t 1 is taken from the primal evaluation of the program. By choosingw = 1 as input for the reverse AD mode, the adjoint variabled contains the derivative of procedure (2) with respect to the input d.
The example (7) shows the application of the reverse AD mode on a simple computer program. According to the general 20 theory of AD, given a computer program which can be represented by the function f : R n → R m , the reverse AD mode where, again, d f dx (x) denotes the Jacobian of f and d f dx (x) T its transpose whilez represents the desired direction for the derivative. The equation (9) again just states the result that is computed by the reverse mode of AD and not how it is computed. 25 The actual evaluation of the derivative is done without forming the Jacobian and instead by storing informations during the primal computation of f (x). Afterwards, a reverse sweep over the stored information is performed. This reverse sweep applies a slightly modified version of Equation (9) to each elemental function as in the above example.
Both operation modes of AD are connected via the discrete adjoint operator. Let ·, · n and ·, · m denote the Eucledian scalar products in R n and R m , respectively. We now select an arbitrary directionz ∈ R m , which we apply to the result of the 5 forward mode. This yields the equality by shifting the Jacobian matrix of f to the left side of the scalar product. This shows that the reverse mode is the discrete adjoint of the forward mode (see, e.g., Kalnay, 2003, for the use of adjoint models in atmospheric data assimilation).
The advantage of the reverse mode becomes clear, if we assume that we want to compute the full gradient of a function 10 f : R n → R m . In case of m = 1, i.e. a computer program with n input and a single output variable, the full gradient d f dx (x) is a matrix with n columns and one row, hence its transpose is a matrix with n rows and a single column. Since m = 1, the directionz is a vector with a single entry. Consequently, we obtain the resultx by computing (9) exactly once with the single inputz = (1) ∈ R 1 . Using the forward mode of AD, we infer from (5) that the computation of the full gradient requires n subsequent computations with the choicesẋ = e 1 ,ẋ = e 2 , . . . ,ẋ = e n and e i ∈ R n denotes the i-th unit vector. If n is large, 15 these n subsequent computations require much more computational effort than a single (but slightly more costly) computation using the reverse mode of AD.
In contrast, if n = 1, i.e. the computer program has a single input and m output variables, the forward mode of AD computes the full derivative with a single computation by choosing the 1 × 1-matrixẋ = (1) as input direction, whereas the reverse mode needs m computations; being costly for large values of m.

20
An alternative approach to compute the derivative of f : R n → R m in the direction d ∈ R n is to apply the finite difference with a (small) stepsize t > 0, requiring two evaluations of the program f . Instead of the approximation (11), one could alternatively choose a finite difference approximation of higher order (e.g. Grossmann and Roos, 2007), but these typically need 25 even more program evaluations. In contrast to AD, the finite difference approach requires the choice and tuning of the stepsize t > 0 to achieve the desired accuracy of the derivative. Moreover, the optimal value of the stepsize will in general depend on the selected direction d and the state x (see Elizondo et al., 2002, for a comparison with AD). These issues render the finite difference approach as quite unattractive but due to its simplicity it is often used, accepting all drawbacks of the method.
If f is a linear function, then an arbitrary stepsize t can be chosen for all directions. For non-linear functions, t should be 30 as small as possible to achieve the desired accuracy but large enough to avoid cancellation errors due to the difference in (11).
AD has the advantage of not having to choose and tune a stepsize. Since the derivative of each elemental function is known exactly and AD applies the chain rule, the computed derivatives are accurate up to machine precision (Griewank et al., 2012).
Moreover, using the reverse mode for computing the full gradient in case of only a small number of output variables, AD has the potential of being several times faster compared to using finite difference approximations.
AD is introduced into computer programs mostly in two ways: Either through operator overloading or through source 5 transformation. For C++, the majority of tools (as listed at www.autodiff.org) use the operator overloading approach which is also used by the AD tool CoDiPack (Sagebaum et al., 2017a) developed by the authors from the Scientific Computing group in Kaiserslautern and employed in this study. CoDiPack uses expression templates to reduce the amount of required information for the reverse AD mode. The data layout of the information is such that a minimal memory footprint is required and caching strategies of the processors can be applied. The general focus of CoDiPack is its application in High Performance Computing The source transformation approach is mostly used in Fortran source codes. Here, the code is parsed and new code is generated which adds the additional statements for the forward or reverse AD mode. Tapenade (e.g., Hascoët and Pascual, 2013) is the most wide-spread tool for source transformations in Fortran. It is written in Java and supports nearly all features of older Fortran standards. The support for more modern features in newer Fortran versions is an ongoing development. 15 In general AD can be applied to any computer program. After an initial effort, the derivative computations can be automatized in the sense that every change in the code will immediately affect the primal computation and also the derivative evaluation.
How much time the initial effort requires depends strongly on the code and which AD tool is applied. A general effort on analyzing software and detecting problematic code constructs is done in Hück et al. (2015). In general, operator overloading tools are usually quite easy to introduce into a code if it is well written and there is a distinct place where the computation type 20 of the program can be defined. Source transformation usually requires a much larger effort. In both cases an early introduction of AD into the code reveals wrong implementation assumptions and yields a cleaner code.

The Warm Cloud Scheme
As an application for AD, we consider a slightly generalized one-moment scheme for warm cloud microphysics, i.e. liquid clouds without ice, within a zero-dimensional air parcel framework. One-moment schemes are designed to predict the temporal 25 evolution of the mass of non-sedimenting cloud droplets, rain drops and water vapor, i.e. the mixing-ratios q c = Mc Ma , q r = Mr Ma and q v = Mv Ma where M c is the mass of cloud droplets, M r the mass of rain drops, M v the mass of water vapor and M a the mass of dry air. One-moment schemes have a long history and are governed by the classical partitioning of the droplet spectrum into non-sedimenting cloud droplets and larger rain drops, which fall down due to the gravitational acceleration (Kessler, 1969).
Although these schemes remain the default choice in many computational models, for example, the operational numerical 30 weather forecast models IFS (ECMWF, 2017), run by the European Center for Medium Range Weather Forecast (ECMWF), and COSMO (Doms et al., 2011), run by the German Weather Service (DWD), much consensus exists that two-moment schemes are in general more accurate (Igel et al., 2015). The difference between a one-moment and a two-moment scheme is that the two-moment scheme does not only predict the evolution of the mass mixing-ratios but also the corresponding number concentrations.
The one-moment warm cloud schemes of the IFS and the COSMO model may be written in generic form as (see Rosemeier et al., 2018;Porz et al., 2018) with the coefficients c, a 1 , a 2 , e 1 , e 2 , d, the exponents γ, β c , β r , δ 1 , δ 2 , ζ and the saturation ratio S = pv psat , comparing the 10 partial pressure p v of water vapor to the saturation vapor pressure p sat over a flat surface of water. The scheme includes the following processes: (i) Condensational growth of cloud droplets; (ii) Autoconversion, describing the formation of rain drops by colliding cloud droplets; (iii) Accretion, describing the collection of cloud droplets by falling rain drops; (iv) Evaporation of rain drops; (v) Sedimentation of rain drops out of the grid box. The term B subsumes the flux of rain drops falling from above into the considered gridbox. The splitting of the evaporation term in (12d) is due to the appearance of the ventilation 15 factor in the diffusional growth equation for the rain drops, taking a non-uniform distribution of water vapor around the falling rain drop into account.
The major differences of Equations (12) to the actual scheme in the operational model are the formulations of the sedimentation process as a sum of the incoming and outgoing fluxes B and dq ζ r , and the use of an explicit condensation term, which is usually circumvented by employing a saturation adjustment scheme (Asai, 1965;McDonald, 1963;Langlois, 1973;Soong 20 and Ogura, 1973; Kogan and Martin, 1994). Note, that the values of the coefficients may also depend on the environmental conditions. Also note, that the formulation in (12) does not contain a term for the activation of new cloud droplets. Within the operational models, the activation of cloud droplets is done with the help of the saturation adjustment, where the excess water vapor is converted into mass of cloud droplets, thus always activating the maximal number if not restricted otherwise.
After choosing an appropriate set of coefficients and exponents, Equations (12) represent a cloud scheme for a warm cloud 25 in the spirit of Kessler, although not every choice of parameters yields a physically meaningful scheme. Apart from the condensation term, the parameterizations of the other processes are not based on purely physical reasoning. Rather the structure of the terms represent in some sense ad-hoc formulations and approximations, but may also be motivated in the sense of population dynamics. The values of the coefficients and the exponents are usually obtained by fitting to observational data or results of detailed models (e.g. Khairoutdinov and Kogan, 2000). In any case, the precise values of the coefficients and the exponents are uncertain to some degree.
For simplicity, we consider the cloud scheme (12) within an adiabatic air parcel, providing a natural framework to start with in the development of cloud schemes. The closure of (12) is given by the evolution equations for pressure p and temperature T 5 dp dt where g denotes the gravitational acceleration, R = R a 1 + 1−ε ε qv 1+qv the gas constant for moist air, R a the gas constant for dry air, R v the gas constant for water vapor, ε = Ra Rv the quotient of the gas constants of dry air and water vapor, w the vertical velocity and L the latent heat of vaporization. The coefficient c of the condensation is given by for the condensational growth of a cloud droplet. In (15), K denotes the thermal conductivity of dry air and D the diffusivity 15 of water vapor in air. Note that the choice of a fixed constant cloud droplet number N c in (14) is motivated by the default Kessler-type warm cloud scheme of the IFS model (ECMWF, 2017).
Using the notation introduced in Section 1, the combined discretization of the governing Equations (12) and (13)  together with the parameters η = (a 1 , a 2 , e 1 , e 2 , d, γ, β c , β r , δ 1 , δ 2 , ζ) to compute the state of the system at the new time-level, i.e. computing z = f (y, η). Implementing f yields the function f , from which AD can compute the derivatives with respect to the parameters η.
4 Application of Algorithmic Differentiation 25 We implemented the air parcel model in C++ and discretized the ordinary differential equations using the classical explicit Runge-Kutta method of order four (Hairer et al., 1993), although any other method could be used as well. We chose the values of the parameters according to the warm rain scheme used in the operational forecast model IFS (ECMWF, 2017).
In this case, all exponents are independent of the environmental conditions and only the coefficients e 1 , e 2 of the evaporation depend on the environmental conditions. Table 1

Cloud Formation in Updraft
As the first example we consider an updraft velocity w = 1 m s −1 , the initial conditions 15 (q c (0), q r (0)) = 10 −10 kg kg −1 , 0 kg kg −1 for the mixing-ratios, and S(0) = 1 for the saturation ratio, integrated for 1950 s. The reason of not choosing q c (0) = 0 kg kg −1 is that Equation (12a) needs a non-zero initial value to allow a non-constant and non-zero solution. Physically, this reflects the fact that the cloud scheme (12) does not include a precise activation mechanism which predicts the number of activated cloud droplets as a function of saturation ratio. Such a mechanism is not included in the operational model due to their usage of a 20 saturation adjustment scheme (e.g. Kogan and Martin, 1994). Figure 1 shows the temporal evolution of the saturation ratio S (left panel), the cloud droplet mixing-ratio q c (middle panel), and the rain drop mixing-ratio q r (right panel). Apparently, the saturation ratio increases initially due to the adiabatic cooling, until cloud droplet mass increased enough to balance the source for saturation ratio from the adiabatic cooling by the diffusional growth of the cloud droplets. Since autoconversion is the only process to form rain, its formation starts after enough cloud droplet mass is available. The time interval between 900 s and 1200 s is a transition region, where the influence of autoconversion decreases while the influence of accretion increases, i.e. falling rain drops start to effectively collect cloud droplets and consequently increase the rain drop mass mixing-ratio q r . This may also be seen directly from the model equation (12): Accretion is modelled as a 2 q βc c q βr r , hence an increase of rain drop mass directly increases the influence of this process 5 and the further decrease in cloud droplet mass may be attributed to accretion. At about 1200 s the saturation ratio starts to increase again, since the decreasing cloud droplet mass diminishes the sink for water vapor due to their condensational growth.
Note that, according to Equation (12d), the evaporation term is inactive for supersaturated conditions with S ≥ 1.
The derivatives of the mixing-ratios with respect to the coefficients are shown in Figure 2, whereas Figure 3 shows the derivatives with respect to the exponents. As the upper left panel in Figure 2 shows, the coefficient with the largest sensitivity 10 for the cloud droplet mass q c up to about 1000 s is the coefficient a 1 for autoconversion (red curve). The large sensitivity during the initial stage of the cloud evolution implies that the main loss of cloud droplet mass can be attributed to the autoconversion process, rendering the autoconversion rate a critical parameter. Given that autoconversion is the only process to produce rain drops out of the cloud droplets, this result may be anticipated. The negative value of the derivative ∂qc ∂a1 indicates a decrease in q c if the value of a 1 is increased by a small amount, i.e. a larger autoconversion rate results in a faster decrease of the cloud 15 droplet mass. The transition region between 900 s and 1200 s, where the influence of autoconversion decreases and accretion increases, is also clearly visible as a decreasing magnitude of ∂qc ∂a1 and an increasing magnitude of the derivative ∂qc ∂a2 of cloud droplet mass mixing-ratio with respect to the accretion parameter a 2 (blue curve).
Inspecting the upper right panel in Figure 2, we observe a positive derivative ∂qr ∂a1 of the rain mixing-ratio with respect to the autoconversion coefficient with the same magnitude as ∂qc ∂a1 . This simply resembles the mass continuity, since a faster 20 autoconversion implies a faster decrease in the cloud droplet mass q c and an equally fast increase in the rain drop mass q r . The same is true for the accretion, i.e. the derivatives ∂qc ∂a2 , ∂qr ∂a2 (blue curves in the upper panels).  qr with respect to the coefficients but without the derivative ∂qr ∂d with respect to the sedimentation coefficient; the lower panel shows the derivatives of qr including the derivative with respect to the sedimentation coefficient d (purple curve). All panels correspond to the first case of an ascending air parcel with no pre-existing cloud.   The derivatives ∂qc ∂e1 , ∂qc ∂e2 of the cloud droplet mixing-ratio with respect to the rain-evaporation rate coefficients e 1 , e 2 as well as the derivatives ∂qr ∂e1 , ∂qr ∂e2 of the rain mixing-ratio with respect to the same coefficients are identically zero, being consistent with the fact, that the evaporation term is inactive within a supersaturated cloud parcel, see Equation (12d). However, the purple curve in the upper left panel, representing the derivative ∂qc ∂d of the cloud droplet mixing-ratio with respect to the sedimentation coefficient, is not identically zero although the sedimentation term is absent in Equation (12a) for the cloud droplet mixing-5 ratio q c . This is an example of an indirect sensitivity of q c to this coefficient: Altering the sedimentation coefficient modifies the sedimentation rate which obviously directly affects the rain mixing-ratio q r . This in turn feeds back to the cloud droplet mixing-ratio q c since the rain mixing-ratio q r enters Equation (12a) through the accretion term. We conclude that the AD methodology is able to detect such indirect effects. Moreover, as may be concluded from the left panel in Figure 2, this indirect sensitivity could easily be masked due to the comparable magnitude of the positive sensitivity ∂qc ∂d (purple curve) and the 10 negative sensitivity ∂qc ∂a2 (blue curve). The lower panel in Figure 2 shows the derivatives of q r with respect to all coefficients, in particular also with respect to the sedimentation parameter d (purple curve; this curve was not included in the upper right panel). From this figure it is evident, that q r is most sensitive to the sedimentation coefficient. Comparing the different scalings of the ordinate in the upper right and the lower panel corroborates this result. To summarize, at the beginning of this simulation, the most sensitive parameter 15 is the autoconversion rate to create rain drop mass. Towards the end of the simulation, enough rain drop mass formed and the sedimentation becomes more important, actually much more important than the autoconversion or the accretion since the derivatives ∂qr ∂a1 , ∂qr ∂a2 and ∂qr ∂d differ by almost four orders of magnitude. Figure 3 shows the derivatives of the mixing-ratios q c , q r with respect to the exponents. For both mixing-ratios, obviously the exponents β c , β r from the accretion process are most sensitive (blue and green curves).  Note that the sign of the curves is counter intuitive, because, e.g., positive values of the blue and green curves imply slower accretion after increasing the values of those exponents by a small amount. This behaviour is easily explained with the values of both mixing-ratios being smaller than unity: Increasing the exponent in the expression a 2 q βc c q βr r with 0 ≤ q c , q r < 1 leads to decreased values of the expression a 2 q βc c q βr r and consequently to a slower accretion process. It is important to keep such a behaviour in mind in interpreting the derivatives.

5
The second most sensitive exponent for the rain mixing-ratio is given by the exponent ζ from the sedimentation process, but only after enough rain drop mass has formed after about 1000 s. The magnitude of the derivatives with respect to ζ and the accretion exponents β c , β r are comparable and of opposite sign. Therefore, the influence of increasing these exponents simultaneously may cancel out.
Observe that the derivatives of both mixing-ratios with respect to the exponents δ 1 , δ 2 in Figure 3 are exactly zero, again 10 resembling the in-activeness of the evaporation term in Equation (12d) within the supersaturated cloud parcel.
As was already indicated, AD computes the derivatives of the implemented code, which in our case involves a numerical method, rather than derivatives of the (unknown) continuous solution of the governing differential equation. This distinction is important for the interpretation of the computed sensitivities and we are now going to discuss the subtleties connected with them. 15 First of all, we outline the fact, that since the numerical discretization of the governing differential equation depends on the timestep ∆t, the magnitude of all computed derivatives also depend on the timestep. Figure 4 illustrates this fact by showing the derivative ∂qc ∂a1 of the cloud droplet mass on the parameter a 1 for autoconversion for several timesteps. In this figure, the dependency on ∆t becomes obvious, however, the shape of the curve does actually not change. Rescaling the curves in Figure   4 by multiplying the values of the blue curve by 0.1 0.01 = 10, the values of the green curve by 0.1 0.001 = 100, and the values of the yellow curve by 0.1 0.0001 = 1000, all rescaled curves do coincide with the red curve (not shown), representing the derivative computed with timestep ∆t = 0.1. Although, this scaling with the timestep is quite intuitive due to the fact that if the timestep is larger, the simulation is done for a larger time interval and the sensitivities are expected to increase, one should be aware of 5 this scaling when trying to interpret the magnitudes of the derivatives.
A second issue regarding the meaning of the computed sensitivities is addressed in Bischof and Eberhard (1999). There are two possibilities of computing sensitivities: 1. Apply AD on the implemented code, which numerically approximates the unknown solution of the governing differential equation, or 10 2. derive an analytical differential equation (called the sensitivity or variational equation as an analogue of the forward AD mode or the adjoint equation as an analogue of the reverse AD mode), describing how the sensitivities evolve with time (see, e.g., Sandu et al. (2003) or Walther (2007) for details). In this approach, AD may enter the computation of the derivatives of the right-hand side of the governing equation.
The essential difference between both possibilities is, that the first approach explicitly includes the numerical method in the 15 computation of the sensitivities, while the second approach excludes any numerical method in the first place and instead creates an analytical differential equation, e.g. the sensitivity equation, for the description of the sensitivities and only in a second step approximates the exact solution of this equation using a numerical method.
In our study we choose the first option and include the numerical method in the differentiation, since this seems more appropriate if one is interested in how the sensitivities propagate through a given, already implemented numerical model. 20 We use a one-step numerical method with constant timestep ∆t to approximate the ordinary differential equation, hence the numerical solution y new at the new time-level is connected to the old solution y old at the old time-level by y new = y old + ∆t · Φ(y old , η, ∆t), where η represents the parameter vector as in Equation (17) and Φ is the numerical method. Moreover, we compute the derivatives at each timestep separately, i.e. the approximation y old is considered as independent of the parameters at the current time-level, hence ∂y old ∂η = 0. Consequently, in our case the AD methodology factually computes the derivative showing once again the observed dependency of the computed sensitivities on the timestep. In addition, (19) shows explicitly that the numerical method Φ is included in the differentiation, thus the computed sensitivities depend on the numerical method.  Table 2. Ratios of the derivatives of the mixing-ratios with respect to different parameters at t = 1000 s, computed for several timesteps ∆t.
All numbers are taken from the computations of the first case of an ascending air parcel with no pre-existing cloud. The numbers are rounded to four digits.
Note that we employed a constant timestep which, according to the results in Bischof and Eberhard (1999), ensures that the computed sensitivities are indeed approximations of the sensitivities of the unknown, exact solution. If we had used a numerical method with adaptive timestepping, this is not necessarily true, since in this case the current timestep depends on the solution which in turn depends on the parameters η, implying ∂∆t ∂η = 0 in general. Again, if one is interested in an approximate computation of the sensitivities of the exact solution, a correction is needed for this non-zero contribution of ∂∆t ∂η = 0, see 5 Bischof and Eberhard (1999). However, if one is interested in the sensitivities of the implemented code, no correction is needed.
Finally, it is not obvious that the computed sensitivities, which depend on the timestep, converge to the sensitivities of the exact but unknown solution in the limit ∆t → 0. In our case, since we do rely on an explicit Runge-Kutta method of order 4, the desired convergence is established in Walther (2007). An extension to more general Runge-Kutta method for the adjoint equation is presented in Sandu (2006).
Although the magnitude of the computed derivatives depend on the timestep of the numerical method, the relative magnitudes of the individual derivatives are independent of the timestep. Table 2 highlights this observation by comparing the ratios of some derivatives of the mixing-ratios at t = 1000 s, computed with several timesteps ∆t. The ratios of the derivatives shown in Table 2 are indeed approximately constant (except for effects of a coarse time resolution), implying that these ratios are 15 indeed independent of the timestep. Therefore, the derivative with respect to the most sensitive parameter will show the largest magnitude compared to the derivatives with respect to the other parameters, regardless of the chosen timestep of the numerical method involved. This retains the possibility to use the computed derivatives to identify the most sensitive parameters of the cloud scheme.

20
As the second example, we consider a downdraft with velocity w = −1 m s −1 , the initial conditions (q c (0), q r (0)) = 2 · 10 −4 kg kg −1 , 10 −4 kg kg −1 for the mixing-ratios and S(0) = 1.01 for the saturation ratio, representing an initial supersaturation of 1 %. The temporal evolution of the saturation ratio and the mixing-ratios are shown in Figure 5. The downward vertical motion of the air parcel causes the saturation ratio to decrease due to the adiabatic heating. However, until the complete evaporation of the cloud droplets at about 175 s (see the middle panel in Figure 5), the release of water vapor of the evaporating cloud droplets counteracts the decrease of the saturation ratio and keeps the air parcel only slightly subsaturated (left panel in Figure 5). After roughly 175 s, the saturation ratio decreases continuously, resulting in a substantially subsaturated air parcel. Consequently, the 5 available rain drops do not only sediment out of the air parcel but also evaporate due to the subsaturation (right panel in Figure   5). However, the release of water vapor of the evaporating rain drops is seemingly not able to counteract the decrease of the saturation ratio as was the case for the evaporating cloud droplets at the beginning of the simulation, but the precise sensitivities of rain drop evaporation and sedimentation cannot be deduced from the temporal evolution of the mass mixing-ratio q r .
The temporal evolutions of the derivatives of the mixing-ratios with respect to the coefficients are shown in Figure 6. 10 Contrary to the first case from Section 4.1, the air parcel rapidly becomes subsaturated with S ≤ 1 and the evaporation process in Equation (12d) is now active, hence no derivative is identically zero. Inspecting Figure 6, the most sensitive coefficient for both mixing-ratios is e 2 (yellow curve), corresponding to the ventilation coefficient within the formulation of the evaporation process of the rain drops, see Equation (12d). This result may be anticipated regarding the rain drop mass mixing-ratio q r , because the air parcel is subsaturated and the evaporation process directly affects the rain drop mixing-ratio. However, we 15 observe also a large sensitivity of the same parameter on the cloud droplet mixing-ratio q c . This feedback is, again, an indirect sensitivity originating from the accretion process: If the rain drop mass decreases faster due to a slight increase of the coefficient e 2 , the accretion gets slower and therefore less cloud droplets get collected by the falling rain drops. Consequently, the decrease of cloud droplet mixing-ratio is diminished.
The right panel in Figure 6 also shows that the second most sensitive coefficient for q r is given by the sedimentation rate 20 coefficient. This observation also answers the question which process is more sensitive to changes in its rate coefficient for the decrease of rain drop mass, seen in the right panel in Figure 5.  (yellow and purple curves in the right panel in Figure 6), a slight change in the evaporation rate coefficient e 2 will result in larger responses than a change in the sedimentation rate.
Although not visible in the left panel in Figure 6, the derivatives with respect to the coefficients a 1 , a 2 , d are all of about the same magnitude, while the sensitivity to the second evaporation coefficient e 1 is significantly smaller. 1e 6 qr,1 qr,0.9 qr,1 qr,1.1 Figure 8. Difference between the reference run with unperturbed coefficient e2, denoted as qx,1 for x ∈ {c, r} and using the perturbed second evaporation coefficient s · e2 with s ∈ {0.9, 1.1}, denoted as qx,0.9 or qx,1.1, for the cloud droplet mixing-ratio (left panel) and the rain mixing-ratio (right panel), respectively.
Inspecting Figure 7, illustrating the derivatives of both mixing-ratios to the exponents, the most sensitive exponents for the cloud droplet mixing-ratio q c are again the exponents β c , β r corresponding to the accretion process (blue and green curve in the left panel). For the rain drop mixing-ratio q r (right panel), the most sensitive exponent changes from the sedimentation exponent ζ (cyan curve) at the beginning to the exponent δ 2 , occurring in the second term e 2 q δ2 r of the evaporation term, being consistent with the large sensitivity of the corresponding rate coefficient e 2 , see the yellow curve in the left panel of Figure 6. 5 To summarize the second example: The AD methodology pinpoints the second summand e 2 q δ2 r of the evaporation term together with the exponents β c , β r of the accretion process to introduce the largest sensitivity in the model results. Although one could find the same sensitivities using classical sensitivity studies instead of AD, the AD methodology provides an immediate hint on, e.g., the sensitive coefficient e 2 for both mixing-ratios, see Figure 6, without having to carry out multiple model runs, where one perturbs each coefficient of the cloud scheme separately, one after the other. Moreover, the sensitivity of the cloud 10 droplet mixing-ratio q c to e 2 is indirect, rendering it difficult to identify this sensitivity directly using the ensemble approach, in particular because the governing Equation (12a) provides no hint due to the absence of coefficient e 2 . Given that even the simple cloud scheme (12) already contains five rate coefficients, perturbing each coefficient within a separate model run results in a significant total number of runs.
After the identification of the most sensitive parameters using AD, one can carry out further model runs, targeted at the 15 parameters which were identified beforehand. Figure 8 illustrates a possible further analysis step. It shows the difference between an unperturbed run, denoted by q x,1 with x ∈ {c, r} and two further runs with perturbed rate coefficient s · e 2 instead of e 2 , where s ∈ {0.9, 1.1} is a scaling parameter. Observe that the signal is consistent with the derivative, computed by AD, in Figure 6: The derivative ∂qr ∂e2 is negative, hence a slight increase of the coefficient should result in a smaller rain mixingratio q r and, consequently, the difference q r,1 − q r,1.1 should be positive. Similarly, the derivative ∂qc ∂e2 is positive, hence a slight increase of the coefficient should result in negative values for the difference q c,1 − q c,1.1 . Figure 8 shows exactly these tendencies (blue curves). The red curves show the resulting differences using the scaling parameter s = 0.9; note that the curves are not symmetric to each other.

Dependency on the Model Trajectory
After having discussed both exemplary cases individually, we now point at another important aspect of the AD methodology.
Given that AD was applied to the exactly same computational code, a comparison between, e.g., the derivatives of the cloud droplet mixing-ratio q c with respect to the rate coefficients (see the upper left panel in Figure 2 and the left panel in Figure   6) reveals that the corresponding curves are not equal to each other, but show a significant different behaviour. The only 10 difference between both examples were the values of the initial conditions. Consequently, the model trajectories between both runs evolved differently despite the fact that the computational code was unchanged. This observation is a crucial aspect of the AD methodology since it underlines that AD computes the derivative of the model, i.e. the computational code, along the particular model trajectory, rather than providing the derivatives for each possible state of the model. Therefore, the AD approach can provide the desired sensitivities for the particular evolution of the model state, posing the same limitations as the 15 computation of the derivatives using the aforementioned finite difference approach.

Conclusion
In this study, we presented and applied the technique of algorithmic differentiation (AD) in the context of cloud schemes, representing an important example of a subgrid parameterization of numerical models within the atmospheric sciences. In the literature, many different cloud schemes are suggested since at the moment, a universal governing equation for the full 20 description of a cloud is not available (in contrast to the Navier-Stokes equation for the description of a non-reacting flow), making it not possible to derive cloud schemes from a common universal basis. As a consequence, many ad-hoc assumptions are made within the formulations of the cloud processes typically leading to the introduction of uncertain parameters.
We propose the use of algorithmic differentiation in the development of cloud schemes in order to identify the most sensitive parameters in the adopted formulation along the simulated solution trajectory. The AD methodology is based on the 25 observation that each computer code is a large composition of only a few differentiable elemental operations, hence, by the chain rule, the code itself is differentiable apart from critical points, where the elemental functions are not differentiable. Since the derivatives of the elemental operations are known, the full computational code can be differentiated in a (semi-)automatic fashion. Moreover, the resulting derivatives are accurate to machine precision because the AD approach merely evaluates the exact derivative. 30 In the context of sensitivity studies, the AD approach yields the desired sensitivities of the parameters on the result of the computation requiring only a constant additional computational effort, see Griewank and Walther (2008). The forward mode of AD roughly doubles the number of code instructions since each statement is complemented with its derivative, hence also roughly doubles the code execution time for each run. Note that the forward mode needs the specification of the desired direction for the directional derivative in advance, thus it only computes a single directional derivative per run. This may rapidly become very computational intensive if there are many input parameters or one is interested in the full gradient. In this case one should use the vector forward mode which leads to a significant reduction in complexity. In contrast, the reverse 5 mode introduces more overhead than the forward mode for a single run, but the amount is independent of the number of input variables, among which are also the parameters to be investigated. Therefore, the reverse AD mode has the ability to outperform the forward mode in case of many input variables but only a small number of output variables because only a single run of the reverse mode is required to establish the derivatives with respect to each input variable, whereas the forward mode requires as many runs as the number of input variables. The fact that AD introduces only a constant computational 10 overhead is especially useful if the number of parameters is comparably high, since establishing an ensemble to investigate the sensitivity of the parameters quickly results in a high number of model runs. Moreover, using an ensemble of model runs to study parameter sensitivities additionally involves a thorough post-processing of all model output. In contrast, the AD approach clearly identifies the parameters with high sensitivity, regardless if the sensitivity is direct or indirect, and allows to focus a further post-processing on only the relevant parameters. 15 AD helps in computing the derivatives, but one has to keep in mind the subtleties: Using AD as advocated in this study implies that the computed derivatives involve the differentiation of the numerical method used to approximate the solution of the differential equation (12), (13). Consequently, the derivatives depend on the timestep and may change if the numerical method is replaced by another method. Nevertheless, the computed sensitivities resemble the actual sensitivities of the implemented code and comprise the correct result to tackle the question how uncertainties propagate through a given numerical model as, 20 e.g., a weather forecasting model.
We emphasize that the technique of AD is not restricted to a specific programming language nor to the analysis of cloud schemes. It is a generic technique which may help in the development of any (subgrid) scheme for a (geophysical) numerical model by providing informations about the sensitivities of the involved parameters.
In adopting a cloud scheme or any subgrid scheme, the question of how the inherent uncertainties of the scheme influence 25 the (numerical) solution of the model arises. In the context of the topic of this study, an example of the influences of a single parameter within a typical cloud scheme on the overall cloud development is discussed in Igel and van den Heever (2017a, b, c).
Answering the question of how uncertainties propagate within a given model is a highly non-trivial task. As already indicated in Section 1, methods from "Uncertainty Quantification" allow to assess this propagation (e.g., Sullivan, 2015;Le Maître and Knio, 2010), but taking many parameters simultaneously into account is challenging and computationally expensive. 30 Algorithmic differentiation allows to first identify the parameters which influence the result of a given parameterization at most, e.g. a cloud scheme, and limit the more extensive investigation to only these parameters; see Chertock et al. (2019) for an example of such an investigation. However, as AD by design computes the derivatives along the numerical solution trajectory, one might not detect all possible sensitivities, but at least the most sensitive parameters in "typical" situations.
Code availability. The code for the cloud model together with all scripts needed to reproduce the results in this study is available at https: //doi.org/10.5281/zenodo.3461483. The algorithmic differentiation tool "CoDiPack" may be found online at https://github.com/scicompkl/ codipack and the version 1.8.1, which is used in this study, is available at https://doi.org/10.5281/zenodo.3460682.
Author contributions. MB developed the cloud model code and carried out the numerical experiments under the supervision of AB and PS.
MS developed the algorithmic differentiation tool "CoDiPack" under the supervision of NRG. MB and MS prepared the manuscript with 5 reviews from NRG, AB and PS.
Competing interests. The authors declare that they have no conflict of interest.
Brinkmann acknowledge support by the DFG within the Transregional Collaborative Research Centre TRR165 "Waves to Weather" (www. wavestoweather.de), projects B7 and Z2.