Efﬁcient approximation of the incomplete gamma function for use in cloud model applications

. This paper describes an approximation to the lower incomplete gamma function γ l (a,x) which has been obtained by nonlinear curve ﬁtting. It comprises a ﬁxed number of terms and yields moderate accuracy (the absolute approximation error of the corresponding normalized incomplete gamma function P is smaller than 0.02 in the range 0 . 9 ≤ a ≤ 45 and x ≥ 0). Monotonicity and asymptotic behaviour of the original incomplete gamma function is pre-served.


Introduction
In cloud physics (and also in radar meteorology), it is common practice to use so-called gamma-distributions or generalized gamma distributions (Deirmendjian, 1975) to describe particle size distributions (PSD) of hydrometeors, either to fit observed distributions (Willis, 1984;Chandrasekar and Bringi, 1987) or to base parametrizations of cloud microphysical processes on it (e.g., Milbrandt and Yau, 2005b;Seifert and Beheng, 2006 and many others).As will be outlined below, this ansatz may lead to the necessity of computing ordinary and incomplete gamma functions.Particularly the incomplete gamma function poses certain practical computation problems in the context of cloud microphysical parametrizations used on supercomputers, which up to now hinders developers to apply parametrization equations involving this function.These problems can be alleviated by using a new approximation of this function that is introduced in Sect. 2 of this paper, which might ultimately lead to improvements in cloud microphysical parameterizations.
With regard to cloud-and precipitation particles, let y represent either the sphere volume equivalent diameter D or the particle mass m.Then, the distribution function f (r,t;y) describes the number of particles per volume at a specific location r at time t having mass/diameter in the interval [y,y + dy].Dropping the r-and t-dependence for simplicity, f (y) is said to be distributed according to a generalized gamma-distribution if it obeys f (y) = N 0 y µ exp −λy ν (1) with the four parameters N 0 , µ, ν and λ.If ν = 1, Eq. (1) reduces to "the" gamma-distribution.Note that if the widely used assumption m ∼ D b with b > 0 (e.g., Locatelli and Hobbs, 1974, and many others thereafter) holds (most simple case: water spheres with Published by Copernicus Publications on behalf of the European Geosciences Union. U. Blahak: Efficient approximation of the incomplete gamma function m ∼ D 3 ), the generalized gamma distribution is invariant under the transformation between the diameter-and the massrepresentation; only the values of the parameters are different.
The parameters for Eq. ( 1) are not necessarily independent in natural precipitation.For example, in case of rain drops there has been observed some degree of correlation (Testud et al., 2001;Illingworth and Blackman, 2002).
Often, cloud microphysics parametrizations require the computation of (or are -in case of "bulk" parametrizations -based entirely on) moments of the PSD functions, leading, in case of infinite moments, to the ordinary gamma function, which is defined as For example, the mass content L, which is of primary importance, is the first moment of the distribution with respect to the mass representation.Generally the moments M (i) are defined as hence the name "generalized gamma-distribution" for f (y).Such infinite moments, also with non-integer i, enter stateof-the-art bulk (moment) cloud microphysical parametrizations in many ways, e.g., in the computation of collision rates, deposition/evaporation rates and so on, which is extensively described in textbooks (e.g., Pruppacher and Klett, 1997) or in the relevant literature (e.g., Lin et al., 1983;Seifert and Beheng, 2006; to name just a few).However, many cloud microphysical processes involve some sort of spectral cut off, for example, collision processes like autoconversion or some 2-and 3-species collisions between solid and/or liquid particles.If, in bulk models, one wishes to parametrize such processes, e.g., the conversion of particles above a certain size/mass threshold to another species by a certain process (e.g., "wet growth"; collisions of ice particles with drops where the "outcome" depends on certain size ranges of the ice particles and the drops as proposed by Farley et al., 1989), this would require the computation of incomplete gamma functions.The lower incomplete gamma function is defined as For example, consider the transformation of intermediatedensity graupel particles to high-density hail particles in conditions of wet growth, which is important for hail formation.That is, if graupel particles are in an environment of high content of supercooled liquid water drops (high riming rate), then particles larger than a certain size/mass m wg cannot entirely freeze the collected supercooled water, because the latent heat of fusion cannot be transported away from the particles fast enough (e.g., Young, 1993).The non-frozen water might get incorporated into the porous ice skeleton of the graupel and might refreeze later, leading to an increase in bulk density ("hail").Following Ziegler (1985), the corresponding loss of L g (graupel) to L h (hail) might be simply parameterized by where the index g denotes graupel and t the numerical time step.On the right-hand side, now the upper incomplete gamma function γ u (a,x) = (a)−γ l (a,x) appears.Values of m wg are usually in a range equivalent to a diameter 1 mm, µ g is typically between −0.5 and 1, and ν g ≈ 1/3.Both N 0,g and λ g are > 0 but quite variable, so that the corresponding value of x in Eq. ( 4) might take on arbitrary values > 0.
Equation ( 5) is a simple but instructive example of a bulk cloud microphysical process parameterization, that comprises three typical features regarding the parameters a and x of the incomplete gamma function in cloud physics.First, the parameter a depends on the shape parameters µ and ν of the assumed distribution Eq.(1) and not on N 0 and λ.Now, in most of the established one-or two-moment bulk schemes, µ and ν are fixed parameters which do not change during a particular model simulation, so that a also remains fixed (only recently, authors start to make at least µ variable in a diagnostic way for some of the processes, e.g., Milbrandt and Yau, 2005a;Seifert, 2008).Second, the integration limit respectively size-or mass threshold m of the process parameterization (m wg in the above Eq.( 5) translates into the x-parameter λm ν of the incomplete gamma function.That means, even if m is a fixed parameter, x is not fixed because λ is variable.Therefore, in cloud physics either a is fixed and x varies during a model simulation (which will be of importance later) or both a and x vary; the case of a variable a and fixed x has not yet occured in literature to the best knowlegde of the author.Third, a attains non-integer values in most cases, depending on the choice of µ and ν.As it is outlined later, an integer value of a leads to an analytic expression for the incomplete gamma function, which facilitates its computation.Unfortunately, there is only a chance to get integer values of a if one assumes exponential particle size-or mass-distributions, which is a serious and perhaps in many cases unphysical restriction.
Dividing Eq. ( 4) by the (ordinary) gamma Function (a) = γ l (a,∞) leads to the normalized function with values monotoneously increasing from 0 for x = 0 to 1 for x → ∞.The majority of the increase from 0 to 1 occurs at values of x around a with a "band width" of about √ a (see black curves in Fig. 1, in anticipation of the following).
Taking the complement 1 − P (a,x) leads to the so-called upper normalized incomplete gamma function Q(a,x), which, upon multiplication with (a), gives the integral in Eq. (4) but with lower limit x and upper limit ∞, which is denoted by γ u (a,x).Computing any of γ l (a,x), γ u (a,x), P (a,x) or Q(a,x) will lead to any of the other functions by simple transformations.
Complete and incomplete gamma functions are wellknown and treated extensively in the mathematical literature, and there are various ways to compute these functions in practice.To start with (a), along with the well-known recurrence relations, Press et al. (1993) devise a very efficient and very accurate approximation for a > 0, which is sufficient for cloud microphysical applications, and which has been originally derived by Lanczos (1964).Incomplete gamma functions are analytic for integer values of a (see Eqs. 18-22 later in this paper); for arbitrary values of a, a widely used method devised again by Press et al. (1993) uses a series expansion of γ l (a,x) or a continued fraction expansion of γ u (a,x), depending on whether x is larger or smaller than a + 1.These expansions are summed up to a certain number of terms until convergence is reached.The required number of terms depends on a and x and on the desired numerical accuracy.
While this is a very accurate method, the numerical burden is comparatively high within the framework of cloud models and is hard to predict because of the variable number of terms required to reach the desired accuracy, and, on vector machines, this method leads to vectorization problems, which might drastically lower the computing performance.We believe that, among other reasons, these practical problems and the supposedly high computational costs with regard to the low computing performance up to now prevented many cloud physicists from extensively using parametrizations which involve incomplete gamma functions, such as Eq. ( 5).In some cases where incomplete gamma functions have been used, simple analytical approximations for very special and fixed values of a were employed, e.g., a piecewise linear "ramp" function of x for non-integer a in Cotton et al. (1986).Or, as in Farley et al. (1989), finite integrals similar to the one in Eq. ( 5) -analytically resulting in incomplete gamma functions -have been calculated numerically (which is perhaps even more costly than Press's method!).
However, if the parameter a is fixed during subsequent function evaluations (as is the case for many cloud microphysical process parametrizations in the context of bulk ap-  12) with coefficients given by Eq. ( 14) and Table 1. 19 Fig. 1.P (a,x) (black lines) for some values of a as function of x.Grey lines: proposed approximation Eq. ( 12) with coefficients given by Eq. ( 14) and Table 1.
proaches), the most efficient way in this case is perhaps interpolation from pre-computed regular lookup tables with respect to x at fixed a. "Regular" is meant here in the sense that the indices of neighbouring table values can be computed from the interpolation point instead of a grid point search.
Generally, this is achieved if the distance between tabulated values can be given by some functional form.Perhaps the most simple example is an equidistant table, where the indices of the neighbouring values x i and x i+1 in the table, required to obtain the interpolated value of γ l at a point x, can be explicitly computed from x, the starting point x 1 of the table and the table increment x, This is much more efficient and predictable compared to a non-regular table, where a search loop with a nested if-clause is necessary to find i.Another example of a regular lookup table would be logarithmically equidistant.
For the equidistant table, further efficiency is gained by pre-computing the constant 1/ x, so that the costly division is replaced by a cheap multiplication.If linear interpolation between neighbouring table values is used (accuracy might be gained from decreasing x), so that then altogether only one integer rounding operation and a few additions and multiplications have to be performed per γ l evaluation.This method has been used, e.g., by Harringon et al. (1995) and Walko et al. (1995).Alternatively one could use more costly higher order interpolation schemes and at the same time reduce x.
In case that a is not fixed, the lookup table would have to be two-dimensional, requiring two-dimensional interpolation techniques (bilinear in the most simple case).
www.geosci-model-dev.net/3/329/2010/Geosci.Model Dev., 3, 329-336, 2010 The necessary span of the table with respect to x at a certain value of a may be taken from 0 to the value x 995 where P (a,x 995 ) = 0.995.x995 as a function of a may be estimated from the approximative formula g 1 = 36.63g 2 = −0.1195 which has been obtained by nonlinear curve fitting and which is depicted in Fig. 2. At x > x 995 , P (a,x) ≈ 1 respectively γ l (a,x) ≈ (a).
Unfortunately such lookup tables also lead to vectorization problems because of memory bank conflicts, in case of many parallel vector tasks having to access the same table values at the same time.On such types of architectures, it is desirable to have an approximation formula at hand with a fixed number of mathematical terms, regardless of a and x.At the same time, for many applications a reduced accuracy may be acceptable, e.g., approximation errors for P within 0.01 absolute and/or 1% relative.Lanczos's very accurate approximation to (a) does already fulfill the requirement of a fixed number of terms.The purpose of this paper is to develop such an approximation also for γ l (a,x), although at a reduced accuracy.

Approximation of the lower incomplete gamma function
We seek an approximation by means of nonlinear curve fitting, because this leads to the desired fixed number of terms, as opposed to relying on the convergence of series expansions.To motivate a regression ansatz, series expansions are however useful.A series expansion of γ l (a,x) can be obtained by plugging the Taylor series of the exponential function exp(−t) into Eq.( 4) and integrating each term separately, which leads to (Abramowitz and Stegun, 1970) A different well-known representation (e.g., Press et al., 1993) is which can be obtained from Eq. ( 10) after tedious manipulation by separating the series representation of exp(−x) by means of the Cauchy-product and solving for the series coefficients by equating the pre-factors for each individual power of x.
It turns out that for x a the first two terms in Eq. ( 11) are sufficient to give a reasonable approximation.For larger x, an ever increasing number of terms is necessary.On the other hand, for x → ∞, P (a,x) approaches 1 seemingly similar to something like 1 − c −x 4 , with some positive number c 4 .The former approximation is asymtotically correct for x → 0, the latter for x → ∞.Therefore, a 4-parametric ansatz γl (a,x;c 1 ,c 2 ,c 3 ,c 4 ) is constructed which blends the former function (slightly modified by the the fitting parameter c 1 ) into the latter: For many fixed values of a ∈ [0.1,30], nonlinear curve fitting by the Levenberg-Marquardt-method with respect to x ∈ [0,x 995 (a)] leads to a large number of coefficient sets c 1 to c 4 , each coefficient being a function of a. γl (a,x) denotes the resulting fit, P (a,x) = γl (a,x)/ (a) and Q = 1 − P .The results of such curve fits have been found very pleasing, as the absolute approximation errors of the resulting normalized functions P and Q are generally less than 0.01 over the employed a-x-domain.This corresponds to a relative error |( P − P )/P | < 1% for x a and |( Q − Q)/Q| < 1% for x a.The relative errors might be larger when P (a,x) and Q(a,x) approach 0, but relative errors are not meaningful in that case.The fits show the same asymptotic behaviour as the original function and are found to always increase monotonically with increasing x, which is essential to be of practical use.12) as derived by nonlinear regression for fixed values of a. Black lines: nonlinear regression functions to each of c 1 through c 4 as functions of a after ansatz (14) and the coefficients from Table 1.21 Fig. 3. Numerical values (grey stars) of the regression coefficients c 1 through c 4 in Eq. ( 12) as derived by nonlinear regression for fixed values of a. Black lines: nonlinear regression functions to each of c 1 through c 4 as functions of a after ansatz (14) and the coefficients from Table 1.
As a last step, c 1 to c 4 are approximated as functions of a, again by nonlinear curve fitting.The results are where the particular functional forms have been arrived at by guessing and experimentation.The values for the parameters p 1 . . .p 6 , q 1 . . .q 4 , r 1 . . .r 4 and s 1 . . .s 5 are given in Table 1.Replacing in Eq. ( 12) c 1 . . .c 4 by ĉ1 . . .ĉ4 leads to the final approximation γl (a,x).Figure 1 shows examples of the normalized P (a,x) = γl (a,x)/ (a) (grey lines) as function of x for some fixed values of a in comparison to the original function P (a,x) (black lines), demonstrating a reasonably good agreement.However, Fig. 3 shows the coefficients c 1 to c 4 as obtained by the curve fits at constant a (grey stars) along with the corresponding approximations ĉ1 . . .ĉ4 given by Eqs. ( 14)-( 17) (black lines), and some difficulties at the lower range of a-values (a < 0.9) are apparent.Therefore, γl (a,x) (resp.P (a,x)) with the set of coefficients in Table 1 is not likely to be a good approximation to γ l (a,x) (resp.P (a,x)) for Fig. 4. Relative (left) and absolute (right) error of the proposed approximation P (a,x) = γ( a,x)/Γ(a) as function of a and x/(a + 1).The parameter x has been scaled by a + 1 because the main variation of P (a,x) with x takes place at x-values around a + 1.The vertical grey lines at a = 0.9 indicate the lower boundary with respect to a of the (a,x)-range where errors are acceptable.22 Fig. 4. Relative (left) and absolute (right) error of the proposed approximation P (a,x) = γ( a,x)/ (a) as function of a and x/(a + 1).The parameter x has been scaled by a + 1 because the main variation of P (a,x) with x takes place at x-values around a + 1.The vertical grey lines at a = 0.9 indicate the lower boundary with respect to a of the (a,x)-range where errors are acceptable.a < 0.9, a fact that is clearly demonstrated in Fig. 4, which depicts the relative and absolute errors ( P −P )/P (left plate) and P − P (right plate) as function of a and x.In these figures, x has been scaled by a + 1 since P varies mostly in the region of x around a + 1.
Where do these difficulties for small values of a come from?To understand this, it is instructive to look at the analytical formulas for γ l (a,x) in case of a being an integer m, starting with m = 1, The last finite sum representation has been deduced from the representations at m = 1,2,... and can be proved by a) taking the first derivative with respect to x, which yields x m−1 e −x as it should, and b) by the fact that the highest derivative (last element in the sum) equals (m − 1)! (which is (m)), so that for x = 0 the correct value γ l (m,0) = 0 is obtained.For small integer values of m, this representation is an efficient way to evaluate γ l .Now, concerning the difficulties for small values of a, γ l for a = 1 (Eq.18) is a simple exponential function which matches the ansatz (12) only in the limits c 2 → ∞, c 3 = 0 and c 4 = e.As a → 1, the coefficients c 2 to c 4 go towards these values, as is shown in Fig. 3, with the singularity of c 2 at a = 1.Of course at a = 1 the fitting algorithm leads to a "compromise"-solution with finite c 2 ; otherwise, a kind of branch-cut of the coefficients around a = 1 can be observed.For larger values of a, the analytical formulation of γ l gets more complicated (for a being an integer m, more and more terms get involved into the finite sum in Eq. ( 22) as m increases) and is more likely to be well describable by the ansatz (12).
For most cloud microphysical applications, the branch for a > 1 seems to be the important one.Therefore, the approximations ( 14)-( 17) are developed to represent mainly that branch.By altering the regression functions to, e.g., rational functions, it would perhaps be possible to find approximations which encompass both branches.
It turns out, however, that the approximations ( 14)-( 17) are applicable for a-values down to 0.9, because for a ≥ 0.9, the absolute error | P − P | given in Fig. 4 (right figure) remains below 0.02 everywhere, and the relative error |( P − P )/P | (left figure) is also quite small except when P gets close to 0. But here, the relative error is not a good error measure anyways.Beyond the range of a-values for which P has been fitted, it turns out that the approximation error remains within the same small limits up to a = 45.Above that, P is not a good approximation.
A further condition for P to be of practical use is that this function increases monotonically with respect to increasing x at fixed a, as does the original function P .One possibility to check monotonicity is to look at the sign of the partial derivative of P with respect to x. Figure 5 shows ∂ P (a,x)/∂x = 1/ (a) ∂ γl (a,x)/∂x (the formula is omitted for brevity) as function of a and x, and it is apparent that values are > 0 everywhere, which indicates the desired monotonicity.
Concerning the efficiency of the proposed approximation, it has been found that, on our scalar linux desktop computer using the gfortran compiler and high optimization, it is faster by a moderate factor 4 on average compared to the efficient method of Press et al. (1993) mentioned in the introduction.Here, the speed-up depends on a and x, that is, on the number of required terms and the desired accuracy in Press's method.The latter has not been changed from its original values.
A more impressive speed-up is gained, however, in the case where subsequent evaluations at the same value of a are needed, as is the case in cloud physics: many coefficients of the terms in the proposed approximation only depend on a and may be pre-computed once, so that we were able to obtain a speed-up factor of about 15 on average.But the biggest advantage is perhaps on vector CPUs, where the fixed number of terms makes vectorization easy for programmers and compilers.Also, there might be advantages on massively parallel machines in that the method avoids load imbalances otherwise caused by different numbers of terms in series expansions to reach convergence at different grid points.
Further experiments, in which the hyperbolic tangent in Eq. ( 12) has been replaced by the simple but accurate and continuously differentiable rational approximation tanh did not increase efficiency on our desktop computer.This is presumably because the necessary numerical division operation is very costly and because the tanh-function is already implemented in a very efficient way in state-of-the-art compilers.Nevertheless, if on a specific computer system the computation of tanh should cause efficiency problems, Eq. ( 23) might be tried instead.Observe that changing the value for c t would only rescale the function along the x-axis but preserve the differentiability and the "outer" function values −1 and 1, a fact that renders Eq. ( 23) a quite general blending function.

Summary
This paper describes an approximation to the lower incomplete gamma function which has been obtained by nonlinear curve fitting.It comprises a fixed number of terms and yields moderate accuracy (absolute approximation error of P is <0.02 in the range 0.9 ≤ a ≤ 45 and x ≥ 0), which should be enough for most cloud microphysical applications, but may be problematic for other applications.The proposed approximation P consists of Eq. ( 12) in combination with Eqs. ( 14)-( 17) and coefficients given in Table 1.Monotonicity and asymptotic behaviour of the original incomplete gamma functions are preserved, which is important.The method is generally only slightly more efficient in terms of the required number of floating point operations than the more accurate method of Press et al. (1993), but if subsequent evaluations at a certain fixed value of a are sought (which is often the case in cloud microphysics), then a significant performance increase can be obtained by precomputing certain terms and coefficients which only depend on a.On scalar architectures, however, the most efficient method to evaluate incomplete gamma functions is certainly interpolation of regular lookup tables as described in the introduction.
A great advantage of the proposed approximation P is its applicability on vector CPUs, because the formula with its fixed number of terms is well suited for vectorization, as opposed to, e.g., Press's method or other more accurate algorithms based on series-or continued-fraction representations.
With regard to massively parallel machines (where the use of equidistant lookup tables might pose memory issues), the fixed number of terms might be also beneficial to avoid load imbalances caused by different numbers of terms in series expansions to reach convergence at different grid points.

Fig. 1 .
Fig.1.P (a,x) (black lines) for some values of a as function of x.Grey lines: proposed approximation Eq. (12) with coefficients given by Eq. (14) and Table1.
Fig. 2. x 995 as function of a. x 995 is defined by P (a,x 995 ) = 0.995.

Fig. 3 .
Fig.3.Numerical values (grey stars) of the regression coefficients c 1 through c 4 in Eq. (12) as derived by nonlinear regression for fixed values of a. Black lines: nonlinear regression functions to each of c 1 through c 4 as functions of a after ansatz (14) and the coefficients from Table1.

Table 1 .
Coefficients p i , q i , r i , and s i for the approximation functions ĉ1 (a) ...ĉ4(a) in Eq. (14).