Parameterization of the collision-coalescence process using series of basis functions: COLNETv1.0.0 model development using a machine learning approach

A parameterization for the collision-coalescence process is presented, based on the methodology of basis functions. The whole drop spectrum is depicted as a linear combination of two lognormal distribution functions, leaving no parameters fixed. This basis-function parameterization avoids the classification of drops in artificial categories such as cloud water (cloud 10 droplets) or rain water (raindrops). The total moment tendencies are predicted using a Machine Learning approach, in which one deep neural network was trained for each of the total moment orders involved. The neural networks were trained and validated using randomly generated data, over a wide range of parameters employed by the parameterization. An analysis of the predicted total moment errors was performed, aimed to establish the accuracy of the parameterization at reproducing the integrated distribution moments representative of physical variables. The applied machine learning approach shows a good 15 accuracy level when compared to the output of an explicit collision-coalescence model.


Introduction 20
Drop populations are represented using drop size distributions (DSD). The first attempt at characterizing drop spectra was made by Marshall and Palmer (1948), who employed exponential distributions based on drop diameter to describe the DSDs.
More recently, the use of a three-parameter gamma distribution has shown a good agreement with observations (Ulbrich, 1983). However, lognormal distributions have shown a better squared-error fit to measurements of rain DSDs than gamma or exponential distributions (Feingold and Levin, 1986;Pruppacher and Klett, 2010). The analysis of several important 25 characteristics of the Brownian coagulation process showed that the lognormal distribution adequately represents the particle distributions (Lee et al., 1984(Lee et al., , 1997. In addition, some authors have employed this type of distribution function, lognormal, to parameterize cloud processes with promising results (Clark, 1976;Feingold et al., 1998;Huang, 2014Huang, ). 1972) and several analytical solutions. Also, the possible implications of this approach for cloud physics are discussed. Other variants of this approach are analysed in Alfonso et al. (2011), and it has also been used to simulate the subprocesses of autoconversion and accretion applying a Monte Carlo-based algorithm within the framework of Lagrangian cloud models (Noh 65 et al., 2018). This approach is accurate, and represents well the stochastic nature of the collision-coalescence of drops, but it is also computationally expensive, as a large number of particles are needed in each grid cell, to be able to calculate accurate statistics (Morrison et al., 2020). The cost of these schemes could be reduced by using simple methods to treat droplet activation, such as the Twomey CNN activation (Grabowski et al., 2018;Twomey, 1959). However, even considering those simplifications, the cost of a Lagrangian particle-based scheme is 25% greater than bin microphysics, when considering a 70 similar number of particles and bin variables per grid cell (Grabowski, 2020;Morrison et al., 2020).An alternative to these main approaches is a hybrid approach to parameterize the cloud microphysical processes. This approach simulates the explicit approach in the way that it describes the shape of the DSD through a linear combination of basis functions (Clark, 1976;Clark and Hall, 1983), and it could be considered a middle point between bulk and bin microphysics. This is done by having timevarying distribution parameters, instead of fixed ones, as is common in conventional bulk approaches. A system of prognostic 75 equations is solved to obtain the parameters' tendencies related to the statistical distribution functions based on the evolution of their total moments (the combination of the statistical moments with same order of all distribution functions involved), describing their tendencies due to condensation and collision-coalescence. Since the integration process covers the entire size spectrum, the artificial separation of the droplet spectrum is avoided, making the terms cloud droplet and rain drop meaningless (they are just drops), and it is possible to solve a fully closed system of equations without the need to keep any parameter of 80 the distribution constant. However, this integration can be made only once for all parameters at each time step. Another advantage of this approach is its independence from a specific collision kernel type, as is common in the bulk approach; in order to obtain analytical expressions from the integrals of the KCE, a polynomial type kernel such as the one derived by Long (1974) is frequently used. Having said that, a limitation of this approach is that the total moment tendencies have to be solved at each time step for the needed parameters. An alternative solution for this shortcoming is previously calculating the moment's 85 rates by including a sufficiently wide range of parameters, and store the results in lookup tables that should be consulted several times at every time step.
Machine Learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data (training) (Mitchell, 1997). ML algorithms build a model based on sample data in order to make predictions or decisions without being explicitly programmed to do so (Koza et al., 1996). They are used in a wide variety of applications, such as in 90 medicine, email filtering, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks. In particular, neural networks (NN) are especially well suited for solving non-linear fitting problems and for establishing relationships within complex data such as the outputs of the KCE. In the field of atmospheric sciences, the use of Deep Neural Networks (DNN) has been extended to the parameterization of sub-grid processes in climate models (Brenowitz and Bretherton, 2018;Rasp et al., 2018), while in cloud microphysics, the autoconversion process was parameterized using 95 DNNs with a superior level of accuracy when compared with equivalent bulk models (Alfonso and Zamora, 2021;Loft et al., 2018). Also, a partial parameterization of collision-coalescence was tested in Seifert and Rasp (2020), which developed a ML parameterization that includes the processes of autoconversion and accretion, describing the droplet spectra as a gamma distribution, and establishing a comparative study that exposed the advantages and disadvantages of the use of ML techniques on cloud microphysics. 100 In order to eliminate the need to solve the rate equations for the total moments of the KCE at every time step (Thompson, 1968), or resort to the use of lookup tables, we propose to predict the total moment tendencies using a ML approach within this parameterization. Thus, the objective of this study is to apply DNN to the parameterized formulation of the collisioncoalescence process developed by Clark (1976) in order to replicate the rate equations for the total moments, eliminating the need of lookup tables or numerical solution of integrals. By doing this, a complete representation of collision-coalescence, 105 based on ML, could be achieved (except drop breakup).
The research article is structured as follows: In section 2, the parameterization framework is described, as well as the reference model used for comparison purposes; In section 3, the procedures of DNN methodology are explained and the network architecture is introduced, the training data set is generated, and the DNN is trained and validated; In section 4, the experiment design is explained; In section 5, the results of the experiment are discussed, an assessment of the results is made by contrasting 110 them with the reference solution, and the predicted total moment errors are analyzed; and in section 6 several conclusions are drawn.

Formulation of the total moment tendencies
Under the framework of the parameterization developed in this study, any given drop spectrum can be approximated by a 115 series of basis functions. Therefore, the distribution that characterizes the evolution of the spectrum is given by a linear combination of probability density functions as shown below: where 〈 〉 are the individual members of the set of distributions considered, I stands for the number of distribution functions that make up the set, and r refers to the radius of drops. In the case at hand, a set of two statistical distributions is employed. 120 At each time step, the rates of the parameters of each distribution will be calculated. It should be noted that, in any set of distributions considered, all the members have the same type of distribution. For the proposed parameterization, as described in Clark (1976), a distribution of log-normal type is used, as follows Where µ and 2 stand for the mean and variance of ln respectively, while N represents the number concentration of drops. 125 Considering that moment of order p ( ̅̅̅̅ ) of any distribution can be defined as (Straka, 2009) the following analytical solution of eq. (3) can be found for the moments of the lognormal distribution ̅̅̅̅ = + (4) Combining eqs. (1), (3) and (4), the p-th total moment of a linear combination of lognormal distributions could be expressed 130 as (Clark and Hall, 1983) ̅̅̅̅ = ∑ ̅̅̅̅ Both A and F are normalized in order to achieve a better numerical stability in the solution of the system of equations. The 145 evolution of the distribution functions' parameters is calculated by applying a simple forward finite differences scheme (Clark and Hall, 1983)  ( 2 ) +1 = ( 2 ) + ( 2 ) ∆ (11 ) 150 With k being the time index in the finite difference's notation.

Description of the calculation of the total moment tendencies due to collision-coalescence
The KCE determines the evolution of a DSD due to collision-coalescence. This equation can be expressed in a continuous form as a function of mass as follows (Pruppacher and Klett, 2010) where ( | ′) is the collection kernel. Reformulating eq. (12) in the form of Thompson (1968)and in function of radius, we can calculate the total moment tendencies (vector F from the previous section) as follows where ( 1 , 2 ) = ( 1 3 + 2 3 ) /3 − 1 − 2 (14) 160 ⟨ 1 | 2 ⟩ = ( 1 + 2 ) 2 ( 1 , 2 )| ( 1 ) − ( 2 )| (15) Equation (15) represents the hydrodynamic kernel and ( 1 , 2 ) stands for the collection efficiencies taken from Hall (1980), which is based on a lookup table representing the effectiveness of drop collisions under given environmental conditions. A set of two lognormal distributions (eq. (2)) is used as members of the set in eq. (1). Hence, the prognostic variables under the parameterization formulation will be the corresponding parameters of both distribution functions: 1 , 1 , 1 , 2 , 2 and 2 . 165 At this point in the parameterization the total moment tendencies should be calculated either by solving eq. (13) at each time step for all the moments involved, or by searching in a lookup table calculated a priori. Instead, the following chapter explains in detail the ML approach proposed and its implementation.

Description of the reference model
To obtain a reference solution (KCE from now onwards), the explicit model developed by Bott (1998aBott ( , 1998b This scheme is conservative with respect to mass and very efficient computationally speaking. It is based on the numerical integration of the KCE (eq. 12), combined with the mass density function ( , ) (Berry, 1967), in order to simplify the calculations: ( , ) = 1 3 2 ( , ) (17) 175 where = ln and r is the radius of a drop of mass m. By substituting (17) in (12) we obtain the KCE for the mass density function (Bott, 1998a) The first integral of the right-hand side of eq. (18) represents the gain of drops of mass m due to collisioncoalescence of two smaller droplets, while the second integral portrays the loss of drops of mass m being captured by bigger 180 drops (Bott, 1998a). For the numerical solution of eq. (18), a logarithmic equidistant mass grid is used, and is generated as where n is the total number of grid points.

Machine Learning architecture and training data set
ML methodology can be classified into three main categories, according to the problem at hand: supervised, unsupervised and 185 reinforced learning. In our case, supervised learning is used. Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs (Russell and Norvig, 2010). Under this classification, there is previous knowledge of the set of input values ⃗ , and their corresponding outputs ⃗ , , with = 1,2, … , , where n is the amount of input values. The objective is to obtain a function ( ⃗), by means of which the new data ⃗ simulates reasonably well the output values. The set { ⃗ , ⃗ }; = 1,2, … , is called the training data set. To test the performance of ( ⃗), the input 190 and output data are separated into two different data sets: training and testing. As NN are able to fit any non-linear function (Schmidhuber, 2015), a ML parameterization should approximate reasonably well the solution of the KCE in the form of eq.
(13), given enough layers and neurons in the architecture of the network.

Neural network architecture
Deep Neural Networks are based on artificial neurons. Each neuron receives a set of input data, processes it and passes it to 195 an activation function ( ), which returns the activated output (Fig. 1). The activation value of neuron i in layer l is denoted by and is determined as In eq. (21) The selected algorithm for minimization of the loss function (eq. (22)) is the bayesian regularization, which updates the weight and bias values according to the Levenberg-Marquardt optimization (Marquardt, 1963). Backpropagation is used to calculate 210 the Jacobian of the performance with respect to the weight and bias variables (Dan Foresee and Hagan, 1997;MacKay, 1992).
The designed DNN is conformed by one layer which receives the input data (input layer), three intermediate layers (hidden layers) with 20 neurons each and an output layer with a single neuron which returns the simulated target values (Fig. 2).

Figure 2: Schematic representation of the architecture of the trained neural network used to calculate the total moment 215
tendencies. The neural network receives six inputs and then processes them by means of three hidden layers of 20 neurons each, and an output layer with a single neuron and one output value.

Generation of the training and validation data sets
The training procedure consists of feeding the DNN with six input values corresponding to the distribution parameters of each distribution and the total moment tendency for the p-th order obtained from eq. (13) as a target. The NN training algorithm 220 then processes those values in order to establish the relationships between the data provided. This process is repeated until all input and target data are processed. The resulting trained DNN should be able to estimate the total moment tendencies from a given set of distribution parameters that falls within the ranges of the training variables. A schematic representation of the trained NN with the inputs and output is shown in Fig. 3. In order to generate the training and test data sets, 100000 drop spectra derived from the input variables are employed, over a wide range of distribution parameters ( 1 , 1 , 1 , 2 , 2 and 2 ) . Those input parameters are used to calculate the total moment rates from eq. (13) and train the DNN. Five DNNs are trained, one for each total moment tendency involved in the 230 formulation of the parameterization (moment orders ranged from 0 to 5), with exception of the total moment of order 3, as total mass is not affected by the collision-coalescence process. The same training input parameters are used to train all NNs, varying only the target values corresponding to the total moment tendencies of each order.
The physical variables related to the input parameters are shown in Fig. 4 for a better representation of the generated training clouds. The training and test data is created using an uniformly distributed random number generator, with means and standard 235 deviations shown in Table 1, as well as the ranges (minimum and maximum values) of each predictor variable.  Figure 4 shows that within the ranges of the training data (concentration from 1 cm -3 to 500 cm -3 ), the corresponding liquid water contents (LWC) are between 10 -10 g cm -3 and 10 -4 g cm -3 , with the majority of the data concentrated between the limits of 10 -8 g cm -3 and 10 -5 g cm -3 . Those values cover a sufficiently wide range of liquid water content to adequately represent warm clouds within the parameterization. 245  Table 1, and were calculated from eq. (4). The red dots represent the initial conditions for the experiment 250 case included in Table 4. Only one in hundred data points are shown.

Training and testing of the Deep Neural Network
From the available data, 80 % is employed in training the DNN, and the remaining 20 % is used for testing purposes. The total moment tendencies (eq. 13) are solved using a trapezoidal rule, over a logarithmic radius grid between the ranges of 1 ≤ ≤ 10 4 . The solutions of eq. (13) are called the target values. The mean and standard deviation for each calculated total 255 moment rate are shown in Table 2.
The input and target values require normalization to facilitate the work of the optimization algorithm. All nodes in each layer of the DNN use the MSE as a loss function. The training procedure for a NN consists of processing a fragment of the total training data through the network learning architecture, then determining the prognostic error and the gradient of the loss function (MSE) back through the network in order to update the weight values. This algorithm is repeated via an iterative 265 process over all training data until the performance index (MSE) is small enough or a predefined number of passes through all data are completed. One pass through all training data is known as an epoch. In this case, a maximum number of 1000 epochs is established, and a minimum value of 10 −7 is considered for the gradient function.
Five DNN are trained, one for each total moment tendency involved in the formulation of the parameterization (moment orders ranged from 0 to 5, except 3 rd moment). A variant of the training process, known as cascade-forward neural network training, 270 is employed. This enables the network to establish more precise non-linear connections between the data provided, but results in a more computationally expensive training process (Karaca, 2016;Warsito et al., 2018). The main difference with the standard training procedure (feed-forward) is that it includes a connection from the input and every previous layer to following layers (see Fig. 2). As with feed-forward networks, a two-or more layer cascade network can learn any finite input-target relationship arbitrarily well, given enough hidden neurons. The total moment tendencies for the statistical moment of order 3 275 is not calculated because the collision-coalescence process does not affect total mass. Performance (MSE) training records for the total moment tendencies calculated from eq. (13) are depicted in Fig. 5. The speed of convergence is similar in all cases, and all networks converged at epoch 1000. This occurs because the gradient value never was below the minimum, so the training process kept refining the results until it reached the maximum number of epochs previously defined. Despite that, a good performance is achieved, with the MSE in the order of 10 −4 for all orders of the total 280 moment tendencies as shown in Table 3, where the best (final) MSE values for each trained DNN are manifested in detail.
Since the values of the total moments are normalized in the DNN model (scale of 10 0 ), these values of MSE are considered as the indications of high accuracies for the scale of the problem. Thus, the NN model developed reproduces very well the rates of the moments following the formulation of Clark (1976), which is a main objective in this research  Regression plots for the trained networks are depicted in Fig. 6. It is a comparison between the outputs obtained from evaluating 295 the trained NNs using the test inputs and the targets from the test data set corresponding to each of the total moment tendencies obtained from eq. (13). Minor differences can be appreciated from the graphics, with the trained DNN models overestimating or underestimating the actual values. However, a good agreement was reached for all trained DNN, with the predicted values from the DNN matching the actual output from the solution of eq. (13) with a coefficient of correlation between 0.9998 and 0.9999 in all cases (as shown in Table 3). The axis ranges of the graphics vary because the plotted data is non-normalized, 300 thus, there are different ranges for each of the calculated total moment tendencies. This result means that the rates of the total moments obtained from Clark's parameterization of collision-coalescence (Clark, 1976), are accurately reproduced by the NN model developed. the trained neural networks using the test inputs and the targets from the validation data set corresponding to each of the total moment tendencies obtained from eq. (13). The y axis varies for each subplot because the plotted data is nonnormalized.
Experiments with non-normalized training data were performed, yielding results with MSE at least an order of magnitude higher. Those results are not shown in the present article due to the lower accuracy of the regression outputs. 310 Neural networks give us a better way to estimate the values of the integral (13).. The neural networks of course do not replace the computation of integrals, but since they have the ability to learn and model complex non-linear functions, they allow (once trained) to estimate them efficiently for values of the parameters ( 1 , 1 , 1 , 2 , 2 and 2 ), for which it has not been previously calculated.
Before the widespread adoption of machine learning, the alternative previously used by other authors (Clark, 1976;Clark and 315 Hall, 1983;Feingold et al., 1998) were the lookup tables, that are tables that stores a list of predefined values (the moment tendencies in this case). Then, in the context of our work, the lookup table is a mapping function that relates the parameters of the basis functions ( 1 , 1 , 1 , 2 , 2 and 2 ), with the total moment tendencies ( ̅̅̅̅ ).
However, usually, functions computed from lookup tables have a limited domain.. Preferably, we need functions whose domain is a set with contiguous values. Furthermore, every time we need to calculate the integral (13), a search algorithm must 320 be executed in order to retrieve the moment tendency for a given set of parameters, and some kind of interpolation will be needed to compute moment tendencies for values of the parameters for which it has not been calculated.
The advantage of the neural networks is that all the computational effort is dedicated to the training phase. Once we trained the networks, they replace the lookup tables and are able to map the parameters of the basis functions with total moment tendencies. 325

WDM6 parameterization and experiment design
An experiment is performed with the objective of illustrating the behaviour of the ML-based parameterized model (P-DNN) and how it compares with the results of a traditional bulk parameterization and the reference model (KCE). This experiment should not be interpreted as an evaluation of the overall behaviour of P-DNN, but as an example of how it predicts the DSD and bulk variables. A detailed evaluation of the novel components of the P-DNN scheme was already carried out in the previous 330 chapter.

Initial conditions and experiment design
A simulation covering = 900 (15 minutes) of system evolution is considered for all models, with a time step of ∆ = 0.1 . The initial parameters for the distribution functions of the parameterized model are as shown in Table 4. 335 The values from Table 4 are well within the parameters established on Table 1, and were set following Clark (1976). The 340 initial spectrum for the KCE was calculated from these parameters to ensure the same initial conditions for both models. A 300-point logarithmic equidistant grid was generated for the integration of the KCE, with radii values in the range of 0.25 ≤ ≤ 2.6 × 10 4 . Equations (16) and (17) were used to transform the output of both models to make them comparable, while the bulk quantities from the KCE were integrated from the calculated spectra.

WDM6 parameterization 345
To better establish the accuracy of the P-DNN, an extra parameterization was included in the comparison with the reference solution. The selected parameterization is the WRF Double Moment 6-class bulk mode (WDM6) (Lim and Hong, 2010), which was chosen for being implemented in a well-known three-dimensional atmospheric model (WRF). The collision-coalescence section of that parameterization is explained in detail in Cohard and Pinty (2000), and treats the main warm microphysical processes in the context of a bulk two-moment framework. A scheme of such a type is believed to be a pragmatic compromise 350 between bulk parametrizations of precipitation as proposed by Kessler (1969) and very detailed bin models. Inclusion of a prognostic equation for the number concentration of raindrops provides a better insight into the growth of large drops, which in turn can only improve the time evolution of the mixing ratios.
The scheme makes use of analytical solutions of the KCE for accretion and self-collection processes, while autoconversion follows the formulation of Berry and Reinhardt (1974). This has been done by using the generalized gamma distribution, which 355 enables fine tuning of the DSD shape through the adjustment of two dispersion parameters. All the tendencies, except the autoconversion of the cloud droplets, are parametrized based on continuous integrals that encompass the whole range of drop diameters within each water substance category (cloud droplets and raindrops). The threshold for considering a change of category is = 41 . Thus, the gamma distributions employed are truncated in this radius value. With this method, the treatment of autoconversion is the weakest link in the scheme because this process acts in the diameter range where the fuzzy 360 transition between cloud droplets and raindrops is hardly compliant with a bimodal and spectrally wide (from zero to infinity) representation of the drops. As neither the KCE model (Bott, 1998a), the Clark's parameterization (Clark, 1976) or the developed ML model take into account drop breakup, the formulation of this process included in Cohard and Pinty (2000) has been left out of the current implementation. This model is referred to as P-CP2000 in the following sections. For comparison purposes, all simulations share the same initial conditions. It should be noted that WDM6, being a conventional two-moment 365 scheme, is focused on the evolution of the moments of order zero and three of a truncated gamma distribution function.

Spectra comparison 370
The output of this parameterized Deep Neural Network model (P-DNN) are the updated distribution parameters at every time step ( 1 , 1 , 1 , 2 , 2 and 2 ). The physical variables related to the moments of the distributions, such as mean radius or liquid water content (LWC) are diagnosed from those parameters. Besides, we can calculate the shape and scale of the drop spectrum at any given time, by integrating the distribution functions defined by its parameters. Figure 7 shows a comparison between the mass density spectra derived from P-DNN and KCE models for three chosen times 375 (300 s, 600 s and 900 s). At 300 s (first row of Fig. 7), there is a slow development of the total spectrum, with a clear mass transfer between both modes of the presented models. The parameter-generated spectrum from P-DNN fits well the reference solution, with a slight overestimation of the maximum mass in the second mode. The mean radius of the distributions are well represented by P-DNN. At 600 s and 900 s (second and third row of Fig. 7), there is a development of a third mode in the evolution of KCE, 385 that is not reproduced by P-DNN, producing instead a wider second mode, representing well the mean radius and mass distribution. The first mode is accurately represented at those times. An increase in mean radius can be observed, due to the effect of the collision-coalescence process.
The simulation results with P-CP2000 are clearly different from the others. The first noticeable difference is the existence of droplets that are smaller than the initial distribution. This is caused by the fixed distribution parameters employed in its 390 formulation. The slope parameter is determined by an analytical expression and evolves with time within certain limits, but the parameters related to the spectral breadth are held fixed. For more information please refer to Cohard and Pinty (2000).
Besides that, P-CP2000 performs poorly at all the represented simulation times, when compared with KCE. It presents pronounced tendency to go ahead of the KCE, leading to a faster-than-normal development of larger drops. Particularly, the mass transfer is very noticeable at the end of the simulation. However, the first mode of P-CP2000 does not decrease 395 proportionally. Figure 8 shows a comparison between the drop number concentration spectra derived from P-DNN, P-CP2000 and KCE for three chosen times (300 s, 600 s and 900 s). A generally good agreement is appreciated at all times for P-DNN, with its concentration values slightly underestimating the results from KCE. As the collision-coalescence process decreases the drop number concentration, there is no noticeable increase in the number of drops in the second mode of the distributions. However, 400 an increase in the mean radius is observed, that is consistent with the behaviour described in Fig. 7, where a related mass transfer between both distribution functions is seen.
Regarding P-CP2000, its spectra underestimate the KCE, and the lack of a second mode reaffirms the behaviour shown in Fig.   7. However, being a bulk parameterization, its strong points are not related to the description of the drop spectra, but to the representation of bulk quantities such as the total number concentration and mass content of the clouds. 405  Figure 9 shows a comparison of two main bulk quantities (total number concentration and mean radius) obtained from P-DNN, P-CP2000 and KCE. The concentration and mean radius of KCE were obtained by integrating the drop number concentration spectra for the corresponding moment order (0 and 1 respectively). As expected, number concentration decreases with time, due to the coalescence of drops, ranging from an initial value of 200 cm -3 to around 160 cm -3 in KCE. The predicted 415 concentration from P-DNN underestimates the KCE values throughout most of the simulation time, with the differences reaching 10 cm -3 at 900 s. The P-CP2000 model achieves a relatively better representation of drop number concentration, although it reaches the same differences as P-DNN by the end of the simulation.

Bulk quantities comparison
A similar behaviour is observed in the mean radius results, with a growth in the drop size consistent with the decreasing values on the drop number concentration for P-DNN. However, both P-DNN and P-CP2000 predict the mean radius rather well 420 (considering that their values are diagnosed in both models), with differences reaching only 0.5 . This result is very important for P-DNN, since radius is the independent variable in the lognormal distributions that act as basis functions for this parameterization. In this regard, P-CP2000 performs somehow worse than P-DNN for the mean radius, with the mean difference almost reaching1 , although it shows a similar monotony to both the KCE and P-DNN models.
There is a correlation between these results and those depicted in Fig. 8, where concentration decreases with time, and mean 425 radius changes little throughout the simulation, when considering the total moment of order 1. The differences reached in this figure are related to the base model from which the NN model was created. Since the NN model only reproduces (very accurately) the rates of the moments as in Clark (1976), an accuracy improvement of the overall parameterization should not be expected. Despite that, the P-DNN model achieves a physical consistency, and behaves following the rules of the simulated process, as evidenced in the increase on the mean radius (drop growth) and decrease of the number concentration (drop 430 coalescence).  . Regarding concentration, a decrease in f1 values is observed, due to the coalescence process, while a consistent increase in f2 is also appreciated. However, the increase in drop number concentration in f2 is not proportional, because there 440 are a fewer number of bigger drops in the distribution, which are also colliding within the same distribution function (the equivalent of self-collection in bulk parameterizations). This is consistent with Fig. 8, in which the second mode in the concentration DSD is barely developed after 900 s of simulation time. . However, a general decrease of the total concentration value represents well the theory and observations of the parameterized process. Barros et al. (2008) found this same behavior while revisiting the validity of the experimental results obtained by Low and List (1982), excluding drop breakup. 445 The liquid water content (LWC) values (diagnosed) are depicted only to verify that mass is conserved under the formulation of P-DNN. The LWC of each of the distribution functions (f1 and f2) were obtained from the corresponding moment (order 3) calculated from eq. (4). Effectively, total mass remains constant during the entire simulation, with a proportional mass transfer between f1 and f2. These results demonstrate that the P-DNN parameterization behavior is physically sound, with a remarkable consistency between the different variables calculated, both bulk (N, r and LWC) and DSD related (concentration and mass 450 density spectra).

Total moment errors
An analysis of the predicted total moments was performed with the objective to further test the precision of P-DNN, due to the importance of the statistical moments in calculating physical variables such as mean radius and LWC. Table 5 shows the mean 460 percent errors of the total moments predicted by P-DNN and P-CP2000. The percent error is taken relative to the moments of KCE. The data was obtained by calculating the mean of the percent errors of the entire simulation. The moments for the solution of the KCE were computed by integrating the reference drop number concentration spectra using eq. (3), while the total moments from P-DNN and P-CP2000 were calculated using the predicted distribution parameters and solving eq. (5).
The defined gamma distribution equations for the moments are used in the case of P-CP2000. A reasonable degree of accuracy 465 was achieved by P-DNN, with the mean error never surpassing the 4 %. However, the data shows that the total moments of order 0 to 2 are usually underestimated, while those of order 4 and 5 are slightly overestimated. This could result in the calculations of drop number concentration values lower than the actual ones, as seen in Fig. 9. As for P-CP2000, the model is not formulated to predict individual moments other than the zeroth and third moments. Thus, the other moments of the distributions are not well represented, as observed in the mean percent error, which reaches almost -61% for the moment of 470 order five. That value is a great difference from the zeroth moment for example, whose percent error is only -1.3 %. This result indicates that P-DNN is adequate to represent the evolution of individual moments within certain ranges, when compared with more conventional bulk schemes.  Figure 11 shows the time evolution of the percent error of the total moments throughout the simulation for P-DNN. The percent error is taken relative to the moments of KCE. The moments of KCE were calculated by integrating the reference drop number concentration spectra using eq. (3), while the total moments from P-DNN were calculated using the predicted distribution parameters to solve eq. (5). The error of total moment of order 3 is zero during the entire simulation because mass is not 480 affected by the collision-coalescence process. The total moments from order 0, 1 and 2 overestimate the KCE in the first 300 s of simulation, underestimating them for the rest of the P-DNN run, with the percent error reaching a minimum value of -8 %. The opposite behaviour is appreciated for the total moments of order 4 and 5, where they initially underestimate the KCE, overestimating it for the rest of the simulation. However, for these orders the percent error is usually lower, with a maximum of 4 %. Generally, P-DNN is performing well, with the percent error never reaching the 10 % threshold. However, further 485 analysis on this topic is recommended, to improve the accuracy of the parameterization. Figure 11: Time evolution of the errors corresponding to the predicted moments from P-DNN. The percent error is taken relative to the moments of KCE. The moments of KCE were calculated by integrating the reference drop number concentration spectra using eq. (3), while the total moments from the P-DNN were calculated using the predicted 490 distribution parameters to solve eq. (5).

Conclusions
The presented way to simulate the evolution of the droplet spectra due to collision-coalescence falls within the framework developed by Clark (1976) and Clark and Hall (1983). Under this approach, a dynamic framework has been established in Rodríguez-Genó and Alfonso (2021e). A hybrid parameterization for the process of collision-coalescence based on the 495 methodology of basis functions employing a linear combination of two lognormal distributions was formulated and implemented. All the parameters of the distributions are derived from the total moment tendencies, which in turn are calculated by means of five trained deep neural networks. By doing this, we obtained a parameterized model that determines the distribution parameters' evolution, hence, the evolution of the DSD. The physical variables are diagnosed from the distribution moments. Within the framework of this parameterized model, there is no artificial classification of the water substance (cloud 500 droplets or raindrops). Instead, we consider a full set of distribution parameters for each of the distribution functions considered in the formulation of the parameterization, in order to describe the DSD in radius space. This kind of microphysical parameterization allows the use of an arbitrary number of probability density functions in linear combination to reproduce the drop spectrum.
The novel components of the P-DNN model were introduced and evaluated in Chapter 3. A total of five Deep Neural Networks 505 were trained to calculate the rates of the total moments following Clark (1976), using a novel training approach called cascadeforward neural network, instead of the traditional sequential networks. By doing this, the trained NNs were able to accurately reproduce the total moment tendencies, showing a very high correlation and small MSE , when compared with those calculated with the original formulation (Clark, 1976) (See Figs 5 and 6, and Table 3). Thus, it was demonstrated the precision and ability of the ML-based method to reproduce the rates of the total moments due to collision-coalescence, when trained with a sufficient 510 number of data samples (100000 combinations).
One experiment was performed to illustrate the behaviour of the DNN-based parameterization at the initial stages of cloud formation. The simulation results from P-DNN showed good agreement when compared to a reference solution (KCE), for both the predicted DSD and the bulk quantities considered. Moreover, the P-DNN model demonstrated a physically sound behaviour, adhering to the theory of the collision-coalescence process, with overall consistency in the values of both prognosed 515 and diagnosed variables. With the development of P-DNN, a parameterization for solving the entire collision-coalescence process have been developed (with exception of drop breakup) using a ML methodology. To the best of the authors' knowledge, previous attempts on describing collision-coalescence using the same methodology have been focused on superparameterizations for sub-grid processes (Brenowitz and Bretherton, 2018), or formulations for specific sub-processes such as autoconversion (Alfonso and Zamora, 2021;Loft et al., 2018) and accretion (Seifert and Rasp, 2020). 520 For comparison purposes, the bulk parameterization developed by Cohard and Pinty (2000) (P-CP2000) was also implemented.
According to the comparison with the bulk model, the main strength of P-DNN is the superior ability to represent the evolution of the total moments and the shape of the DSD, because of its formulation based on time-varying distribution parameters. The predicted P-DNN concentration and mass spectra closely match that of the reference bin model (KCE), while showing a good accuracy at forecasting bulk variables such as drop number concentration and the parameter-derived mean radius. Furthermore, 525 total mass is conserved throughout the entire simulation, which is remarkable due to the inherent numerical diffusion of the first order finite differences method applied to update the parameters at each time step. An analysis of the accuracy of the predicted total moments of P-DNN was performed, with the percent error relative to the KCE never exceeding 8 %. However, there is room for improvement in the calculations of the total moments. Thus, it is the recommendation of the authors to retrain the DNNs with a finer resolution in the parameters' values, and with a wider range of values in order to cover all possible combination of parameters. Another recommendation is the inclusion of the drop breakup process in the formulation, which is currently not included in the parameterization, despite its influence on the behaviour at the edge of raindrop distributions. In addition, the use of ML eliminated the requirement of numerically integrating the total moment tendencies at each time step, and the use of lookup tables for each predicted moment is no longer needed under this formulation.
To obtain a full warm cloud model, an extension of this neural network algorithm applied to condensation is proposed, 535 following the same methodology of series of basis functions. A parameterization scheme such as this could be included in regional weather and climate models, as its initial conditions can be calculated from variables needed by more traditional bulk models.

Author contributions
Léster Alfonso performed the conceptualization of the article, designed the methodology, acquired funding and resources, 540 supervised, reviewed and edited the original draft. Camilo Fernando Rodríguez Genó realized the formal analysis, investigation, developed and implemented the software and code for simulation and visualization, validated the results, and wrote the original draft preparation.

Code availability
The current version of COLNET (COLNETv1.0.0) used to produce the results presented in this paper is archived on Zenodo 545