FALL3D-8.0: a computational model for atmospheric transport and deposition of particles, aerosols and radionuclides. Part I: model physics and numerics

This manuscript presents FALL3D-8.0, the last version release of an open-source code with 15+ years of track record and a growing number of users in the volcanological and atmospheric communities. The code has been redesigned and rewritten from scratch in the framework of the EU Center of Excellence for Exascale in Solid Earth (ChEESE) in order to overcome legacy issues and allow for successive optimisations that are already planned in the preparation of the code towards extreme-scale computing. However, this baseline version already contains substantial improvements in terms of model 5 physics, solving algorithms, and code accuracy and performance. The code, originally conceived for atmospheric dispersal and deposition of tephra particles, has been extended to model other types of particles, aerosols and radionuclides. The solving strategy has also been changed, replacing the former central-differences scheme for a high-resolution central-upwind scheme derived from finite volumes, which minimises numerical diffusion even in presence of sharp concentration gradients and discontinuities. The parallelisation strategy, Input/Output (I/O), model pre-process workflows and memory management have 10 also been reconsidered, leading to substantial improvements on code scalability, efficiency, and overall capability to handle much larger problems. ::: All ::::: these ::: new :::::::: features ::: and :::::::::::: improvements ::::: have :::::::::: implications ::: on :::::::::: operational ::::: model ::::::::::: performance :::: and ::::: allow, :::::: among ::::: other, :::::: adding :::: data ::::::::::: assimilation :::: and :::::::: ensemble ::::::: forecast :: in :::::: future ::::::: releases. : This paper details the FALL3D-8.0 model physics and the numerical implementation of the code.

optimisations, including a comparison on model performance and scalability with respect to v7.x. This paper shows only one illustrative example of FALL3D-8.0 model results for ash dispersal from the 2011 Cordón Caulle eruption. A companion paper (Prata et al., 2019) contains the second part, including a detailed FALL3D-8.0 model validation for several simulations that are part of the new benchmark suite of the code. Finally, Section 7 wraps the conclusions of the manuscript and outlines which 60 will be the next steps for further model optimisation.
2 New features in v8.0 FALL3D-8.0 introduces substantial improvements at different levels. From the point of view of model physics: • The code has been generalised to deal with species different from tephra, including other kinds of particles, aerosols and radionuclides. Different types of species have been defined and can be simulated using independent sets of bins.
• Weibull and bi-Weibull particle Total Grain Size Distributions (TGSDs) can now be generated in addition to Lognormal distributions. On the other hand, FALL3D-8.0 can estimate TGSDs of tephra particles directly from magma viscosity (magma composition) and eruption column height following the fit proposed by Costa et al. (2016a).
• An "effective bin" flag has been introduced. For a given specie, only those bins having this flag "on" are actually simulated, whereas bins tagged as "off" are frozen in terms of transport and used only for source term characterisation. This 70 option has been added because several model parameterisations for the emission (source ) term :::::: volcanic :::::: plume :::::: source :::::::::::::: parameterisations : depend on the whole granulometric spectrum of particles but, at the same time, model users are often interested only on a subset of the particle spectra (e.g. fine volcanic ash for long-range tephra dispersal simulations).
• For the specie "tephra", several classes of particle aggregates can now be defined in certain aggregation options, differently than the single-class aggregation option available in v7.x.

75
• Model parameterisations for physics have been revised to include more recent studies. Meteorological drivers have also been updated to add recent datasets (e.g. ERA-5) and to remove deprecated options.
• Periodic boundary conditions are now possible, permitting simulations on domains spanning from local to semi-global (pole singularities still remain).
• For some species, the initial model condition can be furnished from satellite retrievals. This "data insertion" option is a 80 preliminary step towards a full model data assimilation using ensembles.
From the point of view of numerics and code performance: • The solving strategy has been changed to a high-resolution central-upwind scheme (Kurganov and Tadmor, 2000), optionally combined with a 4th-order Runge-Kutta explicit scheme. This replaces the former Lax-Wendorff (LW) centraldifference scheme, which was known to be over-diffusive in case of sharp concentration gradients or discontinuities.
• A new parallelisation strategy exists based on a full 3D domain decomposition. The former trivial parallelisation on particle bins in v7.x has been removed because, in case of interaction among bins, it yielded to unnecessary communication penalties, resulting on poor code scalability.

90
• A much more efficient memory management exists to exploit contiguous cache memory positions along each spatial dimension. In some model configurations, this can imply a substantial gain on computing time.
• Parallel model I/O using netCDF-Par :::::: netCDF and parallel model pre-process. In addition, all the pre-process auxiliary programs have been embedded within the main code (a single multipurpose executable exists in v8.0). The code can now be run to perform different tasks individually or sequentially through a single parallel workflow. As a result, all 95 the pre-process, modelling and post-process workflows can now run as a single execution concatenating all tasks and without needing to write/read intermediate files to/from disk. In large problems, this saves substantial disk space and I/O time because the required model input data (e.g. interpolated meteorological fields) are already stored in each processor memory when running the FALL3D model task.
• A hierarchy of MPI communicators has been defined. This is actually not active yet in v8.0 but gives flexibility to extend 100 some code functionalities in a near future with little refactoring effort. For example, plans for future versions include ensemble modelling using the Parallel Data Assimilation Framework (PDAF) or model nesting. These will require of teams of processors associated to different ensemble members or model grids respectively. In continuum mechanics, the general form describing the passive transport of a substance mixed within a fluid (air) in a domain Ω derives from mass conservation which, in conservative flux-form and using the Eulerian specification, reads ::: (see :::::: Tables :: 1 ::: and : 2 ::: for ::: list :: of :::::::: symbols): where F = c u is the advective flux, G = c u s is the sedimentation flux, H = −K∇c is the diffusive flux, and S and I are 110 the source and sink terms respectively. In the expressions above, t denotes time, c is concentration (in kg m −3 ), u is the fluid velocity vector (i.e. wind velocity), u s is the terminal settling velocity of the substance, and K is the diffusion tensor (in m 2 s −1 ). Note that the definition of the diffusive flux H explicitly assumes the Fick's first law of diffusion.
Boundary conditions are imposed at the Dirichlet Γ D (inflow), Neumann Γ N (outflow), and Robin Γ R (ground) parts of the boundary of the computational domain Γ (with Γ = Γ D ∪ Γ N ∪ Γ R and Γ D ∩ Γ N ∩ Γ R = 0) as: n · H = 0 at Γ N ; n · (H + G) = n · D at Γ R wherec is the concentration prescribed at inflow (typicallyc = 0), n is the outwards unit normal vector, and D = c u d is the ground deposition flux (u d is the ground deposition velocity). Note that when the deposition flux D coincides with the sedimentation flux G (i.e. when u s = u d ), the boundary condition at ground reduces to the standard free flow condition imposed at Γ N .
Depending on the specie(s) under consideration, the mass source S and sink I terms in (1) can be decomposed as: where the superscripts denote emission source terms (S e ; see Sec. 3.2.3), wet deposition sinks (I w ; see Sec. 3.2.4), particle 135 aggregation source and sinks (S a and I a respectively; see Sec. 3.2.6), radioactive decay source and sinks (S r and I r respectively; see Sec. 3.2.7), and chemical reactions source and sinks (S c and I c ). Note that FALL3D-8.0 does not account for aerosol chemistry yet. However, the code has been designed to allow incorporating this functionality in future versions in a straightforward manner.
When the ADS equation (1) is discretised, species in the mixture are binned in n b discrete "classes" or bins, so that the total 140 concentration c at any point of the domain decomposes as the sum of bin concentrations, i.e., c = n b i=1 c i . Substitution of bin discretisation in (1) yields to n b equations (one per discrete bin), each formally identical to (1): Note that the n b equations for bins can be coupled by means of the source and sink terms, which define the eventual transfer of mass among different bins, e.g. in case of particle aggregation/disegregation, chemical reactions, formation of child radionu-145 clides, etc.

Model parameterisations
Parameterisations in FALL3D have been revised and updated, removing deprecated options and adding new options available from more recent studies. Table 4 shows the parameterisations implemented in FALL3D-8.0 that described in the following 150 subsections.

Diffusion tensor
The atmospheric flow is characterised by large horizontal to vertical aspect ratios of wind velocities and length scales, as well as by an anisotropic momentum diffusion with the horizontal diffusion coefficient being typically much larger than the vertical one (e.g. Schaefer-Rolffs and Becker, 2013). For this reason, model diffusion that accounts for sub-grid scale atmospheric eddies 155 is typically assumed anisotropic, with two distinct eddy diffusion coefficients along the horizontal and vertical dimensions: In the case of FALL3D-8.0, the horizontal coefficient k h can be either assumed constant or parameterised as in Byun and Schere (2006): where: In the equations above, ∆ g is a characteristic grid cell measure ( ::: e.g. ::: the ::::::::: equivalent ::: area :::::: length), α ∼ = 0.28 denotes the Smagorin-  (6) was proposed by Byun and Schere (2006) to overcome the dependency of horizontal diffusion on grid resolution, and combines a Smagorinsky sub-grid scale (SGS) model giving the diffusion by transport (k ht ) with dominates. In this way, the effect of the transportive diffusion is minimised for coarse grids, whereas for fine discretisations the numerical diffusion term is reduced.
The vertical diffusion coefficient k v can also be either assumed constant or parameterised according to the similarity theory and distinguishing among surface layer, Atmospheric Boundary Layer (ABL), and free atmosphere (e.g. Neale et al., 2010): where κ is the von Karman constant (κ = 0.4), z is the distance from the ground, u * is wind friction velocity, φ h is the atmospheric stability function for temperature, h p is the ABL height, l c is a characteristic length scale, u h is the horizontal wind velocity modulus, and F c is a stability function which depends on the Richardson number Ri. For l c and F c , FALL3D-8.0 adopts the relationships used by the CAM-4.0 model (Neale et al., 2010): where λ c is the so-called asymptotic length scale (λ c ≈ 30m). The atmospheric stability function for temperature φ h is calculated as: where β h = 5, γ h = 15, and L is the Monin-Obukhov length, defined as: with g denoting gravity,θ v the mean potential virtual temperature, and θ * the potential temperature scale. The parameters in (13) (i.e. L or u * /θ * ) are ideally furnished by the driving meteorological model. If not and alternatively, FALL3D-8.0 estimates 190 the friction velocity u * and the potential temperature scale θ * from the potential virtual temperature θ v and the Richardson bulk number Ri b as (Louis, 1979;Jacobson, 1999): where z r and z o denote the reference and the ground roughness heights, andθ v the average between the two vertical levels.
Given Ri b , one can estimate u * and θ * as: where P r t is the turbulent Prandtl number (P r t ≈ 1) and the stability functions G m and G h are given by (Louis, 1979;Jacobson, 1999):

Sedimentation velocity
Particle bins in the model are assumed to settle down with a sedimentation velocity u s = (0, 0, −w s ) equal to its terminal 205 velocity: where ρ a and ρ p denote air and particle density, d is the particle equivalent diameter, and C d is the drag coefficient that depends on the Reynolds number, Re = du s /ν a (ν a = µ a /ρ a being the kinematic viscosity of air and µ a its dynamic viscosity). For irregular particles, the drag coefficient C d has to be obtained from experimental measurements. FALL3D-8.0 includes several 210 parameterisations derived from laboratory results using natural and synthetic particles and that cover a wide range of particle sizes and shapes (characterised by sphericity, by circularity, or by some other model shape factor). Model options for the drag coefficient C d include: 1. The GANSER model (Ganser, 1993): 215 + 0.4305K 2 1 + 3305 Re K 1 K 2 where K 1 = 3/[(d n /d) + 2Ψ −0.5 ] and K 2 = 10 1.8148(−LogΨ) 0.5743 are two shape factors, d n is the average between the minimum and the maximum axis, and Ψ is the particle sphericity (Ψ = 1 for spheres). For calculating the sphericity, is practical to use the concepts of "operational" and "working sphericity", Ψ work introduced by Wadell (1933) and Aschenbrenner (1956), which are based on the determination of the volume and of the three dimensions of a particle 220 respectively: with P = S/I, Q = I/L, where L ::::::::: P = d 3 /d 2 , :::::::::: Q = d 2 /d 1 , ::::: where ::: d 1 is the longest particle dimension, I :: d 2 is the longest dimension perpendicular to L, and S :: d 1 , :::: and :: d 3 : is the dimension perpendicular to both L and I :: d 1 :::: and :: d 2 .
2. The PFEIFFER model (Pfeiffer et al., 2005), based on the interpolation of previous relationships by Walker et al. (1971) 225 and Wilson and Huang (1979): where ϕ = (b + c)/2a is the particle aspect ratio (a ≥ b ≥ c denote the particle semi-axes).
Note that, in any case, the terminal velocity w s is defined by a triplet (d, ρ p , Ψ). As a result, particles with similar values of the three parameters can be grouped within the same model bin.

Emissions
The emission source term for the i-th bin (S e i term in the bin equations (4)) gives the mass per unit of time and volume (units of kg m −3 s −1 ) released at each point (cell) of the computational domain. FALL3D-8.0 can generate and handle multiple types of emission sources, internally defined as a data structure made of n p discrete points, each "tagged" with a time-varying position and bin emission rate (source strength) M ip (in kg s −1 ). As a result, S e i in a model grid cell results from summing emissions 240 from all point sources laying within the cell volume V : The total source strength M o results from summing over all source points and bins, i.e.: Table 5 summarises the different (exclusive) options available in FALL3D-8.0 for the emission term and related source strength.

245
In detail: 1. The POINT option assumes that all mass is emitted from a single point (n p = 1) located at height z t above ground level: where f i is the i-th bin mass fraction.
Note that this option includes as end-members the POINT option (if z b = z t ) and a vertically uniform emission from ground to top (if z b = 0).

255
3. The SUZUKI option (Suzuki, 1983;Pfeiffer et al., 2005) assumes a mushroom-like vertical distribution of n p emission points depending on two dimensionless parameters A and λ :: A s :::: and :: λ s : ::::::: The Suzuki parameter A :: A s : controls the vertical location of the maximum of the emission profile, whereas the parameter λ :: λ s controls the distribution of the emitted mass around the maximum.

260
When any of the previous source options is defined for volcanic plumes, it is useful to prescribe the total source strength where a = 140.8 is a constant and H is the eruption column height expressed in km above the eruptive vent. Alternatively, the 1D model fit by Woodhouse et al. (2016) can also be used to provide M o depending on the surrounding atmospheric conditions: where N is the Brunt-Väisälä frequency, c o is a constant, and f is a function of the parameter W = 1.44γ/N :::::::::::: given by: f (W W ) =   1 + bW + cW 2 1 + aW 1 + bW + cW 2 1 + aW :::::::::::: with the coefficients a = 0.87 + 0.05β/α, b = 1.09 + 0.32β/α, and c = 0.06 + 0.03β/α, being α and β ::::::::::::::::::: a = 0.87 + 0.05β e /α e , :::::::::::::::::: b = 1.09 + 0.32β e /α e , :::: and :::::::::::::::::: c = 0.06 + 0.03β e /α e , ::::: being ::: α e ::: and ::: β e the radial and cross-wind plume entrainment coeffi- being C s , C v and C a the specific heat capacities at constant pressure of the solid pyroclasts, gas phase, and air respectively, T a and T o the air and vent magma mixture temperatures, and n 0 the mass gas :: gas ::::: mass fraction. in terms of the friction velocity. For example, emission scheme 1 (Westphal et al., 1987) considers: where F V is the vertical flux (in kg m −2 s −1 ), occurring only above a (constant) threshold friction velocity u * t (u * given in ms −1 ). An important limitation of (33) is that the vertical flux does not depend neither on particle size nor soil moisture. However, despite its simplicity, this parameterisation can be useful when information on soil characteristics 295 (e.g. particle sizes and densities, moisture, roughness, etc.) is unavailable or poorly constrained. Emission scheme 2 (Marticorena and Bergametti, 1995;Marticorena et al., 1997) considers: where the experimental coefficient S c (in cm −1 ) depends on the amount of available fine particles in the soil, and the threshold friction velocity is given by: with K = ρpgd ρa 1 + 0.006 ρpgd 2.5 and Re = 1331 × d 1.56 (the lower bound of the fit corresponds to particles of ≈ 10µm in size). Note that in (35), ρ p and ρ a are particle and air densities (expressed in g/cm 3 ), g is gravity (in cm/s 2 ), d is the particle size (in cm), Re is the Reynolds number parameterised as a function of the particle size, and u * t is given in cm/s.

305
Finally, emission scheme 3 (Shao et al., 1993;Shao and Leslie, 1997;Shao and Lu, 2000) considers that the uplift from surface of the fine fraction of soil particles is controlled by the bombardment of saltating particles of larger sizes (≥ 63 µm), which breaks the cohesive forces of smaller particles. Based on theoretical and experimental results, Shao et al. (1993) found an expression for the vertical flux of dust particles of size d ejected by the impact of saltating particles of size d s : ::::::: where α :: α s : (in m s −2 ) is the coefficient of sandblasting efficiency determined experimentally (Shao and Leslie, 1997) and F H is the horizontal flux (in kg m −1 s −1 ) of saltating particles: where c o is an empirical dimensionless constant close to 1. In this scheme, the threshold friction velocity u * t (d) is given 315 by: where γ is an experimental parameter ranging between 1.65 × 10 −4 and 5 × 10 −4 kg s 2 (a value of 3 × 10 −4 kg s 2 is assumed in FALL3D-8.0).
In FALL3D-8.0, dry and wet deposition mechanisms can be activated for any type of bin below a certain particle/aerosol size.
Dry deposition on the ground is imposed prescribing the deposition velocity through a Robin boundary condition in (2).
FALL3D-8.0 admits two dry deposition parameterisations, which describe the vertical depositional fluxes by Brownian diffusion and inertial impaction, parameterised through the Schmidt and the Stokes number respectively. The first option considers the mass-consistent formulation proposed by Venkatram and Pleim (1999): where r a describes the effects of aerodynamic resistance and r b the quasi-laminar resistance (e.g. Brandt et al., 2002, and references therein). The aerodynamic resistance r a can be calculated as: with z o denoting the ground roughness height and φ h the atmospheric stability function for temperature given by (12). The 330 quasi-laminar resistance r b can be expressed in terms of the Schmidt number Sc = ν/D and Stokes number St = w s u 2 * /(gν) (with ν kinematic viscosity of air, and D molecular diffusivity of particles) (e.g. Brandt et al., 2002): The second option is that proposed by Feng (2008), which essentially differs from (39) in the estimation of r b :

335
where c 1 = 0.0226, c 2 = 40300 and c 3 = 15330 are dimensionless parameterisation constants, Re * is the Reynolds number (computed with the friction velocity u * ), and a and b are coefficients that depend on the particle size and surface characteristics.
Note that Feng (2008) gives a and b best-fit values for 7 land use categories and 4 aerosol size modes: nuclei (up to 0.1 µm), accumulation (up to 2.5 µm), coarse (up to 10 µm), and giant (up to 100 µm). A cut-off is assumed above this size because the sedimentation velocity term w s dominates and therefore the dry deposition contribution can be neglected.

340
Wet deposition mechanisms in FALL3D-8.0 are assumed to occur only within the Atmospheric Boundary Layer (ABL) and the corresponding sink term in (3) is parameterised as: where Λ differs for in-cloud (ic) and below-cloud (bc) sinks. For below-cloud scavenging (precipitation), Λ bc is estimated from the total precipitation rate as (e.g. Brandt et al., 2002;Jung and Shao, 2006): where P is the precipitation rate (in mm h −1 ), and a = 8.4 × 10 −5 and b = 0.79 are two empirical constants. For in-cloud scavenging (rainout), the model considers a parameterisation based on the atmospheric relative humidity RH (in %) as in Brandt et al. (2002): , and RH s = 100% (saturation value). Two critical particle cut-off sizes of 100 and 1 µm are assumed for below and in-cloud scavenging respectively.

Gravity spreading of the umbrella region
Large explosive volcanic eruptions can generate gravity-driven transport mechanisms that dominate over passive transport close to the vent and cause a radial spreading of the cloud (e.g. Woods and Kienle, 1994;Sparks et al., 1997). In order to 355 simulate this mechanism, FALL3D-8.0 includes a gravity current model (see Costa et al., 2013, and the Erratum published in June 2019). This option consists on adding a radial velocity field to the background wind, so that contributions from both passive and density-driven mechanisms are accounted for. The added radial wind is centred above the eruptive vent in the umbrella region, and extended up to a radius R given by: :::::: where t is time since eruption onset, λ :: λ g : is an empirical constant constrained to ≈ 0.2 from Direct Numerical Simulations (Suzuki and Koyaguchi, 2009), N is the Brunt-Väisälä frequency, and q is the volumetric flow rate into the umbrella region, estimated as (Morton et al., 1956;Suzuki and Koyaguchi, 2009;Costa et al., 2013, as correct in Erratum 2019): :::::::::: where M o is the total source strength (i.e. mass eruption rate), k :: k e : is the air entrainment coefficient, and c a constant that from 365 varies from tropical to mid-latitude/polar locations: Given the radius R, the radial velocity field as a function of distance r is calculated as (Costa et al., 2013): where u r (R) is the front velocity: :::::: wind velocity using an exponential decay function of the cloud thickness h :: h c : as: :::::::::::: where d : r is the distance from current front. 375

Aggregation
Aggregation of tephra particles can occur inside the eruptive columns or even downwind in ash clouds during atmospheric dispersion, thereby affecting the sedimentation dynamics and deposition of volcanic ash. FALL3D-8.0 includes some simple a-priori aggregation options and a wet aggregation model  that can be activated for tephra bins. The a-priori options consist on user-defined or empirically-based pre-defined fractions of aggregating classes 380 being transferred to one or more class of aggregates at the source points (i.e. aggregation is performed before transport). In contrast, in the wet aggregation model, ash particles aggregate on a single effective class of diameter d A , i.e. aggregation only affects tephra bins with diameter smaller than d A , typically in the range 100-300 µm. This option can run embedded in FPLUME-1.0 or as stand-alone. Consider a tephra grain size distribution in which k particle bins can aggregate. Then, the aggregation model defines the source (S a ) and sink (I a ) bin terms for the corresponding k + 1 bins as: where d j (< d A ) and ρ j are, respectively, the diameter and density of particles in bin j, andṅ j is the number of particles per unit volume and time that aggregate. The model assumes that this is proportional to the total particle decay per unit volume ::: and :::: timeṅ tot , i.e.: is the number of primary particles of diameter d j in an aggregate of diameter d A , k f is a fractal pre-factor (k f ≈ 1), and D f is the fractal exponent (D f ≤ 3). The model estimates the total particle decay per unit timeṅ tot integrating the coagulation kernel over all particle sizes, depending on the sticking efficiency times a collision frequency function which accounts for Brownian motion, collision due to turbulence as a result of inertial effects, laminar and turbulent fluid shear, and differential sedimentation (see Costa et al., 2010;Folch et al., 2016, for details).

Radioactive decay
FALL3D-8.0 can handle the fate of radioactive material dispersed from accidental releases (e.g., Brandt et al., 2002;Leelössya et al., 2018). Five common species of radionuclides have been implemented, Cesium 134 Cs, 137 Cs and Iodine 131 I, which decay 400 to stable isotopes, and 90 Sr, which decays to the unstable isotope 90 Y (see Table 6).
Radionuclide species need to specify the source (S r n ) and the sink (I r n ) terms in (3), associated to the radioactive production or decay of the isotope n respectively. The radioactive decay term indicates the mass per unit volume of the isotopes of type n that decays per unit time: where c n is concentration (expressed in kg m −3 ) and k r a constant specific of each isotope (decay rate) that can be calculated from the radioactive element half life t 1/2 as k r = ln(2)/t 1/2 . Values of t 1/2 and k r for common radionuclides are reported in Table 6. Note that the decay term is more relevant for isotopes with short half lives, e.g. for 131 I, which has t 1/2 8 days (Brandt et al., 2002;Leelössya et al., 2018).
The radioactive decay term of the isotope n constitutes a sink I r n for the isotope itself. However, the decay of the isotope m, 410 father of isotope n, constitutes a source S r n for the isotope n: where p mn is the the relative probability of decay of the isotope m to the isotope n. If the isotope m has only one child n, the relative probability of the branch m → n is p mn = 1. Note from Table 6 that 134 Cs, 137 Cs and 131 I, decay to stable isotopes, whereas 90 Sr decays to 90 Y which, in turn, is unstable and decays to the stable isotope 90 Zr. The production rate S r n of 90 Y is 415 therefore equivalent to the decay rate of 90 Sr.
In FALL3D-8.0 the radioactive decay is implemented by firstly transporting the radionuclides for a time step ∆t, and then by evaluating the decay during the same time step. For radionuclides that decay to a stable isotope ( 134 Cs, 137 Cs and 131 I) it is considered that after a time step ∆t the concentration decreases as:

420
wheres for decay to an unstable isotope (Yttrium) the concentration varies as: where c Y and c Sr are, respectively, the concentrations of 90 Y and 90 Sr and k Y and k Sr are the corresponding decay rates.

425
Instrumentation onboard the new generation of geostationary satellites provides with an unprecedented level of spatial resolution and temporal frequency (2 to 4 km pixel size and 10 to 15 min observation period; see Table 7), yielding a quasi-global coverage considering the overlap of different existing platforms. This is very suitable for high-resolution model data assimilation and related uncertainty quantification, as well as to implement ensemble-based dispersal forecast systems. These aspects are still under development and hopefully will be part of next FALL3D

Coordinate mappings and scaling
Consider the ADS equation (1) written in a Cartesian system of coordinates (x, y, z) assuming a diffusion tensor as in (5) and a sedimentation velocity u s = (0, 0, −w s ) aligned with the vertical coordinate z: It is straightforward to discretise the above equation in a "brick-like" computational domain Ω c using a structured regular (i.e. 445 equally-spaced) mesh, although the regularity condition is typically relaxed across the vertical direction so that the vertical grid resolution increases close to ground, where higher gradients are expected. In order to use other coordinate systems, equation (59) can be written on a generalised orthogonal system of coordinates (X 1 , X 2 , X 3 ) (e.g. Toon et al., 1988;Byun and Schere, 2005): where C is the scaled concentration, (U, V, W ) are the scaled wind components, (K 1 , K 2 , K 3 ) are the scaled diffusion coefficients, and S * and I * are the scaled source and sink terms. The implementation of a generalised equation like (60) presents two major advantages. On one hand, the generalised equation reads formally equal to that in Cartesian coordinates, so that little computational penalty exists to map physical domains (e.g. accounting for Earth's curvature and topography) to a "brick-like" computational domain (see Figure 1) by using coordinate-dependent horizontal and vertical mappings. On the other hand, a 455 generalised form simplifies the structure and implementation of the code because the model can be solved on various horizontal (cartesian, spherical, Mercator, polar stereographic, etc.) and vertical (terrain following, σ-coordinates, etc.) coordinate systems using only one solving routine. To this purpose, one needs first to scale the model coordinates and some terms in the equation using adequate mapping and scaling factors, then solve for the scaled concentration C in the regular computational domain Ω c (as in Cartesian coordinates) and, finally, transform the scaled concentration back to the original one.

460
In general and given two orthogonal coordinate systems (x 1 , x 2 , x 3 ) and (X 1 , X 2 , X 3 ), coordinate mapping factors are given by the terms m ij of the Jacobian transformation matrix M :: M: but, for the transformations considered here, M :: M will always be diagonal with three non-zero components m 1 , m 2 and m 3 . For example, in the horizontal transformation to spherical Earth surface coordinates (λ, φ) one has: where R ::: R e is the radius of the Earth and λ, φ and γ are the longitude, latitude and colatitude respectively (in Rad). Trivially, this transformation yields to m 1 = sin γ and m 2 = 1. Table 8 gives horizontal mapping factors for different coordinate systems (Toon et al., 1988). In most practical cases, where h(x, y) is the terrain height and H ::: H T : the height of the top of the computational domain. In the σ-coordinate system the influence of the terrain decreases linearly with height, from terrain-following at the surface (X 3 = 0) to a rigid lid at the top of the computational domain (X 3 = H :::::::: X 3 = H T ). This option has been added to partially correct numerical oscillations in 475 the previous terrain-following model mapping (z = X 3 + h), which can appear near the surface in case of flows over mountain ranges and propagate upwards (Schar et al., 2002). Table 9 gives the vertical mapping factors for the different coordinate systems available in FALL3D-8.0. Once defined, these coordinate mapping factors are used to scale the variables and parameters that appear in the generalised equation (60).
The scaling of scalar quantities is straightforward since it only involves the Jacobian determinant of the transformation, whereas for the vertical component one has to consider that, in terrain following coordinate systems, the coordinate X 3 depends 485 also on (x, y) through the terrain elevation h(x, y): Basic algebra manipulation yields to the scaling factors shown in Table 10 for the different vertical coordinate systems. Note that the expression above implicitly contains the correction for topography in the vertical velocity. Finally, scaling factors for 490 diffusion coefficients can also be obtained after some manipulation (Toon et al., 1988;Byun and Schere, 2006).

Discretisation and solving algorithm
FALL3D solves for each model bin the 3D generalised equation (60) using a fractional step method, which splits the equation along each spatial direction (Folch et al., 2009): with the solving order of each resulting one-dimensional ADS equation being permuted in each successive time step in order to avoid any privileged direction. One advantage of this splitting strategy is that, even if each one-dimensional equation is solved explicitly in time, the result is a semi-implicit scheme that adds stability.
In FALL3D-8.0, the "brick-like" computational domain is discretised using a variation of the staggered Arakawa D-grid, in which the wind velocity components are evaluated at the respective cell faces and the rest of scalar quantities at the cell 500 centres (Fig. 2). Note that this configuration is very convenient for solving the 3D equation in a fractional manner because, when solving each one-dimensional case, wind velocities are already aligned with the boundaries of the corresponding onedimensional cells. The previous versions of FALL3D used the classical Lax-Wendorff (LW) central differences scheme for solving the resulting one-dimensional ADS equations in (67), combined with a slope-limiter to reduce numerical over/under shootings near discontinuities. This resulted on a second-order accuracy except near sharp concentration gradients where, nonetheless, accuracy remained higher than in a first order upwind method. The main advantage of using the LW scheme was its simplicity but, in contrast, it is well known that it introduces numerical dissipation that prevents accurate resolution of discontinuities, leading to over-diffusive results. In order to circumvent this drawback, FALL3D-8.0 uses instead a highresolution Kurganov-Tadmor (KT) scheme that can be combined either with a fourth-order explicit Runge-Kutta or with a first-order Euler time-marching method. The later option, even if less accurate, is still supported for computational efficiency 510 reasons because it implies 4 times less solver calculations.
Consider the general one-dimensional advection-diffusion equation for a scalar variable c(x, t) on its conservative form: where, in our particular case, F = c u is the advective flux and G = k∂c/∂x is the diffusive flux (as already introduced, u(x, t) and k(x, t) are the velocity and diffusivity respectively). Consider also a 1D computational domain discretised as in Figure 2, 515 where c is computed at cell centres ("mass" points) and u is stored at staggered cell boundaries, which do not need to be equally spaced. The semi-discrete form of the KT scheme can be written at centre of each cell i in terms of fluxes at boundaries i ± 1/2 as (Kurganov and Tadmor, 2000): where ∆x i is the i-th cell width and: Note that, in the expression above, the last equality holds because our flux G depends only on the gradient of c, i.e. G = G(∂c/∂x). In (70), a i±1/2 is the maximum absolute value of the eigenvalue of the Jacobian matrix of F (in our particular case, it reduces to a i±1/2 = |u i±1/2 |), and c r and c l are, respectively, right and left values at a cell boundary, computed as: and φ(r) is a flux limiter function (e.g. Sweby, 1984). Options available in FALL3D-8.0 are the well-known superbee φ s (r) 540 and minmod φ m (r) (Roe, 1986): φ s (r) = max(0, min(1, 2r), min(2, r)) φ m (r) = max(0, min(1, r)) Note that c is needed at two extra mass points in order to evaluate c l i−1/2 and c r i+1/2 at the left/right cell boundaries respectively. In other words, the "stencil" of the KT scheme needs two ghost nodes at the boundaries of the computational domain or a two-545 point halo for internal domains in case of parallel domain decomposition. Note also that the solving strategy derives from a one-dimensional finite-volume formulation that, in our one-dimensional case, is also in practice equivalent to use linear finite elements.
Time marching from t n to t n+1 = t n + ∆t in (69) can be performed with the explicit first-order in time Euler method (EU1): 550 or, alternatively, using the classical fourth-order Runge-Kutta method (RK4), in which: with: where the function f (t, c) is given by the RHS of (69 560 multiplied by a user-defined safety factor. This factor should theoretically be lower than 1 in fully explicit cases but, given the semi-implicit nature of the splitting algorithm, slightly larger values can also yield to stability.

Algorithm benchmarks
Three benchmark cases serve us to illustrate the gains of the KT+RK4 numerical scheme with respect the former LW+EU1 implemented in the previous versions of the code.

565
Example 1 considers the pure advection (k = 0) of a step-like discontinuity. Consider a 1D domain x ∈ [−1, 1] with an initial concentration of: that is advected by a uniform velocity field u = 1. Periodic conditions are imposed at the boundaries, so that "mass" leaving the computational domain at x = 1 is re-injected at x = −1. As a result, the initial condition is periodically recovered after 570 each cycle with a period t = 2. Results are shown in Figure 3. Note how, as opposed to LW, the KT scheme adds almost no numerical diffusion and preserves discontinuities. In addition, because of its higher order in time, the KT+RK4 scheme preserves the solution whereas for LW+EU1 accuracy deteriorates with time (compare the accuracy of the LW+EU1 solutions after 1 and 10 cycles in Figure 3).
Example 2 considers the classical 1D advection-diffusion problem for the onset of numerical instability in a domain x ∈ [−1, 1] 575 subject to the boundary conditions c = 0 at x = −1 and c = 1 at x = 1. The problem has a steady-state analytic solution given by: where P e = ul/2k = u/k is the Péclet number. Figure 4 shows the steady-state solutions for different Péclet numbers, illustrating also how KT+RK4 outperforms LW+EU1 as the Péclet number increases.

580
Example 3 considers a case with pure advection (k = 0) on a 2D domain (x, y) ∈ [−1, 1] × [−1, 1] with an initial condition at t = 0 given by a conic concentration distribution with a unit (c = 1) peak concentration centred at (x c , y c ) = (0, 0.695) and having a radius r = 0.1 (Figure 5a). The cone is advected by a rotating clock-wise velocity field centered at (0, 0): with an angular velocity Ω = π (in Rad s −1 ), so that each cycle is repeated with a period of t = 2. Concentration profiles along two transects A (y = y c ) and B (x = 0) after 2 cycles (i.e. at t = 4) are shown in Figure 5b and 5c respectively. Note again the substantial improvement in the KT+RK4 scheme, which performs well even in this numerically challenging test.
In FALL3D-8.0, the pre-process auxiliary programs have been parallelised and embedded in the code, so that a single executable 590 file exists for all the pre-process steps and execution workflow (see Figure 6). These formerly independent programs can still be run individually as model tasks (specified by a program call argument) or, alternatively, concatenated with the model in a single execution. In the first case, pre-process tasks generate output files that are later given as inputs to the model task. This is similar to what occurred in the previous v7.x but with the difference of a parallel pre-process. In contrast, the second option does not require intermediate file writing/reading and, therefore, saves disk space and overall computing time. In any case, all 595 tasks share a unique model input file and generate its own log file to track execution and report eventual warnings and errors.
Possible task options are summarised in Table 11 and include: 1. Task SetTgsd. This task can generate Gaussian and Bi-Gaussian particle grain size distributions in Φ (log-normal in diameter) or, alternatively, Weibull and Bi-Weibull distributions (Costa et al., 2016a), and assumes a linear variation of density and particle shape factor between two specified cut-offs. For other kind of grain size distributions, the user must 600 provide a total grain size distribution file (name.tgsd).
2. Task SetDbs. This task interpolates all the required meteorological data from the original grid of the driving meteorological models to the computational domain. Table 12 summarises the different meteorological drivers available in 3. Task SetSrc. This task generates different emission source terms (see Sec. 3.2.3 and Table 4), including the PLUME option based on the FPLUME-1.0 model . If necessary, it also performs a-priori tephra particle aggregation (Sec 3.2.6) and a TGSD cut-off in order to select the effective bins considered in the atmospheric transport.

Task
All. Finally, this task runs all previous tasks consecutively as a single parallel execution.
Parallelisation in FALL3D-8.0 considers a 3D domain decomposition, with freedom for user to choose the number of processors along each direction. In contrast, previous versions considered two different levels of parallelisation, one on particle bins and another on domain but only along the vertical dimension. Parallelisation on bins was convenient in v7.x because no interaction 625 among bins existed, but such a form of trivial parallelism has been deprecated given that it would now yield to unnecessary communication penalties. Note also that the full 3D domain decomposition allows solving on much larger grid sizes before reaching hardware memory limits.
In terms of run time performance, it is important to recall that the splitting algorithm combined with the RK4 time marching implies solving 4 times a series 1D equations along each spatial dimension, contrasting with the 1 single solution for the EU1 630 case. In other words, each time integration step of (67) using the RK4 scheme along X 1 implies solving 4 times n y × n z one-dimensional problems, the solution along X 2 implies solving 4 times n x × n z problems and so on. In order to minimise the number of communications between neighboring processors, it is more convenient to compute first each of the 4 partial increments in (78) for all the 1D problems on a given dimension. This optimises swapping communications among domain partitions because a single swapping request (i.e. one MPI send/receive call) communicates all the data necessary to compute 635 the next RK4 increment for the whole mesh. In contrast, if all partial increments in (78) were computed for each 1D problem, it would require as many swapping requests as 1D problems involved. Clearly, this solving approach is convenient but it presents two drawbacks. Firstly, it obviously increases the amount of memory required and, secondly, it poses an issue on memory data access, with potential increase of cache memory misses given the more frequent access to data. Moreover, when one solves for the dimensions stored on array lowest axes, memory data access is scattered and therefore less efficient. This issue has been 640 addressed in FALL3D-8.0 by re-arranging the components of the velocity and diffusion arrays, with the dimension to solve being stored always in the fastest axis, and also by transposing the concentration array that has to be updated at the first partial increment. Such a memory management strategy allows exploiting always contiguous cache memory positions on any spatial dimension.
All the aspects discussed above have improved substantially the performance and scalability of the code. As an example of 645 strong scaling (i.e. time to solution for a fixed problem size), let us consider a real-case ash dispersal simulation from the 2011 Cordón Caulle eruption (e.g. Collini et al., 2013). The model was configured as shown in Table 13 Figure 8b plots the corresponding parallel efficiencies, defined as: where t 1 is the total computing time with 1 processing unit, and t N the time with N processing units. Note that ideal strong scaling implies a parallel efficiency of 100%. Clearly, v8.x improves notably the scalability of the code, with a trend close to that of perfect scaling up to ≈100 processors (parallel efficiency ≥ 90%). Depending on the time integration scheme, values of parallel efficiency above 50% are obtained with up to 512 and 1024 processors for EU1 and RK4 respectively. This is in striking contrast with version v7.3.4, for which parallel efficiency already drops below 50% with only 16 processors (Fig. 8b).

660
A much better v8.x code performance is also observed not only in terms of code scalability but also in terms of total computing time. Figure 9 shows the computing time ratio (total elapsed time) between v7.3.4 and v.8 depending on the number of processors and the time integration scheme. The EU1 case (i.e. equal order of accuracy in time that in v7.3.4), shows a similar serial performance, only only with an insignificant computation overhead probably due to algorithmic differences. However, this overhead is rapidly balanced with only 4 processors, and v8.x with EU1 is already up to 4x faster than v7.3.4 with 64 665 processors. The RK4 case (i.e. 4-th order of accuracy in time) is around 4x slower than v.7.3.4 for the serial execution. This is a logical result given that the RK4 scheme needs to iterate 4 times more to solve in time. However, this penalty is also rapidly balanced as the number of processors increases, and RK4 already outperforms v7.3.4 in absolute terms with only a few teens of processors. In summary, v8.x clearly shows a much better scalability and performance (absolute computing time to solution for a given problem and computational resources) than v7.3.4. The companion paper (Prata et al., 2019) contains a detailed 670 model validation and shows that this is also true in terms of model accuracy.

Conclusions
After 15+ years, the atmospheric transport model FALL3D has been completely rewritten and modernised to overcome legacy constrains in the former release versions v7.x that precluded the introduction of new functionalities and seriously limited the scalability and performance of the code on hundreds or thousands of processors. With this, FALL3D-8.0 can be considered as 675 a baseline for the successive optimisations and preparation for Exascale computing. However, as detailed in the paper, version v8.x already contains remarkable improvements and updates on model physics, numerics, and performance. In particular, the code has been prepared to deal also with particles different from tephra, aerosols and radionuclides, includes new coordinate mapping options, a more efficient and less diffusive solving algorithm (KT) that can be combined with a high-order in time solver (RK4), and a better memory management and parallelisation strategy based on a full 3D domain decomposition. Strong 680 scaling results have shown perfect scaling with few hundreds of processors and a parallel efficiency above 50% with 1024 processors. This remarkable improvement is also true in terms of performance (total computing time), with v8.x outperforming the previous release by a factor of 4x with only 64 processors.
Only for resuspended particles (ash and dust)   Table 8. Horizontal mapping factors (x, y) ← (X1, X2) for different coordinate systems where λ is longitude, φ latitude, R the radius of the Earth, γ colatitude, and φo the latitude at which the projection is true. Mercator and polar stereographic projections use cartesian coordinates projected into the Earth surface, i.e. account for curvature. Regular Cartesian coordinates should be used only for local domains, where the Earth's curvature can be neglected.

All
Fall3d.x All problemname.inp npx npy npz Runs all previous tasks in a single execution (1) model input file (same for all tasks), (2) number of processors along each spatial dimension in domain decomposition. If not given, the execution is serial.