Open Access The Cryosphere

Abstract. The Lagrangian particle dispersion model FLEXPART was originally designed for calculating long-range and mesoscale dispersion of air pollutants from point sources, such that occurring after an accident in a nuclear power plant. In the meantime, FLEXPART has evolved into a comprehensive tool for atmospheric transport modeling and analysis at different scales. A need for further multiscale modeling and analysis has encouraged new developments in FLEXPART. In this paper, we present a FLEXPART version that works with the Weather Research and Forecasting (WRF) mesoscale meteorological model. We explain how to run this new model and present special options and features that differ from those of the preceding versions. For instance, a novel turbulence scheme for the convective boundary layer has been included that considers both the skewness of turbulence in the vertical velocity as well as the vertical gradient in the air density. To our knowledge, FLEXPART is the first model for which such a scheme has been developed. On a more technical level, FLEXPART-WRF now offers effective parallelization, and details on computational performance are presented here. FLEXPART-WRF output can either be in binary or Network Common Data Form (NetCDF) format, both of which have efficient data compression. In addition, test case data and the source code are provided to the reader as a Supplement. This material and future developments will be accessible at http://www.flexpart.eu .

LPDMs compute trajectories of a large number of infinitesimally small air parcels (also called particles) to describe the transport of air in the atmosphere.Unlike Eulerian models, Lagrangian models can accurately represent the emissions from point or line sources and accurately advect narrow plumes and filaments, as they do not suffer from the numerical diffusion that is inherent in discrete Eulerian models (Rastigejev et al., 2010).Additional benefits of Lagrangian models are their flexibility and small computational cost compared to Eulerian models.
However, Lagrangian models suffer from numerical errors due to interpolation in space and time of the simulated meteorological fields (Stohl et al., 1995).Furthermore, well-mixed particles (which should remain so) may artificially "unmix" when the stochastic particle movements representing turbulence are not treated accurately enough (Thomson, 1987) or when the vertical velocity is not very precisely mass balanced with horizontal winds.These issues can be partially addressed, using a mesoscale model with full control of the horizontal and vertical resolution and output time interval.
The current version, FLEXPART v9.02, uses, like former versions, a terrain-following Cartesian vertical coordinate and latitude/longitude as horizontal coordinates.It uses meteorological model-level data from ECMWF or pressurelevel data from NCEP's Global Forecast System (GFS).The reader is referred to Stohl et al. (2005) for a detailed description of FLEXPART version 6.2 and the updated FLEXPART user manual available from http://www.flexpart.eufor later versions of the model.
Although FLEXPART has mostly been used with input data from global meteorological models, the planetary boundary layer (PBL) turbulence parameterizations implemented are based on data obtained from small-scale field experiments, and hence are valid for the meso-and local scales.This has led to several attempts to provide a mesoscale version of FLEXPART driven by mesoscale meteorological model output, such as the Mesoscale Meteorological (MM5) model (Grell et al., 1994), the Weather Research and Forecasting (WRF) model (Skamarock et al., 2008), or the weather model prediction COSMO (Brunner et al., 2013).The first mesoscale version of FLEXPART, developed in 2000, used data from MM5 (Wotawa and Stohl, 2000).A more recent version was developed in 2007 using MM5 v3.7 and FLEXPART version 6.2 (Seibert and Skomorowski, 2007;Srinivas et al., 2006;Arnold et al., 2008).Although promising, this version was not further developed, due, in part, to the termination of support, development, and usage of the MM5 meteorological model.
At about the same time, a FLEXPART version that uses the WRF model output was developed at the Pacific Northwest National Laboratory (PNNL) and renamed PILT (Fast and Easter, 2006).Not only were the input data were changed in this version, but also the whole computational domain, following the native metric horizontal coordinates from WRF, were changed as well.Along with the source code, Fast and Easter (2006) provided detailed documentation of the modifications, test cases, and results of the new implemented features, as well as a list of remaining issues to address in the model in order to make it complete.PILT was considered a beta version at that time.It has, however, been successfully applied in several studies, basically for non-depositing species, demonstrating the validity of the concept and formulation (Doran et al., 2008;Pagano et al., 2010;de Foy et al., 2011de Foy et al., , 2012)).
The new version presented here has been developed mainly at the National Oceanic and Atmospheric Administration and the University of Colorado Cooperative Institute for Research in Environmental Sciences (CIRES), in cooperation with the Norwegian Institute for Air Research (NILU), the Technical University of Catalonia Institute of Energy Technologies (INTE) and the University of Alaska Arctic Region Supercomputing Center (ARSC).It has been successfully applied to simulate pollutant transport at mesoscale in complex terrain, and to estimate surface fluxes using in an inversion framework (Brioude et al., 2011(Brioude et al., , 2012a(Brioude et al., , b, 2013;;Angevine et al., 2013).FLEXPART-WRF combines the main characteristics of PILT and the FLEXPART v9.02.Furthermore, new features are introduced, which include new options for using different wind data (e.g.instantaneous or time-averaged winds), output grid projections, and numerical parallelization for computation efficiency.A novel scheme for skewed turbulence in the convective PBL has been implemented based on the formulation developed by Cassiani et al. (2013), which may give significant improvements, especially for small-scale applications.FLEXPART-WRF is a useful tool for representing scales smaller than those FLEXPART-ECMWF/GFS can represent, while keeping the strength and formulation of FLEXPART.It is recommended that one use WRF version 3.3 or higher in order to have the full palette of FLEXPART-WRF options available.
FLEXPART-ECMWF/GFS has been a free software ever since it was released, and so is FLEXPART-WRF.The code is released under the GNU General Public License (GPL), version 3. The code can be distributed and modified under the terms of this license, which states basically that subroutines from this software cannot be sold.The text of the license is included in the file COPYING in the source code archive.
In this document, we present the new features of FLEXPART-WRF, giving in-depth details on the differences between it, FLEXPART-ECMWF/GFS, and PILT.Different test cases are used to describe the main features.Performance evaluations, the source code, a step-by-step description of the installation/execution and a python-based visualization software are also provided.It is recommended that one first read the latest FLEXPART-ECMWF/GFS manual for a complete understanding of this paper.

Model description
In this section we describe the main specific attributes of FLEXPART-WRF v3.1 and the basic differences between it and its predecessors.The reader is referred to Stohl et al. (2005) for detailed information on the physics and numerics of the model that are not described here.

WRF
The Weather Research and Forecasting (WRF; http://www.wrf-model.org)modeling system is used for various forecast and analysis applications, from the microscale to the synoptic and even global scales.WRF includes dozens of parameterizations for boundary layer processes, convection, microphysics, radiation, and land surface processes, and several options for numerical schemes (Skamarock et al., 2008).As a limited-area model, WRF must be provided with boundary conditions from another meteorological analysis system.The goal of this subsection is to suggest considerations for successful FLEXPART-WRF simulations, not to describe WRF or all its possibilities.
There is an extensive but inconclusive literature on the proper WRF configurations to use.Angevine et al. (2012) evaluated the performance of several WRF configurations in comparison with a variety of data, with the specific goal of providing input for FLEXPART.Despite all its virtues, WRF, like any model system, has some inherent limitations (e.g.Arnold et al., 2012b) and uncertainties, not dealt with in this paper, that will propagate into the atmospheric transport modeling (Gerbig et al., 2008).We recommend that WRF simulations should be evaluated with meteorological observations in order to have confidence in the FLEXPART-WRF results.
In terms of direct relevance to driving FLEXPART, WRF is a non-hydrostatic mesoscale model that uses perturbation equations with respect to a dry hydrostatic base state.Some of the meteorological variables required by FLEXPART are calculated from a base value and a perturbation from the WRF output (Table 1).WRF uses pressure based terrainfollowing coordinates.The prognostic variables in WRF are mass-weighted and therefore help to conserve mass.
The WRF pre-processing module, WRF Preprocessing System (WPS), sets the computational domain, the geographical projection, and the resolutions both in the horizontal and in the vertical, and interpolates the meteorological fields (usually from a global model analysis or forecast) used as initial and boundary conditions.WPS also prepares and reprojects the static data for the runs (including land use and elevation), which usually comes from satellite information.These data should be considered carefully if fine resolution (∼ 1 km) is required (e.g.Arnold et al., 2012b).
The choice of meteorological data for the initialization, the land surface model, boundary layer scheme, and convection scheme are all important for an accurate WRF and therefore FLEXPART-WRF simulation.Initialization data choices are limited to what is available, especially in a forecast context.Land surface models range from being quite simple to being extremely complex, and have corresponding data requirements.In general, interpolating soil moisture from a global model to the WRF grid is not recommended (Di Giuseppe et al., 2011), but other choices add considerable complexity.As for the boundary layer, if the turbulent kinetic energy (TKE) from WRF is to be used in FLEXPART (see Sect. 2.5 below), a scheme that provides TKE must be used.At the time of writing, only the Mellor-Yamada-Janjic (Janjic, 2002) and Mellor-Yamada-Nakanishi-Niino (Nakanishi and Niino, 2006) schemes provide the required TKE variable.The user should also be aware that different PBL schemes calculate the PBL height differently, and then modify FLEXPART-WRF simulations if the height from WRF is to be used in FLEXPART (again, see Sect.2.5 below).The choice of whether or not to use a convective scheme in WRF depends on the situation.It is generally advisable to use a convective scheme for a horizontal grid spacing larger than 30 km.Convection schemes are in general not designed for grid spacing less than 10 km, and hence convection should be resolved explicitly by the model.This, however, requires a resolution of 1-2 km or less.There is no general rule for intermediate grid spacing.However, if a convective scheme is used in a WRF domain, a convective scheme should be used in FLEXPART as well, in order to parameterize subscale convection.

FLEXPART-WRF
FLEXPART-WRF v3.1 can handle the different map projections WRF is able to work with.Figure 1 presents the projections that are commonly used in WRF: Lambert conformal, stereographic and Mercator.The blue grid cells represent the center grid of the Arakawa C-grid used in WRF (Fig. 2).The green and red grids represent the two different FLEXPART output projections (see Sect. 2.7 for further details).
To conduct a FLEXPART experiment, different meteorological fields from WRF are needed.The surface pressure, horizontal winds at 10 m a.g.l., the tem-perature and dew point at 2 m a.g.l. are used to calculate different parameters used in PBL schemes.Some variables are optional.For instance, RAINNC and RAINC are not necessary for running FLEXPART-WRF with passive tracers; however, they are needed to calculate wet deposition.The output time interval (how often WRF output is provided to FLEXPART) should in general be as short as practical.Brioude et al. (2012b) have shown, for instance, that a time interval of 1 h is reasonable in complex terrain.
In the subsequent subsections, detailed information is given to explain specifics of FLEXPART-WRF. .Two types of projection can be defined for the FLEXPART output: A projection that follows the WRF projection output (green grid), the so-called irregular output grid, and a projection based on regular latitude/longitude coordinates (red grid), the so-called regular grid.See Sect.2.7 for further details on the FLEXPART output.

User input file
The FLEXPART-WRF model has a somewhat different structure than the FLEXPART-ECMWF/GFS version.In the FLEXPART-WRF, all the input files have been merged into a single control file (See, for instance, flexwrf.testcase0 in the Supplement).The format of the input file has descriptions of the different input needed so that experienced FLEX-PART users may easily identify the pathnames (the path name of the directory where the meteorological data and FLEXPART output are located), COMMAND (the list of options), AGECLASSES (the ageclasses used in the experiment), OUTGRID (the coordinates and vertical levels of the FLEXPART output domain), RECEPTORS (the coordinates of the receptors), SPECIES (a list of species that include molar weight, and wet and dry deposition parameters), RELEASES (coordinates of the release boxes) files used in FLEXPART-ECMWF/GFS, and can easily adapt scripts if needed.A similar structure, following a namelist, is planned for future versions of FLEXPART-ECMWF/GFS.
The rationale behind this consolidation of the input files is that, instead of duplicating the FLEXPART binary or directories that include the input files, a single binary and a single directory can be used for any number of FLEXPART-WRF experiments.Unlike PILT, FLEXPART-WRF allows the input file to have any user-defined name; this improves flexibility and helps automatic scripting.The input file has a free format except for the definition of the species used.However, attention has to be paid to the real or integer format required for the input values.Different input files covering different test cases are provided with the source code in the examples directory.A README file is given in the examples directory that describes the purpose of each input file in detail.To run those test cases, the WRF output files can be downloaded from the http://www.flexpart.euwebsite or from the Supplement.FLEXPART-WRF outputs for each of these cases are also included.
Beside the regular switches found in FLEXPART-ECMWF/GFS, new additional switches are available in FLEXPART-WRF.Two new switches allow the user to control the use of the input wind field information.Switch WIND_OPTION facilitates the choice of instantaneous wind with the Cartesian vertical velocity W (option 0), timeaveraged wind (option 1), instantaneous wind with eta dot as vertical velocity (option 2) or using a divergence based vertical velocity calculated in FLEXPART-WRF (option −1).It is recommended to use option 1. See Sect.2.3 for further details.A second option, TIME_OPTION, can be used to correct the reference time of the time-averaged wind fields compared to the other instantaneous fields of WRF if not corrected by a preprocessing program.
The switch SFC_OPTION allows the user to select different treatment of certain PBL and surface parameters, including PBL height, surface sensible heat flux, and friction velocity.With option 1, these parameters are taken directly from WRF; with option 0 they are diagnosed by FLEXPART.There is no particular recommendation since the PBL height out of WRF might have large uncertainties.
The switch TURB_OPTION allows the use of different PBL turbulence parameterization schemes.Option 1 uses the Hanna scheme used in FLEXPART-ECMWF/GFS.The switch CBL can be used to assume skewed rather than Gaussian turbulence in the convective boundary layer.Options 2 and 3 use the turbulent kinetic energy (TKE) from WRF. Figure 3 presents the vertical profile of the concentration of a passive tracer released from a point source near the surface in a convective PBL after 3 h of transport.In a convective PBL, the tracer is expected to be well mixed throughout the depth of the PBL after such a long transport time.While the tracer using the Hanna scheme and the skewed option CBL is nearly well mixed vertically, the schemes that use TKE from WRF show an unrealistic buildup of tracer near the PBL top.This might be due to an underestimation of TKE near the top of the PBL, the perfect reflection scheme used at the top of the PBL, or an overestimate of the PBL height (from WRF) to be used with TKE.Based on Brioude et al. (2012b) and Fig. 3, it is recommended that one use option 1, as it is, so far, the only option that conserves the well mixed criterion in the PBL.The switch RELEASE_COORD changes the coordinates in which FLEXPART release boxes are specified.This can be in meters, as given in the WRF grid coordinate system (option 0), or as latitude/longitude (option 1).
The switch OUTGRID_COORDlets the user decide on the projection to be used for the FLEXPART-WRF output.Option 0 uses the WRF projection, while option 1 generates a regular latitude-longitude grid.The definition of the output domain, the former OUTGRID file, uses coordinates according to OUTGRID_COORD.See the test cases for further details.

Parallelization aspects
Lagrangian models can be very efficient compared to Eulerian models.For instance, they do not need to calculate advection and diffusion for the entire model domain, but instead only need to calculate advection and diffusion where particles are actually located.In the case of a single point source, this is a large advantage.Furthermore, the computational cost of advection and diffusion calculations does not increase with the number of tracers because a particle can represent any number of tracers at the same time.One important drawback of LPDMs up until now, however, was the computational resources required to run these models with a large number of particles (on the order of millions).FLEX-PART has been designed to use a large number of particles while keeping computational costs acceptable.On current computers, FLEXPART typically needs less than 1 s to calculate the transport of 100 thousand trajectories per time step.This capability is particularly important when pollution dispersion from large areas or from numerous sources need to be simulated.As an example, simulating anthropogenic pollutants over the North American continent requires at least 10 million particles to avoid intolerable levels of noise in Geosci.Model Dev., 6, 1889-1904, 2013 www.geosci-model-dev.net/6/1889/2013/hourly averaged output fields of 0.25 • resolution.Of course, the number of particles needed in a FLEXPART experiment depends on the size of the domain, the resolution (in space and time) of the meteorological data and the FLEXPART output, and the distribution of the sources.Trivial parallelization can be easily programmed by using several FLEXPART instances in different processors.Note, though, that these runs must be independent from each other (i.e. using different random number sequences).However, treatment of the same meteorological fields in every FLEX-PART instance can amount to a substantial computational overhead, as can the post-processing (e.g.merging) of the output files.Therefore, and to simplify the post-processing, a real parallel version of FLEXPART was desired.
FLEXPART-WRF was programmed keeping the end-user needs in mind; it was intended for a range of different computational resources, from single CPU computers to multithread clusters with distributed memory.The source code thus consists of some common routines for both parallel and serial runs, and some specific routines either for serial or for parallel compilation.The later include the main skeleton FLEXPART routines that control all the integration times and routine calls, the interpolation of the meteorological fields to the particle positions and, of course, the initialization and transport of the particles.The interpolation from the WRF coordinates onto FLEXPART coordinates, the concentration calculations, and the trajectory integrations have all been parallelized in OPENMP compiler directives for parallel programming.The reading of the WRF input file could be parallelized as well using a special Network Common Data Form (NetCDF) library that handles parallel reading.However, this has not been implemented yet.Routines with OPENMP instructions do not have a specific name, since the compiler will interpret the OPENMP instructions only when OPENMP is used.Routines marked by the mpi label, however, are the specific routines that use Message Passing Interface (MPI) parallel programming in distributed memory.This mainly concerns the timemanager.f90routine and routines that distribute the memory among the nodes within the system.Hence, the user can use OPENMP and MPI to reduce the computation time needed to perform a FLEXPART-WRF experiment.
The random numbers are handled similarly to the way in which they are in the standard serial FLEXPART; a series of Gaussian random numbers is first generated and stored in memory, and subsequently used during the simulation.In the parallel code, independent Gaussian random numbers series are generated and stored for different MPI processes.This is achieved using different initial seeds coupled with the random number generator RANLUX (Luscher, 1994;James, 1994).If properly set, this generator theoretically ensures that different seeds create independent random series, which is not true for most generators.Two methods (using the option newrandomgen in par_mod.f90line 249) are available for picking the Gaussian random numbers from each series: one very quick but less rigorous, and one ensuring a high level of randomness.In the quick method, the series of Gaussian random numbers is partitioned based on the number of OPENMP threads, and each thread serially extracts a series from its own partition.This method is very fast but the resulting random number series are repetitive with a short period.In the rigorous method, every OPENMP process has its own independent, uniform random number generator based on the parallel Mersenne-Twister algorithm of Matsumoto and Nishimura (1998), Haramoto et al. (2008) as implemented by K. I. Ishikawa (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/VERSIONS/FORTRAN/fortran.html).In this method, any OPENMP process can generate an independent, uniformly distributed random number whenever required, and use it to randomly select one of the Gaussian distributed numbers previously stored in memory.In this second case, the algorithm ensures the same level of randomness as in a single serial FLEXPART simulation; no repetitiveness should be observed within the resulting series of Gaussian numbers generated for any OPENMP process and no correlation should be observed between streams in different OPENMP processes.Of course, the statistical error of the simulation decreases by increasing the number of particles used for each release box (which is true for any Monte Carlo simulation).To achieve a bias of approximately 0.1 % or less in the random term used in the trajectories, we suggest using at least 1000 particles per FLEXPART release location.See the test cases for example.
The user can control the number of cores used by OPENMP by setting the system variable OMP_NUM_THREADS to the number of cores requested.(2) the number of trajectories used in the simulation.The larger the number of trajectories, the smaller the statistical sampling errors of the FLEXPART-WRF output.The optimization factors in Table 2 are given for the parallelization of the trajectory loop (in timemanager_mpi.f90)and the interpolation/concentration calculations (in verttransform.f90 and calcpar.f90).The optimization of the trajectory calculations will largely determine the time needed to perform a FLEXPART-WRF experiment with a large number of particles.The optimization of the interpolation/concentration calculations will determine the time needed to perform a FLEXPART-WRF run with a large input/output (e.g. using a large computational domain with dense grid spacing and/or many nests).However, since the reading process is not currently parallelized, the effective reduction of computation time will also be determined by the reading of the meteorological input data.These benchmark values are indicative only, since they depend on the computer system used.Our OPENMP tests were performed for different numbers of cores using a Xeon E31 225 processor with eight cores.Using OPENMP, the speed-up factor can be expected to be on the order of ∼ 4.6 using 6 cores.The overall speed-up factor (i.e., including the reading process of the WRF output) will depend on the time resolution of the input data, the size of the WRF domain, the time step used in FLEXPART-WRF and the number of trajectories.
The node-to-node intercommunication for MPI instances depends strongly on the connection speed between the nodes.The benchmark for the OPENMP+MPI experiments in Table 2 was made on a system with Gigabit ethernet connections.Each node had 12 cores.We limited our tests to four nodes, which is probably a typical value for such a FLEX-PART experiment.Our test was made with 40 million trajectories, a number typically needed for simulating forward plumes of anthropogenic pollutants over a large domain.Using four nodes, the overall speed-up time using MPI together with 6 OPENMP threads (including reading process and interpolation) was about 10.

Ingestion of the meteorological data
Like FLEXPART-ECMWF/GFS, FLEXPART-WRF defines the computational domain, characteristics and grid specifications from the input meteorological data.In the case of a nested input configuration, the coarsest and largest domain will be the one defining the geographical extent of the computational domain.WRF output can come in different projected coordinate systems.FLEXPART-WRF uses the same horizontal coordinates as WRF.The x and y coordinates are given in meters from the lower left corner of the WRF output domain.
FLEXPART-WRF reads the projection information to convert the WRF coordinates given in meters into latitude/longitude coordinates if necessary (i.e. if release boxes or output grid are defined in geographical coordinates).FLEXPART-WRF uses the cmapf_mod.f90and wrf_map_utils.f90modules and map_proj_wrf.f90routines to convert back and forth between WRF coordinates (in meters) and latitude/longitude coordinates for Lambert conformal, polar stereographic, Mercator, and latitude/longitude projections.FLEXPART-WRF tests the uncertainty in the projection transformation before beginning the experiment using latitude and longitude 2-D fields from WRF. FLEXPART-WRF terminates if the error is larger than 2 % of the horizontal grid spacing.
Unlike in FLEXPART-ECMWF/GFS, no singularities are expected at the pole, as WRF should be used with a polar stereographic projection if a pole is included in the domain.Hence, in contrast to the FLEXPART-ECMWF/GFS version, there is no specific treatment of the wind or coordinates near the pole.
Unlike PILT, map factors are used in FLEXPART-WRF to convert the displacement of the particles from the physical displacement on Earth to the WRF domain in advance.f90.While the grid distances x and y in the WRF domain are constant within the WRF domain, the corresponding distances on Earth are not.Therefore, 2-D map factors are used to calculate the true horizontal distance on Earth at any point in the WRF domain.The Lambert conformal, polar stereographic and Mercator projections in WRF are isotropic, which means that the map factors along x and y directions are identical.However, map factors along x and y directions differ for the latitude/longitude projection used for a WRF run at global scale.Map factor (MAPFAC_M) or map factors along x and y directions (MAPFAC_MX, MAP-FAC_MY) are read from WRF output.

Vertical interpolation and wind conversion
The WRF model output is on a Arakawa C-grid with terrainfollowing pressure based sigma levels (also called eta level in WRF manual), where the horizontal and vertical wind components are defined on a staggered grid (Fig. 2).In FLEXPART-WRF, the winds are interpolated onto the grid cell center so that all the meteorological data are on the same grid.Therefore, no external preprocessing of the WRF output is needed.
Wind data from WRF output is on sigma levels.The subroutine verttransform.f90interpolates the 3-D data onto the FLEXPART-WRF Cartesian terrain-following coordinates, applying the appropriate correction factors (Fig. 2).FLEXPART-WRF uses a Cartesian terrain-following coordinate system to save computation time, since the PBL turbulence parameterization is done in this coordinate system.In addition, an option (ADD_SFC_LAYER) is given to add a near surface level with the wind speed at 10 m a.g.l., and the 2 m temperature and the 2 m specific humidity read in from the WRF output.This option is not recommended if the first sigma level of a WRF run is below 10 m in altitude.Brioude et al. (2012b) have shown that large uncertainties can arise from using W , the vertical wind in Cartesian coordinates that is output from WRF by default, when significant orography is encountered in the domain.This version of FLEXPART-WRF therefore gives the possibility to use different vertical velocities, and allows choosing between timeaveraged and instantaneous winds.Depending on the choice, FLEXPART-WRF will require additional output fields from WRF.
By default, the WRF output includes instantaneous horizontal wind along x and y axes on sigma levels (U and V , in m s −1 ) and a geometric Cartesian vertical velocity (W , in m s −1 ).To convert W from a geometric vertical coordinate onto a terrain-following coordinate (either Cartesian or σ coordinate), a correction factor based on the gradient of orography has to be applied.This is expressed in the form of Eq. ( 1), where H is the orography from the WRF output and w the terrain-following vertical velocity.
If the WRF registry is modified, WRF can output an instantaneous mass-weighted sigma dot vertical velocity (WW, σ in s −1 ).σ and time-averaged σ follows the same conversion as the vertical time-averaged wind.with Z = z + H. (2) The last term on the right-hand side of the equation is negligible.The corrective terms (second and third terms) in Eq. ( 2) are much smaller in magnitude than those from Eq. ( 1).Using the instantaneous σ therefore reduces the mass conservation issues encountered when using the geometric Cartesian vertical velocity (Brioude et al., 2012b).
Since WRF version 3.3, an option allows the user to output mass-weighted, time-averaged winds on sigma levels (AVGFLX_RUM, AVGFLX_RVM, AVGFLX_WWM).Time average winds reduce the uncertainty of off-line Lagrangian trajectories because the time-mean wind is more representative for the average transport during a given time interval than the rather noisy instantaneous wind.The improvement owing to the use of time-averaged winds compared to instantaneous winds will depend on the meteorological situation and the time interval of the WRF output.See Brioude et al. (2012b) for a comparison between instantaneous and time-averaged winds as input for FLEXPART-WRF.
The mass-weighted winds are converted to uncoupled winds by applying a mass factor (MU and MUB).The time-averaged vertical wind is a mass-weighted time average σ , and therefore its vertical coordinate transformation in FLEXPART-WRF is equivalent to the one applied on the instantaneous σ in Eq. ( 2).
Since the time-averaged winds are integrated over the WRF time interval output, they are valid at time t − 0.5 t, with t being the WRF time interval output, and t being the valid time of the WRF output.The user has the possibility of setting the switch TIME_OPTION = 1 to let FLEXPART correct the valid time of the time-averaged wind internally.If the WRF output has been preprocessed and the valid time of the time-averaged wind is correct, TIME_OPTION = 0 has to be used.
If instantaneous σ or time-averaged winds cannot be found in WRF output, FLEXPART-WRF can determine the vertical velocity internally based on mass-conservation and the hydrostatic assumption.While this assumption is not necessarily true at the mesoscale, Brioude et al. (2012b) have found that the uncertainties of this divergence-based vertical velocity are much smaller than those of w.Brioude et al. (2012b) have shown that using σ instantaneous velocities or time-averaged sigma dot velocities lead to lower wind divergences and thus better results than Cartesian vertical velocity.The authors argue that there are two possible reasons for the larger errors when using w.The first one is that w cannot be accurately converted into a terrainfollowing vertical velocity due to the partial derivative terms of orography in Eq. ( 1), which has significant uncertainties.In the case of σ , on the other hand, the correction factor involves only the horizontal differences in geopotential between two terrain-following coordinates.The second reason might come from the fact that the mass-weighted w is a prognostic variable, while the mass-weighted σ is a diagnostic variable.The prognostic mass-weighted w might be more sensitive to numerical errors than σ .For more details the reader is referred to the original work by Brioude et al. (2012b) and Nehrkorn et al. (2010).The conversion to the FLEXPART internal vertical coordinate system is made in routine verttransform * .f90and the routines include the dependency on the vertical wind velocity selected by the user (Fig. 2).

Boundary layer treatment and turbulence
By default, two boundary layer parameters are read from the WRF output: the friction velocity u * (UST), and the sensible heat flux (HFX).If those variables are not available, a profile method is automatically used to calculate them.An option (SFC_OPTION) gives the user possibility of either reading the PBL height (PBLH) from WRF (SFC_OPTION=1) or letting FLEXPART calculate it using a Richardson number threshold (SFC_OPTION=0).The latter path uses the same boundary layer parameterization www.geosci-model-dev.net/6/1889/2013/ as in FLEXPART-ECMWF/GFS.Different PBL schemes in WRF calculate the boundary layer height differently, so the user must be aware of these differences if SFC_OPTION=1 is used.Like in FLEXPART-ECMWF/GFS, the user can decide if an additional term based on subgrid-scale variation of topography and local Froude number will be used in the PBL height calculation.With switch SUBGRID=1, the subgrid variation (envelope PBL height) is used.See Stohl et al. (2005) for further details on the envelope PBL height.The use of this subgrid parameterization is not recommended for mesoscale WRF simulations (horizontal grid spacing ≤∼ 10 km).
Unlike FLEXPART-ECMWF/GFS, FLEXPART-WRF has four different options for the turbulent wind parameterization in the PBL: -TURB_OPTION = 0: Turbulence is completely switched off and FLEXPART works as a nondispersive Lagrangian trajectory model.
-TURB_OPTION = 1: FLEXPART-WRF internally calculates PBL turbulent mixing using the Hanna turbulence scheme (Hanna, 1982).This scheme is based on boundary layer parameters PBL height, Monin-Obukhov length, convective velocity scale, roughness length and friction velocity.These parameters are either read from WRF or calculated within FLEXPART.Different turbulent profiles are used for unstable, stable or neutral conditions in the PBL.The switch CBL=1 can furthermore be used to assume skewed rather than Gaussian turbulence in the convective PBL.In this case, the CBL formulation of Luhar et al. (1996), extended to account for the gradient in air density formulated by Cassiani et al. (2013), is used for the drift term in the Lagrangian stochastic equation.This drift is based on a skewed probability density function (PDF constructed as a sum of two Gaussians distributions.The formulation assumes that the third moment of vertical velocity in convective conditions has the profile proposed by Luhar et al. (2000) and uses a transition function to modulate the third moment from the maximum possible values in fully convective conditions to zero in neutral conditions.In the present implementation a steady and horizontally homogeneous drift term is used even for unsteady and non-homogenous conditions as done in the standard FLEXPART drift.However, the skewed PDF drift seems to be more sensitive to this inconsistency and a re-initialization procedure of particle velocities has been introduced to avoid numerical instability (Cassiani et al., 2013).If selected, the CBL option requires shorter time steps (typically values of CTL = 10 and IFINE = 10 are used in the command section of the input file) than the standard Gaussian turbulent model in any case.With the computational time per time step being about 2.5 times that of the standard Gaussian formulation the CBL options is, therefore, much more computationally demanding.It is important to note that the CBL formulation smoothly transitions to a standard Gaussian formulation when the stability changes, but the actual equation solved inside the model for the Gaussian condition is still different from the one used in the standard FLEXPART turbulent option, since, in this equation, it is not the scaled velocity to be advanced, but the actual particle velocity.Full details of the CBL implementation can be found in Cassiani et al. (2013).
-TURB_OPTION = 2: The turbulent wind components are computed based on the prognostic turbulent kinetic energy (TKE) provided by WRF.TKE is partitioned between horizontal and vertical components based on surface-layer scaling and local stability with the Hanna scheme.
-TURB_OPTION = 3: TKE from WRF is used but the TKE is partitioned by assuming a balance between production and dissipation of turbulent energy.
TURB_OPTION=2 and TURB_OPTION=3 have been tested and found to violate the well-mixed criterion which states that a homogeneously mixed tracer in PBL should stay homogeneously mixed.This is especially a problem in complex terrain.The authors therefore advise using TURB_OPTION=1.
Above the PBL, turbulence is based on a constant vertical diffusivity of D z = 0.1 m 2 s −1 in the stratosphere, following Legras et al. (2003), and a horizontal diffusivity D h = 50 m 2 s −1 in the free troposphere.See Stohl et al. (2005) for further details.

Dry and wet removal processes
Until FLEXPART version 8, wet deposition was treated without differentiating between in-cloud and below-cloud scavenging.The wet scavenging process, as explained in Stohl et al. (2005), was implemented as a single decay process depending on a bulk scavenging coefficient which, in turn, depended on the precipitation rate extracted from the meteorological input data.Since version 8, in-cloud and belowcloud scavenging are treated differently, and the treatment of aerosols and gases follow different formulas.As explained in the online manual (http://www.flexpart.eu/wiki/FlexDoku),information about cloud base and top is needed and deduced from the relative humidity input field using an 80 % threshold.As for all the meteorological parameters, the value assigned to each particle is obtained by a nearest-neighbor interpolation method, both in time and space.The implementation of this scheme is somehow too simplified and can sometimes lead to unrealistic patterns in the deposition fields.The interpolation of the precipitation fields together with the precipitation disaggregation in the extraction routines generates smoothing, that, apart from the underestimation of the precipitation maxima, generates events in which a particle could encounter precipitation not associated with a precipitating cloud and thus, according to the FLEXPART-ECMWF/GFS version 8.0-9.02implementation, would not undergo scavenging.
A second problematic issue is that this approach overlooks the existence of convective precipitation occurring within subscale and unresolved convective clouds and thus underestimates the in-cloud scavenging.In FLEXPART-WRF the same issues needed to be addressed, but were adapted to the differences in temporal and horizontal resolutions and to the different disaggregation of the cumulative precipitation fields provided by WRF.Seibert and Arnold (2013) developed and implemented a better wet deposition scheme for the mainstream FLEXPART-ECMWF/GFS.In FLEXPART-WRF the same approach is taken and adapted to the particular needs of this version.This new scheme includes: 1.The 3-D cloud fraction from WRF is not used and does not mask the scavenging process.
2. The 3-D clouds are diagnosed differently.An initial value of 90 % in relative humidity is taken as a threshold for cloud existence, and to obtain the cloud base and height.If precipitation exists but no cloud is diagnosed, the relative humidity threshold is reduced recursively by 5 % incrementation until a cloud is detected.
If the relative humidity goes down to 30 %, the precipitation is mainly convective and FLEXPART detects only a cloud with a 6 km top, then FLEXPART reassigns the cloud thickness according to the precipitation intensity -from 0.5 km to 8 km for weak precipitation, and from 0 km to 10 km for heavy precipitation.
3. Cloud base and top are interpolated to the particle position.Neighbor grid cells without diagnosed clouds are not considered in the interpolation.
4. If no cloud is diagnosed, but precipitation exceeding a minimum rainfall rate is present, the bulk parameterization previous to version 8 is used with hard-coded scavenging coefficients (by default, aerosol scavenging coefficients).
A new study (Seibert and Philipp, 2013) improved the wet deposition scheme by modifying wet scavenging coefficients.The aerosol lifetime of radionuclides with this new scheme has been evaluated using measurements from the Comprehensive nuclear Test Ban Treaty network in the FLEXPART-ECMWF/GFS model.Improvement in the aerosol lifetime compared to wet deposition has been characterized.More investigations on the wet and dry deposition schemes are still needed.Dry deposition follows the same parameterization used in the latest version of FLEXPART-ECMWF/GFS.To simulate dry removal processes, FLEXPART-WRF needs to read additional land use and roughness length data available in the data directory where the source directory is located.Those files should be copied or linked into the same directory in which the FLEXPART-WRF binary is used.

Model output
Three choices of format are given to the user for the FLEXPART-WRF model output.First the user can output individual trajectory information (particle position in space and time, mass of species carried) by using the switch IPOUT to track individual particles.This option is also used for continuing a previous FLEXPART-WRF run by reading the information on the last position of each particle.The output can be formatted in ASCII or as a binary file.
The second choice is to output the center of mass and clustered particle positions with additional information (percentage of tropospheric air, PBL height, etc.) by using the switch IOUT=4 or 5.The results can be found in the trajectories.txtfile in the FLEXPART-WRF output directory and are only available in ASCII format.
The third choice is to distribute the information from each particle onto a regular grid using a uniform kernel.In FLEXPART-WRF, the uniform kernel is not used during the first 2 h after the particle's release in order to prevent smoothing in the vicinity of the source.This 2 h threshold is also applied to the gridded deposition, and for both the single and the nested domain configurations.The gridded output is an efficient format for comparing FLEX-PART results to measurements or other model results.The FLEXPART-WRF gridded output can be given in various units, for instance, as a concentration, a volume mixing ratio, or a source-receptor sensitivity (for backward trajectories).The switches IND_SOURCE and IND_RECEPTOR can be used to (1) modify the unit of the quantity released and (2) change the unit of the gridded output following the description by Seibert and Frank (2004).
The output grid coordinates can be defined using the coordinate of the lower left corner and the number of grid cells and either the resolution of the gridded FLEXPART-WRF output (switch OUTGRIDDEF=0) or the coordinate of the upper right corner (switch OUTGRIDDEF=1).
Two possibilities are given for the projection of the gridded output.The FLEXPART-WRF output can be defined following the WRF projection using the switch OUTGRID_COORD=0 (green grid in Fig. 1).The corners of the domain and its resolution are then defined in meters, with the origin being the lower left corner of the WRF domain.The choice of this so-called irregular (in the longitude/latitude sense) output grid has the advantage of making use of the entire WRF domain, and minimizing the interpolation error when applying the kernel.Coordinates in longitude and latitude of the FLEXPART-WRF output can be obtained from the latlon.txt(coordinates of the center grid) and lat-lon_corner.txt(coordinates of the lower left corners) files in the output directory or in the header file if NetCDF format is used.
The second possibility is to define a FLEXPART-WRF output with regularly spaced latitudes and longitudes using the switch OUTGRID_COORD=1 (red grid in Fig. 1).In this case, an interpolation is needed to apply the uniform kernel on a latitude/longitude grid.The associated numerical inaccuracies can be potentially important in a polar projection.However, it is the easiest option for comparing FLEXPART-WRF and FLEXPART-ECMWF/GFS results, or to use whenever output on a latitude/longitude grid is needed.
The FLEXPART-WRF gridded output is formatted either as a binary file or as a NetCDF file.The binary format is compressed by FLEXPART using a custom-designed algorithm which dramatically reduces the size of the sparse matrix data if there are many zero values.It follows the same format as FLEXPART-ECMWF/GFS output.Fortran, Matlab and Python reading routines are available for reading in the FLEXPART output.Further information is given at http://www.flexpart.eu.The NetCDF format uses NetCDF-4 libraries to compress the NetCDF FLEXPART-WRF output files.A header file is generated that includes the latitude and longitude coordinates and information on the simulation.A switch, NCTIMEREC, can be used to increase the number of time frames in a single NetCDF file.A NetCDF FLEXPART-WRF output file can include wet and dry deposition fields, concentration, mixing ratio fields and sourcereceptor relationship depending on the type of simulation or species used.Adding more time frames in NetCDF files reduces the overall disk space needed by the NetCDF FLEXPART-WRF output.The compression of the NetCDF file is similar to the binary format if NetCDF library version 4 is used.Using NetCDF library version 3 or earlier versions, the size of the output files is at least 4 times larger.

Installation, compilation and execution
FLEXPART-WRF is coded following the Fortran 95 standard, including module options.It has been tested with the Portland Group Fortran compiler, the Intel Fortran compiler, and the free GFORTRAN compiler in serial, OpenMP, and hybrid OpenMP-MPI modes.FLEXPART-WRF directly reads the WRF output without the need of any external preprocessors.WRF output is commonly given in a NetCDF format and therefore, FLEXPART-WRF requires compilation with NetCDF libraries that can read WRF output.To benefit from the compression capability of FLEXPART-WRF NetCDF output format, FLEXPART-WRF has to be linked to a NetCDF-4 library.
The Fortran subroutine par_mod.f90 is used to give the maximum dimension of different variables in the model.Prior to compilation, the user needs to modify the maximum dimensions of the WRF input files at line 126, the maximum number of nests and their maximum horizontal dimension at line 160, and the maximum number of species used in FLEXPART-WRF at line 211.Unlike FLEXPART v9.02 or earlier versions, FLEXPART-WRF does not need to have a maximum number of particles.The model automatically handles the number of particles released for each experiment to save memory space.
A single makefile is provided, makefile.mom,which provides the user with a ready-to-work makefile for any of the combinations mentioned above once the user has adapted the library path to her or his own computational environment.This helps inexperienced users with installation and compilation, and requires no in-depth knowledge of compilers and compiler options.
The FLEXPART-WRF model can be compiled in three different versions.The serial version does not require a specific library besides a NetCDF library and can be compiled by using the following: make -f makefile.momserial The second version is an OPENMP parallel version that uses more than one core to run the code in shared memory (single PC or single node).This version should be used on any personal computer that has a multi-core CPU (Central Processing Unit) and requires compiler libraries that support multi-threading in shared memory.This version is compiled by using the following: make -f makefile.momomp The third version is a hybrid parallel version that uses MPI and OPENMP.The system needs MPI libraries to be installed beside those that support shared multi-threading.This version can be used on a super computer, a cluster of PCs or any system that has distributed memory.We recommend using one MPI task per node, and using OPENMP for the multithreading in shared memory.This version is compiled by using the following: make -f makefile.mommpi If a module is modified (e.g.par_mod.f90or com_mod.f90),or if a different version (serial, omp or mpi) needs to be compiled, the object files have to be cleaned before compilation using the following: make -f makefile.momclean A FLEXPART-WRF experiment can be run using an input file named flexwrf.input.forward1using the following command: flexwrf31_pgi_mpi flexwrf.input.forward1 There is no naming convention for the input file passed to FLEXPART-WRF.Therefore, using the same FLEXPART-WRF binary to run several FLEXPART-WRF experiments at once is easy.

Visualization tool
The FLEXPART-WRF binary output format follows the same format as FLEXPART-ECMWF/GFS version.NCAR Graphics based on fortran, matlab subroutines and python subroutines can be used to read and generate plots from FLEXPART output.The software pflexible (https:// bitbucket.org/jfburkhart/pflexible)written in python provides a set of easily modifiable scripts, allowing the plotting of gridded output, such as concentrations at different vertical levels and deposition fields in single or nested output domains.The reader is referred to http://www.flexpart.eufor further details and latest improvement.
The NetCDF FLEXPART-WRF output format can be read and displayed using common visualization tools.

Conclusions
The official FLEXPART versions released so far have ingested input data from the global ECMWF or GFS models.While several derivations of FLEXPART exist, which allow using mesoscale model output, this paper has described the first official FLEXPART-WRF release.As both WRF and FLEXPART have large user communities, we are confident that this development will be of interest for many atmospheric scientists.The possibility to use input from mesoscale meteorological models for dispersion calculations opens a wide range of possibilities.Like FLEXPART-ECMWF/GFS, FLEXPART-WRF has been released under the GNU GPL and is made available at the new website http://www.flexpart.eu.Future updates will also be available from this website, which facilitates collaborative work by the FLEXPART developers.Users of the FLEXPART model are encouraged to register at this website and make their developments available to others, too.Future developments would include the following: -Adapt reading and transformation subroutines to other mesoscale models.
-Use of snow, 3-D cloud information, land use and soil information from WRF in deposition processes.
-Wet and dry deposition and differences among nests being further tested.
-Continue the work on using turbulence and boundary layer parameters from WRF.The use of the TKE 3-D field from WRF would help to resolve the problem of defining the PBL height.
-Implement new convective schemes or use WRF output of entrainment/detrainment to estimate redistribution of particles by subscale convection instead of using a convective scheme.
-Implement more sophisticated kernel methods, such as a Gaussian kernel, to reduce the smoothing effect of the kernel that can be large at high resolution.
-Include non-stationary and horizontally nonhomogenous drift terms in the stochastic equations.
resolved winds are the horizontal and vertical wind 3-D fields.The latitude and longitude 2-D fields are used to validate the projection calculation in FLEXPART.The map factor field is used to correct the displacement of the trajectories.The pressure 3-D field is used for density calculations and vertical coordinate transformations.The geopotential is used to interpolate the WRF vertical coordinate onto the FLEXPART vertical coordinate.The specific humidity and temperature 3-D fields are used for calculating air density and different parameters used in PBL parameterizations.

Fig. 1 .
Fig.1.Lambert conformal, Mercator and polar stereographic projections from WRF (blue grid).Two types of projection can be defined for the FLEXPART output: A projection that follows the WRF projection output (green grid), the so-called irregular output grid, and a projection based on regular latitude/longitude coordinates (red grid), the so-called regular grid.See Sect.2.7 for further details on the FLEXPART output.

Fig. 2 .
Fig. 2. Top: Arakawa C grid representing the staggering of the wind field (U, V , W ) and T , q v , TKE at the center of the grid.Bottom: terrain-following coordinates used in WRF (σ ) and FLEXPART (z).WRF fields are interpolated vertically onto the terrain following Cartesian coordinate z in subroutine verttransform.f90

Fig. 3 .
Fig. 3. Vertical profile of concentration of a tracer emitted from a point source at the surface in a convective PBL after 3 h of transport.Vertical profiles from each available PBL scheme in FLEXPART-WRFv3.1 are shown.

Table 2 .
Speed factors expected on the trajectory and interpolation calculations using different number of OPENMP threads, and on the trajectory calculations for different number of nodes using MPI threads in hybrid mode.The total speed-up factor in hybrid mode can be calculated by multiplying the speed factors for OPENMP and MPI.See the text for further details.
Table2presents speed-up factors for computation times using OPENMP and OPENMP+MPI, relative to the same FLEXPART run without parallelization.The time needed for a FLEXPART-WRF run to finish is determined mainly ) the time needed to read the WRF output and interpolate them onto the FLEXPART vertical coordinate, and www.geosci-model-dev.net/6/1889/2013/Geosci.Model Dev., 6, 1889-1904, 2013 by (1