A benchmark for testing the accuracy and computational cost of shortwave top-of-atmosphere reﬂectance calculations in clear-sky aerosol-laden atmospheres

. Accurate calculations of shortwave reﬂectances in clear-sky aerosol-laden atmospheres are necessary for various applications in atmospheric sciences. However : , computational cost becomes increasingly important for some applications such as data assimilation of top-of-atmosphere reﬂectances in models of atmospheric composition. This study aims to provide a benchmark that can help in assessing these two requirements in combination. We describe a protocol and input data for 44,080 cases involving various solar and viewing geometries, four different surfaces (one oceanic bidirectional reﬂectance function 5 and three albedo values for a Lambertian surface), eight aerosol optical depths, ﬁve wavelengths, and four aerosol types. We ﬁrst consider two models relying on the discrete ordinate method: VLIDORT (in vector and scalar conﬁgurations) and DISORT (scalar conﬁguration only). We use VLIDORT in its vector conﬁguration as a reference model and quantify the loss of accuracy due to (i) neglecting the effect of polarisation in the DISORT and VLIDORT (scalar) models and (ii) decreasing the number of streams in DISORT. We further test two other models: the 6SV2 model relying on the successive orders of 10 scattering method and FLOTSAM, a new model under development by two of the authors. Typical mean


Introduction
Accurate radiative transfer calculations in the Earth's atmosphere are necessary for some applications such as remote sensing of the atmospheric and surface properties, numerical weather prediction, and climate modelling.In this study : , we are interested more specifically in the calculation of radiances in the shortwave spectrum in clear-sky aerosol-laden atmospheres of the Earth for reasons explained below.Assuming perfect knowledge of the optical properties of the atmosphere and surface and a plane parallel atmosphere, solving the radiative transfer equation is a very well posed physical problem that essentially reduces to a (not-so-easy) algorithmic and numerical problem.Accurate methods have existed for a long time and testing the accuracy of newly developed methods has been standard practice in the radiative transfer community for some time (de Haan et al., 1987;Stamnes et al., 1988;Spurr, 2008;Kotchenova et al., 2006;Kotchenova and Vermote, 2007;Barlakas et al., 2016;Korkin et al., 2017).Well-used state-of-the-art radiative transfer models generally show agreement within 1% (or better) under most conditions tested.There has also been a number of intercomparison or benchmarking exercises for shortwave radiance calculations : , largely motivated by the requirements for accurate calculations in ground-based and satellite aerosol retrievals (Kotchenova et al., 2006;Kokhanovsky et al., 2010a;Emde et al., 2015Emde et al., , 2018)).A classical benchmarking exercise consists in comparing the model results to well-known solutions for simple cases of a Rayleigh scattering atmosphere (Coulson et al., 1960;Natraj et al., 2009).However such cases are very specific and do not represent the variety of conditions met in the real atmosphere.
Forecasts and reanalyses of atmospheric aerosols have been produced operationally by combining numerical models and aerosol observations in data assimilation systems (Bocquet et al., 2015;Hollingsworth et al., 2008).Aerosol information provided by satellite retrievals is often assimilated in aerosol models both in research (Collins et al., 2001) and operational (Benedetti et al., 2018) contexts.A widely assimilated quantity is the aerosol optical depth (AOD), a column integrated quantity of ::: that :::::::: quantifies : the interaction of the aerosols with the electromagnetic radiationin the atmosphere :::::: aerosols ::::: with :::::::::: atmospheric ::::::: radiation.So far, the assimilation of AOD retrievals has been successful in constraining the model variables towards the equivalent measured variables, but as the resolution, complexity : , : and quality requirements of the products vary, possible inconsistencies between the assumptions used for the satellite aerosol retrieval and the model or between the different satellite retrieval algorithms arise.Furthermore, there is some information on aerosol types in the observed reflected solar radiation that may not be included in AOD products and istherefore : , :::::::: therefore, : lost for data assimilation.
An alternative pathway to the assimilation of AOD retrieved from passive satellite measurements is the direct assimilation of these measurements ::::::::::::::::::: (Benedetti et al., 2018).The main challenge of doing so is basically that the satellite retrieval algorithm has to be replaced by a somehow equivalent procedure inside the data assimilation loop.In particular : , a shortwave radiative transfer model (RTM) would act as a major piece of the observation operator , and some stringent requirements on accuracy/quality, flexibility, : and runtime would have to be met.In the case of variational data assimilation, similar requirements would apply to the Jacobian of the RTM.The purpose of this benchmark paper is to assess the accuracy and computational cost of different RTMs to assimilate top-of-atmosphere reflectances to constrain aerosols.We will exclude the assessment of the Jacobian of the RTMs in this work.Successful implementation of this approach could help to improve the performance of the current aerosol data assimilation systems by providing: i) consistency between modelled aerosol type, aerosol vertical profile, atmospheric conditions : , and the radiative transfer calculation, ii) potentially better characterisation of the error covariance matrices for the observations, iii) flexibility when there is a change in satellite instrument, and iv) capability of assimilating simultaneously several different sources of aerosol information.Additionally, developments in this direction could help improving the numerical weather prediction (NWP) models by providing a functional shortwave observation operator to the data assimilation system.
This ::: The : latter also opens the possibility of improvements in the data assimilation of shortwave radiances for the atmospheric and surface properties (Martin et al., 2014), including cloudy atmospheres (Martin and Hasekamp, 2018).These authors show that the adjoint framework can be computationally efficient, but do not present the computation ::::::::::: computational : cost of the SHDOM and FSDOM models that they use.
To our knowledge, only a limited number of studies have been able to assimilate shortwave radiances in atmospheric composition models.Weaver et al. (2007) Migliorini (2012) discusses some conditions where the assimilation of radiances is equivalent to the assimilation of the retrieval products.For this to happen, the observation operator has to be reasonably (i.e., within the errors of the variable) approximated by a linear function of the control variable , and the background information has to be chosen such that the same information content of the observations is propagated in both assimilations.In our case, the first condition is usually assumed true, but the second condition is difficult to implement and to evaluate.
For operational data assimilation purposes, the computational cost of the radiance observation operator is crucial (Weng, 2007).While the computational burden of assimilating satellite radiances is expected to be larger than that of assimilating retrievals, it has not yet been specified, to our knowledge, how fast such an observation operator has to be.A rough estimate could be made using the computing time of the infrared radiance calculation as a guideline.For example, in the European Centre for Medium-range Weather Forecasts (ECMWF) data assimilation system, the infrared radiance observation operator takes around 0.1 ms per profile and channel in a highly parallelised system and roughly three times more including the tangent linear and adjoint code calculations.Taking into account the current infrastructure of the Copernicus Atmospheric Monitoring Service (CAMS) at ECMWF, a first estimate of the maximum computing time for shortwave radiance calculation would be approximately 10 ms per profile and channel, but ideally of the order of 1 ms (or about ten times slower than the current RTTOV infrared calculations) in this particular ECMWF/CAMS infrastructure.Likewise, there is a lack of knowledge of the required accuracy of the shortwave radiance calculations needed.A first order estimate would be an accuracy similar to the measurement or retrieval error, but this could depend on the data assimilation configuration and the measurement characteristics (in terms of multi-spectral and multi-angle viewing capability, quality of the prior including that of the surface reflectance, etc.).There may also be a variety of RTMs for different atmospheric conditions (e.g., clear-or cloudy-sky) or wavelengths (ultraviolet, visible or near-infrared).
How computational speed is achieved depends on the desired application.Some methods, such as the discrete ordinate method or the Monte Carlo method (e.g., Emde et al., 2015Emde et al., , 2018)), can compute radiances for a set of viewing geometries (given one sun geometry) in a single simulation.Other methods, such as the successive orders of scattering (e.g., Lenoble et al., 2007), require one simulation for each viewing geometry (again given one sun geometry).The latter is not necessarily a disadvantage for applications where a different atmospheric profile of optical properties is associated with every viewing geometry.When the problem to be solved requires a lot of tasks to be processed in parallel and frequent thread synchronisation, as is the case in many three-dimensional atmospheric models, it is the slowest tasks that matter.Therefore : , the tail of the runtime distribution matters as much, if not more, than the average or median runtime across atmospheric conditions and viewing geometries.In the specific case of aerosol radiance assimilation in NWP models, it may nevertheless be possible to leave out the conditions and geometries that take too long, either by eliminating them in advance or by stopping them on the fly during the first iteration if the corresponding tasks exceed a threshold time.In the end, there is likely to be some trade off to be accepted between accuracy and computational cost of the radiance calculations.
Keeping this introduction in mind, the objective of this study is to provide a benchmarking tool for evaluating shortwave radiances in the clear-sky aerosol-laden atmosphere under a wide range of aerosol conditions.The benchmarking is designed to assess both the accuracy and computational cost of radiative transfer models.Although data assimilation of aerosol radiances is the key motivation for this work, we report computational cost both for single and multiple geometries to make the scope more general.We provide all necessary input and output data for this benchmarking as Supplementary Material in NetCDF and ACSII format so that it becomes an available resource for model developers and potential model users.The protocol is described in the next section.It is followed by a test of the protocol in different RTM ::::: RTMs : and some conclusions.

Benchmark protocol
The detailed benchmark protocol is provided as Supplementary Material.Only the main features are summarised here.We con- of conditions because molecular scattering and the impact of polarisation are more important at the shorter wavelengths.

Vertical profiles
The molecular and aerosol profiles are described below.It should be noted that we assume the atmosphere to be plane-parallel, i.e. : , there is no correction applied for sphericity of the Earth's atmosphere.

Molecular profile
We use a midlatitude summer atmospheric profile (McClatchey et al., 1971) and provide the layer altitude (km), pressure (hPa), temperature (K), relative humidity (%), water vapour mixing ratio (ppmv), ozone mixing ratio (ppmv), humid and dry air density (g/cm 3 ) on 50 levels ranging from z = 0 to z max = 120 km.The molecular scattering and absorption optical depths are provided for the 49 corresponding layers from the surface to the top-of-atmosphere using the radiative transfer code GAME (Dubuisson et al., 2004(Dubuisson et al., , 2006)).In particular, the absorption optical depths have been calculated with GAME for the five selected wavelengths by averaging the gaseous absorption (for H 2 O and O 3 ) in each layer from the k-distribution parameterisation used in the GAME code.Note that gaseous absorption is weak at the selected wavelengths so this approximation remains valid.
The molecular depolarisation ratios are taken from Bodhaine et al. (1999), with values of 0.0288567, 0.0283241, 0.0279199, 0.0275716, and 0.0274591 for the five selected wavelengths.There is no obligation to use the 49 layers provided in the protocol.
Indeed : , it could be interesting to perform the RTM calculations on a reduced vertical grid to decrease the computational cost.
In case a different vertical grid is used, we ask that the vertically-integrated molecular scattering and absorption optical depths to be conserved.

Aerosol profile
We define the vertical profile of the aerosol extinction coefficient (σ aer ) as an exponential function with a height scale H of 2 km, normalized to unity when integrated between z = 0 and z = z max = 120 km, that is: The aerosol optical depth of a layer i, τ i , is computed as: where :::: with : ∆z i is the layer thickness, that is, the difference between the altitudes of the layer top (z i ) and the layer bottom (z i ).The average aerosol extinction coefficient for layer i then satisfies: We provide σ aer i and τ i for the 49 layers defined in our atmospheric profile, but these quantities can be easily recomputed for any vertical grid.These values are normalised to an AOD of 1, so they have to be multiplied by the actual AOD of each case.

Aerosol types
Four aerosol types have been defined.Aerosol size distributions are assumed to be log-normal with the parameters shown in Table 1.Refractive indices are taken from the literature (see Table 1) and the aerosols are assumed to be internally mixed, with a volume-weighting rule.The four aerosol types defined in this benchmark have been chosen to represent a variety of size and optical properties observed in the atmosphere.Two fine-mode aerosols are defined: industrial scattering (aer_is) is sulphate aerosol with a relative humidity of 50%, while industrial absorbing (aer_ia) is a mixture of the OPAC (Hess et al., 1998) aerosol type INSO (insoluble), SOOT (soot), : and WASO (water soluble, at 50% of relative humidity).The industrial absorbing aerosol is similar to an organic matter aerosol type.We also consider a monomodal dust aerosol and a bimodal sea-salt aerosol (aer_ss).

Aerosol optical properties
Aerosol optical properties were computed by the authors using their own Mie routines and results were cross-checked for consistency.The Mie routine used to generate the input data was rewritten in Fortran based on the implementation of Toon and Ackerman (1981) and was compiled with the Intel Fortran compiler in quadruple precision.The calculations are monochromatic (i.e., no integration was performed on the wavelength).The aerosol size distribution is integrated over the size range reported for each type in Table 1 with a Gaussian quadrature of 10,000 points in logarithm of the radius.The number of terms used to compute the sums of the a n and b n coefficients is approximated as a function of the Mie parameter (x = 2 π r/λ) following Wiscombe (1980).
For each aerosol type, we provide the aerosol optical depths at the 440, 670, 865 : , and 1024 nm wavelengths corresponding to the selected AOD at 550 nm.The aerosol :::: mass extinction coefficient (although not necessary to conduct the RTM calculations), single scattering albedo, : and asymmetry parameter are also given.
The phase matrix is computed and provided at 50,000 points evenly spaced in scattering angle from 0 to 180 • .As the particles are assumed to be spherical, only the S 11 , S 12 , S 33 , : and S 34 elements of the phase matrix are non-zero and provided.
We also decompose the phase matrix in generalised spherical functions (de Rooij and van der Stap, 1984;de Haan et al., 1987) as needed for the vector version of the discrete ordinate method.For this, the phase matrix is integrated with a 8,000 points Gaussian quadrature over the cosine of the scattering angle and we provide the first 750 moments of that decomposition for the six commonly-used elements (α 1 , α 2 , α 3 , α 4 , β 1 , β 2 ).For completeness : , we also provide the corresponding Legendre coefficients (by definition the l th Legendre coefficient equals the l th Legendre moment multiplied by 2l + 1).Thus, : we provide all necessary information on aerosol optical properties for usual RTM ::::: RTMs so that no further processing of the aerosol optical properties should be required.Of course, the RTM ::::: RTMs that neglect polarisation should only consider the S 11 term of the phase matrix or the α 1 Legendre moments.
The accurate reconstruction of the original phase function from the Legendre decomposition (S 11 ) cannot be ensured if a too low number of Legendre moments is used to approximate it.As for the vertical resolution, there is no obligation for this benchmarking exercise to represent the exact phase function and it may be interesting to approximate it to strike a balance between accuracy and computational cost ::: (see ::: for :::::::: example ::::::::::::::: Ding et al. ( 2009)).The minimum number of Legendre moments for which the approximated phase function is positive depends on the aerosol type and the wavelength (basically : , on how large the forward peak is).For the industrial scattering and industrial absorbing aerosols, a minimum of 4 to 8 moments is required (depending on the wavelength).For dust aerosol, the minimum is between 14 moments (at 1024 nm) and 40 moments (at 470 nm), while for sea salt, the minimum is between 144 moments (at 1024 nm) and 348 moments (at 470 nm).
Figure 1 shows the phase function and the linear depolarisation ratio for the four aerosol types considered in this benchmark.

Surface reflectance
Two surface reflectance models are selected, a Lambertian and an oceanic bidirectional reflectance distribution function (BRDF) model.For the Lambertian model, three surface albedos were chosen, namely 0, 0.05, : and 0.1.The spectral dependence of the surface reflectance is ignored, i.e., the albedo is assumed the same for the five wavelengths considered.The second surface reflectance model is the oceanic glitter model from Mishchenko and Travis (1997).This model is suited for both full vector radiative transfer models (i.e., including polarisation) and for the scalar approximations.
Wind speed is set to 10 m s −1 but the Mishchenko and Travis (1997) code assumes an isotropic Gaussian distribution of (oceanic) surface slopes, so the wind orientation is not taken into account.The salinity is set to 34.3‰ and water refractive indices are interpolated, for each wavelength, from the data provided in Hale and Querry (1973) as inputs to the Mishchenko and Travis (1997) routine (available at https://www.giss.nasa.gov/staff/mmishchenko/brf/).The effect of shadowing is also taken into account.The convention for the azimuthal angle used in the Mishchenko and Travis (1997) routine is the opposite of the convention that we use below to define geometries, so the coupling between the atmospheric RTM and the oceanic code has to be done carefully.

Geometries and configurations
We have selected 8 cases for aerosol loadings in the atmosphere, with AOD at 550 nm ranging between 0 (no aerosol) to 2 (high aerosol loading).These values refer to the 550 nm wavelength and the corresponding AODs can be estimated for the other wavelengths depending on the aerosol types.A table with the equivalences in AOD, for each aerosol type and wavelength is also provided in the benchmark protocol.
We have further selected four solar and four viewing zenith angles, θ s and θ v , from 0 • (zenith) up to 60 • .Given the provided surface reflectance models , and the assumption of spherical (hence randomly-oriented) aerosols in the atmosphere, the radiance at the top of the atmosphere is only dependent on the difference between the incident and reflected azimuthal angle , and not on each value taken independently.We have chosen five azimuthal angles between φ r = 0 • (backscatter when solar and viewing zenith angles are equal) and φ r = 180 • (forward scattering when solar and viewing zenith angles are equal).The different cases are summarised in Table 2. Removing the redundancies in aerosol types for AOD = : = : 0 and azimuthal angle for this results in 11,020 cases for the BRDF surface and 33,060 cases for the Lambertian surfaces (see Table 3), which gives a grand total of 44,080 cases.

Measured variables
We aim to provide a standardised dataset to evaluate the possibility of assimilating satellite radiances in operational forecasts of atmospheric composition.Thus, two variables are important to estimate for each case: the simulated reflectance (or radiance) that would be observed by the satellite , and a "sound" estimate of the radiative transfer code runtime.
We will describe the accuracy of the models in terms of the reflectance, ρ I :: ρ I , of the intensity (i.e., the first Stoke component), which is related to the radiance, I, through the relationship: where I s is the incident solar flux at the top of the atmosphere (conveniently taken equal to one for reflectance calculations), and θ s is the solar zenith angle.
The runtime may be difficult to estimate and compare between different models.We aim to measure the runtime of the computations excluding input/output operations, as typical applications would embed the RTM code in a larger operational system.The runtime of the RTM should be measured in a single-thread configuration.Ideally, the simulations should be performed on the same core of the computer and with the same system load conditions (except for the RTM) as other threads on the operating system (in particular with a high input/output load) could hamper the correct timing of the model.Multiple repetitions of the computations can be performed to get better estimates.
Given a set of atmospheric and aerosol conditions, some models can compute, almost without extra cost, the radiances for several viewing geometries, while other models require multiple simulations to achieve the same.For this protocol, this would introduce a difference of a factor of circa 80 (the number of viewing geometries) in the reported runtimes.Depending on the application, this feature of some models may or may not be relevant.For this reason, we present here the runtimes of the tested models in two configurations: one taking advantage of the simultaneous output for multiple viewing geometries, and another one without this feature.In the second configuration (called ind in Section 3), each simulation outputs only one geometry while in the first configuration (called mult in Section 3), it outputs as many values as possible within the specifications of the protocol.The number of multiple outputs per simulation will be detailed in the next Section.

DISORT
The DISORT model (Stamnes et al., 1988) is a flexible and widely used RTM.It solves the scalar approximation of the radiative transfer equation in a plane parallel atmosphere by using the discrete ordinate method.In this study, we use the DISORT code, version 4, available from http://lllab.phy.stevens.edu/disort/, and we have coupled the oceanic BRDF following the code developer instructions.We use the model with 32 streams and the TMS correction (Nakajima and Tanaka, 1988) is activated.It should be noted that most of this code is written in single precision, but important routines (as the computation of the Gaussian nodes) are written in double precision.We use 32 streams in our main configuration but sensitivity results for 4, 8, 16, 32, 64, 96 : , and 128 streams are also presented below.The protocol vertical resolution with 49 layers is used.Similar to VLIDORT, the DISORT code has numerical instabilities which manifest themselves when the solar and viewing zenith angles (θ s and θ v ) are equal.In such cases, we perturb the cosine of the viewing zenith angle by 8 × 10 −5 .
Given a solar zenith angle, the DISORT model can provide outputs for multiple viewing geometries.In this study, the number of geometries for the mult timing configuration is the number of φ r cases multiplied by the number of θ v cases (i.e., 4×5 = 20).
Although we know the DISORT model to be too slow for online data assimilation of aerosol radiances, we find it useful to document here the trade-off between accuracy and computing time as a function of the number of streams used in the calculations.Figure 2 shows both the Mean Fractional Error and the average computing time for 4, 8, 16, 32, 64, 96 : , and 128 streams.The Mean Fractional Error is shown relative to our reference VLIDORT (vector) and is of the order of 1 or 1.5% for the Lambertian and oceanic BRDF surface cases, respectively, as soon as the number of streams exceeds 8.Most of that discrepancy is due to DISORT neglecting polarisation as the Mean Fractional Error against VLIDORT (scalar) is significantly smaller: <0.1% for the Lambertian surface case and a <1% for the surface BRDF case.
There is a big gain in accuracy when increasing the number of streams from 4 to 8 and little further gain in accuracy beyond 16 streams.A deterioration in accuracy is observed beyond 96 streams in the case of the BRDF surface, which may be due to the DISORT model being coded in single precision float.While the dependence of model accuracy on the number of streams is well known (Stamnes and Swanson, 1981), we find it useful to confront the loss of accuracy with the gain in computational cost.Computing time increases rapidly with the number of streams (faster than the cube of the number of streams beyond 16 streams).Using 16 streams might, therefore, be an interesting trade-off between accuracy and computational cost.

6SV2
The 6SV2 model is a radiative transfer model based on the successive order of scattering (SOS) method (Vermote et al., 1997).
It solves the radiative transfer equation by using the three elements approximation (3 × 3) of the Stokes vector.The accuracy of the code against previous benchmark and Monte Carlo codes has been reported to be better than 1% (Kotchenova et al., 2006;Kotchenova and Vermote, 2007).We have used version 2.1 of the model.Two configurations are available: a standard accuracy configuration that consists of 83 Gaussian points for the phase function, 30 vertical levels, 25 zenith angles and 181 azimuth angles, and a high accuracy configuration that consists of 121 Gaussian points for the phase function, 50 vertical levels, 75 zenith angles : , and 181 azimuth angles.We will present results only from the standard accuracy, which is 2 to 70 faster than the high accuracy configuration, but will comment briefly on the high accuracy configuration in Section 4.2.
Molecular scattering and absorption vertical profiles are embedded in the code.Both are internally computed according to the atmospheric profile as part of "core" SOS routines.It is therefore not appropriate to prescribe our own molecular scattering and absorption profiles, so we use the 6SV2 defaults instead.The difference in the total molecular scattering optical depth is less than 0.26% with respect to the files provided in this protocol.We have coupled the BRDF calculations to the 6SV2 model but it should be noted that the atmosphere-surface coupling of the oceanic BRDF model in 6SV2 does not use the full surface reflection matrix but only the first column of it (i.e., the column that relates reflected I with incoming I, Q : , and U ).
The 6SV2 model can only compute one viewing geometry in each model call (i.e., the ind configuration is exactly the same as the mult configuration).10

FLOTSAM
We also test in this study a preliminary version of the Forward-Lobe Two-Stream Radiance Model (FLOTSAM), which is being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF).Unlike look-up-table-based solar RTM, FLOTSAM permits an arbitrary layering of different types of scatterers, including clouds, aerosols : , and molecules.The model has been designed to be fast enough to be used in iterative assimilation and retrieval schemes.It exploits the fact that particle phase functions typically contain a "narrow forward lobe" associated with scattering angles of less than a few degrees, and a "wide forward lobe" associated with scattering angles of the order of 15 • .This enables coupled ordinary differential equations to be written down for radiation in the quasi-direct solar beam (including the narrow forward lobe) and a wider beam of radiation propagating away from the Sun due to scattering by the wide forward lobe.Additionally, the upwelling and downwelling diffuse fluxes are computed using the two-stream approximation.FLOTSAM may, therefore, be thought of as the solar analogue of the infrared Two-Stream Source Function technique of Toon et al. (1989).To facilitate the use of FLOTSAM in variational assimilation and retrieval schemes, it has been coded in C++ using version 2.0 of the Adept library (Hogan, 2014(Hogan, , 2017)), which enables the Jacobian to be easily computed via Adept's fast automatic differentiation capability.
FLOTSAM has been incorporated into the Cloud, Aerosol and Precipitation from Multiple Instruments using a Variational Technique (CAPTIVATE) retrieval scheme, which will provide operational products from the forthcoming EarthCARE satellite (Illingworth et al., 2015).
FLOTSAM is designed to be used in iterative schemes in which multiple calculations are performed with different profiles of optical properties but for the same viewing geometry.This is implemented by allowing the user to perform the set-up calculations once for a particular solar zenith angle, viewing zenith angle and viewing azimuth angle, and then to re-use the information in subsequent radiance calculations for different optical profiles.We assess the computational cost for both the ind and the mult configurations, with the latter consisting of 4 × 4 × 5 × 8 = 640 cases involving 80 geometry set-up calculations and each separate geometry being used with 8 different optical depths.

Methods and statistics
The protocol has been tested for the 5 model versions described above: VLIDORT (vector), VLIDORT (scalar), DISORT, 6SV2, : and FLOTSAM.As explained in Section 2.6, the accuracy is quantified in terms of monochromatic reflectances at the top of the atmosphere and the accuracy is computed against the vector version of the VLIDORT model unless stated otherwise.
With as many as 44,080 cases considered in this study, we have to find original ways to visualize the results as explained below.Along with the average reflectance computed for each model, the following statistical measures have also been computed: where N is the number of experiments, m i is the reflectance of the tested model and r i is the reflectance of the reference model for case i.By definition, the Mean Fractional Bias can range between -2 ::: −2 and 2, with a best score of zero, while the Mean Fractional Errors can range between 0 and 2, with a best score of zero.Both values are multiplied by 100 with respect to the definitions above in the Tables presented below.We also state the percentages of cases where the relative error of the reflectances is above defined relative error thresholds.These diagnostics, which could be detailed for particular subsets of our 44,080 cases, can help to decide if and under which conditions a particular RTM could be useful, according to the required accuracy of the reflectance calculations.
Runtimes of the models are presented in an absolute measure (seconds), and no reference is needed.Finally, we will comment on possible trade-offs between accuracy and computing time.

Accuracy
Figure 3 shows histograms of the simulated reflectances of the VLIDORT (vector) model at the top of the atmosphere for all or various subsets of the 44,080 cases.These reflectances form the reference simulations for this study.The width of the histograms (or violins) in Figure 3 are proportional to the number of cases in each bin of the reflectance (y-axis) discretization.
The multiple panels (and multiple violins) are computed with all the simulations, but filtered by the value indicated on the x-axis.Therefore, they can be interpreted as an analogous to marginal density probability functions of the set of simulations.
Figure 3 shows a wide range of reflectances (from near 0 to almost 0.5) and known dependencies of the reflectance with respect to some of the input variables such as the surface albedo (Lambertian case), the AOD : , or the wavelength in this range of the spectrum.
Larger relative errors are shown for the dark surface case (Lambertian surface albedo equal to zero have MFE ≈ 1.2%) because the reference reflectances are smaller than for their 0.05 and 0.1 Lambertian counterparts (with MFE ≈ 0.7% and MFE ≈ 0.5%, respectively).The error statistics of the VLIDORT (scalar) and DISORT models are very similar, with noticeable differences only for the BRDF case (Figure 4 and Table 3).The scalar approximation may thus be appropriate for some combinations of wavelengths, :::::::: aerosols :::: types : and surface types but not for others.We further note, that for both types of surfaces, the scalar VLIDORT simulations outperform DISORT simulations when they are compared against the vector version of VLIDORT (which is not surprising), but in both cases the overall Mean Fractional Error is larger than 0.7 % and the Mean Fractional Bias is negative.
Figure 5 and Table 4 show the relative errors of the other two models used in this study, 6SV2 and FLOTSAM, against VLIDORT (vector).As the FLOTSAM model does not include polarisation, it is not expected to outperform the scalar approximations exemplified in the DISORT and VLIDORT (scalar) comparisons.However, this is the case for the 470 nm cases, as can be seen from the corresponding violin plot of Figure 5.For 6SV2, the relative error shows a mode around zero, while for FLOTSAM, the shapes of the violin plots are more similar to those of the scalar approximations shown in Figure 4.The reason for the unexpected dependence of the 6SV2 model errors with the wavelength is not completely clear, but it could be related to the technical difficulties in prescribing user-specific molecular scattering and absorption profiles in the 6SV2 code (see Sections 2.1.1 and 3.3).In contrast to Figure 4, Figure 5 shows larger relative errors for sea salt aerosols, for both models.
While larger errors in Figure 4 were related to the polarisation/depolarisation effects from aerosol particles (as shown in the right column of Figure 1), errors of 6SV2 and FLOTSAM could be related to the more pronounced forward scattering peak of the sea salt aerosols . of ::: the ::: sea ::: salt ::::::: forward :::::::: scattering ::::: peak.There is no clear dependence of the relative errors with respect to the viewing and solar geometries but, as for Figure 4, larger errors are observed for larger AOD and for the surface BRDF case.
Synthetic error statistics are shown in Table 4.Both the 6SV2 and FLOTSAM models have comparable Mean Fractional Errors, but these are larger than those of DISORT and VLIDORT (scalar).Some difference concerning the type of surface can be appreciated with 6SV2 performing better in the case of Lambertian surfaces.It should be noted that using the high accuracy configuration of 6SV2 instead of the standard accuracy configuration reduces the Mean Fractional Error from 1.5 to 1.1% for the Lambertian surface case, but does not improve this score for the BRDF surface case.The Mean Fractional Bias is negative, and it is of similar magnitude for both models.However, 6SV2 presents larger biases over the two types of surfaces (and the sign of the bias differs for the BRDF and Lambertian surfaces).In contrast, FLOTSAM shows virtually no bias for the Lambertian surface cases and a smaller bias than 6SV2 for the BRDF cases.An important difference with the scalar models is the presence of a heavy tail in the relative error distribution, which can be inferred from the last rows of  3. Statistical measures of DISORT and scalar VLIDORT reflectances against vector VLIDORT (first row).Errors of the last rows are defined as the absolute value of the relative error, using the VLIDORT (vector) reflectance as the reference.VLIDORT is the configuration with 32 streams and the TMS correction activated.N refers to the number of cases.

Timing
In this Section, we present statistics of computing times for three of the models used in this study.All the computing times presented below were estimated on an Intel i5-4690, 3.5 GHz workstation running Linux (kernel 2.6.32),GNU C++ and Fortran compiler version 4.4.7, on a single processor and with exclusive use of the workstation.We have compiled the models using their default compilation flags, ie, "-O" for 6SV2; "-O3" and "-march=native" for FLOTSAM and "-O3" for DISORT.
We start this section with the 6SV2 model which can only be used in ind configuration.Figure 6 shows the violin plots of the computing time of the 6SV2 model that spans as much as two orders of magnitude.Clearly, the BRDF cases are more expensive than the Lambertian surface cases, with most computing times ranging between ca.0.7 and 4 s for the former.This appears to 10 be due to multiple calls to the BRDF routine.It may be possible to optimise this routine or recode it in a more efficient way, but we have not attempted to do so.We also note an increase in computing time with increasing solar zenith angle (θ s ) and AOD, and with decreasing wavelength.This is consistent with a larger number of scattering orders being necessary in such cases.
Cases for coarse mode aerosols (especially sea salt aerosols) are more expensive than cases for accumulation mode aerosols, with some calculations reaching 5 s for sea salt aerosols.We do not have an explanation as to why the cases for industrial absorbing aerosols are more expensive than those for industrial scattering aerosols.
As discussed above, the DISORT and FLOTSAM models offer efficiency savings when multiple profiles are computed together.In the case of DISORT, multiple viewing geometries can be computed for one solar zenith angle.In the case of FLOTSAM, a smaller saving is that the geometry set-up calculations may be computed once for a given solar/viewing geometry and then re-used for multiple atmospheric profiles with different optical properties.The runtime of these models is, in general, not sensitive to the viewing geometry, wavelength, aerosol type or AOD.However, there are some differences between the two types of surface reflectance.Table 5 shows a summary of the measured runtimes for the three tested models.Computing times are indicated for the two measurement procedures (ind and mult) and for the two types of surface reflectance (BRDF and Lambertian).In the case of FLOTSAM, we also report the "one-off set-up" cost, which in the case of the BRDF involves creating a four-dimensional look-up table.We stress that this is invoked only once per call to the executable, so in an operational data assimilation system performing many millions of radiance calculations it would increase the executable runtime by only a very small fraction.The one-off set-up costs are not shown for the 6SV2 and DISORT models as they are not isolated from the rest of the computations.
For the 6SV2 model, values of Table 5 are in concordance with Figure 6 with runtimes of the order of seconds.The DISORT model average runtime is smaller than that of 6SV2.Similarly to 6SV2, there is an important dependence of the DISORT runtime on the type of surface considered.The extra cost of computing 20 profiles, instead of one profile, can be estimated as the difference between the mult and ind cases.For DISORT, this additional cost is about 0.07 s for the Lambertian case and about 0.22 s for the BRDF case, which correspond to 25% and 18% of the total mult runtime.This would benefit an application that requires simultaneous calculation of many viewing geometries.The spread of DISORT runtimes indicated in Table 5 arises because DISORT efficiently decreases the number of internal computations for symmetric geometries (solar zenith angle equal to zero) :::: zero, ::::: 21% :: of ::: the ::::: cases) :::: and :: it :: is ::::: faster ::: for ::::::: AOD=0 ::::: (3.4% ::::::: percent :: of ::: the :::::: cases).For the remaining cases of Table 5 : , there is almost no spread in the DISORT runtime.
Table 5 shows the runtimes of the FLOTSAM model.Firstly, it is noted that the standard deviation of the reported values is small (around one order of magnitude less than the values).This is because FLOTSAM performs the same number of operations for every case, thus : , there is no added value of showing histograms of runtimes.For this model, it can be seen that the one-off set-up cost of the BRDF is a little under 3 s, but in the ind case, subsequent radiance calculations are 3-4 orders of magnitude faster than the other two models tested, and the BRDF calculation is only slightly slower.If many profiles have to be computed with the same observation geometry, then : , the geometry set-up cost is invoked only once and the average cost per profile is reduced.This is illustrated in the mult case in which 640 profiles are computed, which involves 80 different geometries.Excluding the model set-up time, the computation of a single profile takes around 0.3 to 0.4 ms in the ind case, and around (99 ms)/640 = 0.16 ms in the mult case.  5. Average runtimes in seconds.The standard deviation is shown in parenthesis.The mult runtimes account for 640 profiles for FLOTSAM, 20 profiles one profile per model call for 6SV2.The one-off set-up costs of FLOTSAM are invoked only once per instance of the executable, no matter how many radiances are subsequently calculated, so are the same for the ind and mult experiments.

Summary and conclusions
We have presented in this study a comprehensive benchmarking methodology and protocol, with an unprecedented number of cases combining different viewing geometries, aerosol optical depths, aerosol types, wavelengths : , and surface types.The VLIDORT model in its vector configuration (i.e., considering polarisation) is used :: as : the reference model.Preliminary results in terms of accuracy and computing time are presented here for three models: the well-known models DISORT and 6SV2 on one side, and FLOTSAM, a new fast model under development by two of the authors of this study.We have chosen the VLIDORT model (in vector configuration) as the reference for this study.All models perform better when using Lambertian surface reflectance than when using an oceanic BRDF.All the models perform better under low AOD, and the scalar models have lower accuracies at shorter wavelengths.For the Lambertian surface cases, the Mean Fractional Errors of the models are 0.8, 1.5 : , and 1.9% for DISORT, 6SV2, : and FLOTSAM, respectively.The BRDF cases show a larger Mean Fractional Errors with 1.5, 6.6 : , and 4% for DISORT, 6SV2, : and FLOTSAM, respectively.
The DISORT and 6SV2 models show comparable computing time, between tenths of a second and seconds, but DISORT can provide the solution at multiple viewing geometries for each atmospheric condition, while 6SV2 cannot.FLOTSAM is very fast, with computing times of much less than a millisecond per profile.However : , it should be noted that this model is still under development and its accuracy could improve further.Moreover, all models present a tail of cases with larger errors and/or computing times.It could be interesting to characterise these better to investigate if screening out those cases can be an option in data assimilation applications.Also, a fast tangent linear and adjoint computation is required for the integration of a :: an RTM in a variational data assimilation system.
The protocol, input and output data are available as a potential resource for interested developers and users of RTM ::::: RTMs willing to benchmark their models.Please note the ±5% error range on the y-axis.The number of cases outside the range is indicated in Table 3.The width of the classes is ∆y = 4.98 × 10 −4 .Note that the scale used on the x-axis is different for each violin plot but is the same for the two models within a violin plot.

Figure 2 .
Figure 2. Accuracy (in black) and computing times (in red) for the DISORT model as a function of the number of streams used.The Lambertian and oceanic BRDF surface cases are shown with solid and dashed lines, respectively.The accuracy is shown in terms of Mean Fractional Error against VLIDORT (vector).The computing times are an average for 20 geometries and were estimated on a processor AMD Opteron 6378, 2.4 : GHz.Please note the logarithmic scales for the number of streams used in the DISORT calculations (x-axis) and the timings (right vertical axis).

Figure 4 .
Figure 4. Same as Fig. 3 : , but for the relative error of the DISORT and VLIDORT (scalar) models against the vector version of VLIDORT.

Figure 5 .
Figure 5. Same as Fig. 3, : but for the relative error of the FLOTSAM and 6SV2 models against the vector version of VLIDORT.Please note the ±10% relative error range of the y-axis.The width of the classes is ∆y = 9.95 × 10 −4 .The number of cases outside the range is indicated in Table4.Note that the scale used on the x-axis is different for each violin plot but is the same for the two models within a violin plot.

Figure A1 .
Figure A1.Two-dimensional density plots of the simulated reflectances for the 6SV2, FLOTSAM, DISORT, and VLIDORT (scalar) models versus those of the VLIDORT (vector) model.The plots on the left column mix all 44,080 cases, while those in the middle and right columns are for the Lambertian and BRDF surface cases, respectively.

Table 1 .
Aerosol Woodward (2001)998)l properties for the different aerosol types considered in the benchmark.σgisthegeometricstandard deviation.The refractive indices are linearly interpolated from the OPAC database(Hess et al., 1998).Dust refractive index are linearly interpolated fromWoodward (2001).

Table 4 .
Same as Table3but for the 6SV2 and FLOTSAM models.