Mitigating the impact of atmospheric effects on optical remote sensing data is critical for monitoring intrinsic land processes and developing Analysis Ready Data (ARD). This work develops an approach to this for the NERC NCEO medium resolution ARD Landsat 8 (L8) and Sentinel 2 (S2) products, called Sensor Invariant Atmospheric Correction (SIAC). The contribution of the work is to phrase and solve that problem within a probabilistic (Bayesian) framework for medium resolution multispectral sensors S2/MSI and L8/OLI and to provide per-pixel uncertainty estimates traceable from assumed top-of-atmosphere (TOA) measurement uncertainty, making progress towards an important aspect of CEOS ARD target requirements.

A set of observational and a priori constraints are developed in SIAC to constrain an estimate of coarse resolution (500 m) aerosol optical thickness (AOT) and total column water vapour (TCWV), along with associated uncertainty. This is then used to estimate the medium resolution (10–60 m) surface reflectance and uncertainty, given an assumed uncertainty of 5 % in TOA reflectance. The coarse resolution a priori constraints used are the MODIS MCD43 BRDF/Albedo product, giving a constraint on 500 m surface reflectance, and the Copernicus Atmosphere Monitoring Service (CAMS) operational forecasts of AOT and TCWV, providing estimates of atmospheric state at core 40 km spatial resolution, with an associated 500 m resolution spatial correlation model. The mapping in spatial scale between medium resolution observations and the coarser resolution constraints is achieved using a calibrated effective point spread function for MCD43. Efficient approximations (emulators) to the outputs of the 6S atmospheric radiative transfer code are used to estimate the state parameters in the atmospheric correction stage.

SIAC is demonstrated for a set of global S2 and L8 images covering AERONET and RadCalNet sites. AOT retrievals show a very high correlation to AERONET estimates (correlation coefficient around 0.86, RMSE of 0.07 for both sensors), although with a small bias in AOT. TCWV is accurately retrieved from both sensors (correlation coefficient over 0.96, RMSE

Land surface monitoring at optical wavelengths from medium resolution Earth observation (EO) requires an accurate and consistent description of the bottom-of-atmosphere (BOA) spectral bidirectional reflectance function (BRF)

Threshold uncertainty specifications for aerosol optical thickness (AOT), total column of water vapour (TCWV), and BOA BRF (

An important capability highlighted in target requirements is that per-pixel uncertainty estimates should be supplied

In this paper, we describe the approach used for the UK NERC NCEO BOA BRF (

Main symbols used in the paper.

We wish to estimate the probability distribution function (PDF) of BOA spectral BRF

Schematic diagram of the SIAC processing chain.

Our approach uses a priori constraints in the form of a coarse resolution (500 m) spectral BRDF dataset from MODIS

Simulation of TOA reflectance

Simulation of BOA reflectance

Development of atmospheric composition prior estimates of AOT and TCWV in

MAP estimate of the atmospheric parameters

Application of

We can express the observational negative log likelihood as

The main data controlling the estimation of

MODIS, S2, and L8 bands used in SIAC for the atmospheric parameters retrieval.

From the top to bottom are the MODIS, L8, and S2 relative spectral response functions for each band, and the background is the atmospheric transmittance processed by 6S with US62 atmosphere profile and continental aerosol model with an AOT value of 0.2 at 550 nm.

We need TOA observational constraints

We need an estimate of TOA reflectance

Direct calculation of TOA reflectance

In SIAC, we avoid the sampling limitations of DDV and take advantage of these other ideas of providing a dynamic and globally applicable expectation of surface reflectance. We use the MODIS MCD43A1 BRDF/albedo (collection 6) product

We can express the negative log of the prior pdf (up to a proportional constant) as

The role of

We expect the AOT and TCWV fields to have long correlation lengths

The TOA reflectance observation, modelled TOA reflectance, and prior information on the atmospheric parameters are processed to

We obtain the MAP estimate of

We assume a Lambertian surface in the atmospheric correction process. The relative errors caused by this Lambertian assumption on the surface reflectance is 3 %–12 % in the visible bands and 0.7 %–5.0 % in the near-infrared bands. Its effect on the NDVI analysis is around 1 % and less than 1 % for albedo

The mapping from

In SIAC, the atmospheric composition at 500 m is inferred by combining three sources of constraints:

an a priori constraint on land surface reflectance (at 500 m) derived from the MODIS MCD43 product

an a priori constraint on atmospheric composition (AOT and TCWV) derived from CAMS near-real-time predictions

an expectation of spatial smoothness (correlation) in atmospheric composition parameters at the 500 m scale.

Datasets used in SIAC.

Globally distributed AERONET sites (dot markers) used in this study for the validation of retrieved atmospheric parameters.

We validate using SIAC-derived atmospheric composition to estimate surface reflectance over globally representative sites for the years 2017–2019. We use S2 and L8 granules over more than 400 AERONET sites, seen in Fig.

The AERONET (AErosol RObotic NETwork) (

The Working Group on Calibration and Validation (WGCV) of the Committee on Earth Observation Satellites (CEOS) has been providing ground surface reflectance data through the Radiometric Calibration Network portal RadCalNet (

Here, we compare the SIAC-corrected data with measurements from three RadCalNet sites: the ESA/CNES site in Gobabeb (Namibia), the CNES site in La Crau (France), and the University of Arizona's site at Railroad Playa Valley (Nevada, United States), as these three sites measure over the entire solar reflective spectrum. Railroad Playa Valley is a high-desert playa surrounded by mountains to the east and west; La Crau has a thin, pebbly soil with sparse vegetation cover; and Gobabeb is over gravel plains. The area of interest (AOI) of the radiometric measurements for the sites is taken to be

Sentinel 2A (S2A) and Sentinel 2B(S2B) were launched on 23 June 2015 and 7 March 2017, respectively. A single satellite revisits the Equator every 10 d, while a constellation of two satellites achieves an equatorial revisit time of 5 d, decreasing to 2–3 d at mid-latitudes. Each S2 has a 10, 20, and 60 m spatial resolution Multi-Spectral Instrument (MSI), with 13 spectral bands ranging from 443 to 2 190 nm. Identical to S2A and S2B, Sentinel 2C (S2C) is expected to be launched at the beginning of 2024, in which case S2A will be retired

The Landsat project has provided the longest temporal record of moderate resolution, multi-spectral data over the Earth's surface. Landsat 8 was launched on 11 February 2013, having a global revisit time of 16 d, with 8 d offset to Landsat 7 for 8 d repeated coverage. Two push-broom sensors – the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS) – are mounted on the platform to provide multi-spectral and thermal observations of the Earth's surface at 30 and 100 m resolution, respectively. OLI has nine spectral bands, among which band 8 is panchromatic and has a spatial resolution of 15 m. At the time of writing, Landsat 9

Both products provide projected and calibrated TOA reflectance datasets. Sentinel 2 products were obtained from the Copernicus Open Access Hub (

We process all near-simultaneous (maximum 1 h apart) scenes and tiles from S2 and L8 over the years 2017 to 2019 over the AERONET sites illustrated in Fig.

We want to evaluate how well SIAC estimates mean BOA BRF and associated uncertainty over S2 and L8 wavebands. We can validate mean reflectance against measurements for some conditions using RadCalNet data. However, we can also gain confidence in the results by validating interim products (atmospheric parameters), testing uncertainty via the discrepancy principle

We define residuals between values estimated from SIAC and measurements as follows:

We define standardised residuals as follows:

We compare the 3 472 S2 and L8 samples over AERONET sites with independent in situ measurements of AOT and TCWV. Examples of the retrieved scene atmospheric parameters are given in Figs.

Over all AOT values (Fig.

AOT validation against AERONET measurements from CAMS

The accuracy (

TCWV (g cm

The accuracy (

Comparisons of CAMS and SIAC TCWV with AERONET measurements are shown in Figs.

The S2 and L8 scenes over AERONET sites described above were atmospherically corrected to surface reflectance using SIAC. Since the overpass times between sensors are within 1 h, we expect the surface reflectance in overlapping spectral regions from both sensors to be highly correlated and can use this to test consistency between sensors. Differences in spatial coverage, acquisition geometry, spectral sampling, and other sensor characteristics may impact this, but we will assume them to be small.

Pixels within a

2D histogram of surface reflectance after the atmospheric correction from 3 466 S2 and L8 near simultaneous observations; each subplots shows the results for the closest S2 and L8 bands. The colour bar is shown using a logarithmic scale. The error bars are 3

The results are shown in Fig.

We need to verify that uncertainty values

Part of the a priori constraint in SIAC is the imposition of a degree of smoothness on the atmospheric parameters through the parameter

Standardised residual

Standardised residual

The results show that, for

The distributions of

We validate mean SIAC reflectance by comparison with ground measurements over RadCalNet sites. We compare mean S2 and L8 BOA reflectances from SIAC averaged over defined RadCalNet AOI boundaries with the RadCalNet estimates of BOA reflectance in Figs.

Comparison between the S2

Same as Fig.

Same as Fig.

The ratio of SIAC-corrected S2

The agreement between the SIAC-retrieved surface reflectance and the reference measurements is very strong for all sites, with RMSEs values for the BOA products of around 3 %–5 % of RadCalNet ground measurement reflectance over all wavebands. The correlation coefficient

The best performance is found over Gobabeb, but the results are only slightly poorer for La Crau, which has more variation in the pattern of spectral reflectance. The broader spread of results for the BOA analysis for Railroad Playa Valley is mimicked in the TOA data. A per-band analysis of the ratio of SIAC BOA reflectance to measured ground data over each RadCalNet site is given in Fig.

Although the assessment of

SIAC surface reflectance uncertainty validation. The red histogram distributions show standardised error distributions (Error distribution) from the data and the blue ones the estimated Gaussian distribution from the distributions (Fitted Gaussian).

The histograms are quite noisy, particularly for L8, suggesting that the results might be impacted by a low sample number. For S2, the mean is very close to 0 for around half of the bands but can show a positive bias of up to 0.54 (B11). The standard deviation

The work of

For low AOT and longer wavelengths, the sensitivity to uncertainty in AOT is low, and the term

SIAC S2 uncertainty for different bands for low AOT

The turning point feature mainly arises from the AOT component of uncertainty in equation Eq. (

The black dashed line in the subplots shows the lower boundary of 5 % uncertainty that would be expected for a TOA uncertainty of 5 %. All values appear on or above this line, providing some confidence in the calculations within SIAC. For low AOT, the longer the wavelength, the closer the behaviour directly mimics the TOA relative uncertainty. This arises from the decreasing magnitude of

Current approaches to atmospheric correction of S2 and L8 data over land use readily available and well-tested atmospheric RT codes, such as 6S, considered adequate for the task at hand

SIAC follows these same steps and broad assumptions, but a major feature of the approach is its use of the reliable external operational data streams available nowadays in support of its estimations. It has other novel features, including accounting for PSF impacts in the scaling from L8 and S2 to MODIS. It is further distinguished by applying a Bayesian framework that is able to weigh up these contributions according to their uncertainty and directly estimate the resultant uncertainty in

Rather than the more limited extent available for DDV approaches, SIAC uses a direct expectation of surface reflectance from an external coarse resolution source (MCD43), meaning that the keystone observational constraints can supply information to the solution over a wide range of conditions. This means that the solution is, to some extent, reliant on the accuracy of these MODIS reflectance predictions, so we go to some lengths to filter out samples that may not be reliable predictors. In this first implementation of SIAC, we ignore snow and water pixels as well as some other conditions (Appendix

SIAC aims to be CARD4L compliant and to meet threshold requirements for uncertainty. Validation results show that the method gives accurate (within threshold specifications) retrievals of uncertainty-quantified land surface reflectance, both for S2 and L8, for the most part. Moreover, the surface reflectances for the two sensors are compatible, an important step in using these sensors together for land monitoring applications.

A data assimilation system relies on having well-quantified uncertainties on the constraints used. This is vital in a relative sense for balancing contributions from different sources but also in an absolute sense for quantifying uncertainty. Unfortunately, none of the constraints we use have a per-pixel estimate of uncertainty to drive the analyses, so instead (Appendix

The validation of atmospheric parameter uncertainty in Sect.

The results of comparison between SIAC BOA reflectance and RadCalNet measurements (Sect.

Most of the time, at least over the conditions represented by RadCalNet, we can expect SIAC to estimate surface reflectance

Surface reflectances produced by SIAC are of high accuracy and are consistent for S2 and L8, as shown in Sect.

In this paper, the atmospheric composition is set by a model (6S in this case) and by a choice of aerosol optical properties (continental aerosol model). The use of emulators of the RT model makes it easy to change the RT model entirely in the code or to use a different configuration of the model used. We can also extend the scheme to retrieve independent aerosol species concentrations by both modifying the RT model (and thus extending the number of parameters that go in the inference) and by using data on species distribution available from CAMS and extending the prior to cover these. A similar approach has been implemented in the MAJA processor

SIAC relies on the surface reflectance constraint from the MODIS MCD43 product, but the current MODIS satellites are approaching the end of their mission

We need an estimate of BOA BRF at

A data assimilation system combines evidence from different streams by weighting them by their inverse uncertainties. In SIAC, the statistics of the uncertainties are assumed to be zero-mean Gaussian and thus only characterised by an associated covariance matrix. We review here the sources and values of these uncertainties. The observational and a priori constraints for the estimation of

The observational uncertainty in Eq. (

Following

We take the CAMS estimates of atmospheric state at 40 km

The inverse covariance function given in Eq. (

The effect of applying a smoothness constraint in this way is similar to the combined prior and smoothness constraint used in EO-LDAS

From the results, for a cross-validation study (reported in Appendix

Under the assumption that the log posterior is Gaussian, the mean of the a posteriori state

Define an augmented state vector

The emphasis in this first version of SIAC is on mapping the land surface, so the L1C TOA S2 and L8 data need to have masks applied for areas of cloud, shadow, snow, and large water bodies. We describe the approach to this in this section.

Recall from Eq. (

For the cloud and cloud shadow mask, we trained a U-NET convolutional neural network (CNNs) with TensorFlow following

We also need to consider that some estimates of

In this version of SIAC then, large water bodies are masked out using the ESA global 150 m water products

To avoid erroneous spatial features over large water bodies introduced by the excessive extrapolation from distant land pixels, a conservative estimate of atmospheric parameters over water bodies is used, this being the median value of the retrieval from the rest of the image. This means that SIAC retrievals over water bodies may not be as accurate as over the land surface.

In the retrieval process, if a MODIS pixel on grid

The previous masking removes most of the pixels likely to be unreliable for estimates of

This estimate is obtained by deriving a coarse per-pixel (on the

A coarse look-up table in AOT (AOT 0–2.5 in steps of 0.05) is used to compute

The smooth AOT estimate then forms part of a rough estimate of atmospheric state

Although we expect the multiple constraints used in SIAC to provide some degree of robustness to any biases in

Spectral correlation over most natural surfaces suggests that transformations between different spectral domains are possible. In

Example of spectra selection and comparison between the MCD43-simulated reflectance and mean spectra-simulated reflectance.

The dataset used to define these transformations is derived from merging multiple spectral libraries covering the target spectral range, re-sampled to 1 nm resolution. To emphasise the importance of vegetation and soils, simulated vegetation spectra using the PROSAIL model

List of spectral libraries used to define spectral transformations.

N/A: not applicable.

S2

Given that the MODIS land bands are designed to capture most of the land surface properties, spectra selected with MODIS bands should be able to predict other optical sensors' reflectances with similar bands. Although there is a large number of spectra in the spectra library introduced in Table

A set of five closest spectra from the spectral library are used to compute the weighted mean using an inverse distance weighting. This added number of samples introduces robustness to errors in both the MODIS surface reflectance input and the spectral library. Other numbers of selected spectra were tested, but it shows that spectra vastly different from the target spectrum are likely to be included if more than 10 spectra are used. Once the mean spectrum is calculated, it is then convolved with the target sensor-relative spectral responses (RSRs) to obtain the simulated surface reflectance at

To test the effectiveness of the proposed spectral mapping method, the MODIS, S2, and L8 reflectances are simulated with individual spectra from the spectral library. Then the MODIS-simulated reflectance is used to get

The spectral mapping results for S2 and L8 show that, over most of the cases, our spectral mapping can simulate both sensors well with high correlation (over 0.99 for all the bands), low RMSE (lower than 0.03 for all bands and 0.015 for the first 5 bands), and no bias introduced. The SWIR band around 2 200 nm shows the largest dispersion, which is attributed to the large variation in reflectance in this spectral region and the large difference in the band pass functions between MODIS and S2 and L8, as shown in Fig.

The standard error estimated from MODIS reflectance provides a reasonably good estimation of the mean spectral estimated reflectance for S2 and L8, since a large discrepancy between simulations and observations implies that the input reflectance is not well represented in the spectral library.

Results of the spectral mapping approach are shown in Fig.

Due to the large differences in the spatial resolution between the MODIS (500 m) and S2 and L8 (10, 20, and/or 30 m), the measured reflectance values from them cannot be directly compared. We model the MODIS data effective PSF and use this to convolve the high resolution data in order to make it comparable with the MODIS products. Ideally, the MODIS cross-track direction PSF is triangular and rectangular in along-track direction

A typical MODIS ePSF on the spatial resolution of S2, i.e. a unit of 1 represents 10 m on the

The spatial convolution is calculated in the frequency domain for efficiency.

An effective point spread function (ePSF) of the MODIS MCD43 BRDF product is simulated with a two-dimensional Gaussian function, with

Comparison between MCD43-simulated surface reflectance after spectral mapping

We show an example of the PSF modelling in Fig.

Per-band comparison between the MCD43-simulated surface reflectance

We have assumed that, for a given scene, a single Gaussian PSF is required, in line with the findings of

The correlation map between

The density scatter plots of solved PSF parameters,

After solving for the ePSF parameters over a large number of S2 and L8 scenes globally, we note that some simplifications in the processing are possible. First, we see that the cost function is very flat around the minimum. Figure

The points made above suggest that a fixed value of 260 m for

In this section, we illustrate cases of SIAC being used to infer atmospheric composition parameters. Due to S2 and L8 having bands outside of the strong O

The prior and posterior AOT over S2 50SMH on 10 February 2016 and their shared colour bar are in the first column. Figures in the second column are band 1 TOA and surface reflectance, while the TOA and BOA RGB images are shown in the third column for the same tile over the same time.

Example of retrieval on S2 data over

Example of retrieval on S2 data over

Some artefacts are also apparent. In the bottom right corner of the scene, the AOT map reverts to the prior value from CAMS, which results in a poorer correction of the atmospheric effects. This is caused by lack of high quality MCD43 retrievals in this area at this time, which results in the AOT estimate being strongly driven by the prior from CAMS as well as some spatial diffusive effects from areas where the algorithm performs well. A second artefact is some visible stripes (visible in the middle top and bottom panels). These are caused by the combinations of observations from different detectors

We note that the scene shown in Fig.

As a further illustration of the approach on Sentinel 2 data, we show similar visualisations of AOT and TCWV priors, the associated posteriors, TOA and BOA blue band reflectance, and TOA and BOA true-colour composites for a number of different sites spanning the globe in Fig.

To estimate atmospheric parameters, an estimate of surface reflectance is needed. This estimate is different from the actual surface reflectance and is likely to contain spatial artefacts, which will result in an unrealistic and noisy estimation of atmospheric parameters if an independent pixel-level retrieval strategy is used. To counter this, most practical approaches average the estimation of surface reflectance over a spatial window of fixed size. Within this window or block, atmospheric composition is assumed to be constant and inferred

Within SIAC, the broad-scale (40 km) variations of atmospheric parameters are estimated from CAMS. But there are often finer-scale features that may impact our interpretation of surface reflectance, and we wish to be able to resolve these. To this end, we assume an effective resolution of

We performed a cross validation study using 20 scenes in L8 and in S2, selected to cover a good dynamic range in AOT (0–2). We sampled over

There is mostly more variation in

We show the

Surface reflectance normalised error distribution as a function of TOA reflectance uncertainty. The red dot indicates the point by changing the TOA uncertainty to reach the optimal uncertainty (

We show the correction done by only using the prior values from CAMS predictions in Figs.

Comparison between the S2

Same as Fig.

Same as Fig.

Surface reflectance uncertainty comparison between CAMS prior corrections and SIAC corrections. The ratio is calculated as CAMS prior-corrected surface reflectance uncertainty divided by SIAC surface reflectance uncertainty for different values of TOA uncertainty. The red dot indicates the uncertainty ratio when 5 % of the TOA reflectance uncertainty is used, while the red square indicates the uncertainty ratio when 3 % of the TOA reflectance is uncertainty used.

The code for SIAC is written in Python and is released under the GPLv3 open source licence. The code is available through the Zenodo from

The input data are publicly available through references listed in Table

FY and PEL conceptualised the study. FY developed the code, prepared the datasets, and analysed the results. FY and PEL wrote the manuscript. JLGD suggested experiments and contributed to the drafting of the manuscript. All authors contributed to editing the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to acknowledge financial support from the European Union Horizon 2020 research and innovation programme under grant agreement no. 687320 MULTIPLY (MULTIscale SENTINEL land surface information retrieval Platform) and from the European Space Agency (ESA) under contract no. 4000112388/14/I-NB SEOM SY-4Sci Synergy. Feng Yin, Philip E. Lewis, and Jose L. Gómez-Dans were supported by the Natural Environment Research Council's (NERC) National Centre for Earth Observation (NCEO) (project no. 525861). Feng Yin and Philip E. Lewis were supported by Science and Technology Facilities Council (STFC) of UK-Newton Agritech Programme (project no. 533651).

This research has been supported by the European Commission, Horizon 2020 (MULTIPLY (grant no. 687320)), the European Space Agency (grant no. 4000112388/14/I-NB SEOM SY-4Sci Synergy), the National Centre for Earth Observation (grant no. 525861), and the Science and Technology Facilities Council (STFC) of UK-Newton Agritech Programme (project no. 533651).

This paper was edited by Le Yu and reviewed by three anonymous referees.