Analysis of the MODIS above-cloud aerosol retrieval algorithm using MCARS

The Multi-sensor Cloud and Aerosol Retrieval Simulator (MCARS) presently produces synthetic radiance data from Goddard Earth Observing System version 5 (GEOS-5) model output as if the Moderate Resolution Imaging Spectroradiometer (MODIS) were viewing a combination of atmospheric column inclusive of clouds, aerosols, and a variety of gases and land–ocean surface at a specific location. In this paper we use MCARS to study the MODIS Above-Cloud AEROsol retrieval algorithm (MOD06ACAERO). MOD06ACAERO is presently a regional research algorithm able to retrieve aerosol optical thickness over clouds, in particular absorbing biomassburning aerosols overlying marine boundary layer clouds in the southeastern Atlantic Ocean. The algorithm’s ability to provide aerosol information in cloudy conditions makes it a valuable source of information for modeling and climate studies in an area where current clear-sky-only operational MODIS aerosol retrievals effectively have a data gap between the months of June and October. We use MCARS for a verification and closure study of the MOD06ACAERO algorithm. The purpose of this study is to develop a set of constraints a model developer might use during assimilation of MOD06ACAERO data. Our simulations indicate that the MOD06ACAERO algorithm performs well for marine boundary layer clouds in the SE Atlantic provided some specific screening rules are observed. For the present study, a combination of five simulated MODIS data granules were used for a dataset of 13.5 million samples with known input conditions. When pixel retrieval uncertainty was less than 30 %, optical thickness of the underlying cloud layer was greater than 4, and scattering angle range within the cloud bow was excluded, MOD06ACAERO retrievals agreed with the underlying ground truth (GEOS-5 cloud and aerosol profiles used to generate the synthetic radiances) with a slope of 0.913, offset of 0.06, and RMSE= 0.107. When only near-nadir pixels were considered (view zenith angle within ±20) the agreement with source data further improved (0.977, 0.051, and 0.096 respectively). Algorithm closure was examined using a single case out of the five used for verification. For closure, the MOD06ACAERO code was modified to use GEOS-5 temperature and moisture profiles as an ancillary. Agreement of MOD06ACAERO retrievals with source data for the closure study had a slope of 0.996 with an offset of−0.007 and RMSE of 0.097 at a pixel uncertainty level of less than 40 %, illustrating the benefits of high-quality ancillary atmospheric data for such retrievals.

Abstract. The Multi-sensor Cloud and Aerosol Retrieval Simulator (MCARS) presently produces synthetic radiance data from Goddard Earth Observing System version 5 (GEOS-5) model output as if the Moderate Resolution Imaging Spectroradiometer (MODIS) were viewing a combination of atmospheric column inclusive of clouds, aerosols, and a variety of gases and land-ocean surface at a specific location. In this paper we use MCARS to study the MODIS Above-Cloud AEROsol retrieval algorithm (MOD06ACAERO). MOD06ACAERO is presently a regional research algorithm able to retrieve aerosol optical thickness over clouds, in particular absorbing biomassburning aerosols overlying marine boundary layer clouds in the southeastern Atlantic Ocean. The algorithm's ability to provide aerosol information in cloudy conditions makes it a valuable source of information for modeling and climate studies in an area where current clear-sky-only operational MODIS aerosol retrievals effectively have a data gap between the months of June and October. We use MCARS for a verification and closure study of the MOD06ACAERO algorithm. The purpose of this study is to develop a set of constraints a model developer might use during assimilation of MOD06ACAERO data.
Our simulations indicate that the MOD06ACAERO algorithm performs well for marine boundary layer clouds in the SE Atlantic provided some specific screening rules are observed. For the present study, a combination of five simulated MODIS data granules were used for a dataset of 13.5 million samples with known input conditions. When pixel retrieval uncertainty was less than 30 %, optical thickness of the underlying cloud layer was greater than 4, and scattering angle range within the cloud bow was excluded, MOD06ACAERO retrievals agreed with the underlying ground truth (GEOS-5 cloud and aerosol profiles used to generate the synthetic radiances) with a slope of 0.913, offset of 0.06, and RMSE = 0.107. When only near-nadir pixels were considered (view zenith angle within ±20 • ) the agreement with source data further improved (0.977, 0.051, and 0.096 respectively). Algorithm closure was examined using a single case out of the five used for verification. For closure, the MOD06ACAERO code was modified to use GEOS-5 temperature and moisture profiles as an ancillary. Agreement of MOD06ACAERO retrievals with source data for the closure study had a slope of 0.996 with an offset of −0.007 and RMSE of 0.097 at a pixel uncertainty level of less than 40 %, illustrating the benefits of high-quality ancillary atmospheric data for such retrievals.

Introduction
The MODerate resolution Imaging Spectroradiometer (MODIS) (Barnes et al., 1998) has proven to be an important sensor for aerosol data assimilation purposes for models such as the Goddard Earth Observing System Model, Version 5 (GEOS-5; Rienecker et al., 2008;Molod et al., 2012). There are two MODIS instruments on board NASA's Earth Observing System (EOS) Terra and Aqua spacecraft. There is a wide variety of data products available from these instruments for land, ocean, and atmosphere disciplines. Atmosphere discipline products include cloud mask, cloud top properties, cloud optical and microphysical properties, and atmospheric aerosol properties. The MODIS data product files use a designation of MOD for Terra MODIS and MYD for Aqua MODIS. In this paper for brevity we will use "MOD" to refer to both instruments.
The largest contributor of biomass-burning aerosols is southern Africa (Reid et al., 2009;van der Werf et al., 2010;Chang et al., 2021). Biomass-burning occurring from June through October creates thick smoke plumes that extend over the adjacent Atlantic Ocean. Prevailing winds in the area transport the smoke over the southeastern Atlantic Ocean (SEAO) and then as far as the Americas (Swap et al., 1996). The same time period coincides with a near-persistent layer of marine boundary-layer (MBL) stratus cloud that extends for several hundred kilometers westward from the Namibian coast (Devasthale and Thomas, 2011). The MODIS Dark Target aerosol retrieval algorithm (MOD04) that is used for ocean retrievals operates in clear-sky conditions only. MOD04_DT retrievals are not provided for each individual MODIS pixel level, but rather are performed over a 3 × 3 or 10 × 10 set of pixels. Moreover aerosol properties are not retrieved over sun glint regions (Kaufman et al., 1997;Levy et al., 2009Levy et al., , 2013. The SEAO region has both extensive seasonal cloud cover and a significant portion of MODIS granules containing sun glint, leading to equally extensive loss of continuous observations from the area. Figure 1 illustrates these conditions using Terra MODIS data from 2006 through 2013. Figure 1a shows the percentage of ocean grid boxes in the SEAO area that had daily mean cloud fraction greater than 50 % in the MODIS Daily Level-3 gridded product (Hubanks et al., 2019) stored at 1 • × 1 • resolution. Here, the SEAO area is defined the same way as in Meyer et al. (2015), specifically between −20 and +20 • longitude and +4 to −20 • longitude. As much as 60 % of all ocean grid boxes have cloud fraction greater than 50 % in June (day 152) and only increase to the end of September (day 304). A 1 • -resolution grid box will contain some clear sky, and thus at least some aerosol retrievals are possible. As shown in Fig. 1b, in June between 70 %-80 % of all ocean grid boxes contain some aerosol retrievals, though by September that number drops to between 30 %-50 % year over year.
Due to aforementioned limitations of the standard Dark Target MODIS aerosol algorithm, a model that assimilates aerosol data from SEAO would have very few aerosol retrievals over the ocean available to it. Most of the transport mechanism in the model would be thus governed by the model physical processes (e.g., advection, sedimentation and wet removal and vertical transport) instead of being constrained by observations. The MOD06ACAERO algorithm (Meyer et al., 2015) fills in the aerosol data gap in SEAO as it is able to perform retrievals of aerosol properties above MBL clouds. The algorithm has been evaluated against observations from the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) (Winker et al., 2009), but CALIPSO only provides data at nadir and with a very limited spatial cov-erage. Recent improvements in CALIPSO version 4 aerosol products (Kim et al., 2018) indicate that the comparisons shown of the MOD06ACAERO algorithm with CALIPSO in Meyer et al. (2015) would improve somewhat as significant work had been done to remedy the low bias that CALIPSO retrievals have. However, Kim et al. (2018) state that the remaining SEA low bias in CALIPSO retrievals of aerosol optical depth (AOD) with respect to AERONET and MODIS makes CALIPSO retrievals somewhat problematic as a means of aerosol algorithm evaluation for SEAO area (e.g., Meyer et al., 2013Meyer et al., , 2015Jethva et al., 2014). Observations collected during the ObseRvations of Aerosols above CLouds and their IntEractionS (ORACLES) (Redemann et al., 2021) are currently being used to evaluate the MOD06ACAERO algorithm. Additional descriptions of OR-ACLES aerosol data can be found in LeBlanc et al. (2020) and Pistone et al. (2019).
In this study we applied an Observing System Simulation Experiment (OSSE) framework to gain insight into the performance of the MOD06ACAERO algorithm. Rather than using the classic analysis-forecast error metric common in numerical weather prediction OSSE studies (e.g., Hoffman and Atlas, 2016), we adopt here a "retrieval OSSE" perspective where the quality of the retrieval is used as the verification metric (Wind et al., 2013(Wind et al., , 2016. A radiative transfer code is applied to the model quantities combined with sensor geometry to simulate how a model scene appears to a specific instrument. A retrieval algorithm designed for that instrument can be executed on the simulated measurements. Physical quantities retrieved by the algorithm can be compared to the known simulation input. The algorithm can be examined for closure over a large spatial domain, and thus any areas or conditions that may be problematic for the algorithm could be examined, and the strengths and limitations of the algorithm can be extensively documented. The Multi-sensor Cloud and Aerosol Retrieval Simulator (MCARS) is a tool that combines model output with a radiative transfer code in order to simulate radiances that may be measured by a remote-sensing instrument if it were passing over the model fields (Wind et al., 2013(Wind et al., , 2016. In this paper, MCARS continues to use the combination of the GEOS-5 model, correlated-k models of atmospheric transmittance due to various gaseous absorbers for MODIS channels as per Kratz (1995), inline Rayleigh scattering and the Discrete Ordinate Radiative Transfer (DISORT) code (Stamnes et al., 1988) to simulate MODIS radiances. Two improvements have been made to the MCARS code since the last publication. The computational resolution has been increased to 32 streams, up from 16. Additionally, for this study the higher-resolution 7 km GEOS-5 Nature Run (G5NR) was used in place of the standard 25 km resolution GEOS-5 output (Gelaro et al., 2015;Putman et al., 2015).  The model run is performed at a horizontal resolution of 7 km using a cubed-sphere horizontal grid with 72 vertical levels, extending up to 0.01 hPa (∼ 80 km). In addition to standard meteorological parameters (wind, temperature, moisture, surface pressure), this GCM includes 15 aerosol tracers (dust, sea salt, sulfate, black and organic carbon), O 3 , and CO 2 . The GEOS-5 NR is driven by prescribed sea-surface temperature and sea ice, daily volcanic and biomass-burning emissions, as well as high-resolution inventories of anthropogenic sources. A description of the GEOS-5 model configuration used for the Nature Run can be found in , while results from a validation exercise appear in Gelaro et al. (2015) and Castellanos et al. (2019).
In a previous study of the MOD04_DT code (Wind et al., 2016), we had the advantage of having simultaneous in situ aerosol property measurements from AErosol RObotic NETwork (AERONET) (Holben et al., 1998). AERONET has very limited data available over ocean, mainly from islands and ship transits. Even in places where AERONET is established, no measurements can be obtained in presence of clouds. Therefore, no ground-based in situ measurements can be included in our analysis of the MOD06ACAERO product and so the analysis is necessarily limited to verification and closure.
In sections that follow we will describe the application of MCARS to study the MOD06ACAERO algorithm. Section 2 very briefly describes the MCARS code and the experiment setup. Section 3 describes the MODIS MOD06ACAERO product of Meyer et al. (2015). Section 4 shows the details of the study and study conclusions. Finally, Sect. 5 discusses the next steps in MCARS development.

MCARS description
The MCARS code was previously described in detail in Wind et al. (2013Wind et al. ( , 2016. Therefore, only a brief description will be given here. Global aerosol, cloud, surface, and atmospheric column fields from the G5NR simulation as described above serve as the starting point for radiance simulations. The GOCART bulk aerosol scheme currently used in the G5NR is used for the simulations reported in this paper, with corresponding optical properties as described in Randles et al. (2017), Hess et al. (1998), and references within. The simulation input data were produced in accordance with the methods outlined in Wind et al. (2016). The G5NR model output was split into 1 km subcolumns (MODIS pixel resolution) using the independent column approximation method as described in detail in Wind et al. (2013). Here a brief summary of the model data preparation methodology is given.
MODIS pixels for each GEOS-5 grid box were collected, and the same number of pixel-like subcolumns was generated using a statistical model of subgrid column moisture variability. The subcolumn generation used a parameterized probability density function (PDF) of total water content for each model layer and a Gaussian copula to correlate these PDFs in the vertical (Norris et al., 2008;Norris and da Silva, 2016a, b).
The subcolumns generated in this way were subsequently rearranged, to give horizontal spatial coherence, by using a horizontal Gaussian copula applied to a condensed water path. This arrangement had to be applied in order to create spatially coherent cloud-like structures. The subcolumns themselves were not altered in any way during this process. If this step is skipped and the subcolumns are placed randomly within each grid box, the MODIS Cloud Optical and 4 G. Wind et al.: Analysis of the MODIS above-cloud aerosol retrieval algorithm using MCARS Microphysical Properties (MOD06) product (Platnick et al., 2017) would restore many of the pixels to clear sky unless the initial grid box had close to 100 % cloud fraction (Zhang and Platnick, 2011;Pincus et al., 2012). The MOD06 product is a necessary input for MOD06ACAERO and must be produced prior to MOD06ACAERO execution. The need for this subcolumn rearrangement is significantly lessened when G5NR is used because the smaller grid boxes are often close to 100 % cloudy especially in MBL regimes, but removing the method from the model preparation step was not practical due to its small impact on execution time and possibility of introducing errors.
The layer aerosol properties were obtained using the independent column approximation with the same PDF of total water content as used for clouds. A GEOS-5 aerosol species output file was used in conjunction with aerosol optical properties as in Randles et al. (2017). The aerosol phase functions for each of the 15 species output by GEOS-5 were produced and combined on the fly to create a single bulk set of scattering properties and Legendre coefficients (Wind et al., 2016).
Model parameters such as profiles of temperature, pressure, ozone, and water vapor together with layer information about clouds and aerosols are combined with solar and view geometry of the MODIS instrument. Surface information is also a combination of GEOS-5 information of surface temperature, snow and sea ice cover and MODIS-derived spectral surface albedo (Moody et al., 2007(Moody et al., , 2008. All of these parameters are transferred to the DISORT-5 radiative transfer code, and reflectances and radiances in 22 MODIS channels between 470 nm and 14.2 µm are produced. The default computational resolution of DISORT-5 has also been increased to 32 streams up from 16 used in the two previous studies. Additionally some of the simulations in this study were executed at 64 streams. Final MCARS output is packaged in a format identical to the standard MODIS Level-1B radiometric files and is thus completely transparent to any operational or research-level retrieval algorithm code. These simulations were produced at the NASA Center for Climate Simulations (NCCS) supercomputer. Each complete simulation of a MODIS-like granule requires 5.5 h of wallclock time on 300 processors. Computational throughput can be increased by limiting the scope of the simulation to fit a particular investigation. For this study, however, we retain the full set of channels needed for both cloud and aerosol research.

MODIS above-cloud aerosol property product
The MODIS above-cloud aerosol property product (MOD06ACAERO) (Meyer et al., 2015) is a regional algorithm able to simultaneously retrieve MBL cloud optical thickness (COT), cloud effective radius, and aerosol optical depth (AOD) above-cloud in the SEAO region. It uses six MODIS channels (bands 1-5 and 7) having central wavelengths of 0.47, 0.55, 0.66, 0.86, 1.24, and 2.1 µm. The MOD06ACAERO algorithm takes advantage of the strong biomass-burning aerosol absorption gradient in the visible (VIS) to near-infrared (NIR) spectrum that, when the aerosol layer overlies a bright cloud, yields differential attenuation (stronger at shorter wavelengths) of the otherwise nearly spectrally invariant top-of-atmosphere cloud reflectance across the VIS-NIR. Sensitivity to cloud optical thickness is localized in the spectral range between 0.47 and 1.24 µm and is directly related to the magnitude of reflectance, while sensitivity to above-cloud aerosol optical depth is related to the spectral slope of the reflectance. The MOD06ACAERO algorithm uses 2.1 µm channel for cloud effective radius information. That is also consistent with the principal retrieval contained in the MOD06 product (Platnick et al., 2017) The MOD06ACAERO retrieval inversion uses an optimal estimation-like approach (Rodgers, 1976) that attempts to minimize the difference (cost function) between the six MODIS reflectance observations and forward-modeled reflectance that is a function of cloud optical thickness, effective radius, and above-cloud AOD. However, rather than inline radiative transfer calculations, MOD06ACAERO relies on a set of pre-computed lookup tables (LUTs) of coupled cloud and above-cloud aerosol reflectance. These LUTs are generated using the same cloud microphysics models used by MOD06 (Platnick et al., 2017) and the absorbing aerosol model used by MOD04_DT over land surfaces (Levy et al., 2013). Retrievals using a second aerosol property model, one based on field campaign data from SAFARI 2000 (Haywood et al., 2003), are also available in MOD06ACAERO output. While these Haywood et al. model retrievals were recommended in Meyer et al. (2015), evaluation during the OR-ACLES campaign revealed deficiencies at certain scattering angle ranges (Kerry Meyer, personal communication, 2016). Thus, for this study we use the MOD06ACAERO results based on the MOD04_DT aerosol models.
The MOD06ACAERO retrieval operates at 1 km resolution, compared to the 10 and 3 km MOD04_DT resolutions, and simultaneously provides pixel-level estimates of retrieval uncertainty accounting for known and quantifiable error sources (e.g., radiometry, atmospheric profile errors, cloud and aerosol forward model errors) consistent with the MOD06 cloud product methodology (Platnick et al., 2020). Figure 2 shows an example retrieval result from MOD06ACAERO compared to MOD04_DT standard 10 km output. The Terra MODIS granule shown here, from 2006 day 224 at 10:05 UTC, has extensive cloud cover over the ocean, typical for this season. MOD04_DT provides a very limited amount of data, localized to the few areas of clear sky, while MOD06ACAERO fills in the above-cloud area. Shinozuka et al. (2020) suggest that above-cloud aerosol retrievals are similar to adjacent clear-sky retrievals, and so clear-sky retrievals could be used as an above-cloud proxy. However conditions shown in Fig. 2 are common during the SEAO burning season. There are no clear-sky retrievals of aerosol over most of the area due to near uniform coverage by marine stratus, with cloud fraction approaching 80 %. Nearest successful clear-sky retrievals are hundreds of kilometers away. Therefore an above-cloud aerosol retrieval algorithm such as MOD06ACAERO is very much so necessary.
MOD06ACAERO uses National Center for Environmental Prediction (NCEP) atmospheric profile products (Derber et al., 1991) for atmospheric correction. As part of our investigation we will look at impact of discrepancies between NCEP and G5NR on retrieved aerosol properties.

Analysis
To create the data used for the MOD06ACAERO verification study, we examined the G5NR dataset for cases that were similar to conditions commonly encountered during the burning season over SEAO. August 2006 was selected because it was a very active smoke season and a significant number of MBL clouds were present in the model output. Models often have difficulties forming MBL clouds as higher-than-usual grid and vertical resolution is needed in order to accurately represent the processes that lead to MBL formation in nature.
As real Terra and Aqua overpasses are needed in order to define the sun-satellite geometry for the MCARS simulations, satellite orbital tracks had to be considered. Because orbital gaps are prominent in the MODIS data over the SEAO MBL region, care must be taken in selecting specific days and times having adequate sensor geometry. Technically because MCARS is a simulation, orbital gaps have no meaning. But because of the need for actual sensor geometry to start the simulation, it is most expedient to simply browse available MODIS data for a suitable track. Even though G5NR does not perform any data assimilation, the model code is identical to the standard GEOS-5 model. MCARS normally runs on standard GEOS-5 output. In Wind et al. (2013) we showed MCARS as a model output verification tool. It is always very desirable to match date-time-orbit when model performance may be compared to real concurrent sensor measurements. Even though no orbital match is required in this study, a decision was made to not alter the standard MCARS operation in order to avoid accidental introduction of software issues. Five cases were selected under these considerations. Three came from Terra MODIS overpasses and two from Aqua MODIS. The times and dates were as follows: This simulated radiance dataset comprises 13.5 million points where the atmospheric column and surface conditions are explicitly known. MOD06ACAERO retrievals were attempted over those points, but of course that does not mean that each attempt produced a successful aerosol retrieval. Figure 3a shows simulated RGB images for the five MCARS MODIS granules listed above. Also shown in Fig. 3b are the same simulated granules where the aerosols have been removed from the radiative transfer simulations. This ability to remove clouds, aerosols, or gases from the simulation offers extensive control evaluating the performance of retrieval algorithms and diagnosing algorithm deficiencies.
There is a significant similarity between the real Terra MODIS granule of Fig. 2 and the simulated granule for the same date and time. The G5NR is a free running model and does not perform any data assimilation, and therefore it is not synoptically locked to the particular day depicted in Fig. 2. The apparent similarities between Figs. 2 and 3 merely reflect the persistent patterns of MBL clouds and smoke in the region. There is no expectation of a match with any real data in this study. It is not a statement to G5NR performance, as in other cases the cloud amount/distribution had no match to any real data. It is merely an interesting coincidence. Some granules were selected to include a significant portion of land surface for a later examination of the MOD04_DT retrievals, repeating the study in Wind et al. (2016) in a different region (not reported here).
This dataset, both the complete and the clean (aerosolfree) versions, was fed through the standard operational MODIS Data Collection 6 cloud product processing chain to produce cloud mask, MOD06 cloud top and optical properties, and finally the MOD06ACAERO output for each case. Results from all granules were then combined, and only retrievals for cloudy pixels were examined. The MOD06ACAERO aerosol retrievals were compared to source aerosol optical depth provided by GEOS-5 (Wind et al., 2016). Figure 4 shows results of this comparison. The only constraint on this comparison was that the algorithm-reported pixel-level retrieval uncertainty had to be less than 40 % for Fig. 4a and less than 30 % for Fig. 4b. One of the motivations of this study was to characterize errors in the MOD06ACAERO algorithm for subsequent aerosol data assimilation into GEOS-5. Pixels with higher uncertainties could be considered in the analysis, but assimilating data where the retrieval error is 50 % or greater could negatively impact the assimilated fields. As depicted in Fig. 4, filtering retrievals at the reported algorithm uncertainty at 40 % is very effective to produce a good match between MOD06ACAERO and the G5NR output variables, with the exception of very low AODs. G5NR uses aerosol models described in detail in Randles et al. (2017). It is a set of 15 absorbers, properties of which are a function of column relative humidity. MOD06ACAERO in this study uses the MOD04_DT aerosol models, which are distinct in composition and additionally computed at a constant 80 % column relative humidity (Levy et al., 2013). Because G5NR mixes aerosols on the fly to create bulk layer properties and MOD06ACAERO has a constant regional mixture, there is a natural source of uncertainty in any comparison of MOD06ACAERO retrievals with G5NR (Chin et al., 2002). However the regional mixture of MOD04_DT had been used extensively to train the GOCART model used by both GEOS-5 and G5NR. Thus we expect the uncertainty due to aerosol model mismatch to be fairly minimal. The same exact situation of aerosol mixture mismatch exists in real data and is most likely greater than the one existing in this simulation. Detailed comparison of GOCART and MOD04_DT aerosol models for biomass-burning aerosols has been performed in Wind et al. (2016). Meyer et al. (2015) suggest that additionally MOD06ACAERO retrievals should be screened by retrieved cloud optical thickness and that they should be discarded if COT is less than 4.0. We applied this additional constraint onto the retrieval comparison, and the result is shown in Fig. 5. Discarding the AOD retrievals when cloud is thin improved the matchup against GEOS-5, but there still appears to be an issue when GEOS-5 AOD is very close to zero.
The power of MCARS lies in being able to tightly control simulation parameters. The MOD06ACAERO algorithm appears to run into a difficulty at low source AOD. In order to examine the causes for this discrepancy in more detail, we turn our attention to the clean MCARS case shown in Fig. 3b by setting the AOD precisely to zero and examining the retrieval performance in such a situation. Ideally MOD06ACAERO should retrieve a zero AOD throughout.
With an exception of a narrow range of scattering angles between 135 and 145 • , which corresponds to the cloud bow direction, the algorithm indeed retrieved AOD that was extremely close to zero. Figure 6 depicts the difference between retrieval and source as a function of scattering an-gle. Retrievals where MOD06ACAERO matched GEOS-5 precisely were discarded for clarity. Within the cloud bow MOD06ACAERO tends to return a small positive AOD of about 0.15.
The liquid water phase function is very complex in the cloud bow region and is very difficult to model accurately. That particular region has consistently caused difficulties to the standard MOD06 product retrievals of MBL clouds. Both MOD06 and MOD06ACAERO LUTs are computed at 64 DISORT streams. We performed some investigation of this area by running a special simulation for a single case from Terra 2006 day 224 10:05 UTC. This case was selected because the cloud bow is especially noticeable in both real and simulated data. The simulation was also executed using 64 DISORT streams in order to reduce uncertainties associated with the simulation being performed at half the resolution. In cloud bow region more streams would potentially lead to a better model. Unfortunately the cloud bow persisted. It thus may be the case that 64 streams are not sufficient to properly resolve the cloud bow in either simulation or retrieval. Even higher resolution may be advisable. Increasing computational resolution of MOD06 LUTs is presently considered for the upcoming MODIS Data Collection 7. Depending on the results, the same increase may occur for MOD06ACAERO. At this time, for purpose of establishment of assimilation constraints, which is the focus of this study, one might simply exclude the cloud bow scattering angle range from consideration until more is known. Figure 7 shows the results of MOD06ACAERO retrievals from Fig. 5, where retrievals within the cloud bow have been discarded. The comparison with source data is further improved, and the cluster of MOD06ACAERO retrievals present in Fig. 5 when GEOS-5 AOD was near zero has disappeared.   Often better retrievals can be obtained when less oblique view geometry is considered in real data. Pixel size, longer optical path length, and 3D effects from clouds can all make retrievals performed at oblique view angles less optimal. In the case of this study, another consideration for imposition of a view zenith limit is that presently MCARS does not account for pixel size growth at oblique view angles. The number of subcolumns generated does not change with view zenith angle. Therefore, MCARS results when view angle is oblique may not be an accurate measure of algorithm performance as only the effects of optical path length are simulated.
The MOD06 cloud product outputs cloud top pressure, temperature, and height limited to near nadir in addition to full swath products. The "near nadir" is defined as viewing zenith angle less than 32 • (Menzel et al., 2008). Fig-ure 8 shows the MOD06ACAERO retrievals of Fig. 7 further limited by view zenith angle of less than 32 • . When view zenith angle is limited to 32 • the comparison with GEOS-5 source data is again improved. We can now show a slope of 0.866 for retrievals with less than 40 % error and 0.913 for retrievals with error of less than 30 %. Note that even though the data extent had been limited, there are still over 600 000 data points left to be ingested into a model if data assimilation were to be attempted in an area where previously the number of such data points was close to 0.
We can constrain the view zenith angle range even further as shown in Fig. 9, reducing the threshold to 20 • . Whereas the comparison shows all around improvement with slope of 0.931 and 0.977 for retrieval error of less than 40 % and 30 % respectively, the number of points suitable for assimilation Figure 6. MOD06ACAERO retrieval results from the combined dataset of Fig. 3b, where aerosols had been removed. The results are displayed as difference from GEOS-5 AOD, which in this case was zero, as a function of scattering angle as a normalized density plot. All retrievals where MOD06ACAERO result was also zero had been removed for clarity. All non-zero MOD06ACAERO retrievals appear to be concentrated in a narrow angle range between 135 and 145 • which corresponds to the cloud bow. Panel (a) shows MOD06ACAERO retrievals with uncertainty of less than 40 %, and panel (b) shows the same with uncertainty less than 30 %.  Fig. 3a compared to source GEOS-5 aerosol optical depth as a normalized density plot. AOD retrievals where COT was less than 4 are now discarded. Additionally retrievals in the cloud bow region are also removed. It appears they were indeed the source of a cluster of higher MOD06ACAERO retrievals when GEOS-5 AOD was near zero and the matchup with GEOS-5 source AOD is further improved. Panel (a) shows MOD06ACAERO retrievals with uncertainty of less than 40 %, and panel (b) shows the same with uncertainty less than 30 %. shrinks by half. It is not clear if this dataset size reduction can be justified by the improvement in alignment with the source data.
With the 20 • view angle constraint the algorithm results are very close to source data and we could potentially state that we have closure against source GEOS-5 data even though both MOD06 and MOD06ACAERO run under operational conditions used NCEP GDAS data for atmospheric correction (implying a likely overestimation of the error in these profiles). In order to assess the impact of using these GDAS-based profiles, we consider a final experiment where we use MCARS pixel-level input profiles for atmospheric correction. The result is shown in Fig. 10. When atmospheric profiles are removed as a source of inconsistency, the agreement with source data improves to a slope of 0.996 with intercept of −0.007 and RMSE of 0.097 for retrievals with less than 40 % error and slope of 0.989, intercept of 0.03 and RMSE of 0.085 for retrievals with less than 30 % error. Small sample size for retrievals with lower uncertainty is the reason for somewhat less agreement with source data for this closure experiment. The remaining source of potential disagreement of MOD06ACAERO retrieval with in-10 G. Wind et al.: Analysis of the MODIS above-cloud aerosol retrieval algorithm using MCARS Figure 8. MOD06ACAERO retrieval results from the combined dataset of Fig. 3a compared to source GEOS-5 aerosol optical depth as a normalized density plot. AOD retrievals where COT was less than 4 and where the scattering angle was in the cloud bow are now discarded. Additionally the data extent had been limited to only include pixels with view zenith angle of less than 32 • . Retrieval comparison shows further improvement. Panel (a) shows MOD06ACAERO retrievals with uncertainty of less than 40 %, and panel (b) shows the same with uncertainty less than 30 %. Figure 9. MOD06ACAERO retrieval results from the combined dataset of Fig. 3a compared to source GEOS-5 aerosol optical depth as a normalized density plot. AOD retrievals where COT was less than 4 and where the scattering angle was in the cloud bow are now discarded. Additionally the data extent had been limited to only include pixels with view zenith angle of less than 20 • . Retrieval comparison shows further improvement. However, it is not clear if the reduction in dataset size is worth the gain in accuracy. Panel (a) shows MOD06ACAERO retrievals with uncertainty of less than 40 %, and panel (b) shows the same with uncertainty less than 30 %. put GEOS-5 data is the difference between aerosol models used by MCARS and MOD06ACAERO. Cloud models between MOD06ACAERO and MCARS are identical in this study. The MOD06ACAERO model is fixed for the region, while the GEOS-5 aerosols are fully dynamic as per Randles et al. (2017). However, it is not practical to change either MCARS or MOD06ACAERO code to use a different aerosol model set and with the agreement being as good as it presently is. A question might be asked as to whether the difference between aerosol models used by MCARS and MOD06ACAERO would be an additional source of disagreement, especially in the light of results in Wind et al. (2016). MCARS has the ability to switch between the GEOS-5 aerosols and those used by MOD06ACAERO and MOD04DT. We tested part of the dataset with identical aerosol models between retrieval and simulation and found there to be no significant impact. One reason for that is simulations in Wind et al. (2016) dealt with aerosols located near sources. These aerosols, even though they are the same basic type, traveled a significant distance from the source and have had a chance to absorb water. Once that happens, there is no difference in the scattering properties between the aerosol model used by MOD04DT and GEOS-5. Part of the reason for this specific dataset selection is to also have the cloud- Figure 10. MOD06ACAERO retrieval results from simulated MCARS granule based on Terra MODIS 2006 day 224 10:05 UTC compared to source GEOS-5 aerosol optical depth as a normalized density plot. In this experiment both MOD06 and MOD06ACAERO were modified to use MCARS pixel-level atmospheric profiles to perform atmospheric correction. AOD retrievals where COT was less than 4 and where the scattering angle was in the cloud bow are now discarded. Additionally the data extent had been limited to only include pixels with view zenith angle of less than 20 • . This experiment shows excellent agreement with source data. Panel (a) shows MOD06ACAERO retrievals with uncertainty of less than 40 %, and panel (b) shows the same with uncertainty less than 30 %. The small dataset size in panel (b) is the reason for slightly lower agreement with source compared to panel (a).
free land present so that we could repeat the experiment in Wind et al. (2016) on a different continent. We expect over land, and thus near sources, that we would absolutely see the impact of differences in single-scattering albedo.

Conclusions and future directions
This paper is a direct evolution of work started in Wind et al. (2013) and continued in Wind et al. (2016). The Multi-sensor Cloud and Aerosol Retrieval Simulator (MCARS) has now been applied as a verification tool for a research-level algorithm. The algorithm studied was the MODIS above-cloud aerosol properties retrieval algorithm of Meyer et al. (2015). MCARS computational resolution has been doubled, and for this study the high-resolution (7 km) GEOS-5 Nature Run model was utilized. The MCARS code produces radiances and reflectances in a standard MODIS Level-1B format after sending the GEOS-5 data through DISORT-5 radiative transfer code. The output can be directly ingested by any retrieval or analysis code that reads data from the MODIS instrument.
We used the MCARS code to perform verification and closure study on the MOD06ACAERO algorithm. In this study we generated a set of five MODIS granules located in the southeastern Atlantic Ocean off the coast of Namibia. We executed the MOD06ACAERO code on this case set. In the verification part of the study the algorithm performed very well. When pixels with less than 30 % uncertainty were considered with underlying cloud layer having optical thickness greater than 4, the algorithm matched the source GEOS-5 aerosol optical depth with slope of 0.774 and offset of 0.076, RMSE = 0.131. On further examination, executing the algorithm on the same case set with aerosols removed it was determined that there might be data that is less useful around the scattering angle of 140 • , the cloud bow direction. When the cloud bow pixels were excluded the slope improved to 0.913. The near-nadir slope with angle limit of 20 • improved the agreement further to 0.977, RMSE = 0.096.
To look at closure one of the five cases was selected. For closure both MOD06 and MOD06ACAERO codes were modified to use MCARS input profiles as an ancillary instead of the NCEP analysis used in operations (Platnick et al., 2017). When the results were compared to source GEOS-5 data a slope of 0.996 with offset of −0.007 and RMSE = 0.097 was reached for pixels with less than 40 % uncertainty. The agreement was slightly worse for uncertainties less than 30 % (slope 0.989, offset 0.03 and RMSE = 0.085), but that was mainly due to having a smaller number of pixels in the set, only 130 000.
The results of this study suggest that retrievals produced by MOD06ACAERO are of good initial quality and would be a valuable addition to model data assimilation streams with the following constraints. MOD06ACAERO pixels should be assimilated if retrieval uncertainly is less than 40 %, if optical thickness of the underlying cloud layer is greater than 4.0, and if the pixel scattering angle is outside the cloud bow. Additionally, an even tighter constraint can be added to only take pixels that are near nadir.

12
G. Wind et al.: Analysis of the MODIS above-cloud aerosol retrieval algorithm using MCARS This study is yet another example of the capabilities of the MCARS framework. There are many other potential applications of the MCARS code, including extending the simulator to other sensors and examining the performance of fast retrieval simulators used in climate modeling.
Code and data availability. The MCARS code is free of charge and can be downloaded here: https://doi.org/10.5281/zenodo.5224964 .
Author contributions. GW is the development and experiment design lead on the MCARS project. She maintained the code, carried out the experiments, and performed most of the analysis of experimental data. AMdS and PMN assisted with preparation, interpretation, and integration of the GEOS-5 model data. KGM is the author of the MODIS above-cloud aerosol retrieval algorithm, the subject of this simulation experiment. He assisted with interpretation of retrieval results and development of assimilation constraints for the above-cloud aerosol product. SP assisted with analysis, evaluation, and interpretation of all experimental data.
Competing interests. The contact author has declared that neither they nor their co-authors have any competing interests.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.