Systematic bias in evaluating chemical transport models with maximum daily 8-hour average ( MDA 8 ) surface ozone for air quality applications

Chemical transport models typically evaluate their simulation of surface ozone with observations of the maximum daily 8-hour average (MDA8) concentration, which is the standard air quality policy metric. This requires successful simulation 10 of the surface ozone diurnal cycle including nighttime depletion, but models are generally biased high at night because of difficulty in resolving the stratified conditions near the surface. We quantify the problem with the GEOS-Chem model for the Southeast US during the NASA SEAC4RS aircraft campaign in August-September 2013. The model is unbiased relative to the daytime mixed layer aircraft observations but has a +5 ppb bias relative to MDA8 surface ozone observations. The model also does not capture observed occurrences of <20 ppb MDA8 surface ozone on rainy days. Restricting the evaluation to 15 afternoon hours and dry days removes the bias. Better understanding of surface layer stratification and ozone depletion under nighttime and rainy conditions is needed. Resolving the timing of the day-night transition in atmospheric stability and its correlation with plant stomata closure is critical.


Introduction
Ground-level ozone is harmful to human health and vegetation.It is produced when volatile organic compounds (VOCs) and carbon monoxide (CO) are photochemically oxidized in the presence of nitrogen oxide radicals (NOx º NO+NO2).Ozone air quality standards in different countries are generally formulated using the maximum daily 8-hour average concentration (MDA8) as a metric.In the US, the current ozone National Ambient Air Quality Standard (NAAQS) set by the Environmental Protection Agency (EPA) is 70 ppb as the fourth-highest MDA8 concentration per year, averaged over three years (EPA, 2015).Exceedances of the standard generally occur during daytime due to photochemical production and to entrainment of elevated ozone from aloft (Kleinman, et al., 1994).Ozone is depleted at night due to deposition and chemical loss in a shallow surface layer capped by a stratified atmosphere.
Air quality agencies rely on chemical transport models (CTMs) to identify the most effective emission reduction strategies for ozone pollution.CTMs predict surface ozone concentrations on the basis of NOx, VOC, and CO emissions, accounting for chemistry and meteorological conditions.CTMs tend to overestimate surface ozone, particularly in the Southeast United States (Fiore et al., 2009), and a variety of reasons for this overestimate are examined in Travis et al. (2016).MDA8 ozone is commonly used as the metric for evaluating models with observations and making predictions relevant to air quality standards (Fiore et al., 2009;Mueller and Mallard, 2011;Emery et al., 2012;Lin et al., 2012;Rieder et al., 2015).Use of this metric implicitly requires successful simulation of the diurnal cycle in surface ozone but models are generally too high at night, apparently because they cannot resolve the local stratification and associated depletion from surface deposition.This is a problem not only in global models with coarse vertical resolution (Lin and McElroy, 2010;Schnell et al., 2015;Strode et al., 2015) but also in regional air quality models (Herwehe et al., 2011;Solazzo et al., 2012;Solazzo and Galmarini, 2016).A recent evaluation of the CMAQ regional model shows little bias in the diurnal cycle averaged over all monitoring sites in the contiguous US (Appel et al., 2017) but such averaging may smooth the diurnal cycle across different regions (Bowdalo et al., 2016) and across urban, rural, and background sites.
Here we evaluate the use of the MDA8 ozone metric in the GEOS-Chem CTM, a global model frequently used in studies of regional ozone air quality and evaluated for this purpose with MDA8 ozone (Racherla and Adams, 2008;Lam et al., 2011;Zhang et al., 2011;Zoogman et al., 2011;Emery et al., 2012;Zhang et al., 2014).In our previous application of the model to the Southeast US during the NASA SEAC 4 RS aircraft campaign in August-September 2013 (Travis et al., 2016), we found that the model had no significant bias relative to aircraft ozone observations below 1 km altitude but overestimated MDA8 surface ozone by +6 ppb on average.As we show here, this may largely be explained by the poor representation of surface layer stratification.The ultimate solution of this problem will require improved representation of boundary layer physics, but we propose in the meantime some simple corrective measures.

Comparing simulations of mixed layer and MDA8 surface ozone
The GEOS-Chem simulation used here was previously applied by Travis et al. (2016) to interpret observations from the SEAC 4 RS aircraft campaign in August-September 2013 (Toon et al., 2016).It is based on GEOS-Chem version 9.02 with detailed oxidant-aerosol chemistry (www.geos-chem.org) and is driven by assimilated meteorological data from the Goddard Earth Observing System -Forward Processing (GEOS-FP) product of the NASA Global Modeling and Assimilation Office (GMAO) using the GEOS-5.11.0 general circulation model (Molod et al., 2012).The GEOS-FP data have a native horizontal resolution of 0.25° latitude by 0.3125° longitude, with 72 levels in the vertical on a hybrid sigma-pressure grid and a temporal resolution of one hour for surface variables and mixing depths.This native resolution is used in GEOS-Chem over North America and adjacent oceans (130° -60° W, 9.75° -60° N), with boundary conditions from a global simulation with 4°×5° horizontal resolution.The lowest levels are centered at about 65 m, 130 m, 200 m, and 270 m above ground level (AGL).Boundary layer turbulence follows the clear-sky non-local parameterization from (Holtslag and Boville, 1993), as implemented in GEOS-Chem by (J.-T.Lin and McElroy, 2010).Detailed evaluations of GEOS-Chem with observations over the Southeast US for the SEAC 4 RS period are presented in other papers (Kim et al., 2015;Marais et al., 2016;Yu et al., 2016;Zhu et al., 2016;Miller et al., 2017;).Specific evaluation for ozone is presented in Travis et al. (2016).Travis et al. (2016) found that despite successful simulation of ozone observations from the SEAC 4 RS aircraft in the mixed layer below 1 km altitude, MDA8 surface ozone was biased high by +6 ppb on average.The bias between the model and observations is small (+2 ppb) and not statistically significant (p=0.07).The center panel of Fig. 1 shows the observed and simulated pdfs of daily MDA8 surface ozone in August-September 2013 at the thirteen rural CASTNET sites in the Southeast US (EPA, 2018).The Southeast US region is a relatively coherent region for surface ozone, with different sites showing similar behaviors (Bowdalo et al., 2016).The model is biased high by +8 ppb on average and this is highly significant (p < 0.01).All tests of significance are performed with the Welch Two Sample t-test in R. The bias differs slightly from the +6 ppb in Travis et al. (2016) who showed a comparison for June-August.Comparison of the mean aircraft and MDA8 surface concentrations in Figure 1 indicates a vertical difference of 9 ppb in the observations but only 3 ppb in GEOS-Chem.

Correcting for surface layer gradients
A first problem in comparing the model to the CASTNET surface air observations is the mismatch between the lowest model level midpoint (zm = 65 m above ground) and the level at which the observations are made (z1 = 10 m).The model in fact implicitly simulates an ozone concentration at z1 through the aerodynamic resistance Ra(z1, zm) to turbulent vertical transfer in the resistance-in-series parameterization of dry deposition (Brasseur and Jacob, 2017).The model calculates a local ozone deposition velocity vd(zm) at altitude zm assuming uniformity of the vertical flux down to the surface.We can then infer the implicit model ozone concentration C(z1) at 10 m from the explicit concentration C(zm) at 65 m (Zhang et al., 2012): * (z1, zm) is calculated in GEOS-Chem by similarity with momentum for a neutral atmosphere (friction velocity u*) with a heat-based stability correction ϕ 0 (z/L) where L is the Monin-Obukhov length and k is the von Karman constant: Equations 3(a-c) describe ϕ 0 , from Dyer (1974) for unstable and moderately stable conditions (z/L < 1) and from Holtslag et al. (1990) for stable conditions (z/L > 1):  0 = 5 +   ⁄ , / > 1 (3a) The model deposition velocity vd(zm) over the Southeast US during SEAC 4 RS averages 0.7 ± 0.3 cm s -1 in daytime, consistent with observations (Travis et al., 2016).Applying the correction from equation ( 1) at the CASTNET sites we find a mean MDA8 model concentration at 10 m altitude of 45 ± 8 ppb, as compared to 48 ± 9 ppb at 65 m.This correction is purely diagnostic in nature and thus is not an actual reduction to surface ozone in the model that would influence ozone in subsequent hours.Correcting the model to 10 m altitude thus decreases the model bias relative to observations by 3 ppb, but a bias of +5 ppb remains.Model MDA8 ozone at 65 m has ten exceedances of the 70 ppb NAAQS for the CASTNET data in Figure 1, as compared to one in the observations, and sampling the model at 10 m decreases the number of exceedances to four.

Segregating rainy conditions
The most severe bias in comparing the model MDA8 ozone to the CASTNET observations in Figure 1 is for the low tail of the distribution (less than 25 ppb).7 % of observed MDA8 ozone values are below 25 ppb but there is only one value below 25 ppb in the model at either 65 or 10 m.This low-tail bias cannot be simply explained by inflow of low-ozone tropical air from the Gulf of Mexico (Fiore et al., 2002;McDonald-Buller et al., 2011) because the model simulation is unbiased over the Gulf of Mexico relative to the SEAC4RS aircraft observations (Travis et al., 2016).
We find instead that the low MDA8 ozone values in the CASTNET observations are associated with rainy conditions and that rain has less effect on ozone in the model.Figure 2

Accounting for diurnal bias
Yet another factor in the model overestimate of MDA8 surface ozone is the poor simulation of the diurnal cycle.Figure 3 shows the average ozone diurnal cycle for dry days in the model and observations at the CASTNET sites from Fig. 1.The observations show a typical diurnal cycle of maximum values in early afternoon (14-16 LT) and gradual decrease at night to a mean minimum value of 17 ppb at 7 LT.The nighttime depletion cannot be due to chemical titration by anthropogenic NO emissions since the selected CASTNET sites are rural and not located near major roadways.It must instead be due to deposition, including possible titration by short-lived biogenic VOCs (Goldstein et al., 2004;Ruuskanen et al., 2011;Rossabi et al., 2018) under stratified surface layer conditions.The model diurnal cycle at 65 m altitude (lowest model level) has the correct phase but the amplitude is much too weak.Correcting the model to 10 m altitude increases the amplitude but nighttime depletion is still too weak.The difference between 65 and 10 m grows rapidly between 16 and 18 LT as the atmosphere becomes stable (L > 0) and the mixed layer collapses but ozone deposition is still fast because of open stomata.After the stomata close at night the gradient weakens.We find negligible difference in the model diurnal cycle shown in Figure 3 between August and September.Silva and Heald (2018) show that the low nighttime ozone deposition velocities in the model are consistent with observations, which would include the effect of titration by nighttime emissions of short-lived biogenic VOCs.Lack of diurnal cycle in modeled anthropogenic emissions has been suggested as a cause of the general underestimate among models of the summertime diurnal amplitude of ozone concentrations (Schnell et al., 2015), but the emissions used here have hourly resolution based on the National Emission Inventory of the US Environmental Protection Agency.We conclude that the insufficient nighttime depletion in the model must be due to insufficient vertical stratification of the surface layer, combined with poor resolution of the correlated timing between day-night transition to stable conditions and stomata closure.
A consequence of the insufficient model depletion of ozone at night is that the model may err in the diurnal timing of MDA8 ozone.Fig. 4 shows the pdf of the beginning of the 8-hour interval for MDA8 ozone at the CASTNET sites on dry days, comparing the observations and the model.In the observations the pdf peaks sharply at 11 LT (MDA8 window of 11-18 LT), consistent with the mean diurnal cycle of Figure 3.The model sampled at 65 m also has a maximum probability of MDA8 ozone starting at 11 LT, but also a secondary maximum at 19 LT that is absent from the observations.The latter conditions occur in the model when the atmosphere becomes stable already at 16 LT, decoupling 65 m from the surface and the associated deposition.Under these conditions the model concentration at 65 m remains high in the evening and at night.Correcting the model calculation of MDA8 to use the 10-m ozone largely removes this secondary maximum (Figure 4) but shifts the peak occurrence of MDA8 ahead by two hours (starting at 9 LT) because of the exaggerated model drop at 17 LT when the model atmosphere becomes stable but ozone stomatal deposition is still active (Fig. 3).The transition from a convective mixed layer to stable nighttime conditions is difficult for models to capture and is an active area of research (Lothon et al., 2014).The correlated timing with stomatal closure further complicates the simulation of the day-night transition in surface ozone.Model error in the simulation of the ozone diurnal cycle due to insufficient nighttime depletion thus induces a representation error when comparing to MDA8 observations, as the MDA8 periods in the model do not correspond to the same times of day as in the observations.This causes positive bias in the comparison.Another approach in model evaluation is to focus instead on afternoon conditions, recognizing that the model is inadequate to simulate ozone depletion in the shallow surface layer at night (e.g., Fiore et al., 2002).The right panel of Figure 1 compares simulated and observed pdfs of surface ozone at the CASTNET sites at 12-17 LT on dry days, sampling the model at 10 m altitude.The +8 ppb bias in the original model comparison (center panel) is reduced to an insignificant +1 ppb.Focusing evaluation on afternoon hours can be adequate for understanding general properties of the model ozone budget, such as the response to changes in NOx emissions (Strode et al., 2015), because the stratified surface layer represents only a small volume of atmosphere.However, the problem of simulating the policy-relevant MDA8 surface ozone remains.

Implications
We identified three modeling problems biasing the comparison to observed maximum daily 8-h average (MDA8) ozone for air quality applications: (1) vertical mismatch between the lowest model level and the altitude of the observations, ( suggest the need for more research in the dynamics of stable boundary layers.Focusing model evaluation on dry afternoon conditions avoids these problems and may be adequate for general testing of the model chemistry.Models should seek a consistent approach for surface fluxes of heat, water vapor, and chemical tracers to improve modeling of the day-night transition.As models improve, better representation of surface ozone under stratified conditions may be achieved so that MDA8 ozone can be predicted with confidence.If so, evaluation of the model diurnal cycle of surface ozone at individual or coherent sites should be an essential step to building that confidence.

Author Contributions
KRT and DJJ designed this study and prepared the manuscript.KRT performed the simulations and analyses.segregating rainy and dry days as described in the text.The model is sampled at 10 m altitude to match observations, as described in Section 3.For each sky condition, the mean ozone and its standard deviation are given inset with the frequency of that sky condition in parentheses.The probabilities of dry and rainy condition do not add to 100 % because we do not include marginal days where rainfall is between 1 and 6 mm.
Fig. 1 (left panel) shows the probability density functions (pdfs) of ozone concentrations measured by the aircraft (12-17 local solar time or LT) and simulated by the model along the flight tracks.Model values are adjusted to local solar time by 1 hour per 15 o longitude.The data have been filtered for biomass burning (CH3CN > 200 ppt) and urban plumes (NO2 > 4 ppb).
segregates the frequency distribution of MDA8 ozone at CASTNET sites between rainy days and dry days.Rainy days are defined by 24-h total rainfall exceeding 6 mm and dry days by 24-h total rainfall less than 1 mm.Rainy and dry days are diagnosed in the observations with the high-resolution data from the Parameterelevation Regressions on Independent Slopes Model (PRISM) climate group (PRISM, 2016) regridded to the model resolution of 0.25 o × 0.3125 o .Rainy and dry days in the model are diagnosed from the GEOS-FP data.Observed ozone on rainy days averages 9 ppb lower than on dry days (33 vs 42 ppb).Model ozone is also lower on rainy days but not by as much (41 vs 46 ppb).Rainy conditions can cause MDA8 ozone to drop below 20 ppb in the observations but not in the model.Depletion of surface ozone under rainy conditions is not due to wet scavenging, considering the low solubility of ozone in water, but likely reflects vertical stratification from surface evaporative cooling.Rainfall or dew may also enhance the non-stomatal component of ozone dry deposition(Finkelstein et al., 2000;Altimir and Kolari, 2006;Potier et al., 2017) but the mechanism for this enhancement is uncertain.Comparing the 10-m model MDA8 concentration to observations excluding rainy days decreases the model mean bias modestly from +5 ppb to +4 ppb, but more importantly it excludes the low tail of the observed distribution that the model cannot capture.
insufficient vertical stratification and/or ozone loss under rainy conditions, and (3) inadequate representation of the day-night transition to stable conditions leading to error in timing of the 8-hour MDA8 window.Problem (1) is readily solved by using the parameterization of surface layer turbulence implicit in the model simulation of dry deposition.Problems (2) and (3) Geosci.Model Dev.Discuss., https://doi.org/10.5194/gmd-2019-78Manuscript under review for journal Geosci.Model Dev. Discussion started: 4 April 2019 c Author(s) 2019.CC BY 4.0 License.Dev.Discuss., https://doi.org/10.5194/gmd-2019-78Manuscript under review for journal Geosci.Model Dev. Discussion started: 4 April 2019 c Author(s) 2019.CC BY 4.0 License.

Figure 1 :
Figure 1: Probability density functions (pdfs) of ozone concentrations in the Southeast US (94.5-80W, 29.5-38 N) in August-September 2013, sampled at the blue locations in the maps inset.Observations are compared to GEOS-Chem model values sampled at the same locations and times.Means and standard deviations are given inset for each pdf.The left panel shows afternoon (12-17 local solar time) mixed layer values from the SEAC 4 RS DC8 aircraft at 0.4-1.0km altitude.Ozone measurements are from the NOAA NOyO3 four-channel chemiluminescence (CL) instrument (Ryerson et al., 1998) The center panel shows MDA8 surface ozone at the CASTNET network of 13 rural sites, compared to the model sampled at 65 m (dashed line) above ground (lowest model gridpoint) and the inferred model value at 10 m (solid line) as described in the text.The right panel shows afternoon ozone at the CASTNET sites excluding days with rain in either the model or the observations.

Figure 2 :
Figure2: Probability density functions (pdfs) of MDA8 ozone at CASTNET sites in the Southeast US in August-September 2013, segregating rainy and dry days as described in the text.The model is sampled at 10 m altitude to match observations, as described in Section 3.For each sky condition, the mean ozone and its standard deviation are given inset with the frequency of that sky condition in parentheses.The probabilities of dry and rainy condition do not add to 100 % because we do not include marginal days where rainfall is between 1 and 6 mm.

Figure 3 :
Figure 3: Mean diurnal cycle of ozone and related surface variables at the 13 Southeast US CASTNET sites in Figure 1 for August-September 2013.Ozone observations in the top left panel are compared to GEOS-Chem values sampled at 65 m altitude (lowest model level) and at 10 m altitude (where the observations are sampled).Other panels show the mean 10-m ozone deposition velocity in GEOS-Chem, the median Monin-Obukhov length L in the GEOS-FP data used to drive GEOS-Chem, and the mean mixed layer depth in the GEOS-FP data.5

Figure 4 :
Figure 4: Timing of MDA8 ozone at the Southeast US CASTNET sites in August-September 2013.The figure shows the probability density functions (pdfs) of the beginning hour of the 8-hour period defining the MDA8 ozone value for each day.Only dry days (24-h precipitation 10