A database and tool for boundary conditions for regional air quality modeling : description and evaluation

Transported air pollutants receive increasing attention as regulations tighten and global concentrations increase. The need to represent international transport in regional air quality assessments requires improved representation of boundary concentrations. Currently available observations are too sparse vertically to provide boundary information, particularly for ozone precursors, but global simulations can be used to generate spatially and temporally varying lateral boundary conditions (LBC). This study presents a public database of global simulations designed and evaluated for use as LBC for air quality models (AQMs). The database covers the contiguous United States (CONUS) for the years 2001–2010 and contains hourly varying concentrations of ozone, aerosols, and their precursors. The database is complemented by a tool for configuring the global results as inputs to regional scale models (e.g., Community Multiscale Air Quality or Comprehensive Air quality Model with extensions). This study also presents an example application based on the CONUS domain, which is evaluated against satellite retrieved ozone and carbon monoxide vertical profiles. The results show performance is largely within uncertainty estimates for ozone from the Ozone Monitoring Instrument and carbon monoxide from the Measurements Of Pollution In The Troposphere (MOPITT), but there were some notable biases compared with Tropospheric Emission Spectrometer (TES) ozone. Compared with TES, our ozone predictions are high-biased in the upper troposphere, particularly in the south during January. This publication documents the global simulation database, the tool for conversion to LBC, and the evaluation of concentrations on the boundaries. This documentation is intended to support applications that require representation of long-range transport of air pollutants.


Introduction
The role of hemispheric transport of air pollutants is increasingly a focus of regional pollution studies (Lin et al., 2000(Lin et al., , 2012;;Reidmiller et al., 2009).The growing emphasis reflects three factors: hemispherically transported pollutants (Cooper et al., 2010;Fiore et al., 2009;Oltmans et al., 2006Oltmans et al., , 2010) ) and (3) long-range transport can have episodic strong influence (Fiore et al., 2002).Thus, model attainment demonstrations must achieve lower pollutant concentrations fields with a higher uncontrollable fraction.Under these conditions, it is imperative for the model to include long-range transported air pollution concentrations and accurately represent their variability in time and space.The long-range transported air pollutants are primarily communicated to air quality models (AQM) through the lateral boundary conditions (LBC).This paper documents the development and availability of a resource that provides LBC for the air quality modeling community.The surface level ozone concentrations have a 10-15 ppb sensitivity to LBC values even in locations relatively far from the boundary (Napelenok et al., 2011).Much of the model sensitivity can be attributed to high mixing ratios (O 3 = 100-1000 ppb) in the upper troposphere/lower stratosphere (Krueger and Minzner, 1976;Lacis et al., 1990;Warneck and Williams, 2012).The high concentrations aloft are influenced by local emissions, international transport (Dentener et al., 2010;Lin et al., 2012), and stratosphere-troposphere-exchanges (Bourqui et al., 2012;Cui et al., 2009;Lefohn et al., 2011).The LBC, particularly at high altitude, is a mechanism of communicating each of these sources to the contiguous domains often used in regional air quality simulations.
Previously, LBC have come from a variety of sources and have been evaluated indirectly.The Community Multiscale Air Quality (CMAQ; Foley et al., 2010) model originally used "clean air" estimates or observations averaged over space and time, but preserving the vertical dimension where possible (e.g., ozone based on Logan et al., 1999).These vertical profile lateral boundary conditions (PLBC) have obvious limitations.The observations used to construct LBC are sparse in space and time and, therefore, interpolation and extrapolation are unavoidable.As a result, variability in space and time is lost.Although utilizing "clean air" estimates is still common (Gégo et al., 2008;Godowitch et al., 2008;Smyth et al., 2009;Zhang et al., 2004), increasingly publications recognized these limitations and the growing availability of global simulations Figures

Back Close
Full to provide estimates of air pollution concentrations with time resolution ranges hourly to seasonal mean (Appel and Gilliland, 2006;Barna and Knipping, 2006;Fu et al., 2009;Hogrefe et al., 2008;Jiménez et al., 2007;Lam and Fu, 2009;Nghiem and Oanh, 2008;Schichtel et al., 2005;Valari et al., 2011).By themselves, these global simulations are too coarse for regional/urban air quality standard attainment demonstrations, but they offer a potential source of LBC for regional/urban AQM (Appel and Gilliland, 2006;Lam and Fu, 2009;Song et al., 2008).The importance of evaluating LBC is evident in sensitivity analysis (Barna and Knipping, 2006;Jiménez et al., 2007;Napelenok et al., 2008), but most LBC evaluations are indirect.When modeling the contiguous United States (CONUS), most of the LBC are over water.As mentioned above, these locations have a paucity of observational data.As a result, the accuracy of the LBC inputs are evaluated based on alternate locations.For example, Lam and Fu (2009) first evaluated model predictions based on three ozonesondes sites over the CONUS (Trinidad Head, CA; Boulder, CO; Huntsville, AL).They further indirectly evaluated the LBC fitness based on model performance at surface locations.Although air quality models have many degrees of freedom to isolate LBC, this type of indirect evaluation has been necessary.Even these indirect evaluations concluded that GEOS-Chem LBC (GLBC) outperformed clean air profiles and climatological averages (Appel and Gilliland, 2006;Lam and Fu, 2009;Song et al., 2008).This conclusion gives some credence to the GLBC values, but in this report, we will further evaluate the GLBC using space/time coincident measurements available from satellite retrievals.
This document is structured according to the process of creating and evaluating LBC.The first section describes the details of the GEOS-Chem simulations used to create a database of global concentration fields for LBC.The second section documents the design, components, and functionality of the tool designed to create GLBC from GEOS-Chem output.The third section details the methods and results of evaluating GLBC using satellite observations.The conclusions review the usability of the tool and the Figures

Back Close
Full fitness of database results.Finally, we discuss the availability of the LBC tool and global simulation database for the community.

GLBC simulation database
While LBC may be improved by global atmospheric modeling, the development and testing of global models is beyond the resources and scope of many air quality modeling studies.In order to provide users of regional AQM with global model information for boundary conditions in regional domains, a series of GEOS-Chem simulations have been conducted and are available for download and conversion to regional model ready boundary files.GEOS-Chem is a research-grade atmospheric model with scientific groups across the world continuously improving the model code, chemistry formulation, and input information.(Details of the ongoing work on GEOS-Chem can be found at the model wiki page: http://wiki.seas.harvard.edu/geos-chem/.)Continual improvements to the model and a variety of chemistry, meteorology, and emission options within GEOS-Chem poses a challenge for regional air quality modelers in choosing the optimal model setup for generating LBC.
To address this, we have conducted a series of GEOS-Chem simulations at 2 • × 2.5 • horizontal resolution spanning multiple model release versions and input options.
Hourly concentrations for North America from all of these simulations are archived and available for download.Due to data storage considerations, only the hourly values for gridcells containing and surrounding the contiguous United States are archived (Fig. 1).Plans are underway to expand availability to global coverage.To reduce computational burden, GEOS-Chem combines many chemical species into "tracer" groups at time of advection.These tracer groups are then converted back into chemical species ("cspec") during the chemical calculations.Since some chemical species are important when mapped to regional models (Pye and Napelenok, 2013), both the GEOS-Chem tracer and cspec arrays are included in the LBC archive.All simulations are based upon GEOS-Chem's NO x -O x -hydrocarbon-aerosol application with the optional Secondary Organic Carbon Aerosol module enabled.An update in the chemistry mechanism between versions v8-02-01 and v8-02-04 contained a bug fix which led to a decrease in simulated ozone chemical loss.Because modeled ozone concentrations already have high positive biases in North America (Mao et al., 2013), this bug fix may lead to increased ozone biases in regional models.Improvements to halogen and heterogeneous aerosol chemistry have shown promise in reducing this high-bias (Mao et al., 2013), but are not included in production simulations.As these updates are still underway, the work here does not include them.Our v8-02-03 GEOS-Chem simulations follow the recommended settings.A Sparse Matrix Vectorized Gear-based solver (Jacobson and Turco, 1994) is employed to solve the system of partial differential equations representing emissions and chemistry.Convection was solved using non-local planetary boundary layer and solving cloud convection.

Conclusions References
Tables Figures

Back Close
Full version 9 of the GEOS-Chem model.For simulations after 2004, GEOS-Chem version 8 using the GEOS-5 assimilated meteorology dataset (Molod et al., 2012) is recommended.See the GEOS-Chem documentation (http://acmg.seas.harvard.edu/geos/)for a description of changes between model releases.Overall using a mature version 8 of GEOS-Chem and the GEOS-5 meteorology dataset is recommended, as some of the changes in the latest released version 9 have not been as fully evaluated.Additional details of the model setup for each of the available simulations are listed in Table 1.

GLBC tool description
Model compound translation (GEOS-Chem to regional model compounds) and spatial mapping of the global output to LBC are served by two distinct components in the GLBC tool.Model compound translation is performed by a Python (python.org)preprocessor, and a Fortran program handles spatial mapping.A flowchart of the overall program is shown in Fig. 2 and each component is described below.

Python pre-processing
The Python pre-processor interprets model configurations and user inputs to apply appropriate scaling.Both GEOS-Chem and CMAQ have several chemistry/aerosol configurations that continue to evolve.The pre-processor interprets configurations files and provides failsafe measures to prevent mapping of incorrect model versions and highlight potential errors.In addition, the pre-processor is able to apply appropriate unit conversions when appropriate.

Conclusions References
Tables Figures

Back Close
Full processor requires the tracer_info.dat.The final input is a user configuration file that will be described further below.
Mapping between GEOS-Chem and CMAQ species requires human interpretation.Each model has its own definition of gas-phase and aerosol-phase speciation.Even common elements are named inconsistently (e.g., formaldehyde = FORM = HCHO = CH 2 O).The default compound-mapping file shown as a csv file in green in Fig. 2 is described in detail below to facilitate user creation of new mapping files.For the most common configurations of GEOS-Chem and CMAQ, species mapping are already provided for several chemical mechanisms (e.g., Carbon Bond '05, SAPRC07T -provided in supplemental Tables A1 and A2).For these mechanisms, the species mapping has already been done and no manual interpretation is necessary.Ideally, any new mapping configuration files will be submitted back to the software package for subsequent distribution to other users.The mapping file contains one or more lines for each output boundary species.The individual lines represent algebraic transformations excluding unit conversion, which is mostly automatic.The numbered lines below are example lines from the species-mapping file with the regional model (e.g., CMAQ) species listed first followed by the global model (GEOS-Chem) formula.Mapping assumes that the formula is based on GEOS-Chem tracers.If the name indicated is not found in the tracer file, the species (cspec) file will be searched.Line 1 is currently configured for the GEOS-Chem tracer file.The GEOS-Chem version 8 tracer file does not include ozone explicitly, but rather O x or odd oxygen.The "cspec" file includes ozone explicitly as "O 3 ", so if line 1 is replaced with "O 3 , O 3 " and the mapping tool would first try to find O 3 in the tracer file, not find it, and then search and find "O 3 " in the "cspec" file.
Caution is advised when using values contained in the "cspec" file.For example, in the stratosphere, the "cspec" file does not contain meaningful values.These values are generally not updated or accessed by the GEOS-Chem simulation, and should not be used for LBC if information is available in the tracer file.
Line 2 illustrates a difference between the quantities stored in CMAQ LBC files and GEOS-Chem tracer files.ALD2, or acetaldehyde, is stored as parts per billion of carbon (ppbC) in GEOS-Chem and ppb in CMAQ.Since acetaldehyde has two carbons, the GEOS-Chem value must be halved for use by CMAQ.
Lines 4-8 demonstrate that additional lines are additive.Aerosol species in GEOS-Chem, such as wind-blown mineral dust and sea-salt, are speciated into individual aerosol constituents (Appel et al., 2013), and lines 3-7 demonstrate how GEOS-Chem aerosols such as SALC and DST2 are mapped based on CMAQ emission profiles for assignment to coarse mode sulfate.Because the lines are additive, these lines could be re-written as line 8 but lines 3-7 and 8 should not both be included.The mapping expressions can include all standard python operators (+, -, *, /, **, %, etc), but math functions (e.g., sin) are not currently available.
There are 5 types of factors that might be applied: Introduction

Conclusions References
Tables Figures

Back Close
Full 2. Conversion of real species to CB05/SAPRC mechanism species (like multiplying ACETONE by 3 for PAR).
3. Conversion of tracers in ppbC to ppb (like dividing benzene by 6).
5. Conversion to regional model units.
Type 1 and 2 require algebraic expressions in the mapping file.Type 3 does not require expressions because the python preprocessor will automatically convert ppbC to ppb.Type 4 is a special case of type 3 where the regional model's conversion to ppb must be overridden in the file.Type 5 are treated automatically, converting ppb to µg m 3 for aerosols and ppb to ppm for gas-phase species.

Fortran spatial mapping
The Fortran-based spatial mapping program uses 3 required inputs and 2 optional inputs.The software first requires the output from the species mapping Python preprocessor described above.The species mapping is simply applied in concert with the spatial mapping.
The software also requires a GEOS-Chem tracer output file and, optionally, a chemical species output file.The GEOS-Chem files have sufficient meta-data to identify the files spatial location and extent based on the well-documented GEOS-Chem domains (Yantosca et al., 2012).The vertical coordinate is specified in the GEOS_DOMAIN.INC file, which re-writes the GEOS-Chem hybrid-eta coordinates as a sigma-P coordinate.
Finally, the software requires a meteorological input file, METBDY3D produced by the a CMAQ utility (Otte and Pleim, 2010), which contains sufficient information to describe the centroid locations of each boundary cell, the vertical location on a sigma-P coordinate, and air density.Using the centroid locations from METBDY3D, the software uses a nearest-neighbor approach to identify a corresponding GEOS-Chem row and column.Figure 1 shows the intersection of an example boundary and the GEOS-Chem Introduction

Conclusions References
Tables Figures

Back Close
Full outputs.Using the sigma-P coordinates (native and derived), the method then linearly interpolates concentrations from GEOS-Chem layer centers to CMAQ layer centers.Simulations using coarse vertical resolution may need to reduce the influence of aloft ozone LBC.For example, previous work has shown that coarse vertical resolution can cause bias due to high ozone near the tropopause (Lam and Fu, 2009).We include tools for excluding stratospheric air from LBC, but do not recommend its use unless specifically desired.
Exclusion of stratospheric air has been suggested on the basis that AQM do not explicitly treat the stratosphere (Lam and Fu, 2009).Since then, there has been more work identifying the importance of stratospheric air in air quality (e.g., Lefohn et al., 2011).Air quality models have increased their vertical extent and now often include stratospheric influence, if not stratospheric air (e.g., Carlton et al., 2010).To account for the stratosphere, efforts have been made to scale the upper layer concentrations based on stratospheric indicators (Lin et al., 2008).As such, LBCs that specifically exclude stratospheric air are not consistent with the need to include stratospheric influence in air quality models.Further, reports show that vertically coarse models, like that used in Lam and Fu (2009), previously transported too much aloft air to the surface.This suggests that, while stratospheric air is an important contributor to variability, previous models would have optimal solutions that minimized aloft LBC values.The use of indirect evaluation, like interior domain surface concentrations, is inherently subject to canceling errors (e.g., Oreskes et al., 1994).

GLBC evaluation
This section describes the evaluation of GLBC using satellites retrievals.While ozonesondes are often considered the gold standard for evaluating satellite products (Nassar et al., 2008;Worden et al., 2007), they are not available at the boundary locations.In this analysis, we evaluate the LBC ozone values using the Tropospheric Emission Spectrometer (TES) satellite retrievals.The TES instrument uses infrared Introduction

Conclusions References
Tables Figures

Back Close
Full Fourier transform spectroscopy to retrieve ozone vertical profiles (Bowman et al., 2011) from the Aura satellite using nadir scanning.We are using version 4 (V004) that has improved performance compared to V001 evaluated by Worden et al. ( 2007), but has a 5-15 % high-bias consistent with Nassar et al. (2008).Although the evaluation below will be performed in an absolute sense, the interpretation of these results must account for TES's unresolved high bias.
To evaluate the model, we pair TES observations with GEOS-Chem grid cells from two years, 2006 and 2008.January results are selected to represent winter and August results are selected to represent the traditional ozone season.The GEOS-Chem grid cells are filtered for just those that would be used in creating CONUS boundary conditions (see Fig. 1).Grid cells are paired with TES when the swath centroid is contained within cell.After pairs have been identified, the GEOS-Chem prediction is processed using Eq. ( 1), which is adapted from Bowman et al. (2011, Eqs. 5-8).
where all y values are the natural log of ozone mixing ratio in ppb, y is the model retrieval that can be directly compared to the TES retrieval.In the evaluation shown here, the results have all been converted to ozone mixing ratios.Although the absolute value of y i,m t depends on the prior (y i t,c ), a comparison between y i,m t and the TES retrieval ( y i t ) does not (Bowman et al., 2011).This independence is mathematically shown in the TES User Guide.
The evaluation has been performed on groups of individual retrievals based upon similar bias features.Based on TES swath centroid locations, there are a total of 2139 GEOS-Chem and TES pairs during the comparison time period.For these comparisons, biases were initially reviewed for 24 categories (3 yr × 2 months × 4 perimeter cardinal edges).From those categories, four emerged as distinct cases.The difference between years was nominal and is not highlighted here, but is included in the Appendix.

Back Close
Full  -A8).The difference between perimeter edges was most interesting between north and south.As a result, the west and east perimeter have been bisected equally and allocated to either north or south.The west and east boundaries, with bisected north and south, are shown in the Appendix.As discussed further below, the north/south divide dominates the bias in a somewhat offsetting manner.As a result, the west and east overall performance (Figs.A1 and A2) is nominally better than either the south or north.
Figure 3 shows ozone (ppb) for the remaining four categories (northern-January, southern-January, northern-August, southern-August) from GEOS-Chem, GEOS-Chem retrievals, and TES retrievals.To aid in interpretation, GEOS-Chem biases have been highlighted using triangles on the y scale (red = high; blue = low) when greater than the TES observation uncertainty.The mean and range of profiles show good correspondence most of the time.In January, mixing ratios and variability increases with altitude.In August, mixing ratios increase with altitude, while variability decreases.In the north, the model and TES have lower mixing ratios than the prior above 700 hPa, which demonstrates the sensitivity of this retrieval to the model/TES signal.
In Fig. 3, the two main areas of bias can be seen in more detail.The first is evident along the northern boundary in both January and August.In the north, GEOS-Chem shows a low bias relative to TES in the upper troposphere (350-125 hPa).The southern boundary in January is high-biased in the upper troposphere (> 350 hPa).This highbias is only found in winter.
To further explore these aggregate biases, Fig. 4 shows the distribution of individual retrieval biases.The biases in Fig. 4 are shown as the ratio of retrieval mixing ratios (i.e., ppb).To reiterate, this type of comparison is not dependent upon the prior -only the sensitivity of the instrument.A comparison of the range of biases is shown in Fig. 4, rather than range of concentration.As in Fig. 3, the southern boundary in August has the best performance.Southern-August results show that 53 % of the results are within ±10 % (80 % within ±20 %).The northern-August results show slightly Introduction

Conclusions References
Tables Figures

Conclusions
We describe and evaluate a tool for using global simulations to produce LBC for regional air quality models.In general, the LBC had biases that are acceptable given the uncertainties of the TES retrievals.Our evaluation showed better performance for August compared to January.A higher bias was found in the upper troposphere on the southern boundary in January that has longer ozone life times.The altitude and time of the bias suggests an over-estimation of long-distance transport.Whether the error comes from simulated transport or emissions is not known, but Southeast Asia emissions are suspect based on their high uncertainty.
We propose that the presented tool provides a resource to better represent global transport through boundary conditions in regional air quality studies.The tool's evaluation demonstrates the fitness of produced LBC.Introduction

Conclusions References
Tables Figures

Back Close
Full

A2 Species mapping for gas-phase
See Tables A1 and A2.

A3 Species mapping for CMAQ aerosols
The CMAQ AERO6 aerosol module generally contains more detailed information regarding aerosol speciation and size than standard GEOS-Chem output.As a result, factors are applied to GEOS-Chem aerosols to appropriately convert them to CMAQready boundary conditions.The conversions we recommend are shown in Table A3 and discussed below.
Both seasalt and dust in GEOS-Chem contain size information.Accumulation (SALA) and coarse (SALC) mode seasalt from GEOS-Chem are matched with the accumulation (J) and coarse (K) mode in CMAQ.Based on the particle size of the four GEOS-Chem dust size bins, the smallest dust (DST1) is mapped to the accumulation mode while all other bins (DST2-4) are mapped to the coarse mode.Speciation of seasalt into trace metals and other aerosol constituents is based on the same speciation profile that CMAQ uses for seasalt emissions diagnosed within the model.The speciation of wind-blown mineral dust also follows a speciation profile in CMAQ and is based on a composite of four desert dust profiles (Appel et al., 2013).
Sulfate, nitrate, and ammonium aerosol in GEOS-Chem (Park et al., 2004;Pye et al., 2009) do not explicitly contain size information, but are generally assumed to be representative of the accumulation mode.As a result 99 % of sulfate, nitrate, and ammonium are assigned to the accumulation (J) mode while 1 % is attributed to the Aitken (I) mode.Sulfate formed on seasalt (SO 4 s) and nitrate formed on seasalt (NO 3 s) (Alexander, 2005) are mapped to the CMAQ coarse mode.99.9 % of primary carbonaceous Introduction

Conclusions References
Tables Figures

Back Close
Full aerosols from GEOS-Chem are attributed to the accumulation mode while 0.1 % are assigned to the Aitken mode consistent with CMAQ emissions processing.Both hydrophobic (BCPO) and hydrophilic (BCPI) forms of black carbon in GEOS-Chem are summed together and mapped to elemental carbon (EC).Similarly, hydrophobic and hydrophilic organic carbon is mapped to primary organic carbon.The non-carbon organic matter (NCOM) associated with primary organic aerosols is not calculated by GEOS-Chem, so a OM/OC ratio of 1.4 is assumed for boundary condition purposes (Park, 2003).
Although CMAQ and GEOS-Chem both treat secondary organic aerosol from the same set of parent hydrocarbons, the species lumping schemes differ.In CMAQ, lumping is based on precursor hydrocarbon identity as well as volatility while the GEOS-Chem SOA lumping scheme (Chung, 2002;Henze et al., 2008;Liao et al., 2007) generally does not separate based on volatility.The mapping of SOA as well as gas-phase semivolatiles is based on identifying the equivalent parent hydrocarbon in each model.Speciation to the different volatility species within CMAQ is based on the expected relative amounts of each species in outflow of the Eastern US as predicted by a typical CMAQ simulation.
The particle number and surface area for the boundary conditions are calculated in the Fortran code based on the mass mapped into each mode.
The following CMAQ aerosol species boundary conditions are not mapped since there is not an analogous GEOS-Chem model species: AOLGBJ, AOLGAJ, AALKJ, SV_ALK, ACORS.Aerosol water is also not mapped as it is readily computed within CMAQ and does not need to be transported.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full  Full     Vertical gray lines delineate the ±10% (fine) and ±20% (heavy) bias ranges.

Figure S2 :
Figure S2: Individual retrieval bias for 2006-2008 for East and West boundaries shown as boxplots for each altitude bin in the TES product.Whiskers indicate min/max, the box represents Fig. A2.Individual retrieval bias for 2006-2008 for east and west boundaries shown as boxplots for each altitude bin in the TES product.Whiskers indicate min/max, the box represents the interquartile range, the blue line in the box is the median and the red cross is the mean.Vertical gray lines delineate the ±10 % (fine) and ±20 % (heavy) bias ranges.

Figure S3 :
Figure S3: Same as S1 for 2006 northern and southern boundaries.

Table 1 .
GEOS-Chem Annual Simulations for CMAQ boundaries (recommended in bold).

Table A2 .
SAPRC07 species mapping in the form SAPRC07 Species, GEOS-Chem expression.