Twelve-month , 12 km resolution North American WRF-Chem v 3 . 4 air quality simulation : performance evaluation

We present results from and evaluate the performance of a 12-month, 12 km horizontal resolution year 2005 air pollution simulation for the contiguous United States using the WRF-Chem (Weather Research and Forecasting with Chemistry) meteorology and chemical transport model (CTM). We employ the 2005 US National Emissions Inventory, the Regional Atmospheric Chemistry Mechanism (RACM), and the Modal Aerosol Dynamics Model for Europe (MADE) with a volatility basis set (VBS) secondary aerosol module. Overall, model performance is comparable to contemporary modeling efforts used for regulatory and health-effects analysis, with an annual average daytime ozone (O3) mean fractional bias (MFB) of 12 % and an annual average fine particulate matter (PM2.5) MFB of −1 %. WRF-Chem, as configured here, tends to overpredict total PM2.5 at some high concentration locations and generally overpredicts average 24 h O3 concentrations. Performance is better at predicting daytime-average and daily peak O3 concentrations, which are more relevant for regulatory and health effects analyses relative to annual average values. Predictive performance for PM2.5 subspecies is mixed: the model overpredicts particulate sulfate (MFB= 36 %), underpredicts particulate nitrate (MFB=−110 %) and organic carbon (MFB=−29 %), and relatively accurately predicts particulate ammonium (MFB= 3 %) and elemental carbon (MFB= 3 %), so that the accuracy in total PM2.5 predictions is to some extent a function of offsetting overand underpredictions of PM2.5 subspecies. Model predictive performance for PM2.5 and its subspecies is in general worse in winter and in the western US than in other seasons and regions, suggesting spatial and temporal opportunities for future WRF-Chem model development and evaluation.


Introduction
Epidemiological studies have established the importance of health effects from acute and chronic exposure to fine particulate matter (PM 2.5 ) and ground-level ozone (O 3 ) (Jerrett et al., 2009;Krewski et al., 2009;Pope III and Dockery, 2006).The accuracy of health-impact predictions for future air pollutant emissions (e.g., Tessum et al., 2012Tessum et al., , 2014) ) depends in part on the performance of air quality models over long timescales and in all seasons.Accurate health-impact predictions often depend on model simulations that cover large geographic areas such as the contiguous US, so as to capture the full impacts of the long-range transport of pollutants (Levy et al., 2003).Whereas chemical transport model (CTM) simulations for a full year for the contiguous US often use 36 km horizontal grids (e.g., Tesche et al., 2006;Yahya et al., 2014), increasing horizontal grid resolution to 12 km can result in the more accurate prediction of pollutant concentrations (Fountoukis et al., 2013) and population exposure.However, increasing horizontal resolution from 36 to 12 km in a CTM typically results in a ∼ 27 times increase in computational intensity (number of grid cells increases ninefold; number of time steps increases threefold).
Although recent CTM evaluation efforts have focused on 12-month and contiguous US model evaluations (Galmarini et al., 2012), CTM model performance for 12 km or finer horizontal grid size for an entire year for the contiguous US is largely unexplored in the peer-reviewed literature.We know of only one such study: Appel et al. (2012) evaluated the performance of the Community Multiscale Air Quality (CMAQ) model (Foley et al., 2010) in reproducing year 2006 concentrations of PM 2.5 and O 3 for the contiguous US.In a sec-ond study (not peer reviewed), the US EPA (2012) describes model evaluation for PM 2.5 concentrations for year 2007, also for the contiguous US and using CMAQ.Our study contributes to this literature by evaluating a different model with different parameterizations over a different time period.We also provide greater investigation regarding how model performance varies in space, in time, and by chemical species.
We employ and evaluate the performance of WRF-Chem (the Weather Research and Forecasting model with Chemistry) (Grell et al., 2005) for year 2005 for a North American domain.WRF-Chem is functionally similar to CMAQ, but differs from the version used by Appel et al. (2012) in that WRF-Chem predicts meteorological quantities and air pollution concentrations simultaneously, allowing meteorology quantities to be updated more frequently as the model is running and allowing representation of interactions between meteorology and air pollution.WRF-Chem users can follow a simplified modeling workflow that does not require running a separate meteorological model.Combined meteorology/chemical transport models can be more computationally demanding than standalone CTMs; however, for the domain and settings used here, meteorological modeling accounts for only ∼ 10 % of the total computational expense.
Table A1 summarizes spatial and temporal aspects of recent chemical transport model evaluation efforts, with a focus on WRF-Chem evaluations in the US.WRF-Chem performance in predicting air quality observations has been extensively quantified for simulations of individual regions of the US, with simulation periods of several weeks or months (Ahmadov et al., 2012;Chuang et al., 2011;Fast et al., 2006;Grell et al., 2005;McKeen et al., 2007;Misenis and Zhang, 2010;Zhang et al., 2010Zhang et al., , 2012)).One study evaluated WRF-Chem performance for a full year for the contiguous US with a 36 km grid (Yahya et al., 2014).We present here WRF-Chem results from a full year, 12 km resolution simulation for the contiguous US, evaluate the performance of the model compared to ambient measurements, and compare WRF-Chem performance to published goals and criteria (Boylan and Russell, 2006) and to recent CMAQ results for a similar simulation (Appel et al., 2012).

Model setup
We run the WRF-Chem model version 3.4 using a 12 km resolution grid with 444 rows, 336 columns, and 28 vertical layers.The modeling domain (see Fig. 1) covers the contiguous US, southern Canada, and northern Mexico.Previous studies (e.g., Appel et al., 2012;Yahya et al., 2014) have used 34 vertical layers; our choice of 28 vertical layers represents a tradeoff between vertical grid resolution and computational expense.For ease of viewing, the color scales contain a break at the 99th percentile of concentrations.
Within WRF-Chem, we use the Regional Atmospheric Chemistry Mechanism (RACM) (Stockwell et al., 1997) for gas-phase reactions and the Modal Aerosol Dynamics for Europe (MADE) (Ackermann et al., 1998) module for aerosol chemistry and physics.RACM and MADE were selected because of their relatively modest computational expense; at the time of this study, alternatives to RACM/MADE are impractical for large-scale simulations such as ours.We use the volatility basis set (VBS) (Ahmadov et al., 2012) to simulate formation and evaporation of secondary organic aerosol (SOA).The VBS approach differs from other SOA parameterizations in that it assumes that primary organic aerosol (POA) is semi-volatile.Meteorology options are set as recommended by the WRF user manual (Wang et al., 2012) and the WRF-Chem user manual (Peckham et al., 2012) for situations similar to those studied here.Table 1 summarizes the model options and inputs used.See supporting information for additional details.
We use results from the MOZART global chemical transport model (Emmons et al., 2010) as processed by the MOZBC file format converter (available at: http://web3.acd.ucar.edu/wrf-chem) to provide initial and boundary conditions for chemical species.Because the MOZBC boundary conditions for unclassified PM 2.5 are unrealistic for the southeastern edges of the modeling domain -their use results in substantial PM 2.5 overpredictions in the southeastern US -we set all initial and boundary concentrations to zero for unclassified PM 2.5 .As in Ahmadov et al. (2012), owing to uncertainty in secondary organic aerosol (SOA) concentrations over the open ocean, we assume that initial and boundary concentrations of SOA are zero.Data from the National Centers for Environmental Prediction (NCEP) Eta model (UCAR, 2005) provide meteorological inputs, boundary conditions, and, for the four-dimensional data assimilation (FDDA) employed here, observational "nudging" values.
We   (Houyoux and Vukovich, 1999), version 2.6, as bundled with the NEI data (available at: http: //www.epa.gov/ttn/chief/emch/index.html), then we convert the emission files output by SMOKE to WRF-Chem format and apply a plume-rise algorithm (ASME, 1973, as cited in Seinfeld andPandis, 2006) to estimate the mixing height of elevated emission sources and wildfires.Source code for the file format conversion and plume-rise program is available at https://bitbucket.org/ctessum/emcnv.
We simulate atmospheric pollutant concentrations for the period from 1 January through to 31 December 2005.We choose the year 2005 because at the time this study was performed it was the most recent year for which emissions data were available.For logistical expediency, we separate the year into eight independent model runs, each approximately 1.5 months in length plus a discarded 5-day model spin-up period.We run the simulations on a high-performance computing system consisting of 2.8 GHz Intel Xeon X5560 "Nehalem EP" processors with a 40 Gbit QDR InfiniBand (IB) interconnect and a Lustre parallel file system.Using 768 processors, each 1.5-month model run takes ∼ 19 h to complete (∼ 13 processor years for each annual model run).

Comparison with observations
We compare WRF-Chem wind speed, air temperature, relative humidity, and precipitation predictions to data from the US Environmental Protection Agency (EPA) Clean Air Status and Trends Network (CASTNET) observations.We compare modeled ground-level concentrations of total PM 2.5 to EPA Air Quality System (AQS) observations (US EPA, 2005) using 24 h average data (EPA parameter code 88101) and using the less extensive hourly measurement network (EPA parameter code 88502), which allows us to compare modeled vs. measured diurnal profiles.We compare WRF-Chem predictions of O 3 to measurements from the AQS (EPA parameter code 44201) and CASTNET networks.We compare the predictions of PM 2.5 subspecies to observation data from the EPA's Chemical Speciation Network (CSN) (US EPA, 2005) (formally called Speciation Trends Network (STN)) for organic carbon (OC, parameter code 88305), elemental carbon (EC, code 88307), particulate sulfate (SO 4 , code 88403), particulate nitrate (NO 3 , code 88306), and particulate ammonium (NH 4 , code 88301).We additionally compare predictions to data from the Interagency Monitoring of Protected Visual Environments (IMPROVE) network (University of California Davis, 1995) for particulate OC (code 88320), EC (code 88321), sulfur (code 88169), and NO 3 (code 88306); and to CASTNET observations for particulate SO 4 , NH 4 , and NO 3 .WRF-Chem outputs organic aerosol (OA) concentrations, but methods for measuring organic aerosol only quantify OC.OC comprises a variable fraction of OA, but it is common to assume an OA : OC ratio of 1.4 (Aiken et al., 2008).Therefore, we divide WRF-Chem OA predictions by a factor of 1.4 for comparison with OC measurements.Finally, we compare WRF-Chem predictions of gas-phase sulfur dioxide (SO 2 ) and nitrogen dioxide (NO 2 ) to AQS observations.We remove from consideration those stations with ≥ 25 % missing data relative to the number of scheduled measurements during the simulation period.The fractions of excluded data for each type of comparison are in the Supplement.
WRF-Chem, as configured here, outputs instantaneous concentrations at the start of each hour, whereas the observation data are reported as hourly or daily averages.WRF-Chem calculates grid-cell-average concentrations, whereas observations generally represent concentrations at specific locations.
We compare measured and modeled values pair-wise at each time of measurement in the grid cell containing each measurement station.The 24 h average measurements are compared to the average of the modeled (hourly instantaneous) values within the same period.Comparisons are only made with observations that occur within the first (nearest to ground) model layer (height: ∼ 50-60 m).The source code for the program used to extract and pair model and measurement data is available at https://bitbucket.org/ctessum/ aqmcompare.

Aggregation of results
In addition to reporting annual average model performance for the entire model domain, we also disaggregate results spatially and temporally.We evaluate performance using two www.geosci-model-dev.net/8/957/2015/Geosci.Model Dev., 8, 957-973, 2015 spatial approaches.First, we use four regional subdomains: Midwest, Northeast, South, and West (basis: US Census regions (US Census Bureau, 2013); see Fig. 2).Second, we evaluate urban vs. rural (i.e., not urban) locations, also as defined by the US Census (US Census Bureau, 2014).CSN monitors tend to be placed in urban areas (85 % of 186 monitors are urban), whereas IMPROVE monitors tend to be placed in protected rural areas (10 % of 122 monitors are urban).All 67 monitors in the CASTNET network are in rural locations.We also split the analysis into four seasons: winter (January-March), spring (April-June), summer (July-September), and fall (October-December).Employing these time periods allows us to compare against previously published results (Appel et al., 2012).

Performance metrics
After matching all measured values with their corresponding modeled values, and averaging modeled and measured values across the appropriate time period, we calculate metrics shown in Eqs. ( 1)-( 8): where i corresponds to one of n measurement locations, M and O are time-averaged modeled and observed values, respectively, MB is mean bias, ME is mean error, NMB is normalized mean bias, NME is normalized mean error, MFB is mean fractional bias, MFE is mean fractional error, MR is model ratio, and RMSE is root-mean-square error.We ad- Each metric provides a useful and distinct evaluation of model performance.In general, metrics with "bias" in the name evaluate the accuracy of the model, whereas metrics with "error" in the name incorporate both precision and accuracy.Metrics that are in normalized or fractional form tend to emphasize errors where measured and observed values are relatively small, whereas non-normalized metrics tend to emphasize errors where measured and observed values are relatively large.We mainly focus here on MFB and R 2 to evaluate performance as they facilitate direct comparisons among pollutants.Results for all combinations of time periods, measurement networks, spatial subdomains, and metrics are in the Supplement.
Model performance goals and criteria have been published for PM 2.5 (Boylan and Russell, 2006).Goals reflect performance that models should strive to achieve; criteria reflect performance that models should achieve to be used for regulatory purposes.The goals and criteria suggested by Boylan and Russell (2006) vary with concentration: they are MFB less than ±30 and ±60 % and MFE less than 50 and 75 %, respectively, for most concentrations, but increase exponentially as concentration decreases below ∼ 3 µg m −3 .To incorporate this aspect of performance evaluation, we calculate the fraction of observation stations for which our PM 2.5 model results meet both the MFB and MFE performance goals (fG) and criteria (fC).

Results
Figure 1 shows modeled annual average concentrations of PM 2.5 and O 3 , where the edges of the maps represent the edges of the modeling domain.An animated version of Fig. 1 showing pollutant concentration as a function of time is available in the Supplement.Maps of additional pollutants, as well as monthly, weekly, and diurnal maps and profiles of population-weighted average concentrations, are also available in the Supplement.Modeled O 3 concentrations over water in the Gulf of Mexico and along the Atlantic coast tend to be higher than concentrations over the adjacent land areas.As only areas over water appear to be affected (as Fig. 2a shows, O 3 overpredictions along the Gulf of Mexico and Atlantic coasts are not greater than overpredictions further inland), this over-water anomaly in the Gulf of Mexico should not adversely impact estimates of population-weighted concentrations.
Figure 2 shows monitor locations for total PM 2.5 and for O 3 , as well as annual average fractional bias (MFB) values at each monitor.Results in Fig. 2a (PM 2.5 ) display high spatial variability, with no obvious spatial patterns in model performance; large overpredictions are sometimes adjacent to large underpredictions (e.g., in southern Louisiana and Florida).WRF-Chem generally overpredicts daytime O 3 concentrations relative to observations (Fig. 2b).Monitor locations for meteorological variables, PM 2.5 subspecies, and other gas phase species are in Fig. A1.

Meteorological performance
Figure 3 contains scatterplots comparing annual average observed and predicted values for meteorological variables and pollutant concentrations.The model tends to overpredict near-ground wind speed (Fig. 3a) and precipitation (Fig. 3d) relative to observations, whereas temperature (Fig. 3b) and relative humidity (Fig. 3c) predictions agree well with observations.Figures A2-A5 in Appendix A disaggregate model performance for meteorological variables by region (region boundaries are shown in Fig. 2) and by season; meteorological performance is relatively consistent among seasons and regions.Model-measurement comparisons provide important evidence on model performance but might overestimate model robustness for meteorological parameters because FDDA "nudges" model meteorological estimates toward observed values.

PM 2.5 and O 3 performance
The annual average model-measurement agreement is good for total PM 2.5 concentration (Fig. 3e, 94 % of measurements meet performance criteria), although the model tends to overpredict PM 2.5 concentration at relatively high-concentration monitors (Fig. 3e).The model tends to generally overpredict O 3 concentrations, with worse overpredictions for 24 h aver-  age concentrations (Fig. 3f) than for daily peak (Fig. 3g) and daytime average (Fig. 3f) concentrations.
Figure 4 shows the median and interquartile range for modeled and measured PM 2.5 and O 3 concentrations by hour of day (measurements of PM 2.5 subspecies are only available as 24 h averages).For PM 2.5 , the model generally agrees with measurements, although on average it underpredicts concentrations at night and overpredicts during the day (Fig. 4a).For O 3 , on average the model overpredicts for all times of day but with a much lower fractional error during the day than during the night.For both pollutants, the model accurately captures the timing of diurnal trends, including the afternoon peak for O 3 and the morning and evening peaks for PM 2.5 .As a result, when comparing the three averaging-time metrics for O 3 , we observe better model performance for the annual average of daily peak concentration (MFB = 11 %) and of average daytime concentration (MFB = 12 %) than for the overall annual average (MFB = 23 %).For O 3 , the first two metrics may offer greater relevance than the third.For example, the annual average of daily peak concentrations is more strongly correlated with health effects than are annual aver- age concentrations (Jerrett et al., 2009); and, for comparisons to the 8 h peak concentration National Ambient Air Quality Standard (NAAQS), model performance is more important during daytime than at night.and A7 for other O 3 temporal summaries, in Fig. A8 for SO 2 , and in Fig. A9 for NO 2 .Daytime and peak O 3 predictive performance does not exhibit obvious patterns among seasons or regions; MFB values range from −7 to 48 % (daytime; Fig. 6) and −12 to 29 % (peak; Fig. A7).The overprediction of PM 2.5 concentrations at high-concentration monitors is more prevalent in the South and in urban areas, and is less prevalent in summer than in other seasons (Fig. 5).Modelmeasurement correlation for total PM 2.5 is higher in summer (AQS R 2 = 0.64) than in fall and winter (AQS R 2 = 0.20 and 0.24, respectively), but overall PM 2.5 concentrations are not higher in summer.Previous research has suggested that poor PM predictive performance in winter is common among CTMs and may be attributable to difficulty in reproducing the strongly stable meteorological conditions that are responsible for high winter PM concentrations (Solazzo et al., 2012).Annual average PM 2.5 predictive performance in the West (AQS R 2 : 0.45 (summer), 0.13 (winter)) is worse than performance in the Northeast (AQS R 2 : 0.70 (summer), 0.37 (winter)).In the Northeast, performance is better in the summer (R 2 = 0.69) than in other seasons (R 2 = 0.30-0.40).Taken together, these findings suggest that there is an opportunity for future model development for PM 2.5 to focus on winter or full-year simulations rather than summer-only simulations, and on the western US or the full contiguous US rather than just the Northeast.MFB = 36 %) (Fig. 3i) and SO 2 (MFB = 51 %) (Fig. 3n).This finding (overprediction of total sulfur) agrees with prior research for multiple CTMs (McKeen et al., 2007).Particulate SO 4 prediction performance does not vary much by region; as with total PM 2.5 , performance is worse in winter (CSN MFB = 59 %) than in summer (CSN MFB = 10.%) (Fig. 7).

PM 2.5 subspecies performance
WRF-Chem as configured here performs well in predicting observed particulate NH 4 concentrations, with 99 % of locations meeting performance criteria (Fig. 3j).Similar to total PM 2.5 , performance for particulate NH 4 is worst in the urban areas in the West region (Fig. 8), where a number of monitors report relatively high measured concentrations but modeled concentrations are relatively low.
Particulate NO 3 concentrations are consistently underpredicted (annual average MFB = −110 %) (Fig. 3k). Figure 9 shows that these underpredictions are more severe in some seasons and regions than in others.The best predictive performance is for the Midwest in summer (MFB = −39 %) followed by the Northeast in summer (MFB = −47 %).NO 3 predictions in the West region are poor for all seasons (MFB = −148 %).As with other PM 2.5 species, there is an opportunity for future development and evaluation of models for particulate NO 3 prediction to focus on seasons and regions other than summer in the Northeast.Predictions of gas-phase NO 2 (Fig. 3o) agree relatively well with observations (MFB = 4 %) but, as with other species, the model tends to overpredict NO 2 concentrations in areas where measured concentrations are relatively high.This effect is especially prominent in the West and in urban areas (Fig. A9).
Model-measurement agreement for EC concentrations is relatively good (Fig. 3l), with 96 % of monitor locations meeting performance criteria.As with other comparisons, for EC the model tends to overpredict concentrations for monitors with relatively high concentrations, especially in urban areas (Fig. 10).
Model predictions of OC concentrations (Fig. 3m) are biased low compared to CSN (MFB = −55 %) but agree relatively well with IMPROVE (MFB = 15 %).Mean bias values given here are within the range of values reported by a previous publication using the VBS SOA formation mechanism (Ahmadov et al., 2012).As shown in Fig. 11, the differences in model-measurement agreement between the two networks do not appear to be dependent on urban vs. rural monitor location.Instead, they may reflect between-network differences in sampling or analysis; different analysis techniques are known to produce widely varying OC concentrations (Cavalli et al., 2010).

Comparison with other studies
Table 2 compares performance of WRF-Chem as configured here to that of the CMAQ model in a similar modeling effort by Appel et al. (2012).In this table, CMAQ as configured by Appel et al. (2012) in most cases predicts O 3 observations with greater accuracy and precision than does WRF-Chem as configured here, while WRF-Chem in most cases does a better job predicting PM 2.5 .However, given the many differences in physical and chemical parameterizations and input data (including a difference in simulation year), the observed differences may or may not be generalizable.Instead, our conclusion from Table 2 is that the models are generally comparable in performance.
Table A2 compares WRF-Chem results from this study to results from Yahya et al. (2014) for a 12-month, contiguous US WRF-Chem simulation with a 36 km horizontal resolu-  1 and 2.
tion spatial grid.NME results from the simulation performed here are lower (i.e., better) than those reported by Yahya et al. (2014) for most pollutants and measurement networks, but NMB results are more mixed.As horizontal grid resolution, input data, and model parameters all differ between the two studies, we are not able to determine the cause of the differences in results.

Discussion
We simulated and evaluated PM 2.5 and O 3 based on 12month (year 2005) WRF-Chem modeling for the United States.The spatial and temporal extent investigated, and the horizontal spatial resolution (12 km) employed, are nearly unprecedented; to our knowledge, only one prior peerreviewed CTM evaluation has used a comparable extent and resolution (Appel et al., 2012).We find that WRF-Chem performance as configured here is generally comparable to other models used in regulatory and health impact assessment situations in that model performance is similar to that reported by Appel et al. (2012) and, in most cases, meets the criteria for air quality model performance suggested by Boylan and Russel (2006).
There is potential for further improvement in model accuracy, especially for these cases: PM 2.5 concentrations in winter and in the western US, ground-level O 3 at night and in the summer, and particulate nitrate.The good agreement in total PM 2.5 predictions and observations in some cases reflects offsetting over-and underpredictions, including by species (Fig. 3) and time of day (Fig. 4a).Performance in predicting concentrations of PM 2.5 and its subspecies tends to be the worst in winter and in the western US.Overall, WRF-Chem as configured here meets the performance criteria described above for total PM 2.5 concentrations at 94 % of monitor locations.
The WRF-Chem meteorological and chemical settings employed here are reasonable and justified but different settings may also be reasonable.Improved understanding of how alternative parameterizations might impact model performance in large-scale applications such as ours is an area for continued research.Another area for future research is identifying opportunities to evaluate model performance in terms of how changes in emissions cause changes in outdoor concentrations.Table A1.Temporal and spatial aspects of recent model evaluations, focusing on WRF-Chem and North America.

Figure 1 .
Figure 1.Modeled annual average ground level (a) PM 2.5 and (b) O 3 concentrations.For ease of viewing, the color scales contain a break at the 99th percentile of concentrations.
use the 2005 National Emissions Inventory (NEI) (US EPA, 2009) to estimate pollutant emissions.The NEI includes emissions from area, point, and mobile sources for year 2005 in the US, year 2006 in Canada, and year 1999

Figure 2 .
Figure 2. AQS, AQS hourly, and CASTNET monitor locations and annual average fractional bias for (a) total PM 2.5 and (b) daytime average O 3 concentrations.Corresponding information for other pollutants and variables is in Fig. A1.

Figure 3 .
Figure 3. Annual average modeled and measured ground-level (ad) meteorological variables and (e-o) pollutant concentrations.Colored lines show linear least-squares fits of the data for the measurement networks with corresponding colors.Grey lines show model to measurement ratios of 2 : 1, 1 : 1, and 1 : 2. Annual average performance statistics are listed to the right of each plot; acronyms are defined in the methods section.

Figure 4 .
Figure 4. Median values (lines) and interquartile ranges (shaded areas) of annual average modeled values, observed values, and fractional error by hour of day for (a) PM 2.5 and (b) O 3 .

Figure 5 .
Figure 5.Comparison of measured and modeled PM 2.5 concentrations disaggregated by season and region.Region boundaries are shown in Fig. 2.

Figure 6 .
Figure 6.Comparison of measured and modeled annual average of daytime O 3 concentrations disaggregated by season and region.Region boundaries are shown in Fig. 2.
Figures 5 and 6 disaggregate results by season and by location for total PM 2.5 and daytime O 3 , respectively; analogous results are in Figs.7-11 for PM 2.5 subspecies, in Figs.A2-A5 in Appendix A for meteorological properties, in Figs.A6

Figure 7 .
Figure 7.Comparison of modeled and measured particulate SO 4 concentrations, disaggregated by region and season.

Figure
Figure 3i-m illustrates model performance for annual average concentrations of PM 2.5 component species.In all cases, > 65 % of locations meet performance criteria for at least one of the three observation networks.The model overpredicts particulate SO 4 (CSN MFB = 34 %, IMPROVE MFB = 40 %, CASTNET

Figure 8 .
Figure 8.Comparison of modeled and measured particulate NH 4 concentrations, disaggregated by region and season.

Figure 9 .
Figure 9.Comparison of modeled and measured particulate NO 3 concentrations, disaggregated by region and season.

Figure 10 .
Figure 10.Comparison of modeled and measured particulate EC concentrations, disaggregated by region and season.

Figure 11 .
Figure 11.Comparison of modeled and measured particulate OC concentrations, disaggregated by region and season.

Figure A4 .
Figure A4.Comparison of modeled and measured relative humidity, disaggregated by region and season.

Figure A5 .Figure A6 .
Figure A5.Comparison of modeled and measured precipitation, disaggregated by region and season.

Figure A7 .Figure A8 .
Figure A7.Comparison of modeled and measured average daily peak O 3 concentrations, disaggregated by region and season.

Figure A9 .
Figure A8.Comparison of modeled and measured SO 2 concentrations, disaggregated by region and season.Figure A9.Comparison of modeled and measured NO 2 concentrations, disaggregated by region and season.

Table 1 .
Selected WRF-Chem v3.4 settings and parameters employed in this study.

Table 2 .
WRF-Chem and CMAQ seasonal O 3 and PM 2.5 prediction performance.