Evaluation of the United States National Air Quality Forecast Capability experimental real-time predictions in 2010 using Air Quality System ozone and NO 2 measurements

1NOAA Air Resources Laboratory (ARL), NOAA Center for Weather and Climate Prediction, 5830 University Research Court College Park, MD 20740, USA 2Cooperative Institute for Climate and Satellites, University of Maryland, College Park, MD 20740, USA 3Center for Spatial Information Science and Systems (CSISS),George Mason University, Fairfax VA 22030, USA 4NOAA/NWS/NCEP/EMC, NOAA Center for Weather and Climate Prediction, 5830 University Research Court College Park, MD 20740, USA 5I.M. Systems Group, Rockville, MD 20852 6NOAA/NWS/OST, Silver Spring, MD 20910, USA


Introduction
The US National Air Quality Forecast Capability (NAQFC) started as a joint effort between the National Oceanic and Atmospheric Administration (NOAA) and the US Environmental Protection Agency (EPA) to provide advance notice for future air pollution events with potential adverse health effects.By linking the National Centers for Environmental Prediction (NCEP) Eta model with the Community Multi-scale

T. Chai et al.: 2010 NAQFC ozone and NO 2 evaluation
Air Quality (CMAQ) modeling system, the NAQFC began providing next-day predictions of ground-level ozone concentrations at a 12 km horizontal grid resolution for the northeastern US in 2004 (Otte et al., 2005;Ryan et al., 2005).In 2005, the CMAQ coverage was expanded to include the states east of the Rocky Mountains (Pleim and Mathur, 2005;Davidson et al., 2008;Eder et al., 2009).The next NAQFC phase, operationally deployed in 2007, expanded coverage to the contiguous United States (CONUS) and replaced the hydrostatic Eta model with the non-hydrostatic mesoscale model (NMM) within the Weather Forecasting and Research framework (Eder et al., 2009).A prediction system that includes an aerosol module version 4 (AERO-4) and Carbon Bond version CB05 gas-phase chemical mechanism (Sarwar et al., 2008) was initially tested in 2006 (Gorline and Lee, 2009a) and it has been producing experimental ozone predictions for several years.Since 2007, both operational and experimental prediction systems have been continuously updated (Stajner et al., 2012).
The real-time operational NAQFC predictions, which rely on the Carbon Bond Mechanism version IV (CBMIV) gasphase chemical mechanism (Gery et al., 1989), are accessible through NOAA's website at http://airquality.weather.gov/.These operational ozone predictions are used by state and local environmental agencies as a basis for air quality forecasts that they issue in terms of the Air Quality Index (AQI) to protect public health from impending poor air quality.The public also obtains operational hour-by-hour predictions directly from this web site.Vulnerable public uses NAQFC predictions to protect their health by adjusting their daily activities or medications.
The experimental NAQFC ozone predictions, accessible at http://airquality.weather.gov/expr/,are produced using the newer CB05 chemical mechanism.Due to higher ozone biases in the experimental predictions than those in the operational predictions through 2011 (Saylor and Stein, 2012), these experimental predictions have not yet been transitioned to operations.Our study provides a detailed evaluation of the experimental ozone predictions, and a precursor species nitrogen dioxide (NO 2 ), in order to understand and improve performance of the experimental predictions system, with a view towards its potential transition to operations.
A large amount of information created by continuous predictions is amenable to study of the chemical transport model (CTM) performance.A careful evaluation of the model predictions over CONUS may help researchers better understand, assess, and improve chemical mechanisms, coupling methods between the meteorological model and the CTM, and emission inventories along with the processing algorithms.
The NAQFC ozone predictions up to 2009 have been extensively evaluated.Eder et al. (2006) compared the daily maximum eight-hour ozone predictions for the northeastern US with AIRNow observations (http://www.epa.gov/airnow) from 1 June to 30 September 2004.They found that the NAQFC system overpredicted ozone with a domainaveraged mean bias (MB) of +10.2 ppbv and a root mean square error (RMSE) of 15.7 ppbv.The NAQFC predictions in the expanded eastern US domain during the warm season from 2004 to 2007 were evaluated using AIRNow observations (Eder et al., 2009).It was found that the operational NAQFC predictions steadily and gradually improved year after year as demonstrated by decreases in MB and RMSE.The four-month MBs in the eastern US are +11.4,+10.9, +10.5, and +7.9 ppbv in 2004, 2005, 2006, and 2007, respectively. Correspondingly, the RMSEs are 16.8, 16.3, 15.6, and 14.5 ppbv.They also showed that the MB and RMSE for the whole CONUS domain in the summer months (June, July, and August) of 2007 are +4.3 and 13.0 ppbv, respectively.The CONUS categorical statistical metrics for the same three-month period in 2007 were presented using both the 84 and the 75 ppbv daily maximum eight-hour ozone standards.With the 75 ppbv standard, the proportion of correct (POC), critical success index (CSI) or threat score (TS), hit rate (HIT), and false alarm rate (FAR) are 0.924, 0.232, 0.425, and 0.663, respectively.Recently, the NAQFC ozone predictions during the summers of 2007, 2008, and 2009 were compared with the AIRNow measurements by Gorline and Lee (2009b).In their study, the 2007 operational ozone predictions with the CBMIV chemical mechanism were evaluated, while the 2008 and 2009 predictions were obtained from the experimental predictions using the CB05 chemical mechanism.They found that the MB in August 2009 was about 2 ppbv higher than that in August 2008, and about 5 ppbv higher than the MB in August 2007.The unusually cool summer of 2009 was speculated as a contributing factor to the deteriorating predictions in 2009.Recently, Saylor and Stein (2012) presented the NAQFC predictions in 2009 from both operational and experimental versions.They showed that the use of CB05 in the experimental version systematically increased ground-level ozone overpredictions.The primary causes of the differences between the CBMIV and CB05 systems were identified as two sets of reactions in the CB05 mechanism that are absent from the CBMIV mechanism.
Many operational air quality forecasting systems using 3-D CTMs exist worldwide.In Europe, atmospheric composition forecast products have been delivered under the Monitoring Atmospheric Composition and Climate-Interim Implementation project as part of the pre-operational GMES Atmosphere Service (http://www.gmes-atmosphere.eu,see also Menut and Bessagnet, 2010).Similar forecasts are also available in Japan (Maki, 2012), Taiwan (http://taqm.epa.gov.tw/taqm/en/b0204.aspx),and Canada (Talbot et al., 2008).Zhang et al. (2012) summarized some recent real-time air quality forecasting evaluation results.Among all evaluation statistics for hourly ozone, the median positive MB, negative MB,and RMSE,are +4.5,and 16.8The main goal of the paper is to continue the NAQFC evaluations as a reference for real-time regional air quality forecasts and future model developments.All the previous NAQFC evaluations have utilized near-real-time AIRNow measurements instead of quality-controlled and -assured Air Quality System (AQS) data, which is the US EPA's repository of ambient air quality data and is available through the agency's Technology Transfer Network (http://www.epa.gov/ttn/airs/airsaqs/).Rather than reporting in near-real-time as the AIRNow network requires, the AQS only mandates the monitoring stations to report quarterly.In addition to ozone and particulate matter (PM 2.5 and PM 10 ) observations available through AIRNow, a suite of other measurements such as nitrogen dioxide (NO 2 ), carbon monoxide (CO), and sulfur dioxide (SO 2 ) are also available.As pointed out by Sillman (1999), the model uncertainty can be greatly reduced if observations of additional species besides ozone can be utilized in model evaluation and diagnosis.In this study, the AQS NO 2 measurements along with the AQS ozone observations are used for the NAQFC evaluations.NO 2 is not only an important ozone precursor, it is also one of the critical air pollutants regulated through the National Ambient Air Quality Standards in the US, with its annual and hourly limits set as 53 and 100 ppbv, respectively.In the current evaluation, the NAQFC model predictions are the original predictions without any post-processing.
The remainder of the paper is organized as follows.A brief description of the NAQFC model setup is given in Sect. 2. Section 3 presents the AQS observations, including a comparison between the AQS and AIRNow ozone measurements.Detailed comparisons between the model results and observations are provided in Sect.4, followed by a summary and discussion in Sect. 5. A list of abbreviations and acronyms can be found in Appendix A.

Description of the NAQFC prediction system
The real-time NAQFC air quality prediction system during the year 2010 comprised the CMAQ modeling system (Byun and Schere, 2006) driven by the NCEP's North American Mesoscale (NAM) meteorological predictions with the WRF-NMM core (Janjic, 2003), similar to that described by Eder et al. (2009).A pre-processor to CMAQ, PREMAQ, prepares the CMAQ input files after taking WRF-NMM postprocessor outputs (Otte et al., 2005).
Figure 1 shows the computational domain, which is covered by a grid with 442 columns and 265 rows in the longitudinal and latitudinal directions, respectively.The grid has a 12 km horizontal grid resolution and follows the Lambert conformal conic projection.There are 22 hybrid 1 When a range is presented, the midrange value is used in calculating the median value.pressure/sigma layers extending from the surface to 100 hPa, which combine those of the WRF-NMM model (see Lee and Ngan, 2011, for details).At lateral boundaries, fixed profiles based on climatological averages are used at in-flow grid cells, and zero-flux-gradient is imposed at the outflow locations.However, Tang et al. (2009) showed that the NAQFC surface ozone predictions can be improved with the use of the MOZART global model predictions to better account longrange transport, especially over the US west coast.A zeroflux assumption at the top boundary is made in the CMAQ computation.Not considering the stratospheric ozone intrusion may cause ozone underestimations at high latitudes (Browell et al., 2003).Note that the real-time air quality predictions for the Alaska and Hawaii domains were tested and designated operational in September of 2010, but they are not included in the evaluation presented here.Gaseous and particulate emissions from anthropogenic and natural sources were divided into four sectors (area, mobile, point, and biogenic) and were processed using data provided by various agencies.Area emissions including off-road engine emissions are based on the US EPA 2005 National Emission Inventory version 1 (NEI05v1) for CONUS, the province-level 2000 Canadian Emissions Inventory for Canada, and the 1999 Mexico National Emission Inventories for Mexico (http://www.epa.gov/ttn/chief/eiinformation.html, also see US EPA, 2011, for details).These inventory data were processed using Sparse Matrix Operator Kennel Emission (SMOKE) version 2.6 to represent monthly, weekly, daily, and holiday/non-holiday variations that are specific for each year (Houyoux et al., 2000).Emissions from wildfires, prescribed burning, and residential wood burning are based on a multi-year average inventory for the years from 1996 to 2002.Ignoring the temporal and spatial variability of the emission sources could cause large ozone and NO x biases (McKeen et al., 2002;Duncan et al., 2003;Martin et al., 2006).The current operational NOAA Smoke Forecasting System (SFS) establishes the locations and extents of the fires by utilizing fire and smoke data from seven polar and geostationary satellites brought together by the Hazard Mapping System (Rolph et al., 2009;Ruminski et al., 2008).Incorporating the SFS to provide the CMAQ model with near-real-time emissions from large wildfire and agricultural burning is being explored.The EPA's Office of Transportation and Air Quality 2005 on-road emissions inventory was used to generate mobile emissions over the US.Both the electric generating unit (EGU) and the non-EGU point sources were based on the NEI05v1 data.Oxides of nitrogen (NO x ) and SO 2 emissions from the US EGU sources rely on 2008 continuous emission monitoring data.The Annual Energy Outlook (AEO) from the Department of Energy released in April 2010 (US EIA, 2010) was used to project the EGU emissions to 2010 and was implemented on 6 July 2010.Before that date, a similar projection was made based on 2009 AEO data.Biogenic emissions were calculated dynamically using the Biogenic Emissions Inventory System version 3.13 (Schwede et al., 2005), which considers variability in temperature and solar radiation to estimate NO x and volatile organic compound (VOC) emissions from forests and grasslands.

AIRNow and AQS observations
Real-time ozone and PM 2.5 measurement data across the US, Canada, and parts of Mexico are provided by the US EPA through the AIRNow Gateway (http://www.airnowgateway.org).Because of their easy accessibility, AIRNow observations are widely used.Although the AIRNow data are only preliminary and not fully verified, they serve the purpose for real-time AQI reporting and forecasting.Observational data that have been subjected to additional quality control are available from the EPA's AQS, which is designed to meet the needs of regulatory, academic, and public health research communities.Without the requirement to disseminate data .in real-time, the AQS system includes monitors from many other surface networks and its measured species extend from ozone and particulate matter (PM 2.5 and PM 10 ) to multiple atmospheric chemistry components, such as NO 2 , CO, SO 2 , and many VOC species.The AQS measurement data were downloaded from http://www.epa.gov/ttn/airs/airsaqs/detaildata/downloadaqsdata.htm.Figure 2 displays the daily count of hourly observations in 2010 for both the AQS (version 5/16/12) and the AIRNow systems.For both systems, there are almost twice as many ozone measurements available in warm seasons as in cold seasons since some monitors do not operate during the winter.The number of ozone measurements in both the AQS and the AIRNow data sets typically exceeds 10 000 per day.It should be noted that some AIRNow measurements are not available from the AQS system.This could be caused by delays in reporting to the AQS system or elimination of poorquality data during the validation period.The data are considered to "overlap" if the measurements are reported from the same monitor at the same time, even if measurement values differ.The daily counts of "overlapped" measurement pairs are also plotted in Fig. 2. A snapshot of differences between "overlapped" data is displayed in Fig. 3a, which shows the paired data between AQS and AIRNow at the same sites and hours on 31 May 2010.While most data agree, some differences are seen, probably due to the quality control work carried out after AIRNow reporting.

Consistency check of ozone observations
Upon the examination of consistency between the AQS and AIRNow data sets, potential problems with reporting of the measurement time are suspected at several isolated sites.Additional quality control is applied to remove these questionable sites.In this process, hourly AQS and AIRNow ozone observations from each monitor are separated into daily files that run from 00:00 EDT to 23:00 EDT, or 00:00 EST to 23:00 EST following the US daylight saving time schedule.
Consecutive hourly measurements at one location over one day form a 24-dimensional vector.At each location and for each day, the L2-norm ( t) is calculated for the difference vector between AQS and AIRNow ozone observations at the matching hours ( t = 0), as well as for the lowerdimensional difference vectors obtained by shifting the AQS vector forward or backward by 1 or 2 h ( t = ±1, ±2), as given in Eq. ( 1). (1) In addition to the shifting, missing data in either AQS and AIRNow ozone observations results in reduced dimensions of the difference vector.To account for variations in the dimension (N ) of difference vectors, is calculated in Eq. ( 2). (2) Note that ( t) is calculated only when there are no less than 12 pairs of observations to form the difference vector.A monitor is flagged if ( t) < (0)  10 , for any t = ±1, ±2 h.This condition indicates a closer match between AQS and AIRNow data sets after the measurement time is adjusted by t of −2, −1, 1, or 2 h for this monitor on the particular day, implying a possible inconsistency between measurement times reported in the two data sets.A total of 74 sites were flagged after checking the whole year.Observations from those flagged sites over the entire year were then removed.Figure 3b shows the comparison between AQS and AIRNow after removing the questionable sites.The agreement between AQS and AIRNow observations improves after eliminating the measurements from the flagged sites, with the coefficient of determination R 2 increasing from 0.995 to 0.997.Ozone measurements from the monitor sites that are unique to AQS cannot be examined in this fashion and they are not included in the following evaluation either.Figure 2 shows the data counts after these two exclusion criteria are applied.Overall, more than 80 % of the AQS ozone data are retained for the evaluation.The observations are from 1124 AQS ozone monitors.

AQS NO 2 observations
There are no AIRNow NO 2 data available to perform the similar consistency examination between AIRNow and AQS as what is done for ozone in Sect.3.2.All AQS NO 2 measurements are used in the following evaluation.It should be noted that most of the AQS NO 2 measurements were from chemiluminescence monitors equipped with molybdenum converters, which systematically overestimate NO 2 concentrations (Dunlea et al., 2007;Steinbacher et al., 2007).Using the Mexico City Metropolitan Area (MCMA) field campaign data during April of 2003, Dunlea et al. (2007) reported that the chemiluminescence monitor interference resulted in an average concentration of up to 22 % greater than that from co-located spectroscopic measurements.In this study, the AQS NO 2 measurements were used without any correction to account for this issue.NO 2 hourly measurements are also shown in Fig. 2. Unlike ozone monitoring that has a seasonal variation, the daily NO 2 measurement count from 408 sites is almost constant throughout the year.

NAQFC evaluation results
When comparing model predictions with AQS observations, model-predicted concentration counterparts are taken from the monitor-residing grid cells.With such direct matching, there is no interpolation applied and it is consistent with previous NAQFC evaluation studies (Eder et al., 2006(Eder et al., , 2009;;Gorline and Lee, 2009b).However, there is a slight difference from what is described in Eder et al. (2009), where multiple observations inside a single grid cell are averaged as the representative measurement for the grid cell.In this paper, each measurement is compared against model prediction independently when there are two or more monitors located within one grid cell.In the following evaluations, the urbanization characteristics of each monitor site are utilized to filter observations into urban, suburban, and rural categories.Among 1124 ozone sites, there are 200 urban, 455 suburban, 462 rural, and 7 unknown stations.The number of NO 2 sites at urban, suburban, rural, and unknown settings are 130, 148, 126, and 4, respectively.
In addition, separate evaluations in the six predefined regions shown in Fig. 1 are performed to investigate regional variability in model performance.

Annual performance
Figure 4 shows the daily and domain-wide average ozone and NO 2 concentrations from AQS and CMAQ.Similar to the 2009 NAQFC prediction results (Saylor and Stein, 2012), the model significantly overestimates ozone during the summer.Until the end of May, there is very good agreement between model predictions and AQS observations for ozone.As the NAQFC NO 2 predictions are compared with the AQS observations for the first time, it shows that the model overestimates NO 2 for all four seasons.The NO 2 overestimation is more severe in the summer than during the other seasons.The normalized monthly mean NO 2 biases are 74.6,79.8, and 76.1 % for June, July, and August, respectively.January has the lowest normalized monthly mean NO 2 bias of 34.6 %.
Figure 5 shows the annual performance in different local settings for both ozone and NO 2 .The urban and suburban sites mostly resemble what is shown in Fig. 4. In rural areas, NO 2 concentrations are more than 50 % lower than those at urban sites, displayed by both the model and the observations.However, NAQFC still significantly overestimates NO 2 at the rural sites.For ozone, the model overestimation in rural areas during the summer is more pronounced than that in urban and suburban areas.As rural areas are mostly in NO x -sensitive chemical regimes (Choi et al., 2012), the overestimated NO 2 in the area, especially in the forest-dominant Southeast region, can produce ozone much more efficiently than in the urban and suburban areas.Figure 5 also shows that the average ozone concentrations are slightly larger at rural sites than those at urban sites.The lower ozone concentrations in urban areas may be due to NO x titration at nighttime.This also indicates that due to its long lifetime ozone pollution has non-local impacts.
The time series of daily and regionally averaged ozone and NO 2 are shown in Figs. 6 and 7.The ozone overestimation in summer is seen in all the regions, but it is the most pronounced in the Southeast region.NO 2 is also overestimated in all the regions during the summer, ranging from the highest biases in the Pacific Coast and Lower Middle regions to minimal overestimation in the Rocky Mountain and Northeast regions.
The detailed monthly and annual average ozone biases and RMSEs in different regions are listed in Tables 1 and  2. Similar results for NO 2 are listed in Tables 3 and 4. Ozone biases in the Lower Middle and Pacific Coast are the lowest, with the annual averages being +3.7 and +4.0 ppbv, respectively.The most pronounced negative biases are seen in February in the Upper Middle and Northeast regions, with monthly average biases of −8.5 and −5.6 ppbv, respectively.The largest positive monthly average bias of +17.6 ppbv is  seen in the Southeast region in August.Table 2 also shows that the Southeast region has the largest annual RMSE of 17.6 ppbv.The highest monthly RMSE of 22.5 ppbv is seen in the Southeast in August.In agreement with Fig. 7, Table 3 also points to the Lower Middle and Pacific Coast as the worst regions for NO 2 predictions, with their annual average biases of +8.1 and +7.1 ppbv, respectively.When normalized by the observation mean, the relative biases show more than 100 % overestimation in the Lower Middle from April to August, and in the Pacific Coast in June and July.In July, the normalized monthly mean NO 2 bias reaches its peak (167.2 %).The Rocky Mountain region has the smallest annual NO 2 model bias of 0.4 ppbv (4.2 %) among all regions and its monthly average biases range from −0.9 ppbv (−7.1 %) in January to 1.7 ppbv (33.0 %) in July.All other regions show consistent positive biases throughout the year.The CONUS RMSEs for NAQFC NO 2 predictions, listed in Table 4, range from 11.7 ppbv in May to 15.4 ppbv in September.In September, the Pacific Coast and Lower Middle have the highest monthly NO 2 RMSEs of 19.6 and 19.1 ppbv, respectively.

Spatial patterns
The spatial distributions of the monthly average ozone and NO 2 AQS concentrations, model biases, and RMSEs at monitoring sites in August are shown in Fig. 8.They are similar to the other summer months such as July (not shown here).Higher monthly average ozone measurements are mostly located in the California, Rocky Mountain, and mid-Atlantic  (areas bordering the Northeast and Southeast regions) areas.Multiple sites in Los Angeles and an isolated one in Denver, Colorado, show very high NO 2 observations.The spatial distribution of ozone biases in Fig. 8 shows a broad spread of high positive ozone biases in the Southeast region.This is consistent with Fig. 6, which identifies the Southeast as the region with the most severe ozone overestimation in summer.As this region is mostly covered with forest, the abundance of biogenic VOCs during the growing season helps to translate NO 2 overestimations into high ozone biases under the NO x -sensitive regime.
Negative ozone biases are found around Los Angeles and New Orleans, where high positive NO 2 biases are shown in   (2006) showed that increasing NO x emissions actually reduced ozone in central Atlanta in their sensitivity studies to assess ozone impacts from NO x emissions.Figure 8 shows that most of the higher ozone RMSEs are seen in the Southeast region and around Los Angeles.The Los Angeles and New Orleans areas also have the highest NO 2 RMSEs, as shown in Fig. 8.

Daily maximum eight-hour average ozone and its categorical statistics
Eight-hour running averages are calculated for both the model and the AQS hourly concentrations.A minimum of six hourly observations in any eight-hour time window is required for the calculation.Otherwise, the eight-hour ozone observation is flagged as missing.As the primary ozone standard in the US, the daily maximum eight-hour average concentration is currently set as 75 ppbv revised from its previous 0.08 ppm (effectively 84 ppbv due to rounding) in March 2008 (Environmental Protection Agency, 2008).Using the standard as a threshold for daily maximum eight-hour average ozone, there are four possible scenarios: a. prediction is above, but observation is below the threshold (false alarm); b. prediction and observation are above the threshold; c. prediction and observation are below the threshold; d. prediction is below, but observation is above the threshold.
In Fig. 9 a scatter plot of one day's observations in the CONUS and collocated NAQFC predictions is presented and four quadrants are marked according to scenarios a-d that they correspond to.Hit rate (HIT), critical success index (CSI) or threat score (TS), false alarm rate (FAR), equitable threat score (ETS), and proportion of correct (POC), which is referred as Accuracy in Eder et al. (2006), are calculated for the NAQFC predictions for the entire year.The definitions are shown in Eqs. ( 3)-( 7), where N a , N b , N c , and N d represent the number of incidences in each scenario a, b, c and d, respectively, as shown in Fig. 9.
, where The HIT, CSI (or TS), FAR, and POC for the NAQFC predictions in previous years have been reported (Eder et al., 2006(Eder et al., , 2009)).HIT measures the fraction of observed above the threshold events that are predicted correctly.It is also referred as probability of detection.FAR is the fraction of predicted above the threshold events that are wrong.CSI measures the fraction of correctly predicted above the threshold events after removing correctly predicted below the threshold incidences.ETS measures the prediction skill more critically by negating the correct predictions by chance.While "ETS = 1" means a perfect prediction, positive ETS values indicate skillful predictions relative to a random forecast (Schaefer, 1990).ETS ≤ 0 denotes no skill for the forecast.POC is the fraction of predictions that match the above/below threshold with the observations.
Using the AQS observations and NAQFC predictions for the entire year and summer months (June-August), the categorical statistics for the daily maximum of eight-hour ozone exceeding two thresholds are listed in Tables 5 and 6.Overall, the HIT values calculated for summer are better than those calculated for the entire year, but CSI, FAR, ETS, and POC values for summer are worse.The Rocky Mountain region is an exception in that the CSI and FAR values for summer are slightly better than those calculated for the entire year.Using the current 75 ppbv standard as the threshold, out of the total 4065 (N b + N d = 2616 + 1449) observed cases exceeding this threshold in AQS measurements, 2541 (N b + N d = 1812 + 729) cases happened during the summer months.HIT, CSI, FAR, ETS, and POC over CONUS for the entire year are 0.64, 0.17, 0.81, 0.16, and 0.96, respectively; while the same statistics calculated over CONUS for the summer are 0.71, 0.17, 0.82, 0.15, and 0.91.The summer HIT value is much better than HIT = 0.43 reported by Eder et al. (2009) for the 2007 summer months with the same standard.However, the CSI, FAR, and POC values during the summer are worse, with the current 0.17, 0.82, and 0.91 compared with 0.23, 0.66, and 0.92.The ETS values of 0.15 and 0.16 indicate some skill in the NAQFC predictions.In all regions, the ETS scores are positive, showing that the predictions are better than predictions by chance.The highest ETS scores are 0.24 and 0.23 for the Pacific Coast and Northeast regions.In the Rocky Mountain region, ETS = 0.06 reflects little skill of the model, mostly caused by the high FAR values (0.93 for summer and 0.92 for the entire year).The annual POC values are greater or equal to 0.95 in all regions, but the summer values drops to as low as 0.87 in the Pacific Coast region.
The categorical statistics are sensitive to the threshold used to define the exceedance events, as shown by Eder et al. (2009) using both the 85 and the 75 ppbv standards.Similar metrics calculated using a 70 ppbv threshold for daily maximum eight-hour ozone are also listed for CONUS in Tables 5  and 6.With the new threshold, the exceedances increase to 8577(N b + N d = 5753 + 2824) from 4065 with the 75 ppbv standard for the year.The annual POC value drops from 0.96 to 0.93 and all other metrics improve for CONUS.It should be noted that the large model biases greatly affect the categorical statistics.By implementing a bias-adjustment technique, Kang et al. (2010) showed significant improvement in the categorical evaluation metrics, with increased HIT and

Weekly patterns of NAQFC performance
CTM predictions are highly sensitive to the model-ready emissions inputs, which are generated using a large number of month-of-year, day-of-week, and hour-of-day temporal profiles.Section 4.1 already showed that the NAQFC performance for ozone and NO 2 predictions varies significantly by month.These monthly variations in model performance are influenced by differences in the meteorological conditions, specifically the temperature change from month to month.It is difficult to separate the emissions-induced effects caused by the month-of-year profile from the meteorological impacts.However, it has been well documented that the ozone concentrations in urban areas peak at weekends, while nitrogen oxides and VOC emissions are generally lower at weekends than those on weekdays (Marr and Harley, 2002;Murphy et al., 2007;Pierce et al., 2010).Instead of focusing on the "weekend ozone effect", here we study the weekly patterns of NAQFC performance in order to investigate possible systematic errors in weekly profiles that are used in emissions processing.In this section, the NAQFC predictions during the warm months, i.e., from June to September are grouped into days of the week.Strong weekly patterns are shown in the ozone biases for different days of the week listed in Table 7.Over CONUS and most regions, O 3 biases are higher on weekends than on weekdays.The RMSEs calculated for the different days of the week do not show a clear weekly pattern.This indicates that the variability in prediction errors is influenced by interactions among the emissions, chemistry and meteorology, rather than stemming from the emissions alone.
Similarly, the day-of-week biases for NAQFC NO 2 predictions are listed in Table 8.Contrary to ozone, the NO 2 biases over CONUS are lower on weekends than on weekdays.The lowest biases in NO 2 predictions occur on Saturdays in all regions except the Northeast.The weekday-weekend contrast is especially evident in the Pacific Coast, where the average model biases are no less than 9.1 ppbv on weekdays and no greater than 7.5 ppbv at weekends.

Diurnal cycles
Ozone and its precursors have distinctive diurnal cycles.Examination of corresponding cycles in a CTM may help identify and correct shortcomings in the model and thus improve model predictions.van Loon et al. (2007) showed large diurnal cycle variations among seven different regional air quality models.The diurnal patterns of the NAQFC prediction biases are also studied here.Unlike the weekly patterns that mainly exhibit the emission signals, the diurnal patterns of model performance are greatly affected by many diurnal characteristics coming from the meteorological inputs.Diurnal profiles are obtained by averaging model-observation pairs by their local time (LT).Note that LT here is based on the official time zone of each AQS site and daylight saving regime is not considered.In order to remove the impact of monthly variations in meteorological conditions, the diurnal patterns are studied separately for each month.
The diurnal profiles of ozone and NO 2 for August, stratified by the degree of urbanization are shown in Fig. 10.Ozone is overestimated for all hours, except at 19:00 LT for suburban sites and 18:00-20:00 LT for urban sites.The domain-averaged ozone predictions at rural sites have positive biases throughout the day.Ozone model biases peak in the early morning, from 07:00 to 10:00 LT in all three urbanization settings.NO 2 biases are positive for all hours at urban and suburban sites, dipping to lowest levels between 08:00 and 13:00 LT.For the same time period, there are slight underestimations at rural sites.The NO 2 overestimation is most pronounced at night, from 18:00 to 06:00 LT, by around 100 % for all urbanization settings.The standard deviations of model predictions exceed those of the observations at almost all hours for NO 2 .Meanwhile, the ozone variations in the model and observations are comparable.Figures 11 and 12 show the regional diurnal profiles in August for ozone and NO 2 , respectively.Ozone biases in the Southeast region are positive for all 24 h.The other regions display large positive ozone biases from morning until noon and minimal positive to slight negative biases between 18:00 and 20:00 LT, similar to the urban and suburban ozone diurnal profiles in Fig. 10.Note the close agreement between predicted and observed ozone with respect to the average values and the variability during nighttime in the Pacific Coast and Rocky Mountain regions.The regional diurnal profiles of NO 2 in the Pacific Coast, Lower Middle, Southeast, and Upper Middle exhibit good agreement between the model and the observations from early morning until early afternoon, but show large biases at nighttime, resembling the urban and suburban NO 2 diurnal profiles in Fig. 10.In the Northeast, the diurnal profile is similar, but NO 2 biases at night are much smaller.Good agreement between average NAQFC NO 2 and AQS observations for most hours of the day is found in the Rocky Mountain region.However, NO 2 is still overestimated at 19:00 and 20:00 LT by more than 100 % in this region.

Summary and discussion
In this paper, the NAQFC experimental ozone predictions and real-time testing of prediction of precursor species NO 2 in 2010 are evaluated against quality-assured AQS observations of ozone and NO 2 .It is found that the CONUS-and daily-averaged predictions for both ozone and NO 2 are overestimated throughout the year, with peak overestimation in the summer.This seasonal pattern persists when sites are stratified by the degree of urbanization into urban, suburban, and rural sites.In August, overprediction is more pronounced for rural than for urban and suburban sites.The highest regional ozone biases were found in the Southeast during the summer.NO 2 overprediction is pronounced in the Pacific Coast and Lower Middle regions.The spatial distributions during the summer show the largest positive NO 2 biases in Los Angeles and New Orleans, where ozone levels were underestimated.This suggests that VOC-sensitive regimes prevailed during those months in 2010 for these two areas.
The ozone categorical statistics using the current US ambient air quality standard (75 ppbv) for daily maximum eight-hour average ozone show mixed results when comparing the 2010 experimental ozone predictions generated using the CB05 mechanism with the operational ozone predictions for earlier years that rely on the CBMIV mechanism.For a lower threshold of 70 ppbv, HIT, CSI, FAR, and ETS evaluated over the CONUS for 2010, experimental predictions improve, but POC deteriorates in comparison to the same statistics evaluated for the 75 ppbv threshold.
The ozone and NO 2 biases show distinct weekly patterns in summer.While ozone biases are larger during the weekends than they are on weekdays, NO 2 biases show the opposite patterns in most regions.Diurnal patterns show that ozone overestimation is most severe in the morning, from 07:00 to 10:00 LT, lower overnight, and lowest in the evening hours, around 19:00 LT.For NO 2 , the morning predictions are in close agreement with the AQS observations, but nighttime concentrations are overpredicted by around 100 %.Comparisons on regional or domain-wide scales together with monthly or annual evaluations aim to eliminate influence of dynamical meteorological and chemical conditions, which vary significantly from site to site and from day to day.The averaging avoids large uncertainties associated with each individual site and time, thus exposing systematic model errors, which could be reduced in the future to improve NAQFC predictions.For example, NO 2 overestimation throughout the year in almost all regions may have contributed to the overall ozone estimation for the entire year.This is especially true during the growing season in the Southeast region where forests are predominant.Under the NO x -sensitive chemical regime with abundant biogenic VOCs, the NO 2 overestimations likely caused the severe positive ozone biases from May to September.Higher NO 2 biases were found in the summer, and they are believed to contribute to the larger ozone overestimations seen in the summertime in all regions.The clear weekly signals shown by both ozone and NO 2 model biases suggest that weekly profiles resulting from emissions processing may need adjustments.It should be noted that other factors, such as chemical mechanism, not considering long-range transport at lateral boundaries or ozone intrusion from the stratosphere at the domain top, all contribute to the current model errors.
However, drawing conclusions on the exact causes for the current model problems requires further studies.There are several limitations in our current evaluation study.For instance, the AQS stations are quite sparse in some regions, especially for NO 2 monitoring.Uncertainties in emission rates, photochemical reaction rates, and meteorological inputs such as surface temperature, wind speed and direction, and cloud cover all contribute to uncertainties in NAQFC ozone and NO 2 predictions.Further analyses would benefit from meteorological measurements, observations of VOC species, and vertical profiles of most parameters in order to fully explain the evaluation results.
The type of analysis presented here has guided recent updates to the NAQFC system that produces experimental ozone predictions.Concurrently with the updates to the NCEP NAM model and the land use and land cover data for emissions in October 2011, three additional updates were made with the goal of reducing ozone biases discussed here.Previous constant lateral boundary condition profiles for most chemical species were replaced with monthly mean profiles from GEOS-CHEM global model simulations for 2006 that follow the methodology of Bey et al. (2001).Dry deposition was modified based on the Monin-Obukhov similarity theory (Wu et al., 2003) as well as by including canopy height and density based on recent Moderate Resolution Imaging Spectroradiometer and Geoscience Laser Altimeter System satellite observations (Lefsky, 2010).Planetary boundary layer (PBL) height was constrained to be at least 50 m.This mitigated the previous high ozone bias problem due to low PBLs at areas close to large water bodies.It also allows dilution of the mobile emissions near urban centers and lessened the severity of ozone titration at nighttime.Testing during the summer of 2011 has shown shown positive impacts of these changes and they were all incorporated into the experimental ozone predictions for 2012.The emission data sets have been updated in June 2012, with about 35 % decrease in total mobile NO x emissions.Preliminary evaluation of the latest experimental predictions shows improvements from this combination of updates.Examples of additional modifications that may prove beneficial for ozone predictions include the assimilation of observed chemical com-position data, increase of the model resolution, inclusion of newer versions of chemical and meteorological models, as well as a closer coupling among system components.

Fig. 3 .
Fig. 3. Comparison between AIRNow and AQS data on 31 May 2010 before (a) and after (b) removing the 74 questionable sites using density plots, in which color represents the count of observation pairs at each pixel.The data between 00:00 EDT and 23:00 EDT are included here.

Fig. 5 .
Fig. 5. Daily domain-wide average ozone and NO 2 concentrations at urban, suburban, and rural sites in 2010.

Fig. 6 .
Fig. 6.Daily average ozone concentrations in 2010 for each of the six regions listed in Fig. 1.

Fig. 7 .
Fig. 7. Daily average AQS (black) and NAQFC NO 2 concentrations in 2010 for each of the six regions listed in Fig. 1.
Fig. 8.It is possible that the emission inventories do not fully account for the actual emissions reduction due to the longlasting economic aftermath of hurricane Katrina on New www.geosci-model-dev.net/6/1831/2013/Geosci.Model Dev., 6, 1831-1850, 2013

Fig. 9 .
Fig. 9. Diagram of categorical statistics calculation.A scatter plot with AQS observed and NAQFC predicted daily maximum eighthour average ozone on 17 August 2010 is shown as an example.The US standard for daily maximum eight-hour average ozone of 75 ppbv is used as the threshold to delimit the scatter plots into four regions, (a) prediction is above, but observation is below the threshold; (b) prediction and observation are above the threshold; (c) prediction and observation are below the threshold; (d) prediction is below, but observation is above the threshold.

Fig. 10 .
Fig. 10.Diurnal profiles of ozone and NO 2 at urban, suburban, and rural sites in August 2010.Average concentrations of the AQS observations and their NAQFC counterparts are shown with their standard deviations as a function of local hours.

Fig. 11 .
Fig. 11.Diurnal profiles of ozone in August 2010 for each of the six regions listed in Fig. 1.Average concentrations of the AQS observations and their NAQFC counterparts are shown with their standard deviations as a function of local hours.

Fig. 12 .
Fig. 12.Diurnal profiles of NO 2 in August 2010 for each of the six regions listed in Fig. 1.Average concentrations of the AQS observations and their NAQFC counterparts are shown with their standard deviations as a function of local hours.

Table 1 .
Monthly and annual average ozone biases in different regions and CONUS in 2010.Unit: ppbv.

Table 2 .
Monthly and annual average ozone RMSEs in different regions and CONUS in 2010.Unit: ppbv.Orleans, thus resulting in the overestimation in that area.The combination of high positive NO 2 biases with negative ozone biases suggests Los Angeles and New Orleans are probably under a VOC-sensitive regime, in which the increased NO 2 may lead to ozone reductions.Such model behavior in NO x -rich urban regions is common.For instance,Tong et al.

Table 3 .
Monthly and annual averaged NO 2 biases in different regions and CONUS in 2010.Unit: ppbv.

Table 4 .
Monthly and annual averaged NO 2 RMSEs in different regions and CONUS in 2010.Unit: ppbv.

Table 5 .
Daily maximum eight-hour ozone categorical statistics for 2010, with the 75 and 70 ppbv thresholds.See text for details.

Table 6 .
Daily maximum eight-hour ozone categorical statistics for summer months (June-August) in 2010, See text for details.

Table 7 .
Ozone biases for the different days of the week in the six predefined regions and CONUS.June-September 2010.Unit: ppbv.

Table 8 .
NO 2 biases for the different days of the week in the six predefined regions and CONUS.June-September 2010.Unit: ppbv.