Turbulent transport, emissions and the role of compensating errors in chemical transport models

The balance between turbulent transport and emissions is a key issue in understanding the formation of O 3 and particulate matter with diameters less than 2.5 μm (PM 2.5). Discrepancies between observed and simulated concentrations for these species have, in the past, been ascribed to insufficient turbulent mixing, particularly for atmospherically stable environments. This assumption may be simplistic – turbulent mixing deficiencies may explain only part of these discrepancies, and as turbulence parameterizations are improved, the timing of primary PM2.5 emissions may play a much more significant role in the further reduction of model error. In a study of these issues, two regional air-quality models, the Community Multi-scale Air Quality model (CMAQ, version 4.6) and A Unified Regional Air-quality Modelling System (AURAMS, version 1.4.2), were compared to observations for a domain in north-western North America. The air-quality models made use of the same emissions inventory, emissions processing system, meteorological driving model, and model domain, map projection and horizontal grid, eliminating these factors as potential sources of discrepancies between model predictions. The initial statistical comparison between the models and monitoring network data showed that AURAMS’ O3 simulations outperformed those of this version of CMAQ4.6, while CMAQ4.6 outperformed AURAMS for most PM2.5 statistical measures. A process analysis of the models revealed that many of the differences between the models’ results could be attributed to the strength of turbulent diffusion, via the choice of an a priori lower limit in the magnitude of vertical diffusion coefficients, with AURAMS using 0.1 m2 s−1 and CMAQ4.6 using 1.0 m2 s−1. The use of the larger CMAQ4.6 value for the lower limit of vertical diffusivity within AURAMS resulted in a similar performance for the two models (with AURAMS also showing improved PM2.5, yet degraded O3, and a similar time series as CMAQ4.6). The differences between model results were most noticeable at night, when the higher minimum turbulent diffusivity resulted in an erroneous secondary peak in predicted night-time O3. A spatially invariant and relatively high lower limit in diffusivity could not reduce errors in both O3 and PM2.5 fields, implying that other factors aside from the strength of turbulence might be responsible for the PM 2.5 over-predictions. Further investigation showed that the magnitude, timing and spatial allocation of area source emissions could result in improvements to PM 2.5 performance with minimal O3 performance degradation. AURAMS was then used to investigate a land-use-dependant lower limit in diffusivity of 1.0 m2 s−1 in urban regions, linearly scaling to 0.01 m2s−1 in rural areas, as employed in CMAQ5.0.1. This strategy was found to significantly improve mean statistics for PM2.5 throughout the day and mean O 3 statistics at night, while significantly degrading (halving) midday PM 2.5 correlation coefficients and slope of observed to model simulations. Time series of domain-wide model error statistics aggregated by local hour were shown to be a useful tool for performance analysis, with significant variations in performance occurring at different hours of the day. The use of the land-use-dependant lower limit in diffusivity was also shown to reduce the model’s sensitivity to the temporal allocation of its emissions inputs. The modelling scenarios suggest that while turbulence plays a key role in O 3 and PM2.5 formation Published by Copernicus Publications on behalf of the European Geosciences Union. 1002 P. A. Makar et al.: Turbulent transport, emissions and the role of compensating errors in urban regions, and in their downwind transport, the spatial and temporal allocation of primary PM 2.5 emissions also has a potentially significant impact on PM 2.5 concentration levels. The results show the complex nature of the interactions between turbulence and emissions, and the potential of the strength of the former to mask the impact of changes in the latter.

The use of the larger CMAQ4.6 value for the lower limit of vertical diffusivity within AURAMS resulted in a similar performance for the two models (with AURAMS also showing improved PM 2.5 , yet degraded O 3 , and a similar time series as CMAQ4.6).The differences between model results were most noticeable at night, when the higher minimum turbulent diffusivity resulted in an erroneous secondary peak in predicted night-time O 3 .A spatially invariant and relatively high lower limit in diffusivity could not reduce errors in both O 3 and PM 2.5 fields, implying that other factors aside from the strength of turbulence might be responsible for the PM 2.5 over-predictions.Further investigation showed that the magnitude, timing and spatial allocation of area source emissions could result in improvements to PM 2.5 performance with minimal O 3 performance degradation.AURAMS was then used to investigate a land-use-dependant lower limit in diffusivity of 1.0 m 2 s −1 in urban regions, linearly scaling to 0.01 m 2 s −1 in rural areas, as employed in CMAQ5.0.1.This strategy was found to significantly improve mean statistics for PM 2.5 throughout the day and mean O 3 statistics at night, while significantly degrading (halving) midday PM 2.5 correlation coefficients and slope of observed to model simulations.Time series of domain-wide model error statistics aggregated by local hour were shown to be a useful tool for performance analysis, with significant variations in performance occurring at different hours of the day.The use of the land-use-dependant lower limit in diffusivity was also shown to reduce the model's sensitivity to the temporal allocation of its emissions inputs.The modelling scenarios suggest that while turbulence plays a key role in O 3 and PM 2.5 formation

Introduction
Several studies within the last decade have shown the value of the comparison of multiple air-quality models to a common suite of observations.Two studies made use of data collected during the International Consortium for Atmospheric Research on Transport and Transformation/New England Air-Quality Study (ICARTT/NEAQS).McKeen et al. (2005) used seven air-quality forecast models to show that ensemble O 3 forecasts based on the seven-member mean and the seven-member median had better temporal correlation to the observed daily maximum 1 h average and maximum 8 h average than any individual model, for a domain covering the eastern USA and south-eastern Canada.The usefulness of uncorrected ensembles was shown to be limited by positive biases in O 3 inherent to all seven ensemble members.The best method of bias correction was found to be model dependent.In a subsequent examination for the same region and period using the same models, McKeen et al. (2007) found that particulate matter with diameters less than 2.5 µm (PM 2.5 ) forecasts had similar correlation, lower bias and better skill compared to the ozone forecasts.A feature of this work was an analysis of diurnal variability -most models failed to reproduce the observed diurnal cycle of PM 2.5 concentrations at urban and suburban monitor locations.This error in the predicted diurnal cycle was most pronounced in the transition period between night and early morning.Four of the models showed greater diurnal PM 2.5 variability than was observed, with differences in emissions inventories, the Planetary Boundary Layer height (PBL) parameterizations employed and the timing of the predicted morning growth of the PBL all postulated as factors affecting the model performance.The work also identified insufficient model nocturnal mixing as a key factor in low surface sulfate predictions (due to insufficient vertical turbulent transport of sulfate aloft to the surface) and in excessively high predicted surface elemental carbon and NO x predictions (due to insufficient turbulent transport of these species emitted at the surface to higher model levels).These effects were noted for the on-line Weather Research and Forecasting -Chemistry (WRF-CHEM) model, which de facto makes use of the turbulence parameterizations inherent in the driving meteorology.Subsequent model ensemble work for the second Texas Air Quality Study (TexAQS II;McKeen et al., 2009) showed that the relationship between model emissions levels and concentration difference ratios was approximately linear (to within 25 %), with improvements to emissions inventories through the use of continuous emissions monitors and updated mobile emissions resulting in better agreement with observations.The study also noted that despite ratios of PM 2.5 to NO y matching observations, underpredictions of PM 2.5 organic carbon suggested that this might be a result of compensating errors, with excessive model primary PM 2.5 making up for the absence of sufficient model secondary organic aerosol formation.
Multiple model intercomparisons were expanded to include both North American and European domains in the Air-Quality Model Evaluation International Initiative (AQMEII; Galmarini and Rao, 2011), with 23 modelling groups providing annual simulations on either or both of these domains.In a further refinement of ensemble forecasting techniques, researchers participating in this study found that full ensemble mean predictions could be outperformed by subset ensembles of model members selected for an optimal set of error characteristics (Solazzo et al., 2012a).Predictions of PM were also investigated; Solazzo et al. (2012b) showed that all of the models employed underestimated PM 10 , with better estimates for PM 2.5 , though "no model was found to consistently match the observations for all locations throughout the entire year," (p.76) -in the abstract.with North American correlation coefficients for PM 2.5 ranging from 0.34 at a 99 % confidence level to less than 0.10 at a 10 % confidence level.While anthropogenic emissions were prescribed as part of the intercomparison, differences in natural emissions of some PM components such as sea salt were shown to have a significant impact on some model results.The member of the ensemble which made use of a different emissions inventory from that prescribed in the study protocol was shown to have significantly different (factor of four lower) PM 10 emissions than the other models, showing the potential importance of emissions inventory accuracy on PM 10 predictions.Large differences in particulate deposition rates despite similar theoretical approaches to deposition were attributed to differences in the characterization of surface properties and near-surface meteorology, with the fractional bias of the PM 10 seasonal concentration varying by up to 60 % depending on which deposition module was used within a single model (Nopmongcol et al., 2012).Models with the highest deposition rates of PM 2.5 were also found to have the most significant negative biases in PM 2.5 concentrations.Model performance for PM 10 was better in the summer months than in winter, with difficulties in the accurate simulation of very stable boundary layers in the winter being a possible cause of model prediction errors.Most models underestimated the amplitude of the diurnal cycle of PM 10 as well as being biased low.The inorganic PM 2.5 components were better simulated than the organics, demonstrating the ongoing problem with accurate simulations of organic aerosol.While the study did not examine or compare the models' individual chemical process parameterizations in detail, a conclusion of the work was that the details of those parameterizations play a pivotal role in model performance, despite similarities in the overall schemes employed.
A two-model intercomparison attempted to eliminate some of the sources of model prediction variability by prescribing additional model inputs aside from the meteorology.Smyth et al. (2009) used the same emissions inventory, emissions processing system, meteorological driver, North American domain and map projection to eliminate these factors as sources of possible differences between the two models compared (the Community Multiscale Air-Quality model; CMAQ4.6, and A Unified Regional Air-quality Modelling System; AURAMS1.4.2).Despite these similarities, some significant differences in model performance were noted.AURAMS had a normalized mean bias (NMB) for hourly O 3 that was less than half of that for CMAQ (21 % versus 46 %, respectively), while both models had similar normalized mean errors (NME, 47 versus 54 %).The larger NMB errors for CMAQ were shown to be related to its inability to predict the observed night-time O 3 minima.Both models' PM 2.5 predictions were biased low (AURAMS: −10 % and CMAQ −65 %, respectively), though both had similar NME PM 2.5 scores, with much of the reduced PM 2.5 bias in AU-RAMS being the result of high sea salt predictions in this model.Both models underpredicted the organic fraction of PM 2.5 .The study noted the potential difficulties in the systematic assessment of individual chemical and physical processes on the model results, due to the complexity and interconnected nature of those processes.
The above body of work demonstrates both the value of model evaluation and intercomparisons and the corresponding difficulties.Conducting multi-model studies requires a considerable investment in the preparation and evaluation of model fields.A common finding of studies such as those described above is that differences in the performance between the different models lie in their process parameterizations, yet process-level studies comprise an additional level of complexity, and are consequently not always part of large-scale multi-model comparisons with observations.However, when significant differences are found between models employing harmonized input fields, process-level evaluations may provide valuable information on the reasons underscoring model performance.In the work that follows, we describe a processlevel comparison of the CMAQ and AURAMS models on a more limited regional domain.The emissions inventories, emissions processing system, model domain, map projection and the driving meteorology were held in common for the two models, allowing two key factors in model performance to be identified: the accuracy of inputs used to create model emissions, and the models' parameterizations and assumptions regarding turbulent diffusion.

Model description
A detailed description of the two models may be found in Smyth et al. (2009), and updates to the AURAMS model (Gong et al., 2006) subsequent to that time may be found in Kelly et al. (2012).CMAQ v4.6 is also described in Pleim et al. (2006).Here, we note some of the main features of the two models and the framework used for comparison, with reference to Table 1.
Both models made use of meteorology from the Global Environmental Multiscale weather forecast model (GEM, v3.2.2; Côté et al., 1998); GEM simulations were carried out on a rotated latitude/longitude grid, for the period 15 July through 15 August 2005, in a series of overlapping 30 h simulations for a North American domain, with 0.1375 degree horizontal grid spacing (approximately 15 km), starting from model analysis files at 16:00 local standard time (00:00 UTC) on each day.The first six hours of each of these simulations were discarded as "spin-up" in order to allow the model's cloud variables to reach a steady state.The remaining hours (6 through 30) were retained for use as a continuous sequence of air-quality model meteorological input.Two days of spinup time was employed in the meteorological models -this being sufficient for air parcels starting at the upwind boundary to cross the downwind boundary of this relatively small simulation domain.
The meteorological files were interpolated to the 12 km grid spacing air-quality model domain (Fig. 1a, inset white region, and Fig. 1b, which also shows the air-quality model grid along with observation network stations).The domain encompasses the coastal north-western USA and coastal south-western Canada.Several unique features of this domain should be noted, as described in more detail in previous work by Steyn et al. (2013) and Ainslie et al. (2013): (1) unlike many locations in North America, the upwind boundary condition of the domain consists of relatively "clean" air associated with trans-Pacific transport; (2) the terrain is mountainous -previous work (Brook et al., 2004) suggests recirculation events in which aged air carried aloft with upslope flow is returned to the surface over the ocean, allowing accumulation of pollutants; and (3) the boundaries between NO x and volatile organic compound sensitivity in the region have been changing over time, indicating that the region contains markedly different chemical regimes, depending on location (Ainslie et al., 2013).
While the models employ the same horizontal domain map projection and grid, they differ in their vertical coordinate and number of levels.AURAMS uses a Gal-Chen coordinate system with 27 layers and a model top at 30 km, while CMAQ uses a sigma coordinate system with 15 layers and a model top at approximately 15 km.The thickness of the model layers differs, but tests in which the AURAMS layer thicknesses were imported into CMAQ had a negligible impact on CMAQ performance.Both models make use of their default boundary conditions; for AURAMS1.4.2, these vary with season for some species, and for O 3 an adjustment of climatological boundary conditions in response to the local tropopause height is employed (Makar et al., 2010).While the upwind boundary conditions of the models differ, they both describe relatively clean conditions, as is appropriate for the upwind condition of the domain.A comparison of the PM 2.5 and O 3 predictions of the models over the Pacific (upwind boundary condition) relative to the urban regions shows that the changes associated with urban local chemistry and dynamics are an order of magnitude greater than the variations that may be observed in the upwind boundary region of the model.The effects described below are thus the result of local changes in the models' respective responses to the emissions, rather than to upwind boundary conditions.Gas-phase dry deposition velocities for CMAQ4.6 are computed using an electrical resistance analog model (M3Dry, version 1.8;Pleim et al., 2001) in a pre-processing step in CMAQ's Meteorology-Chemistry Interface Processor (MCIP) (Otte and Pleim, 2010), using the MCIP-based stomatal conductance parameterization.The use of this parameterization may degrade results relative to the use of WRF's stomatal parameterization (Otte and Pleim, 2010).A modified version of the MCIP preprocessor (Smyth et al., 2005(Smyth et al., , 2006a) ) was used to convert GEM meteorological files for CMAQ4.6 input.AURAMS gas-phase dry deposition also uses an electrical resistance analog, and is calculated according to Zhang et al. (2002).CMAQ uses the particle dry deposition scheme of Giorgi (1986); AURAMS uses the scheme of Zhang et al. (2001), which is also based on Giorgi (1986).CMAQ4.6 incorporates particle deposition as a boundary condition on vertical diffusion (Binkowski and Roselle, 2003;Binkowski and Shankar, 1995), with the species mass in each of three modes being deposited separately.As noted in Binkowski and Roselle (2003), "the impaction term is omitted for coarse mode particles in the moment dry deposition velocities" (page AAC 3-6), though this has been corrected in more recent CMAQ implementations (J.E. Pleim, personal communication, 2014).CMAQ does not incorporate particle settling between particle layers, though a settling term is included in the lowest layer deposition velocity calculation (J.E. Pleim, personal communication, 2014).AU-RAMS1.4.2 calculates particle settling for each of the 12 size bins of the model.In our AURAMS1 (base case) simulations the settling and deposition velocities are used to calculate mass fluxes between layers, with the settling (or deposition) velocity being used to determine the destination layer of the falling particles.In our remaining AURAMS simulations, a   1-dimensional semi-Lagrangian advection approach is taken, with the settling and deposition velocities being used to determine the mass transport and new vertical distribution of the particle mass.The latter modification had a minor impact on model results.
Both models made use of the same land-use parameters provided by the GEM model, and both models made use of the same emissions inventories.Environment Canada 2006 and US Environmental Protection Agency (EPA) 2005 anthropogenic inventories were combined for this work, and both models made use of the Biogenic Emissions Inventory System (BEIS3.0.9) biogenic emissions algorithms and Biogenic Emissions Landuse Database version 3 (BELD3) landuse data (US EPA, 2007).The gas-phase chemical mechanism employed in AURAMS is the Acid Deposition and Oxidant Model, version 2 (ADOM-II) mechanism (Stockwell and Lurmann, 1989), while CMAQ4.6 made use of the 1999 version of the Statewide Air Pollution Research Center (SAPRC-99) mechanism (Carter, 2000a, b).The models have a similar particulate-matter chemical speciation; however, the particle size distribution in AURAMS makes use of a 12-bin sectional approach while CMAQ uses a three-mode modal approach.The emissions in both models thus had to be speciated for that model's chemical mechanism and particle size distribution.
Emissions for both models were generated using the Sparse Matrix Operator Kernel Emissions processing system (SMOKE; Houyoux et al., 2000;CEP, 2003).Emissions processing systems such as SMOKE make use of input emissions inventories which usually comprise annual emissions totals for different sources over a geopolitical region such as state/province/county/municipality.These annual values are distributed within the geopolitical region using spatial disaggregation data -gridded maps of the expected spatial distribution of pollutants, derived from surrogate fields believed to reflect the distribution of the emitting activities.The emissions are also required by the models on an hourly basis; hence the annual emissions must also be distributed over time.Temporal allocations are required to split the annual emissions into month-of-year, day-of-week within each month and hour-of-day within each day.The accuracy of the gridded emissions used as model input will depend on the extent to which these assigned spatial and temporal fields accurately reflect the true temporal and spatial distributions, as well as on the annual total geopolitically distributed emissions.Unfortunately, the available spatial surrogate fields are severely outnumbered by the number of emitting activities, with thousands of source types typically being represented by a few hundred surrogates (here, a total of 170 surrogates were used).Similarly, the temporal profiles used for emitting activity are often best-guess approximations which are not based on observed monthly/day-of-week/hour-of-day emissions for any given emitting activity to which they are assigned.The assignments for spatial and temporal disaggregation of annual emissions are of crucial importance in determining the resulting model accuracy for circumstances when the spatial and temporal distribution of emissions have a significant impact on local concentrations (i.e.close to the sources as opposed to further downwind).The impact of the choice of spatial and temporal disaggregation data is examined in several scenario simulations in Sect.4.2.
The models differ in the approach taken for vertical diffusion.AURAMS uses diffusion coefficients for heat and moisture from the driving meteorological model along with a fully implicit Laasonen approach for the discretization of the diffusion equation (cf.Richtmyer, 1994).CMAQ4.6 calculates diffusion coefficients based on the driving meteorological model's values for the temperature, wind speed, total liquid water content, specific humidity, surface pressure, friction velocity and height of the boundary layer (Pleim, 2007a, b).The underlying boundary layer model (Asymmetric Convective Model, version 2, ACM2, Xiu and Pleim, 2001) includes a nonlocal transport component for unstable conditions (J.E. Pleim, personal communication, 2014).Numerical solution of the diffusion equation is carried out in CMAQ using the Crank-Nicolson discretization (cf.Richtmyer, 1994).AURAMS also includes a Crank-Nicolson algorithm option -its use did not significantly affect the AURAMS results.Both models subsequently employ a lower limit to their diffusion coefficients, with this "floor" in diffusion in AURAMS in our initial simulations being set to 0.1 m 2 s −1 , and to 1.0 m 2 s −1 in CMAQ4.6.The choice of a specific lower limit has a significant impact on  2006).Other options available for the use of this version of CMAQ include using higher values of the diffusion coefficient lower limit over urban areas (2.0 m 2 s −1 ) and lower values over rural areas (0.5 m 2 s −1 ).More recent versions of CMAQ (5.0.1) use a linear function of urban land-use area fraction for the lower limit of eddy diffusivity, with 0.01 m 2 s −1 employed for entirely rural areas and 1.0 m 2 s −1 for entirely urban areas.These recent changes to CMAQ were implemented in recognition of the fact that the rural minimum should not exceed that of the driving meteorological model, whereas the use of a higher urban minimum may be necessary if the driving meteorological model is not capable of accounting for the turbulence-enhancing effects of the urban environment.
Here, the impact of these strategies was investigated in a set of scenario simulations.The model results were evaluated using hourly O 3 and PM 2.5 data from four monitoring networks (Air Quality System, AQS; Canadian Air and Precipitation Monitoring Network, CAPMoN; Clean Air Status and Trends Network, CASTNET; and National Air Pollution Surveillance program, NAPS).Model values are hourly averages in the case of CMAQ, while AURAMS output is hour-ending averages of 15 min output.Station locations are shown in Fig. 1b, with five stations in the Lower Fraser Valley in Fig. 1c.The Lower Fraser Valley contains a large proportion of the population of the Canadian province of British Columbia; portions of our analysis examine model performance in this sub-region in detail.AURAMS output was available on a 15 min timestep, while CMAQ output was hourly averages; the AURAMS values were averaged to create hourly values for comparison to the observations.An analysis package using the R programming language (R Development Core Team, 2010) was created for model evaluation making use of the "open-air" R package (Carslaw and Ropkins, 2011).The output package of AURAMS 1.4.2includes output at station locations during model run time, while CMAQ output was derived from output netCDF files using the work of Pierce (2010).Visualization packages utilized in creating the graphical display of analysed fields included hexbin (Carr et al., 2010) and Lattice (Deepayan, 2008).

Model simulations
Eleven model simulations were carried out, in order to evaluate the impact of improvements to model algorithms, improvements and sensitivity to emissions inputs, and the impact of changes to the value of the lower limit for eddy diffusivity (Table 2).The first two of these are unmodified CMAQ4.6 and AURAMS1.4.2 simulations; the "base case" scenarios (CMAQ1 and AURAMS1).As will be noted below, these scenarios showed a marked difference between the models with regards to their performance for PM 2.5 and O 3 .The base case scenarios are followed by several process and emissions input related scenarios: AURAMS1b -a   scenario in which several process improvements were added to the AURAMS model and evaluated as a package; CMAQ2 and AURAMS2, in which the impact of improved emissions data were evaluated using both models; and six subsequent AURAMS simulations (AURAMS3 through AURAMS8), which investigated the AURAMS model sensitivity to further emissions changes and different strategies for the use of a lower limit in diffusivity than was used in the base case model.These scenarios and the rationale for their execution will be described below.

Initial comparison and analysis
The statistical measures used in our analysis are presented in Table 3.The resulting analyses of the base case O 3 and PM 2.5 simulations from each model are summarized in the first two columns of Table 4. Table 4a, b shows the statistical scores for the entire grid, and Table 5 shows the PM 2.5 scores for the five stations in the Lower Fraser Valley.The initial results showed a substantial difference in model performance: AU-RAMS1.4.2 outperformed CMAQ4.6 for hourly ozone for the entire grid statistics (Table 4a) for all Canadian stations, aside from tying with CMAQ4.6 for correlation coefficient (not shown) and for the majority of the statistical metrics for the Lower Fraser Valley (not shown).This is in contrast to the earlier North American domain comparison by Smyth et al. (2009), where AURAMS outperformed CMAQ for O 3 mean bias, normalized mean bias, mean error and normalized mean error, but CMAQ outperformed AURAMS for correlation coefficient.Previous work with CMAQ for simulations in the Lower Fraser Valley region for a 12-day period in August 2001 had significantly better O 3 performance for normalized mean bias (NMB) and normalized mean error (NME) than found here (Smyth et al., 2006b: 13 % and 51 %, respectively, versus 75 % and 82 % in the current work).CMAQ simulations by Steyn et al. (2013) for the region for specific short episodes in 2006, 2001, 1995 and 1985 reported NME values ranging from 43 to 79 % (compare to 53 to 81 % in the different simulations of the current work) and NMB from −12 to 64 % (compare to 31 to 75 %).Different meteorological drivers, emissions inventories and domains were used for these studies compared to the more recent work, and these may account for some of the differences in statistics, as well as the shorter time periods used in these earlier studies.
PM 2.5 scores in the current work were mixed, with CMAQ4.6 outperforming AURAMS1.4.2 across the grid (Table 4b) for minimum, y intercept, correlation coefficient, mean absolute error, mean squared error, root mean squared error and normalized mean error, and AURAMS outperforming CMAQ for mean, maximum, slope, mean bias and normalized mean bias.CMAQ outperformed AURAMS for PM 2.5 at Canadian stations for all scores aside from maximum and slope (not shown), while the Lower Fraser Valley performance (Table 5) was mixed, with scores split between the models.
An examination of time series of O 3 and PM 2.5 at the Vancouver International Airport station (Fig. 2 depicts a portion of the total time series for clarity; the depicted model behaviour occurs throughout the simulation period) shows the marked differences between the models in comparison to observations, as well as providing a potential physical and chemical explanation for the differences.CMAQ4.6 tended to overpredict daytime O 3 maxima and invariably created a night-time secondary maximum in O 3 that is absent in the observations (Fig. 2a).AURAMS' O 3 time series more closely followed observations than those of CMAQ, though nighttime minima were sometimes lower in the model than in the observations.The relative performance of the models is clearly reversed for PM 2.5 (Fig. 2b), with both models usually capturing the timing of the night-time peak PM 2.5 levels, but AURAMS greatly overestimated their magnitude relative to CMAQ.
• 100 The timing of the two models' respective positive biases in ozone and particulate matter helps explain these results.Both the CMAQ secondary ozone maxima and the AURAMS PM 2.5 over-predictions occur at night.In an urban region at night, the dominant ozone chemical process is usually the destruction of ozone through titration by NO.The predicted surface concentrations of NO were higher in AURAMS than in CMAQ.The composition of PM 2.5 at night in an urban location can be expected to be dominated by the primary components of particulate matter, given that the oxidation processes that lead to secondary aerosol formation dominate during the day.This was confirmed via a check of the time series for AURAMS' speciated PM 2.5 for the same period as Fig. 2 (not shown): the primary PM 2.5 species dominated the PM 2.5 mass during these periods of high positive PM 2.5 bias.Given that emissions levels of primary PM 2.5 and NO were the same for both models, these results in turn implied that a difference in transport was the cause of the model differences.
While both models make use of the same wind fields, the two models differ significantly in their approach to vertical diffusion.Different numerical methods and lower limits for diffusivity are used, as noted above.The diffusion coefficients used by the model prior to the application of the lower limits in diffusivity also differed: AURAMS1.4.2 makes use of the diffusion coefficients provided by the driving meteorological model GEM, while CMAQ4.6 recalculates diffusion coefficients internally using other fields from the driving meteorology.The diffusion coefficients generated by the CMAQ algorithm prior to the lower-limit truncation were found to produce values similar in magnitude to the GEM weather forecast model's values in other work (Kelly et al., 2012).The main remaining difference between the two base models was thus the magnitude of the assumed lower limit for the diffusivity coefficients and the strategy used for assigning those lower limits.
An AURAMS sensitivity test was conducted to determine the impact of the magnitude of the lower limit in diffusivity on the model results, with 1.0 m 2 s −1 being used in AU-RAMS, for one selected day during the study period.The results of this test were dramatic and are shown in Fig. 3.The use of the higher diffusion coefficient cut-off halved the AURAMS NO x and PM 2.5 maxima, and resulted in higher night-time O 3 levels: the test confirmed that the main cause      of the differences between the models was the use of a higher value for the minimum diffusion coefficient in CMAQ.
The use of a higher level of diffusion than predicted by meteorological models is intended to compensate for specific inadequately modelled aspects of turbulence and transport, such as subgrid-scale flows through complex urban topography and turbulence induced by urban heat islands.Recent work with CMAQv5.0.1 using a linearly interpolated cut-off (between 0.01 m 2 s −1 in rural areas and 1.0 m 2 s −1 in urban areas) showed marginally worse performance when the high cut-off values in urban areas were removed (Pleim and Gilliam, 2012).Our above analysis suggests that the use of a higher-than-realistic diffusivity lower limit to describe subgrid-scale turbulent mixing may result in degraded and unrealistic ozone performance at night and may influence positive biases in the ozone concentration on the following day.The use of a relatively high value for the diffusivity lower limit allows greater vertical mixing to occur at night, allowing the emitted NO to be distributed over a larger vertical volume, reducing O 3 titration and allowing more O 3 to be mixed downwards into the lower part of the model.These effects allow the morning ozone production on the subsequent day to start from a higher concentration than would otherwise be the case, which may in turn allow O 3 to reach higher concentrations by the late afternoon.While this change in initial morning O 3 levels may contribute to the difference in the model results for O 3 , it should be noted that this is not always the most significant factor, in that Fig. 2a shows that AURAMS and CMAQ sometimes have similar daytime O 3 peak levels despite having very different O 3 morning minima.
Given the difficulty in achieving good performance for both O 3 and PM 2.5 via the use of a large lower limit in diffusivity, our focus for the next stage of our analysis became the emissions.Most of the night-time PM 2.5 predicted by the models is primary in origin (i.e.directly emitted), hence potential errors in emissions magnitude, timing or spatial distribution may also play a critical role in setting night-time PM 2.5 concentrations.Consequently, below, we examine the emissions for our domain in some detail and conduct several tests to determine the impact of improvements to the emissions and of model sensitivity to emissions changes, in addition to the use of a lower limit in diffusion coefficient values.We then compare the results of the above tests to additional simulations making use of more recent methodologies for diffusivity, applying the same procedure for lower limits on diffusivity in AURAMS as is applied in CMAQ5.0.1.

Scenarios
The above work led to three levels of analysis and revisions to the emissions, with a focus on the Canadian emissions data with which the authors have the greatest familiarity.The first level ("Emissions 1") identified the top 20 emitting sources for PM 2.5 and NO on the Canadian side of the domain.The temporal and spatial surrogate assignments for these sources were reviewed in detail to identify possible sources of PM 2.5 positive biases (the sensitivity to the annual totals in the emissions inventories was not directly examined here).This identified errors in both spatial and temporal fields, described below, which were consequently corrected.The second level ("Emissions 2") repeated the above analysis, but for the top 50 emitters in the four grid squares comprising the urban core of the city of Vancouver.The reasoning underlying this second analysis was that many large sources of PM 2.5 occur outside the urban core, hence the analysis of Emissions 1 may miss spatial and temporal allocation errors important for the urban regions where the errors have the greatest impact on the model positive biases.The third level ("Emissions 3") was to examine the impact of improving stack parameter information for primary PM 2.5 emissions, for the specific sources in the four urban Vancouver grid squares.The details of these three stages of analysis are described below.

First-level emissions analysis: totals on the Canadian portion of the grid
Upon examining the top 20 annual sources of Canadian emissions, several deficiencies in temporal and spatial allocation were identified.

Temporal allocation
The links to these sources' monthly, weekly and diurnal temporal allocation fields were used to construct grid-total time series of emissions of PM 2.5 and NO for the summer period simulated here, allowing the relative importance of the different sources on the Canadian side of the domain during the day to be determined.The temporal profiles for on-road mobile emissions were updated based on new measurement data (Zhang et al., 2011).The temporal profiles of 21 other activities were found to be inappropriate upon review.For example, charcoal grilling (residential and commercial) was assumed to have a "flat" profile, unchanging with month, day of week or hour, despite the seasonality of the residential portion of this activity and the absence of this activity in late night and early morning hours.This source was the second to fourth largest source of primary PM 2.5 at night (depending on the hour), due to this flat profile.Wood stoves and furnace boilers, and fireplaces were found to have a time-independent monthly profile (despite reduced heating energy needs in the summer, the time of the simulations of interest).Several activities (e.g.fertilizer application, land-spreading of manure, agricultural tractors, agriculture production) made use of simple sinusoidal diurnal temporal profiles with a positive offset from zero -hence late-night and early morning emissions of these daytime activities were non-zero.The temporal profile for fugitive dust emissions from paved and unpaved roads did not follow the known activity levels associated with mobile emissions (and the profile used for the former resulted in higher night-time emissions levels than that used for the latter).Marine vessel emissions were assumed to follow the temporal profile for railways.These inappropriate temporal allocation links were corrected: 1.The diurnal profile for charcoal grilling was revised to take commercial and residential activity levels into account, with zero emissions late at night and in the early morning hours.
2. The monthly profile for woodstoves/furnace boilers and for fireplaces was modified to take seasonal energy use into account.
3. The agricultural temporal profile was modified from a sinusoid with trough value greater than zero to a sinusoid which reached zero levels in the late evening/early morning.
4. Fugitive dust emissions from paved and unpaved roads were assumed to follow the same diurnal profile as mobile emissions activities.
5. Marine vessels were assumed to have a constant diurnal profile (this was a relatively minor change; the railway profile used earlier having been almost constant as well).

Spatial allocation
Six new spatial surrogates were generated for mobile emissions (Zhang et al., 2011).Four activities associated with the mining industry were found to be linked to spatial surrogates that had maxima in urban regions -these linkages were switched to an existing "total mining" surrogate which better reflected the location of actual mining activities in the domain (the original surrogate included mining head offices as "mining activities", resulting in emissions being allocated in urban Vancouver instead of the actual mining locations, see Fig. 4).Twenty-five spatial surrogates were improved through the incorporation of new Graphical Information System (GIS) fields for the Lower Fraser Valley.

Temporal allocation
In a manner similar to the first-level analysis, a list of the top 50 annual emitters corresponding to four downtown Vancouver grid cells was generated.These were linked to monthly, weekly and diurnal temporal profiles and the resulting time series examined for accuracy with respect to the emitting activities.The resulting total emission time series for the nine largest of these sources is shown in Fig. 5. Four activities were found to be linked to profiles with no or minimal expected diurnal variation.Emissions from "other industry" were assumed to be time-invariant, despite the diurnal nature of most human activities.Asphalt emissions)."Concrete/gypsum/plaster products" and "bulk materials storage; all storage types; cement" were assumed to make use of the sinusoidal profile offset from zero mentioned above.All four of these sources were linked to a new diurnal profile which zeroed emissions during the night between 22:00 and 05:00 local standard time.

Spatial allocation
Two spatial allocation fields, "coal industry -coal cleaning" and "mining industry crawler/tractors" were found to have maxima in urban regions -a revised linkage to the new total mining surrogate was used to take into account the actual location of mining industries.

Third level of emissions investigation, specific point sources
For one of the grid squares in urban Vancouver, minor point sources dominate as a group for PM 2.5 emissions, compared to major point, non-mobile area sources and mobile area sources.Only the operators of point sources with stack heights greater than or equal to 50 m are required under Canadian legislation to report stack parameters (height, diameter, exit temperature, exit velocity) associated with emissions to the Canadian National Pollutant Release Inventory (NPRI).Consequently, all stacks with elevations less than 50 m are treated as surface area sources within the Canadian portion of the domain, and the absence of plume rise in the subsequent vertical distribution of emissions may result in surface-level over-predictions of particulate matter.Point sources in the USA are available at lower heights, but a cut-off of 30 m is usually used to reduce the number of sources for which plume rise calculations are required.
Municipal-level reporting of stack parameters is, however, required for all sources in the Metro Vancouver jurisdiction.
For the four largest of these facilities, the original PM 2.5 emissions totals (NPRI, treated as area sources) were replaced with Metro Vancouver data that included stack parameters, allowing vertical redistribution of emissions to take place, as a sensitivity test on the predicted local PM 2.5 levels.

Sensitivity simulations: temporal emissions allocation versus two strategies for setting a minimum in diffusivity
A further analysis of PM 2.5 emissions subsequent to the above changes examined urban diurnal profiles on the basis of four main emissions categories: major point sources, minor point sources, mobile area sources and non-mobile area sources.Non-mobile area sources dominated primary PM 2.5 emissions (particularly in US cities where the above Canadian emissions changes were not applied).While the temporal and spatial allocations of the largest of these sources were reviewed and improved in the above analysis, this was only carried out for the Canadian side of the grid.Also, many other area sources not included in the above analysis contribute to total emissions, and the non-mobile area sources in the USA were unaffected.The default diurnal profiles of total area source emissions from the processed emissions data typically showed either a time-invariant or an offset sinusoidal shape (i.e. a temporal profile with a positive offset diurnal sinusoidal variation contributed to the bulk of the nonmobile area source emissions).In order to examine the relative importance of the diurnal variation in emissions from these sources at night, a sensitivity simulation was carried out: the PM 2.5 emissions from all non-mobile area sources  were modified using a smoothed square-wave function which reduced the emissions during the night and increased them during the day.The total mass emitted was preserved, yet proportionately less was emitted at night and more during the day.This scenario investigates the possibility that the emissions totals are correct, but that the timing of the emissions may overestimate the night-time component.Figure 6 compares the time series of grid total emissions of PM from these sources before and after this change.The intent of the simulation is to investigate the extent to which diurnal emissions behaviour of non-mobile area sources may impact the resulting concentration predictions.This in turn highlights the relative importance of accurate temporal allocation information towards the model accuracy.
The impact of two different strategies for setting a lower limit for diffusivity was also examined.In the first of these, a lower limit for diffusivity of 0.6 m 2 s −1 was applied throughout the AURAMS domain.In the second, the CMAQ 5.0.1 strategy was employed, wherein the lower limit was set using a linear interpolation in land-use fraction, with values ranging between 1.0 m 2 s −1 for completely urban grid squares and 0.01 m 2 s −1 for completely rural grid squares.The land fractions were derived from the same BELD3 database used for biogenic emissions data (US EPA, 2007).
In a final simulation, the changes in emissions temporal allocation described at the start of this subsection were combined with the CMAQ5.0.1 diffusivity minimum strategy.All of these sensitivity studies made use of the third level of emissions improvements described above as their starting point (AURAMS4).

Upgrades to AURAMS
Ongoing improvements to AURAMS during the course of this study included changing from the AURAMS default operator splitting setup (one step forward operator splitting) to centred operator splitting, eliminating an additional source of differences between CMAQ and AURAMS.This was found to have a significant impact on sea salt aerosol production, significantly reducing levels offshore.In addition, the particle dry deposition algorithm was upgraded to treat particle settling and deposition in a semi-Lagrangian approach, and conservation of column mass was enforced in the vertical diffusion algorithm through separation of the area emissions, diffusion and gaseous deposition into three different operators.A separate test of this suite of changes was conducted in order to determine their impact on model performance (AU-RAMS1b in the subsequent discussion).

Quantitative comparison of the impacts of the changes to the model and emissions
The above analysis led to nine model simulations in addition to the original base case.These scenarios are outlined in Table 2, with statistical results in Tables 4 and 5.
The hourly O 3 and PM 2.5 predictions from the above simulations were compared to observations as described above; summary tables of the statistical results for the entire grid are shown in Table 4a (O 3 ) and b (PM 2.5 ).The second column of the table shows observed mean, maximum and minimum values.The third and fourth columns show the results of the initial base case comparison with observations, with normal font showing the model with the lower score and bold font showing the model with the higher score.In the subsequent columns, the model results are compared to their respective base case simulation.Figures 7 and 8 show binned scatterplots of the model simulations of O 3 and PM 2.5 versus observations for the runs analysed in Table 4a and b.

Impact of AURAMS code improvements
The changes to AURAMS' code improved statistical scores for all O 3 measures (Table 4b) except for the maximum and minimum O 3 , which saw a slight decrease, and correlation coefficient, which was unchanged.Comparison of Fig. 7b  and d shows a relatively minor impact on the overall scatter between observations and model values for these changes, with a more pronounced difference visible between the two models (e.g.Fig. 7a vs. b).Conversely, PM 2.5 scores became worse with the exception of maximum PM 2.5 and the slope: Fig. 8b and d suggest a slight increase in PM 2.5 values.Despite the statistical differences noted, the impact of the model improvements on the visual appearance of the scatterplots was minor.

Impact of first-level emissions improvements
For CMAQ4.6, the improvements to the emissions had a mixed effect on the model results.Ozone scores for the mean, mean bias, mean absolute error, mean squared error, root mean square error, normalized mean bias and normalized mean error all improved relative to the base case, while  performance was degraded for maximum, minimum, y intercept, slope and correlation coefficient.Figure 7a and c show the lower slope and increased y intercept noted in the table.CMAQ4.6 tended to underpredict the maximum O 3 values (lower values on the y axis in Fig. 7c compared to a).All CMAQ4.6 PM 2.5 scores were degraded with the use of the improved emissions, with the exception of the y intercept.
Comparing Fig. 8a and c suggests that one impact of the stage 1 emissions change was to decrease CMAQ4.6'sability to simulate PM 2.5 maxima, which is reflected in the statistics.For AURAMS, the use of the first level of emissions improvements resulted in improvements for all O 3 statistics except maximum and minimum.Figure 7d and e are broadly similar: the improvements to AURAMS' O 3 predictions do not result in a substantially different scatter distribution.The statistical measures for AURAMS' PM 2.5 with the stage 1 emissions improved relative to the base case with the exception of the minimum, mean absolute error and normalized mean error, all of which showed a slight degradation of performance.Differences in PM 2.5 scatter for the stage 1 emissions are minor: a slight shift of the distribution to the right (compare Fig. 8b, d, e).Comparing the columns in Table 4a for AURAMS simulations to isolate the impact of the emissions improvements alone on that model, it can be seen that the O 3 scores for slope and correlation coefficient have improved, while the other scores have degraded, but for PM 2.5 all statistics with the exception of the minimum PM 2.5 have improved.
The relative success of the first level of improved emissions data thus appears to be species and model dependant.The revised emissions had a mixed impact on CMAQ4.6'sO 3 performance, and degraded CMAQ4.6'sPM 2.5 performance over most statistics.For AURAMS (considering the impact of emissions alone), O 3 performance was degraded slightly, while PM 2.5 performance generally improved.

Impact of second-and third-level emissions improvements
The second level of emissions improvements (applied only to AURAMS; "AURAMS3" columns of Table 4a and b) results in further improvements to most O 3 statistics, despite a reduction in performance for PM 2.5 for statistics other than maximum, slope and correlation coefficient.The differences relative to the first level of emissions changes are difficult to distinguish visually (Figs. 7 and 8e, f).The third level of emissions improvements (applied only to AURAMS; "AURAMS4") showed no impact on O 3 (as expected, since the final level of improvements was a sensitivity test applied only to primary PM 2.5 emissions; hence Fig. 7f and g are identical).Changes to the PM 2.5 statistics across the grid were relatively minor due to this test (as might be expected given that the emissions were modified in only four grid squares in urban Vancouver).However, differences in the outer envelope of the corresponding scatterplot (Fig. 8f and g) can be observed: the third-level emissions scenario changes the distribution for cases of high model overprediction.

Sensitivity simulation 1: impact of a domain-wide diffusivity cut-off
The application of a diffusion cut-off of 0.6 m 2 s −1 ("AU-RAMS5") resulted in a degradation of AURAMS' O 3 performance for all scores except for the correlation coefficient, while improving AURAMS's PM 2.5 performance for all scores except for maximum, minimum, slope and correlation coefficient.The scatterplots for this simulation, Figs.7 and 8h, are significantly different from the other scatterplots for AURAMS.For O 3 (Fig. 7h), more of the points are clustered in the centre of the distribution, reflecting the improvement in statistics such as the RMSE.However, there are also many points along the y axis which are now in the hotter colours in Fig. 7h, indicating instances where the observed O 3 was close to zero, while the modelled O 3 was sometimes as high as 30 ppbv.These points correspond to cases of nighttime underprediction of NO titration of O 3 , described earlier.The scatter for PM 2.5 improved significantly, with the removal of many of the high values and a better distribution about the one-to-one line than any of the other simulations.As before, PM 2.5 improvements via this approach came at the cost of O 3 performance degradation.

Sensitivity simulation 2: impact of temporal renormalization of non-mobile area source emissions
Renormalizing the PM 2.5 non-mobile area sources so that less non-mobile area source emissions of these species occur at night ("AURAMS6") maintained O 3 performance for all scores (similar to AURAMS4), while improving all scores for PM 2.5 aside from the minimum and the slope (which was unchanged).The corresponding scatterplots (Figs. 7 and 8i) show some of the same behaviour as the previous run ("AU-RAMS5", Figs.7h and 8h), and the number of PM 2.5 points with very high over-predictions has decreased and the distribution about the one-to-one line has improved, though not to the same extent as diffusion cut-off simulation.The two simulations above are compared relative to the base case AURAMS1 simulation in Fig. 9.One impact of using a higher diffusion cut-off for O 3 (Fig. 9a) is an increase in the number of counts close to the y axis (i.e.O 3 minima are increasing), while the temporal redistribution of emissions (Fig. 9b) results in both increases and decreases in low-level O 3 predictions.The higher value for the lower limit in diffusivity causes PM 2.5 to trend downward relative to the base case (Fig. 9c), while the redistribution of emissions has a more uniform distribution across the one-to-one line, with slightly greater counts below the line (Fig. 9d).

Sensitivity simulation 3: impact of a land-use-dependent diffusivity lower limit
The adoption of the land-use-dependent lower limit in diffusivity strategy within AURAMS ("AURAMS7") resulted in improvement to all O 3 statistics, with the exceptions of the model maximum and mean (see Table 4a), relative to both the original model runs (AURAMS1) and the simulations with improved emissions data (AURAMS4).Similar improvements in PM 2.5 performance were also seen for the mean, mean bias, mean absolute error, root mean square error, normalized mean bias, normalized mean error and yintercept.Model performance was degraded for the PM 2.5 slope and correlation coefficient.Figures 7j and 8j show the corresponding scatterplots: the ozone positive bias at low concentrations has been reduced, relative to the spatially invariant lower limit in diffusivity simulations (Fig. 7h).The use of this strategy reduces the positive biases associated with the previous runs for PM 2.5 (Fig. 8j), though the correlation along the one-to-one line has not improved.

Sensitivity simulation 4: combined emissions and land-use-dependent diffusivity lower limits
The land-use-dependent lower limit in diffusivity was combined with the renormalized non-mobile area source   emissions scaling scenario in this simulation.The performance for this last simulation was found to be almost identical to that with the land-use-dependent lower limit in diffusivity alone; the performance improvements from the two scenarios were not additive.The scenario results imply that, while the temporal allocation of emissions can have a significant impact on model results (AURAMS6), that impact is decreased with increasing turbulence strength (AURAMS7 very similar to AURAMS8).

Model performance in the Lower Fraser Valley
The performance of the models for PM 2.5 across the five Lower Fraser Valley stations is shown in Table 5.Here, the base case performance of the two models was mixed, with each model outscoring the other for 7 out of 14 statistical measures.The first level of emissions upgrades has degraded CMAQ's performance as seen in the across-grid statistics of Table 4.The introduction of the 0.6 m 2 s −1 diffusivity lower limit and the renormalizing of non-mobile area source emissions have a similar impact on model results as noted above, while the spatially invariant diffusivity lower limit degrades O 3 performance for all measures except maximum and correlation coefficient (not shown).Example model time series for O 3 and PM 2.5 at a station in the Lower Fraser Valley are compared to observations in Figs. 10 through 12.The degradation in CMAQ4.6'sO 3 performance with the use of the first level of emissions upgrades is noticeable as increases in night-time O 3 levels (compare, e.g., Fig. 2, minima on the night of 30 July).AU-RAMS' O 3 maxima increase with the use of the first-level emissions change, while AURAMS' PM 2.5 levels decrease, sometimes substantially (cf.night of 26 July, Fig. 10b).The subsequent levels of emissions changes have relatively little impact on O 3 (Fig. 11a), though local reductions in PM 2.5 continue (Fig. 11b).Figure 12 shows the local impact of a cut-off in diffusion of 0.6 m 2 s −1 on that of a reduction in non-mobile area source emissions at night (AURAMS5).Here, night-time O 3 levels are erroneously increased, and night-time PM 2.5 levels are decreased.The shift in the timing of non-mobile area source emissions of PM 2.5 (AURAMS6) has a minimal effect on O 3 , while night-time levels of PM 2.5 decrease slightly (compare to AURAMS4).The use of a spatially varying diffusivity lower limit (AURAMS7) improves ozone performance in this urban location relative to a spatially invariant lower limit of 0.6 m 2 s −1 (AURAMS5) and results in reductions in PM 2.5 levels intermediate between the   other scenarios in this figure and the simulation employing a spatially invariant lower limit in diffusivity (AURAMS4).As noted in the above statistical analysis the AURAMS8 simulations were very similar to those of AURAMS7 and have not been plotted here.

Time series of model statistics by hour
The last five scenarios all made use of the stage 3 emissions as their starting point and are examined here in more detail.For these simulations, the hourly grid statistics at each local hour (across all days of the simulation) were calculated and plotted as time series (Figs. 13 and 14).From Fig. 13, the use of a spatially invariant lower limit in diffusivity of 0.6 m 2 s −1 (AURAMS5, grey line) reduces the O 3 correlation coefficient during the night while increasing it during the day (Fig. 13a); furthermore, it increases the O 3 intercept at all hours (particularly at night, Fig. 13b), decreases the O 3 slope at night (Fig. 13c) and increases the night-time O 3 mean bias (Fig. 14a), mean absolute error (Fig. 14c), normalized mean bias (Fig. 14e), normalized mean error (Fig. 14g) and root mean square error (Fig. 14i).The PM 2.5 correlation coefficient for this simulation has also decreased (sometimes halved, Fig. 13b), while the PM 2.5 intercept improves in the early evening hours (Fig. 13d), and the slope has decreased by about 0.25 for most of the day (Fig. 13e).The use of the spatially invariant lower limit in diffusivity does, however, improve all the mean PM 2.5 statistics throughout the day (Fig. 14b, d, h, j).Despite the improvements to mean PM 2.5 statistics, the use of a larger spatially invariant diffusivity lower limit reduces O 3 performance significantly, particularly at night, as well as reducing PM 2.5 correlation coefficient and slope performance.
Temporally renormalizing non-mobile area source emissions of PM 2.5 (AURAMS6, black line) has a relatively small impact on O 3 performance (AURAMS4 and AU-RAMS6 overlap).PM 2.5 correlation coefficients decrease very slightly (Fig. 13b), while the value of the intercept has improved (Fig. 13f), while the PM 2.5 slope has a slight decrease, relative to the AURAMS4 or AURAMS7 simulations.The AURAMS6 results also show decreases in PM 2.5 mean bias, mean absolute error, normalized mean bias, normalized mean error and root mean square error.The shift of primary PM 2.5 emissions from night to day has resulted in an improvement of many of the statistics throughout the day, decreasing slope values slightly, while improving intercepts, and leaving correlation coefficients unchanged.
The AURAMS 7 and AURAMS 8 lines overlap to the extent to be indistinguishable for most of the time series; AURAMS7 is shown in these figures (pink line).The nighttime O 3 correlation coefficient has increased slightly, while the daytime correlation coefficient decreases (Fig. 13a); the night-time O 3 intercept improves slightly while the daytime  values become worse (Fig. 13c), and a slight decrease in the O 3 slope occurs in the early evening (Fig. 13e).The O 3 nighttime mean bias, mean absolute error, normalized mean bias, normalized mean error and root mean square error all improve relative to the AURAMS4 case, with a slight degradation of daytime O 3 performance for mean bias.The PM 2.5 results are similar to the AURAMS5 (spatially invariant diffusivity minimum) case, with correlation coefficient and slope decreasing and mean statistics improving.However, compared to the spatially invariant and higher magnitude diffusivity lower limit, the ozone performance is sometimes improved rather than degraded.

Discussion
The work described above suggests the following: 1.The choice of a larger magnitude and spatially invariant minimum cut-off in diffusivity may sometimes lead to insufficient titration of ozone at night, and/or mixing of higher-level ozone downwards, creating erroneously high O 3 predictions at night and potentially resulting in higher O 3 predictions during the day.When a higher cut-off in diffusivity was tested within AU-RAMS, PM 2.5 scores were improved, but at the expense of degrading O 3 scores, particularly at night.If model PM 2.5 emissions are erroneously high, the use of a high diffusivity cut-off may compensate for these errors, lowering PM 2.5 .This suggests that hourly ozone performance should be used as another means of ensuring that compensating errors of this nature are not taking place.
2. The hypothesis that at least some of the PM 2.5 prediction errors may result from errors in the emissions inputs has some merit.A series of tests to model emissions in which temporal and spatial allocation errors were corrected and changes in diurnal profiles were investigated showed a similar improvement to a spatially invariant diffusion cut-off approach, without degrading O 3 performance or even causing it to improve.This indicates that model performance may under some circumstances be as sensitive to the level of accuracy of the magnitude and spatial and temporal allocation of the driving emissions data as to the parameterization of vertical mixing.The sensitivity of the model to the temporal allocation of emissions will also depend on the strength of vertical diffusivity, with the sensitivity decreasing with increasing diffusivity strength.
3. The use of a land-use-dependent lower limit in diffusivity similar to that employed by CMAQ5.0.1 resulted in improvements to night-time O 3 and many PM 2.5 statistics, though the slope of model PM 2.5 versus observations was decreased, as was the correlation coefficient (which was sometimes half of its previous value, depending on the time of day).One concern about this approach was that the lower limit in diffusivity was being applied in AURAMS throughout the atmospheric column, hence possibly resulting in excessive diffusive mixing in the free troposphere and upper atmosphere.Two further sensitivity runs were carried out in which the upper extent of the region of enhanced "urban" diffusivity was limited to 2.17 km and 285 m, respectively, the former based on urban boundary layer simulations for New York City in Makar et al. (2006), and the latter based on Vancouver measurements from the same reference and from more recent observations and 250 mresolution simulations of the mixing height in Vancouver (Leroyer et al., 2014).Above those heights, the diffusivity minimum was set to 0.01 m 2 s −1 in these additional simulations.The statistical performance of the model at the surface was identical to two or more decimal places to the AURAMS7 run for these simulations, similar to the comparison between AURAMS7 and AURAMS8.This suggests that the diffusivity minimum need only be applied to the typical height of the mixed layer above urban regions; a typical maximum altitude can be employed without degrading surface performance, while avoiding increasing diffusivity in the free troposphere and above.
4. There are other factors which may act to reduce PM 2.5 concentrations aside from temporal and spatial allocation.For example, fugitive emissions of PM 2.5 are subject to land-use-dependant reduction factors to account for the very local-scale uptake of PM 2.5 to vegetation, sometimes resulting in significant reductions from the inventory emissions levels for fugitive sources (cf.Pace, 2005).Similar local reduction/local availability factors may be worth considering for other PM 2.5 sources.
5. We note that the accuracy of the relative magnitude of the emissions of different species is also important.For example, if the NO x emissions alone are currently underestimated, then the negative impact of a high value for the minimum diffusivity on O 3 performance would be decreased.
6.At least some of AURAMS PM 2.5 over-predictions may still reside in vertical mixing, emissions or other issues: model values were still biased positively over all emissions improvement and sensitivity runs performed here, indicating that other processes are required to reduce PM 2.5 levels.
7. It should be noted that the current work is limited, in that only emissions and diffusivity approaches were examined in detail as a cause for differences between the two model results.The model errors in general may also be reduced through adopting a higher resolution to better simulate the complex topography and urban turbulence in the region.For example, the models make use of different deposition parameterizations, and Nopmongcol et al. (2012) found that models with relatively high deposition rates for PM 2.5 were biased low for their overall performance.Leroyer et al. (2014) found that circulation over the Vancouver urban area was best simulated at resolutions of 250 m -this resolution allowing the model to resolve urban up-and downdraughts.While changes to the timing of primary emissions of PM 2.5 were shown to potentially account for much of the difference between the two models, changes to the particle deposition velocity algorithms may account for the remaining positive bias in AURAMS and negative bias in CMAQ for PM 2.5 .This should be examined in future work.Also, while we have focussed on the Lower Fraser Valley in some of our analyses, the relative importance of the different processes may differ in other parts of the model domain.
8. Our work has focussed on the differences between the two models, but has important implications for the broader issue of explaining the causes for the formation of O 3 and PM 2.5 in urban and downwind environments and the relative importance of turbulence and emissions.
Our results suggest that the temporal allocation of emissions may be more important in stable atmospheres than previously expected, but also that this sensitivity is reduced with increasing turbulence, in urban regions.Our results suggest that discrepancies between simulated and observed night-time chemistry cannot be explained via increases in turbulence alone, in that PM 2.5 correlation coefficients and slopes are still degraded with the best of the diffusivity lower-limit procedures tested here.

Conclusions
The CMAQ (version 4.6) and AURAMS (version 1.4.2) models were compared, using a common horizontal map projection and grid spacing, a common set of meteorological inputs, and a common emissions inventory and emissions processing system, for a domain on the north-west coast of North America, for a one-month simulation for the summer of 2005.The initial model results were markedly different, with AURAMS having significantly better performance for O 3 than CMAQv4.6,while CMAQ's performance for PM 2.5 was better than that of AURAMS.One of the main factors leading to the differences was found to be the magnitude of the assumed lower limit in the coefficient of vertical diffusivity employed in each model, with the adoption of a higher value in AURAMS resulting in performance more like that of CMAQv4.6.Improvements in PM 2.5 performance associated with a larger value of a spatially invariant minimum in eddy diffusivity were also associated with significantly degraded performance for O 3 .A subsequent investigation of emissions through improvements to spatial and temporal allocations and sensitivity tests showed that PM 2.5 performance could be improved through emissions improvements, without degrading O 3 performance.The use of a land-usedependent lower limit in vertical diffusivity (similar to that used in CMAQ5.0.1) was found to improve night-time O 3 performance and also PM 2.5 performance for statistics other than the correlation coefficient and the slope (both of which were sometimes halved when this approach was adopted, depending on the time of day).The model results were shown to have a similar level of sensitivity to emissions' spatial and temporal allocation as to lower limits in vertical mixing for lower levels of turbulence.However, when urban vertical diffusivity was modified using the spatially varying lower limit, the model's sensitivity to emissions temporal allocation was greatly reduced.The findings have important implications for our understanding of O 3 and PM 2.5 in urban environments.A spatially invariant lower limit in diffusivity was shown to be insufficient to explain the discrepancies between observations and simulations for these species.However, the choice of a lower limit on diffusivity must be made with care.A spatially varying lower limit in diffusivity improved several statistical scores, implying that accurate portrayal of urban turbulence is critical for model performance.However, higher levels of minimum diffusivity in urban areas also resulted in decreases in PM 2.5 correlation coefficients and slopes and may also mask errors in spatial and temporal allocation of PM 2.5 primary emissions.We have found that the heretofore inadequately resolved timing and spatial allocation of PM 2.5 primary emissions, particularly from the non-mobile area source sector, may have a considerable influence on PM 2.5 concentrations.We therefore recommend improvements to both area source primary PM 2.5 emissions data and urban turbulence parameterizations as foci for future measurement and modelling work.
These results should not be taken to imply that improvements to the model representations of turbulent mixing and/or other factors should be ruled out as a line of investigation for achieving improved model performance.Both emissions (timing, spatial distribution and magnitude) and the magnitude of turbulent diffusion were shown to be of potential importance here.Our results suggest that both processes are complementary routes for further model improvements.Model performance for both O 3 and PM 2.5 should be simultaneously evaluated in future work to ensure that improvements in one predicted species are not offset by degraded model performance in the other. 54

Figure 1 .
Figure 1.(a) GEM 15km domain with boundary of CMAQ and AURAMS 12 km domain shown as inset, (b) 12 km Pacific and Yukon Region Domain, observation stations shown as green dots, background contours elevation; (c) 4 stations (out of 20 total) in the Lower Fraser Valley, elevation contours.A = Vancouver International Airport, B = Pitt Meadows, C = Abbotsford Airport, D = Chilliwack, E = Hope Airport.

Figure 1 .
Figure 1.(a) GEM 15 km domain with boundary of CMAQ and AURAMS 12 km domain shown as inset; (b) 12 km Pacific and Yukon Region Domain, observation stations shown as green dots, background contours elevation; (c) four stations (out of 20 total) in the Lower Fraser Valley, elevation contours.A = Vancouver International Airport; B = Pitt Meadows; C = Abbotsford Airport; D = Chilliwack; E = Hope Airport. 55

Figure 2 .
Figure 2. Comparison between observations , CMAQ, and AURAMS, for (a) O3 and (b) PM2.5 at Vancouver Airport (station (A) in Figure 1).Local standard time night (6 pm to 6 am, Local Standard Time (Pacific Standard Time))shown as shaded regions.

Figure 2 .
Figure 2. Comparison between observations , CMAQ and AU-RAMS for (a) O 3 and (b) PM 2.5 at Vancouver Airport (station A in Fig. 1).Nights (6 p.m. to 6 a.m., local standard time (Pacific Standard Time)) shown as shaded regions.

Figure 4 .
Figure 4. Comparison of spatial surrogates (a) 212 used previously for mining activities) and (b) 221(used in Emissions 1,2,3 scenarios).Note high values of mining activity assumed in urban Vancouver in (a), absent in (b).

Figure 4 .
Figure 4. Comparison of spatial surrogates (a) 212 (used previously for mining activities) and (b) 221 (used in Emissions 1,2,3 scenarios).Note high values of mining activity assumed in urban Vancouver in (a), absent in (b).

Figure 5 .
Figure 5. Temporal allocation of Primary PM 2.5 from top nine sources at night in downtown Vancouver.

Figure 5 .
Figure 5. Temporal allocation of Primary PM 2.5 from top nine sources at night in Downtown Vancouver

Figure 6 .
Figure 6.Comparison of total PM emissions across model domain, original versus scaled (AURAMS6 scenario, see text).

Figure 10 .
Figure 10.Revised stage 1 emissions and model code compared to observations, for (a) O3 and (b) PM2.5 at Vancouver Airport.Compare to Figure 2.

Figure 10 .
Figure 10.Revised stage 1 emissions and model code compared to observations for (a) O 3 and (b) PM 2.5 at Vancouver Airport.Compare to Fig. 2.

Figure 12 .
Figure 12.Revised stage 3 emissions, diffusion cut-off of 0.6 m 2 s −1 and temporally scaled non-mobile area source emissions compared to observations at Vancouver Airport for (a) O 3 and (b) PM 2.5 .Compare to Figs. 2, 8, and 9.

Figure 14 .
Figure 14.As for Figure 13, for mean bias, mean absolute error, normalized mean bias, normalized mean error and root mean square error.

Figure 14 .
Figure 14.As for Fig. 13, for mean bias, mean absolute error, normalized mean bias, normalized mean error and root mean square error.

Table 1 .
Comparison of the main features of the CMAQ and AURAMS models.

Table 2 .
Description of model scenarios.

Table 3 .
Statistical measures of model performance.N is the number of paired observed-model values, O is the mean observed value, M is the mean model value.

Table 4a .
O 3 statistics, entire grid (ppbv).Third and fourth columns: regular and bold fonts correspond to model with worse and better performance, respectively.Subsequent columns: regular font, italics and bold italics correspond to unchanged, worse and better performance, respectively, than the same model in the original comparison.

Table 5 .
PM 2.5 statistics, Lower Fraser Valley stations.Font description as in Table4a.