Reply on RC1

This article provides an incremental step forward compared to the Campbell et al 2022 paper. Campbell et al provides a huge step forward for the regional air quality modeling field. A major limitation for regional models has been coupling to existing available meteorology. The Campbell et al 2022 is a great addition that includes a similar NACCCMAQ and GFS-CMAQ comparison, and this paper helps to strengthen evidence that the resulting modeling is credible. If I am correct, the previous Campbell et al. comparison did not isolate meteorology differences. This paper uses the NEIC 2016v1 emission inventory for both models (as well as GBBEPx and BEIS), which allows for a more clear isolation of meteorology. The meteorology still includes both physics-parameterization, scale, input and interpolation differences. The isolation of met is definitely a strength. The paper uses validation against FIREX-AQ and one month against surface observations. The weakness in this comparison is the focus on only summer months, which highlights ozone performance more than PM2.5.

This article provides an incremental step forward compared to the Campbell et al 2022 paper. Campbell et al provides a huge step forward for the regional air quality modeling field. A major limitation for regional models has been coupling to existing available meteorology. The Campbell et al 2022 is a great addition that includes a similar NACC-CMAQ and GFS-CMAQ comparison, and this paper helps to strengthen evidence that the resulting modeling is credible. If I am correct, the previous Campbell et al. comparison did not isolate meteorology differences. This paper uses the NEIC 2016v1 emission inventory for both models (as well as GBBEPx and BEIS), which allows for a more clear isolation of meteorology. The meteorology still includes both physics-parameterization, scale, input and interpolation differences. The isolation of met is definitely a strength. The paper uses validation against FIREX-AQ and one month against surface observations. The weakness in this comparison is the focus on only summer months, which highlights ozone performance more than PM2.5.

Response:
Overall this is a good paper that characterizes model performance of an important configuration (GFS-CMAQ) and compares it to a more common application (WRF-CMAQ). Perhaps my one disappointment was that the time period for AirNow evaluation was very short and may not highlight issues under a variety of conditions where the model will be applied. I support the publication of this manuscript. Hopefully, minor notes below can be incorporated.
Thank you for your encouraging. This manuscript is trying to expand our previous paper by comparing this method with the prevailing WRF-CMAQ system. The overall result show these two systems are similar, and their difference is mainly related to the meteorological models' dynamic/physics, not the coupler. So, the interpolation-based coupler is useful when the driving meteorological variables are sufficiently available. It is true that the GFS has much more vertical layers than WRF. However, GFS has much higher heights, and they has similar vertical layers below 1km. The GFS meteorology was also collapsed into the 35 layers to drive CMAQ. So they are comparable in certain extent, as shown in Tables 2, 3 for altitude below 3km. Our other paper (Campbell et al. 2022) included the comparison with the previous version: GFSv15-CMAQ, showing that the GFSv16-CMAQ and GFSv15-CMAQ could have significant difference over certain region, mainly due to their difference physics. We added some words about that.

Pg 7, Ln 37: for[ ]10
Changed Figure 2: Ideally, these would be based on local times. The 6UTC is about 1am on the east coast while the west coast is more like 10pm. And 18UTC is about 1pm on the east coast and 10am on the west. I know the authors are aware and likely have already considered the implications based on the PBL rise, but the reader won't be fully aware of the differences in model PBL development rates. The sharp rise and collapse noted by the authors raises questions. For example, the geographic differences could have something to do with the rates of rise and drop rather than the ultimate depths.
Yes, it is true that the selected times may not represent around-noon and midnight situations across the CONUS domain, since it is hard to find one-fit-all time for the 4 time zones. Figure 2 only shows the normal daytime (nighttime) monthly-mean situation after sunrise (sunset), and 18UTC/06UTC are not in the transition time ranges for the sharp rise and collapse of PBL around sunrise/sunset. So these selected times avoided the PBL's fast-change time. It is true that the PBL spatial variations are related to regional geographic differences. We added some explanation on page 7 about the issues. As mentioned in the manuscript, we use one-minute averaged flight data, and the models' hourly 12km outputs are spatiotemporally interpolated to the flight paths for comparison. Yes, NACC output was in hourly resolution.

Pg 11, ln 2-4: Can you expand a bit here? Is the CMAQ-diagnosed W just as appropriate for the NACC interpolated values?
Expanded by adding Figure S2 and corresponding discussions.

Pg 11, ln 15-17: Is the ozone here VOC-limited? Or, why do you think this when the NOx is underestimated from 20:30Z to 23Z and ethene (a highly reactive VOC) is overestimated? Underestimated CO can actually increase the yield of HO2 per OH reaction.
Over some segments of the flight 07/22 with high NOx, ozone tends to VOC limited. The NOx underestimation for 20:30-23UTC could be seen in Figure 9, while ethene was slightly overestimated by about 0.2 ppbv, and ethane was underestimated by about 1 ppbv. We removed CO in that sentence.
Pg 13, ln 29: nitpicky, but 25UTC should be 01UTC Aug 7 Added the 01UTC Pg 14, ln 31: I am interpreting non-fire events as all non-fire times. So, these aren't really events. In that way, I would avoid calling these "2019 events" since you are talking about overall statistics. Similarly, the concentrations of SO2 you are seeing are quite low --the mean biases are fractions of a microgram or ppb. Based on these magnitudes indicating ambient values, it seems plausible that the this could indicate chemical lifetimes or deposition errors too? BTW, most power plants have continuous emission monitoring, which leads to more certain emissions.
Changed the "non-fire events" to "flight segments without fire influences". For SO2 underestimation, you are right that power plant emissions could be the issue. To emulate the forecast behavior, we did not use continuous emission monitoring data which is available after the events, but just the original NEIC 2016 point source inventory. Some sources supposed to shut down in the original inventory might still emit pollutants during the flight observations, leading to the disagreement.
pg 15, ln 1-23: The observed NOz is about 1.63 ppb while the sum of measured HNO3 and PAN is 0.69 ppb. True the PAN bias is very low (-0.22 or -0.25 ppb), but clearly this is disproportionate to the NOz low-bias (-0.96 Ryerson et al (1998, https://agupubs.onlinelibrary.wiley.com/doi/10.1029, in this NOy instrument, "aerosol transmission is not characterized, but inlet design and orientation probably discriminates against the majority of aerosol by mass". The particulate nitrate ion was also underestimated, but its precursor HNO3 was overestimated. As discussed in the next paragraph, this issue should be related to the underestimation of cations, like NH 4 + , which caused the shift of gas-aerosol equilibrium partition shift of the nitrate ions Pg 15, ln 25-30: Can you discuss where the errors may originate like you did with NOz and NH4, and nitrate?
We expanded some. Please check the revised manuscript.
Thank you again for your comments.