This work describes the lightning nitric oxide (LNO) production schemes in the Community Multiscale Air Quality (CMAQ) model. We first document the existing LNO production scheme and vertical distribution algorithm. We then describe updates that were made to the scheme originally based on monthly National Lightning Detection Network (mNLDN) observations. The updated scheme uses hourly NLDN (hNLDN) observations. These NLDN-based schemes are good for retrospective model applications when historical lightning data are available. For applications when observed data are not available (i.e., air quality forecasts and climate studies that assume similar climate conditions), we have developed a scheme that is based on linear and log-linear parameters derived from regression of multiyear historical NLDN (pNLDN) observations and meteorological model simulations. Preliminary assessment for total column LNO production reveals that the mNLDN scheme overestimates LNO by over 40 % during summer months compared with the updated hNLDN scheme that reflects the observed lightning activity more faithfully in time and space. The pNLDN performance varies with year, but it generally produced LNO columns that are comparable to hNLDN and mNLDN, and in most cases it outperformed mNLDN. Thus, when no observed lightning data are available, pNLDN can provide reasonable estimates of LNO emissions over time and space for this important natural NO source that influences air quality regulations.

Lightning nitrogen oxide (LNO) is produced by the intense heating of air
molecules during a lightning discharge and subsequent rapid cooling of the
hot lightning channel (Chameides, 1986). Since NO and

To simulate the amount of LNO production in space and time in a chemical transport model (CTM), it is important to know the following: (1) where and when lightning flashes occur, (2) the amount of LNO produced per flash and (3) how LNO is vertically distributed. Historically, the lightning flash rates are derived with the aid of parameterizations in CTMs (Price and Rind, 1992; Allen et al., 2000, 2010, 2012; Barthe et al., 2007; Miyazaki et al., 2014). Various schemes have been developed for determining LNO production per flash based on assumptions regarding LNO production efficiency per flash or the energy ratio of cloud-to-ground (CG) flashes to intra-cloud (IC) flashes (Schumann and Huntrieser, 2007). The derived parameterizations based on theoretical analysis (e.g., Price et al., 1997), laboratory studies (Wang et al., 1998), limited aircraft or satellite observations, or a combination of these methods, are generally too simplified, have large uncertainties (Miyazaki et. al., 2014) and cannot represent well the regional and temporal variability of lightning activity (Boccippio et al., 2000; Medici et al., 2017). Over the past decades, our understanding of the production and distribution of LNO has been greatly improved with the aid of ground-based lightning detection networks (e.g., Nag et al., 2014; Rodger et al., 2006), aircraft measurements for specific storms (e.g., Huntrieser et al., 2011), satellite observations (Pickering et al., 2016; Medici et al., 2017; Boersma et. al., 2005), and modeling studies (e.g., Zoghzoghy et al., 2015; Cummings et al., 2013). Even though there are still substantial sources of uncertainty, the LNO production rate per flash is now more robust than earlier literature estimates (Bucsela et al., 2010; Huntrieser et al., 2009, 2011; Pickering et al., 2016; Ott et al., 2010).

An LNO production module, based on the lightning flash rate and LNO
parameterizations of Allen et al. (2012), was first introduced in the
Community Multiscale Air Quality (CMAQ) (Byun and Schere, 2006) model
Version 5.0 (CMAQv5.0) that was released in 2012. That scheme, like the
schemes used in previous work (Kaynak et al., 2008; Smith and Mueller,
2010; Koo et al., 2010), uses flash rates from the National Lightning
Detection Network (NLDN) (Orville et al., 2002) to constrain LNO.
Specifically, LNO production is proportional to convective precipitation and
is scaled locally so that the monthly average convective-precipitation-based
flash rate in each grid cell matches the average of the monthly total NLDN flash
rate, where the latter is obtained by multiplying the detection-efficiency-adjusted CG flash rate by

In this paper, we present the updates and development of the LNO module that was released in CMAQ version 5.2 in June 2017 and present a preliminary assessment of the spatial and temporal distribution of LNO columns in the existing (mNLDN), updated (hNLDN), and newly developed (pNLDN) schemes. In a follow-on paper, a comprehensive evaluation of model performance with the various schemes will be presented.

Section 2 of this paper provides descriptions of the data and model configurations. Section 3 describes the existing and updated LNO schemes in CMAQ that are based on the NLDN data. Section 4 presents an analysis of the historical relationship between NLDN lightning flashes and model-predicted convective precipitation. Section 5 provides the derivation of the parameterization scheme based on the analysis in Sect. 4. Section 6 is the assessment of the mNLDN, hNLDN, and pNLDN schemes on their production of total LNO columns. We conclude this study in Sect. 7 with discussions.

The observed lightning activity data were obtained from the National
Lightning Detection Network (NLDN) (Orville, 2008). The raw CG flashes were
gridded onto the model horizontal grid cells hourly for use in the hNLDN
scheme and then aggregated into monthly mean values for use in the mNLDN
scheme. The NLDN CG flashes have a detection efficiency of 90 %–95 % and
a location accuracy of approximately 500 m. The detection efficiency for
NLDN IC flashes is lower and more variable (Zhu et al., 2016), so the
climatological IC

The meteorological fields used in developing the LNO schemes are provided by WRF model simulations (Stamarock and Klemp, 2008). The WRF output fields were processed using the Meteorology-Chemistry Interface Processor (MCIP) to provide input for the CMAQ modeling system (Otte and Pleim, 2010). We leveraged on the archived WRF simulations from 2002 to 2014 to derive the regression-based scheme (pNLDN). The archived meteorological outputs were generated from three WRF versions: version 3.4 for 2002 to 2005, version 3.7 for 2006 to 2013, and version 3.8 for 2014.

NO is the direct product of lightning flashes, and after release a large
portion of it can be quickly turned into

Beginning with CMAQv5.0, the LNO module contains two options for in-line
(based on model simulated parameters at the run time) LNO production. The
first option is an over-simplified parameterization that assumes that 1 mm h

Flowchart of data preprocessing for LNO production in CMAQ for the mNLDN scheme.

The second option in CMAQv5.0 was developed by Allen et. al. (2010, 2012)
and utilized monthly National Lightning Detection Network (hereafter
referred to as mNLDN) flash data. In this scheme, flashes are assumed to be
proportional to CP with the relationship varying locally with a two-step
adjustment so that monthly average CP-based flash rates match the NLDN
observations. First, a global factor (lightning yield) is applied at each
grid cell to produce lightning flashes from model CP. Then, a local
adjustment (LTratio) is applied at each grid cell to ensure that the local
CP- and NLDN-based flash rates match. Figure 1 shows the data preprocessing
for LNO production using mNLDN data in CMAQ. First, CG flashes are gridded
onto the modeling grid that is specified in the model input meteorological
file using the Fortran program, NLDN_2D. The output (GRIDDED
NLDN) is the monthly mean lightning flash density (LFD) over the model
domain in IOAPI format. Ocean_factor, Strike_factor, and ICCG are R scripts that are used to convert NLDN CG flashes to
quantities that are proportional to LNO production. The Ocean_factor script ingests the land–ocean mask and indicates values of 1 for grid
cells that contain land and 0.2 for grid cells that only contain ocean. A
value of 0.2 is used for oceanic-grid cells because the amount of lightning
produced per unit of convective rain is approximately 5 times less for
marine convection than for continental convection (Christian et al., 2003).
The Strike_factor script ingests the gridded NLDN CG
lightning flash data and the CP values predicted by the upstream
meteorological model WRF to calculate the Ratio_NLDN2CP
according to the following equation:

The moles of LNO are then distributed vertically using the two-peak
algorithm described in Allen et al. (2012), which is a preliminary version
of the segment altitude distributions (SADs) of flash channel segments
derived from Northern Alabama Lightning Mapping Array data by Koshak et al. (2014) convolved with pressure. A two-peak distribution is used because NO
produced by IC flashes is centered at a higher layer of the atmosphere (350 hPa) than NO produced by CG flashes (600 hPa). Accordingly, LNO is
distributed with two Gaussian normal distributions: the upper distribution
has a mean pressure of 350 hPa and a standard deviation of 200 hPa, and the
lower distribution has a mean pressure of 600 hPa and a standard deviation
of 50 hPa. For each CMAQ layer, the pressure (

At each pressure level (

Then the fraction of the column emissions at the pressure (

At each model layer, the weighted contribution is

Finally, the LNO at each layer is

As described above, the LNO production scheme, mNLDN, calculates CLNO using
scaled values of the convective precipitation. To simplify the procedure to
generate LNO, in CMAQv5.2 we used the gridded hourly NLDN (hNLDN) flash data
in the lightning module, which reduces Eq. (3) to

Since the hNLDN scheme directly injects LNO into the modeling grid cells
based on observed lightning flashes, it is possible that desynchronization
exists between LNO and other convectively transported precursor species for

The existing LNO production schemes in CMAQ depend heavily on CP amounts
predicted by WRF. We analyzed meteorological fields generated by the WRF
model simulations from 2002 to 2014 over the continental United States to
examine the relationship between the observed lightning flashes and the
predicted CP. Though the WRF model has evolved over a few versions (from
version 3.4 to 3.8), the Kain–Fritsch (KF) convective scheme (Kain and
Fritsch, 1990) was used consistently in simulations for all years. We first
examined the relationship between lightning flashes, which were aggregated
into hourly flash counts and gridded onto the modeling grid cells, and the
modeled hourly CP from WRF over the continental United States (12 km horizontal grid
spacing). The results (not shown) showed little to no correlation between
the observed lightning flashes and the predicted CP, regardless of the time
period examined. However, when the lightning flashes and CP were each
aggregated to mean values over geographical regions (the entire modeling
domain as the extreme) for each month in the time series, as shown in Fig. 2, the correlation between the two quantities was obvious. This suggests
that although the model-predicted CP is not a good predictor of lightning
events in space and time, it does show its skill in predicting cumulative
lightning activity across geographic regions for a given month. Further
analysis of the relationship indicates unique distribution patterns in space
over the contiguous United States through the years. As shown in Fig. 3a
and b, lightning yields per unit CP are smaller in the eastern US than in
other areas confirming that the lightning yield varies regionally. The
original scheme used a universal lightning yield for the entire modeling
domain, while Allen et al. (2012) allowed the yield to vary locally. This
analysis indicates that the yield is lowest in the east (Region 1) but
similar in regions 2–5, which could be combined. Figure 4a shows the
scatter plots and the corresponding linear regression equations as well as
the correlation coefficients (

Correlation coefficients with error bars indicating the 95 % confidence interval between 12 monthly mean NLDN lightning flash density and mean convective precipitation from 2002 to 2014 over the model domain. All is the correlation coefficient for all the years.

Comparison of monthly mean NLDN lightning flash density (km

Statistically, the relationship between CP rate and NLDN lightning flash
rate over large regions suggests similar yields within each region. But
considerable scatter still exists within each region, and the overall
statistics may be dictated by certain large values. As an estimate, the most
direct approach would be to use regression equations to determine LNO from
CP for western US grid cells and regression equations for eastern US
grid cells as shown in Fig. 4a and b. However, in addition to the
concern associated with variations within a region, this direct application
would also cause some practical problems: (1) the analysis regions are
arbitrary, and (2) the LNO production would be spatially inconsistent with
abrupt changes along the bordering grid cells separating regions. Therefore,
instead of deriving regression equations using the regional data, linear
(log-linear) regression equations are derived using data averaged over an
area of adjacent grid cells (analogous to the derivative concept to cut
regions into small areas that cover adjacent model grid cells). In areas
that lack enough data points to perform the regression, data are filled
using the inverse distance weighting (IDW) spatial interpolation technique
(Lu and Wong, 2008). Figure 5 shows the spatial–linear (upper panel) and
log-linear (lower panel) regression parameters and the correlation
coefficients over patches of 3

Parameters of linear

A robust parameterization scheme should be relatively insensitive to the training time period. In order to test this, the lightning yield (slope of the linear and log-linear regression) was re-calculated using data from 2002 to 2012 (P02-12), 2002 to 2014 but excluding 2011 and 2013 (P02-14sb2), and 2009 to 2014 (P09-14). The results are shown in Fig. 6. As indicated in Fig. 6, the spatial patterns of slopes generated using data from different time periods for both linear (upper panel) and log-linear regressions (lower panel) are similar except that larger values are created over the Great Plains east of the mountains when the most recent years' data (2009–2014) were used to perform the linear regression. This difference may be attributable to the evolution of the WRF model and the NLDN data (Nag et al., 2014) through the years, and it also indicates that the parameters need to be updated to include the most recent data available.

The slope maps from linear

Total monthly column LNO over the model domain using parameters
derived from different time periods for

To test the sensitivity of LNO to the parameters derived from different time periods, Fig. 7 shows the total monthly column LNO for 2011 and 2013 generated using different set of parameters derived using linear regression from different time periods, and for comparison the LNO produced by the updated NLDN-based scheme, hNLDN, described in Sect. 2 is also included. As shown in Fig. 7a, in 2011 the parameter schemes (pNLDN) (except for P09-14) tend to underestimate LNO during summer months (June, July and August; JJA) compared with hNLDN scheme, but for 2013 (Fig. 7b) the pNLDN schemes produce both over- and under-estimates of LNO during the summer months. In both years, very small differences are observed with the pNLDN scheme with parameters from different time periods except P09-14. The P09-14 parameters seem to produce the most LNO during summer months in both years making it the best to match LNO produced by hNLDN scheme in 2011, but it yields more overestimates in June and July of 2013.

As discussed earlier, the log-linear regression between NLDN lightning flashes and CP produced better correlation coefficients than the simple linear regression. We also noticed, however, that if the log-scale parameters are applied to all the data, too much LNO is produced relative to the hNLDN scheme, especially during winter months when both lightning activity and convective precipitation occur less frequently. This high bias exists because the log scale tends to inflate contributions from small values when linear regression is performed after the log transformation. To test the impact of log scale on the production of LNO, we choose the summer months (JJA) in 2011 and specify a series of cutoff values for CP (cm) that is linear regression parameters are applied if CP is smaller than a specific cutoff value, and log-linear regression parameters are applied otherwise. Figure 8 shows the monthly total column LNO produced with CP cutoff values from 0.1 (P01) to 0.6 (P06) cm. As indicated in Fig. 8, the smaller the cutoff value is, the more LNO produced. When the cutoff value of 0.2 is applied, LNO production best matched those produced by hNLDN; however, the summer months in 2011 are different from other years, in that significantly more lightning flashes and convective precipitation were observed in the continental United States, especially in the east and southeast US. When the same cutoff value (0.2) is applied to other years, LNO is overestimated compared with that produced by the hNLDN scheme. For generalized application to all years, dynamic cutoff values are used with this scheme (the result is also shown in Fig. 8). Specifically, if CP is greater than the intercept value at a location from linear regression, the log-linear regression parameters are used; otherwise, the linear regression parameters are applied. This technique demonstrates acceptable results for all of the years studied.

Total monthly column LNO over the model domain using different CP cutoff values during summer months in 2011. hNLDN: LNO produced by the hNLDN scheme, P01-P06: CP (cm) cutoff values from 0.01 (P01), 0.02 (P02), to 0.06 (P06). Linear regression parameters are applied when CP is less than the cutoff value and log-linear regression parameters are used otherwise. Dym is when the dynamical cutoff values are used (see text).

As a preliminary assessment of these LNO production schemes, we only investigate the distribution of column LNO in time and space; a more detailed evaluation of the impact of these schemes on air quality will be presented in a subsequent study.

Total monthly column LNO over the model domain with different LNO
production schemes for

Total daily column LNO over the model domain with different LNO
production schemes for

Figure 9 shows the monthly total column LNO produced by the different schemes for the years 2011 and 2013. For both years, the mNLDN scheme tends to generate significantly more LNO during warm months (May–September) than the hNLDN and pNLDN schemes. Collectively during May to September, mNLDN produced about 40 % (39 % in 2011 and 42 % in 2013) more LNO than hNLDN. The regression parameter-based scheme, pNLDN, underestimated LNO during summer months (JJA) in 2011 compared to hNLDN, but the two schemes generally agreed well in 2013. As mentioned earlier, the significant underestimation of LNO by pNLDN may be attributed to underestimated convective precipitation in WRF, which reduced the count of lightning flashes during this period. There were about 17 % more lightning flashes during JJA in 2011 than the same period in 2013 over the continental United States. The relatively poor correlation coefficient between NLDN flashes and model-predicted CP values in 2011 is also evident in Fig. 2, which was the second smallest among the 13 years studied. The daily total column LNO produced by these schemes for July 2011 and July 2013 is presented in Fig. 10. Among the schemes, mNLDN produced the most LNO on most of the days in July for both years. Except for a few days, pNLDN underestimated LNO in 2011 relative to the other approaches, but in 2013 it produced comparable results to hNLDN except that it overestimated LNO on the first few days of the month. In addition, the day-to-day variance generated by pNLDN seems smaller compared with hNLDN for both years.

Spatial distribution of monthly column LNO with different LNO
production schemes for July 2011

The spatial distributions of monthly total column LNO produced by each of the three schemes over the contiguous United States for July 2011 and July 2013 are presented in Fig. 11. Overall, the spatial patterns generally agree with each other for both years with pNLDN producing relatively smaller values, especially along the edges or over locations where LNO amounts are relatively small. Note that both hNLDN and mNLDN are based on the same monthly observed data, so consequently they produce similar spatial patterns. The pNLDN is derived based on the linear and log-linear regression parameters using multiple years' historical observed data and model simulations with different versions, and it is applied to a specific period without including observations. Nevertheless, as the main intention for pNLDN to be applied is when there are no observed lightning data available (such as air quality forecasts and past or future climate simulations with similar climate conditions), it can provide a reasonable estimate for LNO comparable to that estimated by hNLDN and mNLDN.

In this study, we described the LNO production schemes in the CMAQ model's lightning module and updated the existing monthly NLDN observation-based scheme with the current understanding and resources. For retrospective model applications, the hourly NLDN observation-based scheme, hNLDN, is expected to provide the highest-fidelity spatial–temporal LNO. If observations are not available, such as in air quality forecasts and future climate studies, the linear and log-linear regression parameter-based scheme, pNLDN, provides a spatial–temporal estimate of LNO. Note that even though the pNLDN scheme can provide LNO estimates for past or future climate studies, the spatial dependency of the relationship presented here may not hold under changing climate conditions.

Large uncertainties are still associated with each of these schemes
resulting from the various assumptions common to all the LNO production
schemes, e.g., the uniform NO production rate per flash, the IC

Lightning and LNO will remain an active research area in atmospheric
sciences for the foreseeable future. For example, lightning data from
Geostationary Lightning Mapper (GLM) instruments on the Geostationary
Operational Environment Satellite (GOES) 16 and 17 (Goodman et al., 2013;
Rudlosky et al., 2019) are now publicly available. With more observations
(both at surface and in space) available, the assumptions associated with
the LNO schemes will be updated to reflect the evolving understanding of LNO
production in time and space. For example, Medici et al. (2017) recently
updated IC

CMAQ model documentation and released versions of the source code, including all model code used in this study, are available at

The raw lightning flash observation data used are not available to the
public but can be purchased through Vaisala Inc. (

DK was involved in data collection, algorithm design, model simulation, analysis and paper writing. KEP and DJA contributed to algorithm formation and paper writing. KMF was involved in algorithm formation, data analysis and paper writing. DCW performed code updates. RM and SJR were involved in the paper writing.

The authors declare that they have no conflict of interest.

The views expressed in this paper are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency (EPA).

The authors thank Brian Eder, Golam Sarwar and Tanya Spero (U.S. EPA) for their constructive comments and suggestions during the internal review process.

This paper was edited by Gerd A. Folberth and reviewed by two anonymous referees.