A Hybrid Single-Particle Lagrangian Integrated Trajectory version 4 (HYSPLIT-4) inverse system that is based on variational data assimilation and a Lagrangian dispersion transfer coefficient matrix (TCM) is evaluated using the Cross-Appalachian Tracer Experiment (CAPTEX) data collected from six controlled releases. For simplicity, the initial tests are applied to release 2, for which the HYSPLIT has the best performance. Before introducing model uncertainty terms that will change with source estimates, the tests using concentration differences in the cost function result in severe underestimation, while those using logarithm concentration differences result in overestimation of the release rate. Adding model uncertainty terms improves results for both choices of the metric variables in the cost function. A cost function normalization scheme is later introduced to avoid spurious minimal source term solutions when using logarithm concentration differences. The scheme is effective in eliminating the spurious solutions and it also helps to improve the release estimates for both choices of the metric variables. The tests also show that calculating logarithm concentration differences generally yields better results than calculating concentration differences, and the estimates are more robust for a reasonable range of model uncertainty parameters. This is further confirmed with nine ensemble HYSPLIT runs in which meteorological fields were generated with varying planetary boundary layer (PBL) schemes. In addition, it is found that the emission estimate using a combined TCM by taking the average or median values of the nine TCMs is similar to the median of the nine estimates using each of the TCMs individually. The inverse system is then applied to the other CAPTEX releases with a fixed set of observational and model uncertainty parameters, and the largest relative error among the six releases is 53.3 %. At last, the system is tested for its capability to find a single source location as well as its source strength. In these tests, the location and strength that yield the best match between the predicted and the observed concentrations are considered as the inverse modeling results. The estimated release rates are mostly not as good as the cases in which the exact release locations are assumed known, but they are all within a factor of 3 for all six releases. However, the estimated location may have large errors.

The transport and dispersion of gaseous and particulate pollutants are often
simulated to generate pollution forecasts for emergency responses or produce
comprehensive analyses of the past for better understanding of the particular
events. Lagrangian particle dispersion models are particularly suited to
provide plume products associated with emergency response scenarios. While
accurate air pollutant source terms are crucial for the quantitative
predictions, they are rarely provided in most applications and have to be
approximated with a lot of assumptions. For instance, the smoke forecasts
over the continental US operated by the National Oceanic and Atmospheric
Administration (NOAA) using the Hybrid Single-Particle Lagrangian Integrated Trajectory
(HYSPLIT) model

Observed concentration, deposition, or other functions of the atmospheric
pollutants such as aerosol optical thickness measured by satellite
instruments can be used to estimate some combination of source location,
strength, and temporal evolution using various source term estimation (STE)
methods

While there are many STE methods applied to reconstruct the emission terms,
it is still a state of the art. Two popular advanced inverse modeling approaches
are cost-function-based optimization methods and those based on Bayesian
inference. However, it is difficult to evaluate the STE without knowing the
actual sources for most applications.

There have been some tracer experiments conducted to study the atmospheric
transport and dispersion with controlled releases. In these experiments, the
source terms were well-quantified and comprehensive measurements were made
subsequently over an extended area

A HYSPLIT inverse system based on 4D-Var data assimilation and a transfer
coefficient matrix (TCM) was developed and applied to estimate a cesium-137
source from the Fukushima nuclear accident using air concentration
measurements

The CAPTEX experiment consisted of seven near-surface releases of the inert
tracer perfluoro-monomethyl-cyclohexane (PMCH) from Dayton, Ohio, USA, and
Sudbury, Ontario, Canada, during September and October 1983

The locations, time, amounts, and measurement counts (

Distribution of the 84 measurement sites and two CAPTEX source locations (Dayton, Ohio, USA, shown as a red diamond, and Sudbury, Ontario, Canada, shown as a green cross).

In this study, the tracer transport and dispersion are modeled using the
HYSPLIT model (version 4,

To avoid running the HYSPLIT modeling repeatedly, a TCM is generated similar
to the previous HYSPLIT inverse modeling studies

Similar to

As an initial test, the exact release location and time are both assumed
known and the only unknown variable left to be determined is the release
rate or the total release amount. For this type of one-dimensional problem,
an optimal emission strength can be easily found without having to use
sophisticated minimization routines. For instance, the

Both

Firstly, no model uncertainties are considered to contribute to

Comparison between the predicted and measured concentrations for
release 2 during the CAPTEX experiment. In the HYSPLIT simulation, at the
exact release location, an emission rate of 67 kg h

As stated in

Emission strength of release 2 that minimizes

Emission strength of release 2 that minimizes

While using logarithm concentration as the metric variable yields better
emission estimates than using concentration as the metric variable, the
results in Table

To consider the model uncertainties in a simplified way,

With logarithm concentration as the metric variable,

Since the predicted concentrations

When concentration is used as the metric variable, the emission strength
estimates with model uncertainties considered are improved over those without
model uncertainties. The estimates of emission strength generally increase
with the model uncertainty, either through

With logarithm concentration as the metric variable, larger

Emission strength of release 2 that minimizes

Emission strength of release 2 that minimizes

Without model uncertainties, the weighting terms for each model–observation
pair do not change with emission estimates. When

Figure

Cost function as a function of source strength when

Figure

When having concentration as the metric variable and with

Emission strength of release 2 that minimizes normalized

Emission strength of release 2 that minimizes normalized

An individual TCM is generated using each of the nine simulations. The nine TCMs can be used to estimate
the emission strengths independently following the same procedure described previously.
Tables

Instead of using each individual TCM generated from nine simulations
independently, the nine TCMs can be combined into one matrix by taking the
median or average values. The combined TCM can then be used to estimate the
source terms. The results for concentration and logarithm concentration
metric variables are listed in Tables

Similar to what was found in earlier sections and also in

The third (25th percentile), fifth (median), and seventh (75th percentile)
emission strengths of nine simulations of release 2 that minimize the
normalized

The third (25th percentile), fifth (median), and seventh (75th percentile)
emission strengths of nine simulations of release 2 that minimize normalized

Emission strength estimates by using the average and median value of
nine simulations for release 2. The cost function is normalized

Emission strength estimates by using the average and median value of
nine simulations for release 2. The cost function is normalized

In addition to the source strength, the source location and its temporal
variation can be retrieved with adequate accuracy using the HYSPLIT inverse
system described here if there are sufficient measurements available. For
instance,

In the following tests, a

Table

The release rates obtained along with the likely source locations are
underestimated by a factor of 3 for release 1, and overestimated by a factor
of 3 for releases 4 and 7, while the estimates for releases 2, 3, and 5 are
much better, with relative errors of

An assumption made in this inverse modeling algorithm is that the differences
between model and observation have a normal distribution with a zero mean.
Figure

The meteorological field and the observations are the two major inputs to the
current inverse modeling. As discussed above, better model performance of
release 2 helps to lead to better inverse results than the other releases.
However, it is impossible to eliminate the model uncertainties. In practice,
ensemble runs can be used to quantify the uncertainties and reduce the model
errors by taking the average or median values of the ensemble runs. On the
other hand, increasing the number of observations is effective to improve the
inverse modeling results and reduce the result uncertainty. In principle,
when the release strength is the only value to be determined, each
measurement within the predicted plume can provide an independent estimate.
However, relying on a single observation to estimate the strength is
problematic since a particular model output can be very different from the
observation and thus lead to an erroneous estimation of the source
strength when used in isolation. For instance, although the HYSPLIT
predictions of release 2 with exact source terms are very good, compared with
individual measurements, they have severe underestimation (e.g., 0.77 pg m

The source location (latitude, longitude) and release rate

Distribution of 121 candidate source locations for release 2. The
minimal cost function at each location associated with an optimal release
strength is indicated by color. The cost function defined in
Eq. (

Probability density function (pdf) of

A HYSPLIT inverse system developed to estimate the source term parameters has been evaluated using the CAPTEX data collected from six controlled releases. In the HYSPLIT inverse system, a cost function is used to measure the differences between model predictions and observations weighted by the observational uncertainties. Inverse modeling tests with various observational uncertainties show that calculating concentration differences results in severe underestimation, while calculating logarithm concentration differences results in overestimation.

Unlike other STE applications where model uncertainties are either ignored or assumed static, we introduce the model uncertainty terms that depend on the source term estimates. The model uncertainty terms improve inverse results for both choices of the metric variables in the cost function. It is also found that cost function normalization can avoid spurious minimal source terms when using logarithm concentration as the metric variable. The inverse tests show that having logarithm concentration as the metric variable generally yields better results than having concentration as the metric variable. The estimates having logarithm concentration as the metric variable are robust for a reasonable range of model uncertainty parameters. Such conclusions are further confirmed with nine ensemble runs where meteorological fields were generated using a different version of the WRF meteorological model with varying PBL schemes.

With a fixed set of observational and model uncertainty parameters, the
inverse method with logarithm concentration as the metric variable is then
applied to all of the six releases. The emission rates are well recovered, with
the largest relative error as

The HYSPLIT model is publicly available at

TC designed and conducted the inverse tests. AS provided guidance on the HYSPLIT modeling and suggestions on the inverse tests. FN performed the HYSPLIT ensemble simulations using different PBL schemes.

The authors declare that they have no conflict of interest.

This study was supported by NOAA grant NA09NES4400006 (Cooperative Institute for Climate and Satellites – CICS) at the NOAA Air Resources Laboratory in collaboration with the University of Maryland. Edited by: Slimane Bekki Reviewed by: two anonymous referees