This paper investigates how to refine the ground meteorological observation network to greatly improve the PM

Air pollution has become a serious environmental issue in many Asian
countries in recent decades. The Beijing–Tianjin–Hebei region (BTH region),
being one of the most prosperous and populated regions in China, has suffered from successive heavy haze events during the past several decades (Xiao
et al., 2020). Despite large reductions in primary pollutant emissions due
to the recent strict pollution control policies in China, heavy haze events have still occurred in recent years, even during the COVID-19 lockdown period
(Huang et al., 2021). Particulate matter with an aerodynamic diameter that is smaller than 2.5 (PM

To accurately predict the PM

Data assimilation has been recognized as being one of the most effective ways to improve the accuracy of initial conditions (Talagrand, 1997). High-quality
meteorological initial fields could be obtained by assimilating the observations from an observation network for atmospheric conditions (Snyder,
1996). Among the various meteorological observation sources, the observations from the ground meteorological stations are often assimilated to predict the meteorology fields (Hu et al., 2019; Devers et al., 2020; Yao et al., 2021). Yang et al. (2022) studied the uncertainties in the meteorological initial fields with respect to PM

In the past few years, a high quantity of meteorological stations have been constructed around the world to study atmospheric motions and the weather and climate variabilities. In China alone, there were more than 2000 stations operated by the China Meteorological Administration (CMA) in the year of 2020, and the locations of the stations are generally selected based on the administrative district and resident populations (

The dominant meteorological stations to be identified, as mentioned above,
would provide the meteorological observations that will have the largest
impact on the PM

The remainder of the paper is organized as follows. In Sect. 2, we introduce
the model, data, and method. In Sect. 3, we reproduce the eight heavy haze
events that occurred in the BTH during 2016–2018 and identify the sensitive areas of the surface meteorological conditions for the PM

In this study, we use the Weather Research and Forecasting Model (WRF) and
its adjoint model, and the nested air quality prediction modeling system
(NAQPMS), to identify the sensitive areas of surface meteorological
conditions associated with the regional PM

The NAQPMS model is a 3-D regional Eulerian chemical transport model, which
contains emissions, advection/convection, diffusion, dry and wet deposition, and gas/aqueous chemical modules (Wang et al., 1997, 2006). It has been widely used in scientific studies and practical forecasts for air quality in China. The anthropogenic emissions are obtained from Multi-resolution Emission Inventory model for Climate and air pollution research (MEIC;

The NAQPMS model is driven by the meteorological fields generated through
the WRF (

There are eight typical heavy haze events that occurred in the BTH region during the wintertime (OND or October–November–December) in the years of 2016–2018 (Table 1), and all eight events and their associated forecasts are
investigated in the study. The observed surface PM

The root mean square error (RMSE;

The maps of

To produce the initial and boundary conditions for WRF simulation, the fifth
generation ECMWF reanalysis for the global climate and weather (ERA5;

The CNOP represents the initial perturbation (or initial error) that results
in the largest forecast error in the verification area at the verification
time and is the most sensitive initial perturbation. The dynamical equation
in the nonlinear model can be written as Eq. (1).

In our study, since we focused on the uncertainties in the meteorological
initial condition associated with the PM

The spectral projected gradient 2 (SPG2) method is used to solve the
optimization problem in Eq. (3). It is noted that the SPG2 algorithm is
generally designed to solve the minimum value of a nonlinear function (cost
function) with an initial constraint condition, and the gradient of the cost
function with respect to the initial perturbation represents the descending
direction in the search for the minimum cost function. Therefore, in
this study, we have to rewrite the cost function in Eq. (3) as

The flow chart of a CNOP calculation.

In this section, we first simulate the PM

For each of the eight heavy haze events, after the 10 d spin-up of WRF-NAQPMS, the ERA5 and the GFS data are separately used to initialize the
WRF model, and then two forecasted meteorological fields can be obtained,
which force the NAQPMS to output two kinds of simulations of PM

Time series of the PM

The surface wind (vector; m s

To quantify the different sensitivities of the two simulations on the initial
meteorological conditions, the root mean square error (RMSE) and correlation
coefficients (CCs) between the simulation and observation of the eight events
are calculated. It is found that, of all eight events, the ERA5 simulations show smaller RMSEs and higher CCs with respect to the observations (see Table 1). If we take an average of the eight events for the whole simulation period (see Table 1 and Fig. 3), then the RMSE of the ERA5 and GFS simulations are 41.16 and 59.83

From Fig. 3, it is known that when the haze started to develop, it usually
took more than 2 d to accumulate and dissipated rapidly in a few hours. For example, for the event that occurred during the period from 00:00 BJT
(Beijing time; UTC

To do so, we consider the forecasts with the fixed lead time of 12 h but with different start times. For each event, we analyze four cycle forecasts
every 12 h from its start time (see Table 2) over the accumulation process (hereafter AFs) and two forecasts over the dissipation process
(hereafter DFs). As a result, a total of 32 AFs and 16 DFs was obtained
for the eight events under investigation. To identify the sensitive areas of
the ground meteorological field in each forecast, we adopt the idea of
Lorenz (1965); this means that when the effect of initial error growth is explored, then an assumption of a perfect model is done. However, in reality, whatever the initial field of model may be, even in the case of emission inventories, it certainly consists of uncertainties. So to make our findings realistic, we have to take the better simulation initialized by ERA5 as the “truth” run because we cannot obtain relevant observations from the monitoring center for assimilations and use poorer simulation initialized by the GFS forecast data as being either the control forecast or the control run. The differences between them reflect the sensitivities of the forecast uncertainties in the PM

Start times of the cycling AFs and DFs for the eight heavy haze events.

Now we determine the sensitive areas of the ground meteorological field associated with PM

The horizontal distribution of the wind (1) and temperature (2) components of the CNOP-type errors for the AFs that started from 02:00 BJT on 2 November

The horizonal distribution of the TME (units in J kg

From the sensitive areas above, it is can be seen that, even for the same
event, the specific distributions of the sensitive areas are dependent on
the start times of the forecasts. It is therefore conceivable that the 48 forecasts for the eight events will exhibit the sensitive areas of the multifarious structures and locations. In terms of this situation, one
naturally asks how a cost-effective observation network that does the PM

In this section, we will construct a cost-effective meteorological observation network based on the sensitive areas identified by the CNOP-type
errors in the 48 forecasts for the eight heavy haze events. Then, a series of
OSSEs (see Sect. 3.2) are conducted to show the advantage of the additional observations from this observational network in improving the PM

For the 48 CNOP-types errors, we use a quantitative frequency method (see
Duan et al., 2018) to identify the spatial grids that are often covered by
large values of the TME. Specifically, for each CNOP-type error, we sort its
spatial grid points in a decreasing order, according to the amplitude of
the TME, and choose the first 3 % of the grid points in the model domain. Then a total of 424 grid points is obtained, which bears larger TME values than the other grid points and contributes more to the meteorological forcing errors associated with the relevant PM

The spatial distributions of the 174 sensitive grid points (red squares) for AFs

Now we investigate how much this essential network can explain the skill
improvement of the PM

For the 32 AFs, when assimilating the 99 simulated observations, the overall
improvements are 12.03 % and 13.59 %, as measured by

The box plot of the (1) AE

The mean and maxima of the improvements measured by

The essential network has been shown to play the dominant role in the improvement of the PM

The simulated observations (i.e., the ERA5 data) taken from the new observation networks are assimilated to the control run to show the improvements achieved by assimilating the additional observations, where it
is noted that since the essential stations responsible for DFs alone are not
sensitive to the AFs, these stations are also scattered with corresponding
distances, according to the

Now we take the observation network constructed by the combination of
essential stations and the scattered stations with a distance of 60 km as the newly refined observation network (see Fig. 7d) and compare it with all of the constructed ground stations by performing the assimilation runs. We find that the resultant improvements (15.02 % for AFs and 23.62 % for DFs; see above paragraph), by assimilating the newly refined station observations, can account for 97 % and 99 % of the improvements (15.48 % for AFs and 23.87 % for DFs) achieved by assimilating all of the constructed station observations for the AFs and DFs, respectively. Particularly, among the individual forecasts, 9 of the 32 AFs and 5 of the 16 DFs even show a much better forecasting skill at the forecast times in the assimilation of the newly refined observations than in that of all the constructed ground observations. It is demonstrated that assimilating the simulated observations on the refined network can result in comparative, sometimes even higher, improvements in the PM

In this section, we interpret why assimilating the cost-effective station
observations results in comparative improvements and sometimes even higher
improvements in PM

For all the AFs and DFs concerned in the study, we compare their
meteorological conditions before and after the assimilations of the
cost-effective station observations and all the constructed station
observations, respectively. We find that the assimilation, as expected,
adjusts the thermodynamical and dynamic meteorological conditions at the
initial state in the control run and forecasts the meteorological condition
closer to the truth run, which further improve the PM

For the AFs, the PM

Time series of the PM

The differences in the wind (vector; m s

The differences in the boundary layer height (contour line; m; blue line means reduction and red line means increase) and the PM

For the DFs, the mechanism is different from the AFs, where both thermodynamical and dynamical conditions have critical impacts on the PM

The differences in the ground wind (vector; m s

So far, we have verified numerically the validity of the cost-effective
ground meteorological stations' network in improving the PM

The PM

Numerically, a series of OSSEs is conducted to verify the effectiveness of the newly refined 287 station observations in terms of improving the PM

It is clear that assimilating fewer sensitive observations can lead to
better PM

In this study, we focus on the effect of surface meteorological uncertainties in the PM

Version 3.6.1 of the WRF and its adjoint model are used in this study, and both are available from

YL and DW conceived the research, designed the experiments, performed the simulations, and analyzed the results. All authors contributed to drafting the paper.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors highly appreciate the two anonymous reviewers and Jie Feng, who provided constructive comments that greatly improved the overall quality of the paper.

This research has been supported by the National Natural Science Foundation of China (grant nos. 42105061).

This paper was edited by Havala Pye and reviewed by two anonymous referees.