Development of 3D Variational Assimilation System Based on GRAPES-CUACE Adjoint Model (GRAPES-CUACE-3D-Var V1.0) and Its Application in Emission Inversion

. The adjoint method is known for its efficient calculation of sensitive information. After decades of development, assimilation technology based on adjoint method has gradually become an important tool for emission inversion. On the 10 basis of GRAPES-CUACE aerosol adjoint model, and combined with the optimization algorithm and pollutants observations, the GRAPES-CUACE 3D variational (GRAPES-CUACE-3D-Var) assimilation system was further developed, and was used in the inversion of BC emissions in Beijing-Tianjin-Hebei region. The results show that the newly constructed GRAPES-CUACE-3D-Var assimilation system is reasonable and reliable, and can be applied to the emission inversion in Beijing-Tianjin-Hebei region. Compared to the simulations using the a priori BC emissions, the model simulations driven by the a 15 posterior BC emissions in two inversion schemes are in better agreement with measurements. The correlation coefficient between the simulations and the observations is increased from 0.2 before the inversion to 0.7 and 0.64, respectively, and the NMSE is reduced from 0.38 to 0.22 and 0.24, respectively, and the NMB is decreased from 51.53% to 43.37% and 40.90%, respectively, in the two inversion schemes. The spatial distributions of the a posterior BC emissions in the two inversion schemes are consistent with the distributions of the a priori BC emissions. The high-value areas are mainly located in the 20 south of Beijing, Tianjin, central and southern Hebei, and northern Shandong. On the whole, the inversion scheme with a large observation ratio has better optimization effect. The observation information of the target time has a great influence on the a posterior BC emissions in a short period before the target time, and the influence decreases with the reverse time sequence. etc. The 50 optimized source list covers the spatio-temporal scales from the region to the world and from the monthly average to the daily average (Hakami et al., 2005; Müller and Stavrakou, 2005; Pan et al., 2007; Kurokawa et al., 2009; Henze et al., 2009; Wang et al., 2012; Mao et al., 2015; Zhang et al., 2015; Jeong and Park, 2018). Hakami et al. (2005) used the adjoint model of atmospheric chemical transport model STEM-2k1, based on aerosol observation data in Asia Pacific region, combined with optimization algorithm, to inverse artificial sources of BC aerosol in East Asia. The results show that the simulation 55 results based on posterior sources are closer to the observation data. Müller and Stavrakou(2005) optimized the global CO and NOx emissions from man-made and natural sources in 1997 based on IMAGES adjoint model, using ground and satellite observation data of CO and NO2.Kurokawa et al. (2009) developed a four-dimensional variational assimilation system RC4-NOx, which is specially used to optimize NOx emissions, and used this system to retrieve NOx emissions from eastern China in 1996, 1999 and 2002. Wang et al. (2012) combined MODIS satellite data and used GEOS-Chem adjoint 60 model to inverse the dust source flux in Taklimakan desert. Mao et al. (2015) discussed the effect of model resolution on BC inversion, and believed that the finer the model resolution, the smaller the inversion error. Zhang et al. (2015) shows that there are obvious differences in the distribution of posterior sources based on inversion of different prior source inventories. region GRAPES-CUACE-3D-Var


Introduction 25
Adjoint method is an efficient sensitivity analysis tool developed based on adjoint operator theory and numerical model.
Based on the idea of inverse simulation, by establishing the adjoint model of atmospheric chemistry model, the sensitivity of the objective function to any control variable (such as emission source) at any time and any spatial position can be obtained only by running the adjoint model once in inverse time sequence, and then various optimization control problems can be https://doi.org/10.5194/gmd-2019-313 Preprint. Discussion started: 3 February 2020 c Author(s) 2020. CC BY 4.0 License. cloud interaction, etc (Gong and Zhang, 2008;Wang et al., 2010). In addition, the secondary formation process of natural 95 sources, anthropogenic sources and aerosols has been added to the module (Gong and .

Adjoint theory and methods
Let be a continuous linear operator defined on Hilbert space, if there is a continuous linear operator * satisfying: ∀ , ∈ , ( , ) = ( , * ) (1) Then * is called the adjoint operator of (Ye and Shen, 2006). Where (. , . ) denotes inner product. If , are continuous 100 functions on domain Ω, the inner product is defined as (x, y) = ∫ Ω x • ydΩ; if , is a discrete vector: = [x 1 , x 2 , … , x N ]， = [y 1 , y 2 , … , y N ]，then there are ( , ) = ∑ x i • y i N i=1 ., When , is a vector and is a matrix independent of , , we can get ( , ) = = = ( , ), in other words, for matrix linear operators, the adjoint operator is its transposition, i.e. L * = L T ; If the component of the matrix is complex, the adjoint operator is the conjugation of its transpose (Liu, 2005) .
The atmospheric chemistry model can be divided into several small programs, and each small program is abstracted into a 105 vector function : → , which can be expressed as: Where ， are n-dimensional independent variables and m-dimensional dependent variables respectively, corresponding to input variables and output variables in the forward model. Assuming that is continuously differentiable, the tangent linear program of the original model can be expressed as: 110 Where ∇ is the Jacobian matrix of the forward model operator. According to the property of adjoint operator (formula (1)), the transform tangent linear mode (3) can be obtained: calculation process of six kinds of particles (sulfate, nitrate, sea salt, black carbon, organic carbon and sand dust), the adjoint of aerosol transmission process and the adjoint of GRAPES-Meso and CUACE interface program (Jin, 2012;Zhai, 2015;An et al., 2016). The constructed GRAPES-CUACE aerosol adjoint model has passed the correctness test (Jin, 2012;Zhai, 2015;An et al., 2016), and has been well applied in BC (Zhai, 2015;An et al., 2016) and PM2.5 source tracking (Wang et al., 2017;Zhai et al., 2018;Wang et al., 2018aWang et al., , 2018bWang et al., , 2019 in North China. 130 Fig. 1 shows the operation flow of the GRAPES-CUACE atmospheric chemistry model and its adjoint model, where J is the objective function, and different objective functions can be defined according to the concerned problems. c is the model state variable (such as BC concentration); s is the model control variable (such as emission sources, mainly including VOCs, NOx, NH3,SO2 and PPM2.5,etc.). Firstly, the atmospheric chemistry model of GRAPES-CUACE is integrated forward, and the Basic-state values of each integration step are stored. Subsequently, the gradient ∇ of the objective function with respect to 135 the state variable is calculated and used as an adjoint forcing term, the adjoint model is driven in inverse time sequence, and the corresponding ground state value information is read at each integration step, and finally the sensitivity ∇ of the objective function J with respect to the control variable s at any time and at any spatial position is obtained.

Gradient descent method
The Gradient descent method is the most classical method in the unconstrained optimization algorithm. It uses the negative 140 gradient direction as the descending direction for iterative search, which has the advantages of low storage cost, simple and easy implementation (Song et al., 2012). For unconstrained optimization problems: Where f(x) has a continuous first-order partial derivative and has a unique minimum value point. The calculation process of the gradient descent method is as follows: ① given the first guess value x0 of variable x, setting 145 tolerance ε > 0, making k = 0; ② calculate ∇ ( ), if ||∇ ( )|| < ，stop and output ；otherwise, let = −∇ ( ); ③ The step λk takes a fixed value or is obtained by one-dimensional search method, so that x k+1 = x k +λkdk, k=k+1, turn ②.

Source inversion principle based on adjoint model
The adjoint inversion method is essentially based on the Bayesian theory framework, that is, the prior distribution of 150 variables is corrected by using observation information. The objective function of source inversion is generally expressed as: Where and are the priori source and the posterior source vector defined in the n-dimensional control space , and y is the observation vector defined in the m-dimensional observation space O, and is the forward model operator (observation operator) , and B and R are a priori error covariance matrix of n×n and an observation error covariance matrix of m×m 155 respectively, and γ is a weighting factor, which is used to adjust the relative proportion of the observation term and the prior https://doi.org/10.5194/gmd-2019-313 Preprint. Discussion started: 3 February 2020 c Author(s) 2020. CC BY 4.0 License. term in the objective function so as to fully retain the observation data and effective information in the priori estimation in the posteriori estimation (Henze et al., 2009;Cao et al., 2018).
The gradient of the objective function with respect to the variable is: Where T is the adjoint of the forward model. Based on the gradient calculated by the adjoint model, combined with the optimization algorithm, the gradient is used to iteratively update , so that the objective function ( ) takes a minimum value, and at this time is the reasonable source distribution.

Data
Adequate observation data is the basis for pollution source inversion, and better inversion results can be obtained by using

Construction of 3D variation assimilation system based on GRAPES-CUACE adjoint model
Based on the GRAPES-CUACE atmospheric chemistry model and its adjoint model, combined with gradient descent method and observation data, the GRAPES-CUACE-3D-Var assimilation system ( Fig. 3) was constructed, mainly including 175 GRAPES-CUACE atmospheric chemical model calculation and ground state value storage, observation data and adjoint forcing term processing, adjoint model calculation, gradient extraction, target function calculation and model parameter optimization iteration, etc. Taking the source inversion problem as an example, it is expected to minimize the difference between the simulated concentration and the observed concentration at 18 stations in Beijing-Tianjin-Hebei region (Fig. 2).
Firstly, the emission source parameter is defined as independent variable x, and the objective function is established; 180 Operating the GRAPES-CUACE atmospheric chemical model to tp time and storing the ground state value of each integration step; Taking the difference between the simulated and observed values of 18 stations at tp time as the forcing term, the adjoint model is driven in inverse time sequence, and the sensitivity of the objective function to the emission source parameters at each time in [tp,t0] time period is obtained. Updating emission source parameters by using an optimization module; Cycle the GRAPES-CUACE-3D-Var assimilation system until the objective function achieves a minimum value, i.e. 185 a more reasonable posterior source distribution is obtained.

Test design
The simulation scope of this study is in the central and eastern region of China (105°E-125°E, 32.25°N-43.25°N) (Fig. 2), 190 and the horizontal grid resolution is 0.5°×0.5°, including 41×23 grids, vertically divided into 31 layers. The integration step is 300s, and the meteorological field uses FNL data at 6h intervals. The emission sources uses the INTEX-B2006 list of 0.5°×0.5° (Zhang et al., 2009) Due to the large difference between the prior sources and the emissions in 2015 (Zheng et al., 2018), this study mainly considers to fully retain the effective information of observation data in the posterior estimation, and defines the objective function as follows: Among them, C sim,i and C obs,i are respectively the simulated concentration and observed concentration of BC at each station at the target time (07:00 on December 1, 2015). It is difficult to quantify the covariance matrix R of observation errors, which includes observation instrument errors, physical and chemical process errors involved in numerical model simulation and simulation accuracy errors (Cao et al., 2018). Considering that the calculation of BC concentration by proportional method may cause large errors, so the observation error covariance matrix R takes 500%. 210 In the source inversion problem, directly taking the emission source intensity as the inverse evolution quantity may lead to negative emission source intensity, which is inconsistent with the actual situation. Therefore, the emission source factor is usually taken as the independent variable for inversion (Mao et al., 2015). Henze et al. (2009) andJiang et al. (2015) used the logarithmic method to define the emission source factor x: Where s b is the a priori source emission intensity and s is the source variable that needs to be inverted. In fact, Equation (9) contains the influence of a priori source information on the objective function. In order to compare the influence of the proportion of prior information in (9)  0.1ln (s/s b ) and x = 0.01ln (s/s b ), respectively recorded as C_10 and C_100. In the C_10 and C_100 tests, the gradient of the objective function J with respect to the emission factor x is: 220 Among them, ∇ is obtained by the adjoint model calculation. Comparing equations (10) and (11), it can be seen that the proportion of prior information in the C_100 test is relatively large. In contrast, the C_10 test more fully retains the effective information in the observation data. 225 According to the above test design, the GRAPES-CUACE-3D-Var assimilation system (Fig. 3) is driven by a priori emission, and the objective function is reduced step by step through cyclic iteration until the difference between the objective function of the k+1 iteration and the k iteration is less than 1% to reach the convergence condition and the iteration is stopped. The emission adjustment factor x k+1 of the k+1 iteration process is selected as the optimal estimation, and the BC posterior source strength is calculated by using it. 230

Adjoint sensitivity analysis
The sensitivity of the objective function to BC priori source at any time in the inversion period can be obtained by running the GRAPES-CUACE-3D-Var assimilation system. Figure 4 shows the hourly sensitivity distribution from 20: 00 November 30 to 07:00 December 1, 2015. The sensitivity in the southern part of Beijing is negative, indicating that increasing the BC emission source intensity in this area will reduce the difference between the simulated concentration and the observed 235 concentration. The sensitivity of Tianjin, southern Hebei and northwestern Shandong is positive, indicating that reducing BC emission source intensity in these areas can improve the consistency of simulation and observation.
At the target time (07:00 on December 1st), the sensitivity value (absolute value) in the inversion area is relatively large, and the sensitivity value (absolute value) gradually decreases along with the inverse time sequence. At the same time, the sensitivity coverage continues to expand to 11 hours before the target time (20:00 on November 30), and the sensitivity 240 distribution has been extended to Shanxi, Inner Mongolia and Shandong-Henan borders outside the inversion area. This phenomenon shows that in a short period before the target time (December 1, 05:00-07:00), BC emission source intensity in the inversion area has the greatest influence on the target function, and its influence gradually decreases with the inverse time sequence, while BC emission source intensity outside the inversion area starts to have some influence on the target function. 245

BC concentration comparison before and after inversion
The objective function values of the C_10 and C_100 tests decreased with the increase of iteration times, and the objective function values of the two tests decreased from the initial 2288.5 (the 0th) to 1296.0 (the 18th) and 1295.5 (the 19th) ( fig. 5a)  Judging from the difference, the posterior source intensity of BC in Tianjin, central and southern Hebei and northern Shandong is lower than the prior source intensity ( fig. 7c,f), which is consistent with the actual situation, that is, the BC emissions of each province show a downward trend year by year (Zheng et al., 2018). However, the BC posterior source 270 intensity in southern Beijing has increased compared with the prior source intensity, which is inconsistent with the actual situation (Zheng et al., 2018). Because Beijing area is in the process of high PM2.5 concentration at the target time (07:00 on December 1, 2015). Zhang et al. (2017) showed that the BC ratio decreased with the increase of PM2.5 concentration.

Comparison of BC emission sources before and after inversion
Therefore, the average monthly ratio of BC/PM2.5 used in this research will lead to the "observed concentration" of BC at site 5 (Figure 2) being too high (relatively, the simulated value is low), thus causing the emission of posterior sources in 275 Beijing area to increase. Therefore, this phenomenon reflects that the source inversion test based on the GRAPES-CUACE-3D-Var assimilation system can make full use of the observation information to correct the prior source distribution. In addition, the posterior sources retrieved by the C_10 and C_100 tests are slightly different. Compared with the C_10 inversion source, the C_100 test has a low inversion value for the BC source intensity in southern Beijing (Fig. 7c, f). This is https://doi.org/10.5194/gmd-2019-313 Preprint. Discussion started: 3 February 2020 c Author(s) 2020. CC BY 4.0 License.
because the proportion of observations in the C_10 test is larger than that of the C_100 test (Formula (10) and (11)), and the C_10 test makes full use of the BC observation concentration of the site 5 at the target to correct the BC prior source in Beijing. When the difference between the simulated concentration and the observed concentration of BC at Site 5 is reduced, 285 the BC source intensity in southern Beijing will naturally increase. Figure 8 shows the timely varying difference distribution of the C_10 inversion source and the C_100 inversion source. It can be seen that in the shorter period before the target time (03:00-07:00 on December 1st), the difference of BC source intensity in the south of Beijing is large, and the difference is small before that. This phenomenon indicates that the observation data of the target moment has a great influence on the posterior source in the short period before the target time, and its influence gradually decreases with the inverse time 290 sequence. The results show that: The newly constructed GRAPES-CUACE-3D-Var assimilation system is reasonable and reliable, and can be initially applied to BC source inversion in Beijing-Tianjin-Hebei region. Both C_10 and C_100 experimental inversions of BC posterior source simulation results are better than BC prior source simulation results. The correlation coefficient between BC simulated concentration and observed concentration is increased to 0.7 and 0.64 respectively from 0.2 before inversion, NMSE is decreased to 0.22 and 0.24 respectively from 0.38 before inversion, NMB is decreased to 305 43.37% and 40.90% from 51.53%. The BC posterior sources retrieved by C_10 and C_100 tests are consistent with the distribution of prior sources. The high value areas are mainly located in the south of Beijing, Tianjin, central and southern Hebei and northern Shandong. On the whole, the observation proportion in the C_10 test is larger, and its optimization effect is better than that in the C_100 test. The observation information at the target time has a great influence on the posterior source in a short period before the target time, and its influence gradually decreases with the inverse time sequence. The 310 research ideas and experimental results in this paper have laid a foundation for the development of a complete GRAPES-CUACE four-dimensional variational assimilation system. However, although BC/PM2.5 ratio method can be used in this study to obtain BC concentration at various stations in Beijing-Tianjin-Hebei region, there is still a certain difference between the actual BC observation concentration and the inversion result, resulting in a certain deviation from the real https://doi.org/10.5194/gmd-2019-313 Preprint. Discussion started: 3 February 2020 c Author(s) 2020. CC BY 4.0 License.