Towards the closure of momentum budget analyses in the WRF (v3.8.1) model

. Budget analysis of a tendency equation is widely utilized in numerical studies to quantify different physical processes in a simulated system. While such analysis is often post-processed when the output is made available, it is well acknowledged that the closure of a budget is difﬁcult 5 to achieve without temporal and/or spatial averaging. Nevertheless, the development of errors in such calculations has not been systematically investigated. In this study, an inline budget retrieval method is ﬁrst developed in the WRF v3.8.1 model and tested on a 2D idealized slantwise convection case 10 with a focus on the momentum equations. This method extracts all the budget terms following the model solver, which gives a high accuracy, with a residual term always less than 0.1 % of the tendency term. Then, taking the inline values as truth, several ofﬂine budget analyses with different com-15 monly used simpliﬁcations are performed to investigate how they may affect the accuracy of the estimation of individual terms and the resultant residual. These assumptions include using a lower-order advection operator than the one used in the model, neglecting grid staggering, or following a mathe-20 matically equivalent but transformed format of the governing equations. Errors in these post-processed analyses are found mostly over the area where the dynamics are the most active, thus impairing the subsequent physical interpretation. A maximum 99th percentile residual can reach > 50 % of 25

to achieve without temporal and/or spatial averaging. Nevertheless, the development of errors in such calculations has not been systematically investigated. In this study, an inline budget retrieval method is first developed in the WRF v3.8.1 model and tested on a 2D idealized slantwise convection case 10 with a focus on the momentum equations. This method extracts all the budget terms following the model solver, which gives a high accuracy, with a residual term always less than 0.1 % of the tendency term. Then, taking the inline values as truth, several offline budget analyses with different com-15 monly used simplifications are performed to investigate how they may affect the accuracy of the estimation of individual terms and the resultant residual. These assumptions include using a lower-order advection operator than the one used in the model, neglecting grid staggering, or following a mathe-20 matically equivalent but transformed format of the governing equations. Errors in these post-processed analyses are found mostly over the area where the dynamics are the most active, thus impairing the subsequent physical interpretation. A maximum 99th percentile residual can reach > 50 % of the 25 concurrent tendency term, indicating the danger of neglecting the residual term as done in many budget studies. This work provides general guidance not only for budget diagnoses with the WRF model but also for minimizing the errors in post-processed budget calculations. 30

Introduction
The atmosphere is a complex system with different scales of motion. Its dynamics are governed by a set of fluid equations based on the fundamental laws of physics. Although the equation set cannot be solved analytically, numerical mod-35 els can be used to simulate the observed weather and climate systems to improve our understanding of the atmosphere. Due to the complexity and nonlinearity of the numerical models, budget analysis is often employed to interpret the results by quantifying the contribution of each term 40 (i.e., physical process) in a tendency equation that governs the evolution of a certain quantity in the simulated system. The accuracy of a given budget analysis can be estimated from the residual term, defined as the difference between the tendency term on the left-hand side (lhs) of the equation and 45 the summation of all the forcing terms on its right-hand side (rhs). Budget analysis has been performed on diverse properties (e.g., momentum, temperature, water vapor, vorticity) of many systems on various scales, including the Madden-Julian oscillation (MJO; e.g., Kiranmayi and Maloney, 2011;50 Andersen and Kuang, 2012), tropical cyclones (e.g., Zhang et al., 2000;Rios-Berrios et al., 2016;Huang et al., 2018), squall lines (e.g., Sanders and Emanuel, 1977;Gallus and Johnson, 1992;Trier et al., 1998), supercell thunderstorms (e.g., Lilly and Jewett, 1990), and so on. 55 Despite the popularity of the budget analysis, it is generally acknowledged that, in model post-processing analysis, obtaining a closed budget with a negligible residual is difficult (e.g., Kanamitsu and Saha, 1996) and has been accomplished mostly in time-or domain-averaged budget calcula-60 tions (e.g., Lilly and Jewett, 1990;Balasubramanian and Yau, 2 T.-C. Chen et al.: Momentum budget analyses in the WRF (v3.8.1) model residual term that contains non-explicitly diagnosed physics can be larger than the tendency term (e.g., Liu et al., 2016), and many studies simply do not display the residual, making the proper interpretation of the budget analysis difficult.
The "residual analysis method" is sometimes utilized to 5 obtain an indirect estimation of the physical processes that are hard to diagnose or are unresolved in a set of analysis or observational data. In such cases, a non-negligible residual is sometimes used to gain insight into such processes. However, as just discussed, the residual term also contains 10 the inaccuracies associated with the calculations within the budget analysis (e.g., Kornegay and Vincent, 1976;Abarca and Montgomery, 2013). It is thus unclear whether the unresolved physics in such data sets do indeed comprise the main component of the residual without considering the con-15 tributions of other sources of errors in the budget calculation (Kuo and Anthes, 1984). Whereas it is almost impossible to separate the subgrid-scale, unresolved processes from other errors in reanalysis or observational data (e.g., Hodur and Fein, 1977;Lee, 1984), the focus of this study is on numer- 20 ical model data where the local tendency and all the associated resolved and parameterized physics can be obtained from the model. Thus, the residual term in this study specifically refers to errors in the budget calculation. To reduce the residual, an inline budget analysis that ex-25 tracts all the terms of a prognostic equation directly from the model during its integration is generally the most accurate. However, the procedure has been reported only in a few studies (e.g., Zhang et al., 2000;Lehner, 2012;Moisseeva, 2014;Moisseeva and Steyn, 2014;Potter et al., 2018; see 30 Appendix A for a summary and comparison among these works). Most other studies still conduct the offline or postprocessing budget analysis when the output is made available after the model integration. Some specific suggestions have been given in the past regarding how to reduce the er-35 ror of post-processed budget analysis. For example, Lilly and Jewett (1990) emphasized the importance of evaluating terms using the same differencing scheme, grid stretching, and grid staggering as that used in the simulation model. However, it is uncertain whether these rules have been widely followed, 40 and how much of a reduction in residual can be obtained with this approach. In some post-processed budget analyses, transformed equations with different assumptions from those in the model are used and naturally lead to errors in the budget results. On the other hand, even when the same form of the equations is followed, errors can still arise from multiple sources during the post-processing. Some errors are inherent in the time discretization scheme of the model, some are traced to the numerical methods in solving the temporal or spatial deriva-50 tives with finite differencing (e.g., Kuo and Anthes, 1984), and others might emerge during the interpolation or extrapolation from model grids to analysis grids (e.g., Lilly and Jewett, 1990). While the tendency term is often the result of a few cancelations among competing forcing terms, the seemingly 55 non-dominant terms may be as important as the large forcing terms in determining the sign and the value of the tendency. Thus, an incorrect estimation of even a small term may result in a residual with magnitude comparable to the tendency term, hindering the subsequent physical interpretation. 60 A few models, such as the Cloud Model 1 (CM1; Bryan and Fritsch, 2002) and the High Resolution Limited Area Model (HIRLAM; Undén et al., 2002), include inline budget diagnoses that users can choose to include in the model output. However, many other commonly used models 65 (e.g., Fifth-Generation NCAR/Penn State Mesoscale Model (MM5; Grell et al., 1994), Weather Research and Forecasting Model (WRF; Skamarock et al., 2008), the Advanced Regional Prediction System (ARPS; Xue et al., 2000Xue et al., , 2001, and the Regional Atmospheric Modeling System (RAMS; 70 Pielke et al., 1992)) do not have this capability. In this study, we develop an inline momentum budget retrieval tool in the Advanced Research WRF model, one of the most widely used numerical weather prediction models. During the period 2011-2015, there were on average 510 peer-reviewed 75 journal publications involving WRF per year (Powers et al., 2017). Given the widespread use of WRF for both real-case and idealized modeling, such a budget tool may prove useful in numerous applications. In our budget diagnosis, each contributing term is extracted during the model integration 80 and stored as a standard output. In so doing, we essentially solve the prognostic variables as done in the model so that the two sides of the tendency equation are always in balance regardless of the output time interval. By taking the results from the inline budget analysis as truth, we then perform 85 several different post-processing budget analyses with commonly made simplifications or a different format of equation. Comparisons between the post-processed budgets and the inline/true values are made to investigate the potentially large errors in each forcing term and the resultant residuals. The WRF configuration used in this study is a twodimensional [(y, z); no variation in the x direction], fully compressible, non-hydrostatic, and idealized version of the 95 Advanced Research WRF model, version 3.8.1 (Skamarock et al., 2008). Here we briefly revisit the parts that are relevant to the momentum budget analysis. The governing equations in the WRF model are cast on a terrain-following dryhydrostatic pressure coordinate. This vertical coordinate, η, 100 is defined as where p dh is the hydrostatic pressure of the dry air and µ d represents the mass of the dry air per unit area in the column; µ d = p dh_sfc − p dh_top , where p dh_sfc and p dh_top indicate the 105 values of p dh at the surface and the top of the dry atmosphere, respectively.
To ensure conservation properties, the model equations are formulated in flux form, with the prognostic variables coupled with µ d . The flux-form momentum components are de-5 fined as where u, v, and w are the two horizontal and vertical velocities, respectively. Note that the dry-mass-coupled velocities (U , V , W ) on coordinates (x, y, z) have units of pascal meter per second, and the dry-mass-coupled vertical velocity on η coordinate, , has a unit of pascal per second. For the idealized 2D case on an f plane as in this study, the momentum equations in the WRF model are written as where is the flux-form advection, p is the full pressure with in-20 clusion of vapor, φ is the geopotential, f is the Coriolis parameter, r e is the mean earth radius, and α and α d are the full and dry-air specific volume, respectively. In our selected microphysics scheme (Thompson et al., 2008), six hydrometeors are included, and thus α = 25 α d 1 + q v + q c + q r + q i + q s + q g −1 , where q v , q c , q r , q i , q s and q g are the mixing ratios for water vapor, cloud, rain, ice, snow, and graupel, respectively. The rhs forcing terms for the V tendency include the flux-form advection (ADV), horizontal pressure gradient force (PGF), Coriolis force (COR), 30 vertical (earth-surface) curvature (CUV), and the remaining physics (P V ). For the W tendency, the rhs forcings contain the flux-form advection (ADV), net force between the vertical pressure gradient and buoyancy (PGFBUOY), curvature effect (CUV), and the remaining physics (P W ). The remain-35 ing physics may include diffusion, vertical velocity damping processes, and other parameterized physics, depending on the model setup. Note that for closing the budget analysis, all the known physics processes that come into play should be explicitly written in the equation and be diagnosed or di-40 rectly retrieved from the model. The residual (res) is added on the last rhs term of Eqs. (1) and (2) to represent the imbalance between the two sides of the equation during budget analysis, but it is not part of the original equations solved in the model. 45 To develop an inline budget retrieval tool, it is important to understand how these prognostic variables are advanced in the WRF model. Governing equations are first recast to perturbation forms with respect to a dry hydrostatically balanced reference state that is a function of height only (de-50 fined at initialization) to reduce truncation errors and machine rounding errors. Specifically, variables of p, φ, α d , and µ d are separated into reference and perturbation components, e.g., p(x, y, η, t) = p(z) + p (x, y, η, t). The introduction of these perturbation variables only changes the expressions for 55 the rhs terms PGF and PGFBUOY in Eqs. (1) and (2), which will not be shown here for simplicity. Readers can refer to Skamarock et al. (2008, chap. 2.5) for more details.
Based on Skamarock et al. (2008), Fig. 1 summarizes the WRF integration strategy. The integration is wrapped by a 60 third-order Runge-Kutta (RK3) scheme, in which the prognostic variables (generalized as here) are advanced from t to t + t given their corresponding partial differential equations, ∂ ∂t = F ( ), following a three-step strategy: 65 where t is the model integration time step and F , the largestep forcing, represents the summation of all the rhs terms of Eqs.
(1) and (2) excluding the residual. Although the parameterized forcings stay fixed from step one to three as most of the parameterization schemes are called only once at the first 70 RK3 step, the rest of the non-parameterized forcings and thus the total F are changed with the updated * and * * at the second and third RK3 step. Within each RK3 step, a subset of time integration with a relatively smaller time step is embed-

4
T.-C. Chen et al.: Momentum budget analyses in the WRF (v3.8.1) model ded to accommodate high-frequency modes for numerical stability (Wicker and Skamarock, 2002;Klemp et al., 2007;Skamarock et al., 2008). A maximum number of small steps in one model integration step can be specified by the user. To improve accuracy in the temporal solver, the variables be-5 ing advanced in this small-step integration are the temporal perturbation fields, defined by the deviation from their more recent RK3 predictors: = − t * , where t * = t , * and * * for the first, second, and third RK3 step, respectively. Thus, the perturbation momentum equations to be 10 solved are driven by the large-step forcings and the smallstep (sometimes referred as "acoustic-step" although it deals with both acoustic and gravity wave modes (e.g., Klemp et al., 2007;Skamarock et al., 2008)) corrections: where τ indicates the time in the small-step integration, and C as well as c 2 s are sound-wave-related terms (Skamarock et al., 2008, chap. 3.1.2). Here we leave out the details re-20 garding the small-step terms that are irrelevant to the inline budget retrieval. Note that the overbar in Eq. (6) indicates a forward-in-time averaging operator for the small-step modes to damp instabilities associated with vertically propagating sound waves (see Eq. 3.19 in Skamarock et al., 2008). Equa-25 tions (5) and (6) are the ones used to integrate the prognostic momentum fields in the WRF model. For each RK3 step, after the total large-step forcing F is determined, V and W are defined and advanced within the small-step scheme by a loop that adds F multiplied by a time interval, τ (varies 30 with different RK3 steps; see Fig. 1), and the small-step forcing (ACOUS). After the small-step integration loop ends, V and W are then recovered from their temporal perturbation fields and moved forward to the next RK3 step. While it is not relevant to the momentum equations discussed here, 35 for some variables directly contributed by the microphysics scheme, the associated contribution should be considered after the RK3 integration loop ends as the microphysics are integrated externally using an additive time splitting ( Fig. 1) (Skamarock et al., 2008, chap. 3.1.4).

Experimental setup
The main discussion of this study will focus on a 2D (y, z) idealized simulation of slantwise convection. This process releases conditional symmetric instability (CSI), which can be idealized by assuming no flow variations along the 45 direction of thermal winds (denoted as the x direction in our setup; Markowski and Richardson, 2010, chap. 3.4). The initial field consists of a thermally balanced uniform westerly wind shear in x. This baroclinic environment contains no conditional (gravitational) instability, no inertial stability, 50 and no dry symmetric instability but does contain some CSI. A two-dimensional bubble containing perturbations of potential temperature and zonal wind is added to initiate convection and a slanted secondary circulation (v, w). See Appendix B for more details about the experimental setup. The 55 domain size is 1600 and 16 km in the y and z direction, respectively, with a horizontal grid length of 10 km and 128 vertical layers. The model integration time step is 1 min. For simplicity, the only parameterization used is the Thompson microphysics scheme (Thompson et al., 2008). In addition, 60 the upper-level implicit Rayleigh vertical velocity damping (damp_opt = 3) is also activated (Skarmarock et al., 2008, chap. 4.4.2). The former does not directly contribute to the momentum fields (although it can affect the momentum field indirectly through density and pressure variations), and the 65 latter, contained in P W in Eqs.
(2) and (6), affects only the W momentum budget. No subgrid turbulence scheme is used (diff_opt = 0). The WRF model offers different orders of advection operators, and the default third-and fifth-order operators are selected for the vertical and horizontal in this case, 70 respectively. Most of the subsequent analyses and discussion are based on this slantwise convection case with a 10 km grid Figure 1. The time integration strategy for advancing a state variable (generalized as ) in the WRF model based on Skamarock et al. (2008). In this given example, four acoustic steps are specified for one integration time.
length unless specified otherwise. Two other simulations, one of which uses the same setup but with an increased horizontal resolution of 2 km, will be discussed in Sect. 4. Figure 2 shows the 48 h evolution of the 99th percentiles of v and w (hereafter the lowercase indicates that the cal-5 culation uses the uncoupled momentum field) and their tendencies. For the 10 km case, the horizontal velocity reaches its peak in about 20 h, a few hours after the vertical velocity reaches its maximum, and then undergoes a weakening.
Both v and w tendencies are maximized at around 15 h. To understand the evolution of the associated flow dynamics, a momentum budget analysis serves as a natural choice. However, as a preliminary step prior to carrying out such analysis, we focus only on the technical discussion of the budget analysis methodology. The physical interpretation of the motion 15 is beyond the current scope and will be presented in a subsequent paper.

Inline momentum budget analysis
For the inline budget analysis, all the terms are retrieved di-20 rectly from the model for all the integration time steps, and therefore they represent the "instantaneous" terms that act over the specified short integration time window. For the large-step forcing, the WRF model accumulates all forcing terms at the beginning of each RK3 step. To separate them, 25 we simply take the difference before and after WRF calls the subroutine for each large-step forcing, store their values sep-

6
T.-C. Chen et al.: Momentum budget analyses in the WRF (v3.8.1) model Figure 2. Evolutions of the 99 percentiles of (a) horizontal velocity, v (black; axis on the left), and vertical velocity, w (gray; axis on the right) in the simulation of slantwise convection. Panel (b) is the same as (a) but for their tendencies (black and gray lines for v and w tendencies, respectively). Solid lines are for the 10 km simulation, while the dotted ones are for the 2 km case.
arately, and output only the values at the third RK3 step (the total forcing is F ( * * ) t as shown in Fig. 1). As for the contribution of the small-step modes, they are obtained by accumulating over all the small steps in the third RK3 step (ACOUS sum shown in Fig. 1). It is worth noting that Eq. (6) 5 is a vertically implicit equation that couples with the geopotential tendency equation (Skarmarock et al., 2008;Klemp et al., 2007). A tri-diagonal equation for the vector W (involving three grid points in the vertical direction) is thus solved (Satoh, 2002). This means that W (the scalar at a given grid point) is not advanced by linear additions in the small-step or acoustic scheme. To ensure the closure of the inline retrieval budget, we simply take the total changes that are contributed by the implicit solver in the acoustic scheme as small-step modes of W in the third RK3 step. Note that this way does 15 not violate the original W equation in Eq. (6). The contribution from these accumulated small-step modes in the V and W tendency budgets are combined with their large-step PGF and PGFBUOY, respectively, as they share the same mathematical expressions. Finally, we add the inline calculation for 20 the tendency term outside of the RK3 integration loop, after the microphysics scheme: where t is the model integration time step and represents V or W (coupled momentum; hereafter the momen-25 tum tendency with capital V or W refers to the lhs term derived for the budget analysis). The values of at times t and t + t, the latter denoted by superscripts, are termed the current and predicted states, respectively. Note that while variables of momentum tendencies (specifically named "ru_tend, 30 rv_tend and rw_tend") can be directly outputted from the WRF model by modifying the registry file, these variables do not necessarily represent the actual momentum changes that consider all the physical (e.g., microphysics, small-step modes) and non-physical processes (e.g., damping) but only 35 the summation of all the large-step forcings.
Figures 3 and 4 present the results of the inline budget analysis for horizontal momentum and vertical momentum, respectively, at three selected times (6, 12, and 16 h). To demonstrate the momentum changes in a common physical 40 unit (velocities; meter per second), every term of the fluxform budget equation shown in this paper is divided by the dry-air mass, µ d t+ t (so that, for example, the V tendency has a unit of meter per second squared). The magnitude of the V tendency intensifies during this period with local max-45 ima on the order of 10 −4 to 10 −3 m s −2 (Fig. 3). Two forcing terms, PGF and COR, are a few times larger than the ADV term but generally offset each other, making the ADV term of comparable importance in determining the tendency. The CUV term for V tendency is generally small and thus 50 not shown in Fig. 3. The residual, obtained from Eq. (1) with P V equal to 0, is always smaller than 10 −7 m s −2 during the entire 48 h simulation (not shown). To understand how the peak error evolves with time and to avoid reaching misleading conclusions based on one or more outlying 55 values, the evolution of the 99th percentile magnitude of the residual term is shown. Figure 5 shows that it reaches a value of about 7 × 10 −9 m s −2 at around 15 h. Recall that the 99th percentile magnitude of the simulated v tendency has a peak of 7 × 10 −4 m s −2 (Fig. 2b). Thus, the relative 60 magnitude of the 99th percentile residual is about 0.001 % of the 99th percentile tendency term during the peak intensifying stage. Compared to the V tendency, the W tendency exhibits narrower features in the horizontal direction ( Fig. 4) with an overall smaller magnitude in every term. The two 65 largest forcings, PGFBUOY and CUV, usually have opposite signs, so their combined effect is on the same order as the ADV and the W tendency term. While the contribution from the upper-layer vertical velocity damping is not shown in Fig. 4 as it is generally small in the low layers TS1 , it is 70 included as part of the rhs (P W ) of Eq. (2) when calculating the residual for the inline budget analysis. The residual in the inline W budget is generally 4 orders of magnitude smaller than its tendency term. The 99th percentile residual for W budget is about 2 × 10 −10 m s −2 , around 0.0003 % of 75 the 99th percentile w tendency during the peak intensifying stage of the convection (not shown).

.1 Key features and methodologies
In contrast to extracting terms directly from the model during its integration, most of the studies in which the momentum budget analysis is conducted use the model output files after 5 the completion of the integration. Note that since the suboutput time-step information is not available between successive outputs, only the large-step forcing terms can be estimated in these post-processed budget analyses. Generally, the neglect of the acoustic or small-step modes is expected 10 to have little impact on the results as the high-frequency modes are often considered meteorologically insignificant. However, it is mentioned in Klemp et al. (2007) and Skamarock et al. (2008) that the WRF small-step integration scheme includes not only the acoustic-wave but also some 15 gravity-wave modes, which may not be insignificant. These gravity-wave modes form during the small-step integration due to the designated terms that are required for acousticwave propagation and "Consequently, in this vertical coordinate (i.e., terrain-following hydrostatic pressure coordinate), 20 the terms governing the acoustic and gravity wave modes are intermingled to the extent that it does not appear feasible to evaluate any of the gravity wave terms on the large time steps, even if one desired to do so" (Klemp et al., 2007).
Most of the studies did not reveal the complete details 25 about how their analysis was done, so we cannot presume their methodologies and the possible errors. However, a few simplifications commonly made in the post-processed budget analyses may introduce errors that result in deviations from the simulated results and thus a significant residual. Below 30 we revisit the relevant features of the WRF model that should be considered and discuss how they might affect the postprocessed budget if they are ignored. Then, the results are shown for different post-processed budget analyses with different simplifications (Table 1). The aim herein is to identify 35 these potential errors hidden in the budget calculation and show how severely they affect the resulting interpretation.

(a) Diagnosed tendency
In a post-processed budget analysis, the tendency term of a given variable is approximated by the difference between the 40 value of this variable at two successive output times divided by the output time interval. Thus, the accuracy may be sensitive to the output time interval. The value at the predicted state has a form of 45 If the output interval is longer than the model integration time step, the diagnosed tendency would deviate from the model prediction of the instantaneous tendency. To increase the ac-curacy, the output time interval t output needs to be similar to the integration time step t.

50
(b) Spatial discretization on the C staggered grid For computational efficiency and accuracy, WRF utilizes a C-grid staggering system (Arakawa and Lamb, 1977). This staggering system is pertinent to the numerical solution for spatial derivatives. For most of the spatial derivatives other 55 than advection (e.g., the pressure gradient force), the secondorder finite difference operator is used in the WRF model. For example, the y derivative of variable is calculated using the discrete operator: The index (i, j, k) corresponds to a location with (x, y, η) = (i x, j y, k η), where x, y and η are the grid lengths in the two horizontal and vertical directions (can be vertically stretched), respectively. The same expression applies for the x or the η derivatives. Grid staggering implies that different 65 variables may be located on different grids, i.e., shifted by a half-grid point from the others as illustrated in Fig. 6. Depending on what variable the spatial derivatives are intended for, Eq. (9) should be carried out on the corresponding grid, which is not necessarily the same as the grid. For exam-70 ple, for the V tendency, all the associated forcing terms involving the spatial derivatives should be performed on the V grid. More specifically, to calculate the PGF term for the V tendency equation, the term ∂p ∂y and the term ∂p ∂η in Eq. (1) should be calculated on the V grid but not the pressure grid 75 (p grid). Applying Eq. (9) for ∂p ∂y , the V grid with location indices of (i, j − 1 2 , k) and (i, j + 1 2 , k) falls exactly on the p grid and hence no interpolation is required (red arrows in Fig. 6a). However, for ∂p ∂η , the pressures on the V grid with indices of (i, j, k − 1 2 ) and (i, j, k + 1 2 ) must be obtained (red arrows in 80 Fig. 6b) through linear interpolation using their surrounding closest four pressure values, e.g., which is weighted by the irregular (stretched) vertical gridlengths (Fig. 6b).

85
If the C-grid staggering is not considered during the postprocessing analysis, i.e., all the variables have been interpolated on the universal grids before carrying out the budget calculation, in addition to the potential errors brought on by the interpolation method, the term ∂p ∂y , for example, would es-90 sentially involve pressure differences over a larger grid inter-  val of 2× y instead of y, with larger associated truncation errors.

(c) Advection operators
For advection, higher-order operators for finite differencing are provided as the default WRF setup. Taking the y compo- The red arrows indicate the grids that would be used to calculate the second-order spatial derivative term for the V momentum at the V grid (i, j , k).

present simulation, it is written as
where V and v are the mass-coupled and mass-uncoupled velocities, respectively: The odd-order advection operators include a spatially centered even-order operator and an upwind diffusion term. A detailed discussion on the advection scheme in the WRF 10 model with different-order operators can be found in Wicker and Skamarock (2002) and Skamarock et al. (2008). Simplifying the advection estimation using an operator with an order that differs from the numerical setup would contribute to errors in the ADV estimation.

(d) Forward or backward Euler method
Conceptually, the WRF model can be considered more of a forward scheme, i.e., using the known variables from the current state to calculate the forcing and then advancing the variables forward until reaching the prediction time. How-20 ever, there are a few implicit components during the integration. For example, as discussed in Sect. 2.1, the large-step forcings are updated using a predictor-corrector method in the second and third RK3 steps. In addition, the W equation is coupled with the geopotential tendency equation and 25 includes a forward-in-time weighting that utilizes predicted states of the geopotential and temperature in solving the W (Eq. 3.11, 3.12, and 3.19 in Skamarock et al., 2008). In numerical analysis for solving ordinary differential equations, the (explicit) forward Euler method approximates 30 the change of a system from t to t + t using the current states (t), while the (implicit) backward Euler method finds the solution using the predicted states (t + t): ∂ ∂t t+ t ≈ F t forward Euler method, Consistent with this concept, the rhs forcing terms of a budget equation can be estimated using two different instantaneous states in analogous ways. However, we emphasize that the post-processed budget analysis does not solve the tendency equation per se but only diagnoses the relation-40 ship between the two sides of the equation. Note that for post-processing analyses, the availability of the data depends on the output time interval ( t output ), which is often much larger than the integration time step ( t). Thus, for the tendency at a given time t + t, when applying the forward Eu-45 ler method to estimate the associated rhs forcings, the "current states" one can use are the most recent prior output at t + t − t output (see Fig. 7): ∂ ∂t t+ t ≈ F t+ t− t output forward Euler method for post-processing.
If t output is the same as t, Eq. (14) reverts to Eq. (12). If 50 t output is much larger than t, the backward Euler method using predicted states at t + t may better estimate the true model forcing terms as they are calculated using variables at a closer time to the real integration window in the model (Fig. 7).
The above two diagnostic methods estimate the forcing terms using instantaneous states. However, as mentioned in Sect. 3.2.1(a), the diagnosed lhs tendency depends on two 5 successive model output times. Thus, an average between forcings diagnosed explicitly and implicitly are often considered. For a post-processed analysis, this translates into estimating the forcings using both predicted states and the most recent prior available current states: first multiplied by a factor of 1 µ d and V is rewritten as µ d v: Then, by adding the mass continuity equation in WRF (mul-20 tiplied by a factor of v µ d ): to the rhs of Eq. (16), we obtain Moving the first term on the rhs of Eq. (17) to the lhs, the sec-25 ond rhs term can be combined with the flux-form advection using the vector identity ∇ · (µ d v) = µ d (∇ · v) + v · (∇µ d ).

12
T.  Table 1 summarizes all the post-processed budget analyses 5 tested in this study. In the present section, we first present the results one by one, and then a qualitative intercomparison among them and the inline retrieval method is discussed. The first post-processed method (POST10min-E) for V budget follows all the approaches in the model as closely as possible using the 10 min output data. The flux-form equation, C staggering grids, and the same orders of advection operators as the experimental setup are used. The diagnosis of the large-step forcing is applied directly to the model outputs on η levels using the explicit or forward Euler method 15 as shown in Eq. (14). The diagnosed forcing terms are compared with their corresponding true values from the inline retrieval (Fig. 8). Errors smaller than, but on the same order of 10 −4 m s −2 as the V tendency, are observed in all terms including the diagnosed tendency term. These errors grow in 20 magnitude and areal coverage with the growth of the disturbance. Aside from COR, the absolute errors in the tendency, ADV and PGF can exceed 6 × 10 −4 m s −2 , the former two of which are more than 50 % of the magnitude of their true (instantaneous) values locally.

25
The second post-processed analysis (POST1min-E) is done following the same approach but applied to the 1 min (same as the integration time step for this simulation) output data, and the results show strongly reduced errors in all terms (Fig. 9). The errors that remain are mostly in the PGF 30 term and likely stem from the fact that the small-step modes and the RK3 integration scheme are not considered in the post-processed budget. These inherent errors result in a small residual term with a general order of 10 −5 m s −2 , 1 to 2 order(s) smaller than the maximum V tendency. In terms of 35 local maxima, the 99th percentile magnitude of the residual obtained in POST1min-E gives a relative magnitude of about 7 % of the 99th percentile v tendency during the peak intensifying stage of the convection at around 15 h (Figs. 2b and 5). Although reducing the model output interval to be close to 40 the integration time step helps to balance the budget without the need for inline diagnoses, it is computationally expensive especially for large, data-intensive simulations.
Given that computational cost is often a major consideration, we also test whether the implicit or backward Euler 45 method (POST10min-I) can improve the estimation of instantaneous forcing terms relative to the explicit method for the same 10 min output data (POST10min-E). POST10min-I follows the same strategy as POST10min-E except that all the rhs terms, following Eq. (13), are diagnosed with the 50 predicted states instead of the previous output states. As depicted in Fig. 10, POST10min-I does indeed better capture the true model estimated forcing values as errors in all the rhs forcing terms diminish greatly to an accuracy similar to POST1min-E. However, as these forcings are calculated at 55 a given instant, the imbalance of the budget would remain if the diagnosed tendency term is not calculated instantaneously (the second column from the right in Fig. 10). Therefore, if budget analysis at an instant of time is desired, we recommend adding the tendency calculation within the model as 60 a standard output and diagnosing the forcing terms implicitly, which yields a residual term on a similar order to the one obtained in POST1min-E (the rightmost column in Fig. 10 and Fig. 5a).
For the more common situation, the post-processed anal-65 yses diagnose rhs terms using two successive outputs over an output time interval, i.e., taking the averages of the explicitly and implicitly calculated forcings using Eq. (15) on the 10 min output (POST10min-(E+I)/2). Comparing the averaged rhs forcings with the analogously diagnosed lhs mo-70 mentum tendency (Eq. 8) gives a small residual to a similar accuracy level as POST1min-E and POST10min-I (the rightmost column in Figs. 11 and 5b).
We now investigate the impact of other common simplifications on top of the reference experiment, POST10min-75 (E+I)/2. The first such simplification is to approximate the flux-form advection term using the second-order operator (Eq. 9) for both vertical and horizontal components (POST2oadv-(E+I)/2) instead of the third-and fifth-order operators as used in the model setup. In our simulation, such 80 inconsistency of advection operators introduced errors in the ADV term with a maximum value > 3 × 10 −4 m s −2 , more than 50 % of its true magnitude along the slantwise convective band (Fig. 12). Next, we repeated POST10min-(E+I)/2 but the calculation is applied after all the model output vari-85 Geosci. Model Dev., 13, 1-25, 2020 www.geosci-model-dev.net/13/1/2020/  ables have been interpolated to the universal or un-staggered grid (pressure grid) (POSTnonstag-(E+I)/2). This is a common way to post-process model output data for plotting purposes. As mentioned earlier, this approach would reduce the accuracy when solving the spatial differential terms, and in-5 deed, the results do indicate significant errors over a large area in both ADV and PGF (Fig. 13). Their combined errors result in widespread residual values > 3 × 10 −5 m s −2 even over the area where the tendency term is smaller than 1 × 10 −4 m s −2 (error is at least of 30 % magnitude of the 10 tendency term over a wide area and is reaching 100 % over the band head). Finally, a different format of the V equation, the advective form, is used for post-processed analysis (POSTadvF-(E+I)/2). Mathematically, the flux-from momentum equation 15 can be rewritten in the advective form without making any  additional approximation, only with the aid of the conservation law of dry-air mass in the WRF model as shown in Eqs. (16)-(18). However, during the interchange of the expression for the tendency and advection terms, truncation errors may be introduced. We reiterate that the tendency term 5 in the advective form is not equivalent to the one in the flux form divided by µ d ; however, calculation suggests that they are approximately equal, i.e., with a maximum error that is on the order of 10 −7 ∼ 10 10 −8 m s −2 (3 orders of magnitude smaller than the simulated v tendency) in our study. The summation of the tendency term and advection term in these two forms of the momentum equation should be mathematically identical, so we would expect to see a small difference in the advection 15 term as in the tendency term. However, we find that the advection term in the advective form has a strong positive bias compared to that in the flux form (Fig. 14). The residual term in the POSTadvF-(E+I)/2 is thus negatively biased Figure 12. Same as Fig. 11, but the post-processed analysis uses a second-order operator for advection calculation (POST2oadv-(E+I)/2). over the entire convective band with a magnitude exceeding 1.2 × 10 −4 m s −2 (reaching 100 % error near the upper half of the convective band). If the residual is neglected or not shown, authors and/or readers may falsely consider the advection process to be the dominant term governing the evo-5 lution of the slantwise updraft.
A quantitative comparison of the 99th percentile of the magnitude of the residual term in the domain (excluding the boundaries) among different analysis methods is shown in Fig. 5. The residuals between the instantaneously diagnosed 10 forcings and the true model tendency term (calculated inline) are shown in Fig. 5a while the ones between the averaged forcings of two consecutive outputs and the diagnosed tendency term are shown in Fig. 5b. The evolution of the 99th percentile residual shows generally larger magnitudes 15 when the momentum tendency is larger (Fig. 2b), suggesting that these errors may amplify in stronger convection cases. While the post-processed budget analysis in POST1min-E, POST10min-I, and POST10min-(E+I)/2 can achieve a relatively small 99th percentile residual (peak at ∼ 5 × 20 10 −5 m s −2 , or about 7 % of the concurrent 99th percentile v tendency), the inline budget analysis always gives a much smaller magnitude (< 10 −8 m s −2 , or 0.001 % of the tendency, during the entire simulation). Figure 5 also shows that any simplification that is inconsistent with the model solver 25 can severely degrade the accuracy of the post-processed budget analysis. Both POSTnonstag-(E+I)/2 and POSTadvF-(E+I)/2 can lead to a 99th percentile of the residual magnitude peaking at around 4 × 10 −4 m s −2 or more, which corresponds to > 50 % of their concurrent 99th percentile sim- 30 ulated v tendency, respectively. Generally, a higher relative magnitude of residual to v tendency is reached if the maximum instead of the 99th percentile is examined (despite larger fluctuation with time). We also examined the 95th percentile of the residual magnitude and obtained qualitatively 35 similar results although the relative magnitudes of such chosen residuals among the three post-processing methods with simplifications (POST2oadv-(E+I)/2, POSTnonstag-(E+I)/2, and POSTadvF-(E+I)/2) vary due to their different error distributions.

Results of vertical momentum budget
For the W equation, the closure of the post-processed budget appears not to be practicable even when the output time interval is reduced to the integration time step (Fig. 15). One partial reason is that the spatially noisy small-step modes, ne-45 glected in the offline budget analysis, are surprisingly large with a general order of 10 −4 m s −2 over the growing band, which is 1 order of magnitude larger than the W tendency (see the blue and red contours overlapped on the residual subplots in Fig. 4). These high-frequency modes not only in-50 clude vertically propagating sound waves but also some gravity wave modes (Klemp et al., 2007). Furthermore, as indicated in Eq. (6) and mentioned in Sect. 3.1, the W equation solved in the WRF model is implicit, coupled with geopotential tendency equation and includes a forward-in-time aver-55 aging operator that is applied to the small-step modes: where β is a user-specified parameter and τ indicates the small time step in the acoustic scheme. This means that the small-step modes at a current small step, (ACOUS) τ , are cal-60 culated using information (e.g., geopotential, potential temperature and density) at the forecast time τ + τ (see Eq. 3.11

16
T.-C. Chen et al.: Momentum budget analyses in the WRF (v3.8.1) model Figure 13. Same as Fig. 11, but the post-processed analysis does not consider C staggering grids (POSTnonstag-(E+I)/2). Figure 14. Same as Fig. 11, but the post-processed analysis is applied using the advective-form equation (POSTadvF-(E+I)/2). and 3.12 in Skamarock et al., 2008). All these components are not feasible for an offline budget calculation. The application of POST1min-E for the W tendency shows that this method accurately estimates most of the processes, but large errors > 2 × 10 −3 m s −2 remain in the PGF-5 BUOY term resulting in a widespread residual that reaches the same magnitude of the peak W tendency term (Fig. 15). The fact that these errors exceed the small-step modes (contributing to PGFBUOY) suggests that such imbalance does not solely come from the neglect of the small-step modes.

10
A close comparison of the post-processed and the inline PGFBUOY shows that our estimation is close to the inline value to an accuracy of at least three significant figures at the first RK3 step before the acoustic contribution is considered (not shown). However, this large-step forcing term adjusts 15 rapidly, sometime even with a sign change, from step to step within the RK3 integration. Although it is feasible to estimate F ( t ) via post-processing, it is however impossible to retrieve F ( * * ) in Eq. (4), leading to the poor estimation of vertical pressure gradient and buoyancy force in the W bud-20 get. This result also suggests that the budget closure for vertical velocity is difficult by nature due to its rapid variation on small scales. Figure 15. The difference between the post-processed (POST1min-E) and the inline budget analysis for vertical momentum W . All terms have been divided by µ d and thus have a uniform unit of meters per second squared. In each row, the subplots from left to right indicate the difference of true W tendency, ADV, PGFBUOY, and CUV. The rightmost subplot indicates the residual term obtained in the post-processed budget analysis. Each row from top to bottom shows the results at 6, 12, and 16 h, respectively.

Tests on different cases or with different horizontal resolutions
The growth of the residual as the convection intensifies (Fig. 5) , 2008, chap. 4.2.4). The simulated v tendency in this case is 2 orders of magnitude stronger than the one in the slantwise convection case. The inline retrieval budget tool works well with 99th percentile residuals generally 2 orders smaller than the tendency terms in the domain during 15 this simulation. However, as compared to the slantwise convection case, this case features a larger relative magnitude of 99th percentile residual to the 99th percentile tendency term of about 0.1 %. Furthermore, the post-processed budget analysis applied to the output data with an output interval the 20 same as the integration time step (analogous to POST1min-E but in this case, it is termed POST3sec-E; Fig. 16), with no simplification made, does not work as well as in the slantwise convection case. POST3sec-E shows that the largest error appears in the PGF term with a magnitude of 50 % of its true 25 value at a given instant. The error in diffusion only accounts for about 10 % of the error at the same time. One possible reason is that unlike the case of slantwise convection where the PGF exhibits a horizontally rather uniform structure with almost the same sign (Fig. 3), the PGF term in this case has 30 a more complex spatial structure with several sign changes over a horizontal distance of 10-15 km. Thus, large errors appear at the edge of these positive or negative patches where the sign changes. Despite the small spatial scales of these errors, the large error magnitude would render accurate in-35 terpretation of the physical process difficult based on such post-processed budget analysis. This result suggests that the post-processed budgets, even when done with care, do not always work well, and that the associated residual or errors might be sensitive to the intensity of the simulated system, 40 the spatial or temporal resolution, and the nature of the physical processes governing the different systems. While an increase in spatial resolution often requires a shorter integration time step for numerical stability and may result in stronger simulated convection, it is almost impos-45 sible to separate all these factors. We can, however, conduct the same slantwise convection simulation with a higher resolution of 2 km (and a shorter integration time step of 10 s) to exclude the effect of different physical processes in different systems and discuss the changes in the accuracy of the bud-50 get analysis when spatial resolution is increased from 10 km. As shown in Fig. 2b, in the 2 km simulation the maximum of the simulated 99th percentile v tendency is 1.2×10 −3 m s −2 , almost twice the magnitude in the 10 km run. The magnitude of the residual from the inline budget analysis also be-55 comes larger with the 99th percentile value almost 1 order larger than that in the 10 km simulation (Fig. 5). However, its relative magnitude is still small and amounts to about 0.005 % of the tendency in the 2 km case. For the post- processed budget analysis applied to the 2 km simulation, the 99th percentile residual with the instantaneous calculation of POST10min-I-2km appears only slightly larger yet sometimes smaller than those in its 10 km case (Fig. 5a). For the method using two model outputs for both diagnosed ten-5 dency and forcing terms, the peak 99th percentile residual in POST10min-(E+I)/2-2km is about 4 times larger than that in its 10 km counterpart (POST10min-(E+I)/2). This is likely due to the larger deviation caused by the longer diagnosed window (10 min) with respect to the integration time step 10 (10 s) in the 2 km case. In addition, it appears that the simulated fields adjust more rapidly with more complex structures on smaller scales in the 2 km simulation as compared to the 10 km simulation (not shown). If the same analysis is performed using the 1 min output (POST1min-(E+I)/2-2km) as 15 opposed to the 10 min output, the residual can be greatly reduced to be similar to that obtained in POST10min-(E+I)/2 (Fig. 5b).
The results presented above suggest that the relative magnitude of errors in budget analysis vary with different sys-20 tems or cases. Furthermore, while the absolute errors in the inline momentum budget analyses do indeed increase with increasing horizontal resolution, the relative magnitude with respect to the simulated tendency does not increase substantially. The accuracy of the post-processed budget analysis us-25 ing the averages of two consecutive model outputs is highly dependent on the ratio of the output interval and the integration time step. A ratio of 10 as used in the POST10min-(E+I)/2 results in an acceptable accuracy (99th percentile residual of about 7 % of the tendency), while a higherTS2 30 value of 6 is required for high-resolution simulations (e.g., the 2 km case) to reach a similar accuracy. For cases with a more complex physical process like the squall line test case, the inline budget retrieval appears necessary for adequate budget closure.

Discussion and summary
Budget analysis is a commonly employed tool in numerical studies to understand the underlying mechanisms for certain simulated features of interest. However, many studies still have difficulties in achieving a balanced or closed bud-40 get especially when a full-physics model is used and when the budget is calculated instantaneously over a local area. Aside from the complexity of various (some implicit) parameterization schemes, the main challenge in closing the budget involves the analysis of post-processed data using al-45 gorithms that are inconsistent with the model solver. In this study, an inline momentum budget retrieval tool is developed for the WRF model, and its advantages for momentum budget analysis are demonstrated. The 99th percentile residual obtained from this inline retrieval is always smaller than 50 or about 0.1 % of the actual tendency term in all the tested cases, which include idealized, 2D simulations of slantwise convection and squall lines. Taking the results from the in-line retrieval as "truth", we investigate the potential errors in each term and the resultant residual for post-processed budget analyses under different assumptions.
The comparison among different post-processed diagnoses is focused on the horizontal momentum (V ) budget.

5
The reason is that post-processed vertical momentum (W ) budget analysis fails to produce reasonably accurate results due to the noisy vertical pressure gradient and buoyancy forces that are tied closely to the small time-step modes and the implicit scheme used for the vertical momentum integration. Thus, inline retrieval is necessary for an accurate W budget analysis. The errors in the post-processed V budget arise from both the left-hand-side tendency term and the right-hand-side (rhs) forcing terms. To improve the accuracy of the diagnosed momentum tendency estimation, one can 15 reduce the output interval to the model integration time step, which incurs a large computational cost and consumes a large amount of disk space. An alternative and cheaper solution is to add the tendency calculation within the model as a standard output. Our test case of slantwise convection shows that 20 the diagnosed tendency using two successive model outputs with a 10 min interval to approximate the instantaneous true tendency (with an integration time step of 1 min) could create an error exceeding 50 %, which greatly limits the effectiveness of such a budget for physical interpretation.

25
For the rhs forcing terms in the V equation, errors can be limited if the post-diagnosis is done with care using the same form of the model equation, the same spatial discretization, and the same order of the advection operators and performing the calculation on the original (e.g., C staggering and 30 vertically stretched) model grids. However, these steps are necessary but not necessarily sufficient for the closure of the budget, as the forcing term diagnosis also largely depends on the selected input states. If the budget at an instant of time is desired, the explicit or forward Euler method using 35 the previous states might result in large and widespread errors in the advection and horizontal pressure gradient terms (local peak errors are about 50 % and 25 % of their true values in our simulation, POST10min-E) unless the output interval is reduced to the integration time step. In the latter case 40 (POST1min-E), an error < 5 % for each individual term and a residual generally 1 to 2 order(s) smaller than the maximum tendency can be achieved (the 99th percentile residual is about 7 % of the 99th percentile v tendency). An alternative way to reach a similar level of accuracy for instantaneous 45 fields without compromising the computational cost is to diagnose the rhs forcings using the implicit or backward Euler method (POST1min-I). This method diagnoses the forcings using the predicted states and thus can better capture the true model forcings by using inputs at a closer time to the model 50 integration window.
Instead of performing the calculation using model output at one given instant, a more general post-processed budget analysis can use two successive model outputs (POST10min-(E+I)/2). This method seems to work well with the 99th per-55 centile residual being about 7 % of the 99th percentile v tendency in our 10 km slantwise convection case with 10 min output intervals. However, the accuracy of this method varies among the test cases of different systems and is sensitive to the ratio of the output interval to the integration time step. 60 Among the tests conducted in this study, an upper limit of 10 for this ratio is suggested, and it should be even smaller for high-resolution simulations of high-amplitude weather systems, as rapid adjustments occur on the small scales.
Three other common assumptions in post-processing anal-65 ysis are made on top of the POST10min-(E+I)/2 to examine their potential impacts on the accuracy of the horizontal momentum budget analysis. First, utilizing an advection operator with a lower order than the one used in the model setup degrades the accuracy of the advection term 70 with up to 50 % error over the area where the advection is the strongest (POST2oadv-(E+I)/2). Second, the neglect of the staggering grids would negatively impact the estimation of all the spatial differential terms, leading to a widespread residual of at least 30 % of the local tendency 75 (POSTnonstag-ITS3). Last, when the advective form of the momentum equation is used for post-diagnosis rather than the flux form, although it is mathematically equivalent to the flux form solved in the model solver, a strong negatively biased residual results (POSTadvF-ITS4). Both POSTnonstag-I 80 and POSTadvF-I TS5 TS6 give a peak 99th percentile residual of about > 50 % of the concurrent 99th percentile of the v tendency. All the above errors do not just appear randomly; rather, they are spread over the area where the dynamics are the most active, thus undermining the physical interpretation 85 of the dynamics of the simulated system. We thus emphasize the importance of revealing the magnitude of the residual (relative to the tendency term) in publications on budget analysis, to enable readers to gauge the validity of the results.
While the post-processed V budget analysis can reach an 90 acceptable accuracy in some cases, the resultant residual may vary from case to case even when the same analysis method is adopted. Our test of an idealized squall line case with strong momentum tendencies shows that the application of the post-processed budget analysis method without any sim-95 plification using the 3 s (same as the model integration time step) output data nevertheless results in a large relative error magnitude (∼ 50 %) in the horizontal pressure gradient force, with very small-scale error structures.
In summary, different assumptions or simplifications made 100 in a post-processed budget analysis may severely impact the estimation of each forcing term and result in a large imbalance of the budget. Based on our experiments, we conclude that the inline retrieval method like that developed herein is the most reliable one for budget analysis in numerical stud-105 ies. While the budget analyses shown in this study are only for V and W momentum under the 2D idealized configurations, this newly developed budget tool also retrieves budget terms for U momentum and potential temperature. It can be applied to 3D idealized and real cases as the map projection 110 is also considered, following the original governing equations as shown in Skamarock et al.'s (2008) Eq. (2.23)-(2.25) with map factors, which are equal to 1 for an idealized setup on the Cartesian coordinate. We also stress that in some budget studies where a coordinate transformation is necessary 5 (e.g., from Cartesian to polar), some errors are unavoidable. In such cases, it is best to perform the budget calculation using the inline retrieval method on the model grid and then transform the budget to a new coordinate (e.g., Zhang et al., 2000). Finally, in situations where the inline coding cannot 10 be done, this study also provides general guidance to minimize the error in the budget. Thus, our results are beneficial to budget analyses in numerical studies in general and are not limited to the WRF model. Lehner (2012) provides a very detailed instruction of how an inline budget retrieval is done for the WRF model. The method or code was utilized in Lehner and Whiteman (2014) to study the mechanisms of the thermally driven cross-basin circulation. However, the code 10 was never made publicly available. From the document, it appears that Lehner's (2012) general procedure of retrieving the rhs budget terms during the model integration is essentially the same as our approach, which considers both the large time-step and the small or acous-15 tic time-step contributions. Furthermore, the individual contribution from different parameterization schemes that are activated in her study was also separately retrieved. While the general method appears highly similar to our code, the momentum budget retrieval in 20 Lehner (2012) only applies to the horizontal momentum (U and V ) whereas our tool includes the budget retrieval for the vertical momentum (W ) as well. (2014)  The code is publicly available. This budget retrieval uses the code adapted from Moisseeva (2014), taking references from Lehner (2012), and is applied to the same version of the WRF model as used in this study (v3.8.1). More components are added from the ver-40 sion used in Moisseeva (2014), including the potential temperature budget, vertical velocity budget, the sixthorder diffusion term, the parameterized physics term decomposed to boundary layer, and radiation schemes. A major difference from our retrieval tools exists in that 45 the small-step components are neglected in Potter et al. (2018). Comparing the budget analysis results using our retrieval tool with those using theirs for the same idealized test case of the 2D squall line, the largest differences appear in terms that involve the small-step con-50 tributions (e.g., PGF and PGFBUOY), which result in larger residual terms with Potter et al.'s (2018) retrieval method (not shown). While the relative magnitudes of such residuals to the tendency term still appear small for the horizontal momentum budget, they become larger 55 for the vertical momentum budget. This is consistent with our result that the small-step modes are more important in the W budget equation than in the V budget equation, and thus ignoring them results in larger errors.

Moisseeva (2014) and Moisseeva and Steyn
Furthermore, calculations of the lhs tendency terms are 60 added as new variables in our tool while the tendency terms used in the above studies are the model variables ru_tend, rv_tend, rw_tend, etc., which only represent the summation of all the large-step forcings to their corresponding fields (can be directly outputted via changing the WRF registry file) 65 instead of their true local changes with time. y and z are the horizontal distance from the southern boundary and height, respectively, with units of kilometers. The constructed initial profile has a maximum RH of 98.82 % over an elliptical area centered at y = 410 km and z = 1 km. 15 5. A constant surface pressure is specified: p sfc = 1000 hPa.
6. We then iteratively solve for the hydrostatically balanced pressure, water vapor mixing ratio, potential temperature, dry and full (moist) air density, and 20 geostrophic zonal wind for the entire domain.
The constructed initial environment contains some CSI, which is identified by the presence of negative saturated geostrophic potential vorticity (Chen et al., 2018). In this test case, CSI only exists over the southern half of the domain 25 and never extends higher than 5 km. To initiate convection, a 2D bubble of potential temperature and zonal wind perturbations is inserted in the area where RH is maximized and where the saturated geostrophic potential vorticity has a value of about −0.2 ∼ −0.1 pvu. The 30 center of the bubble, located at y c = 400 km and z c = 1.5 km, has a maximum potential temperature perturbation θ max = 0.5 K and zonal wind perturbation u max = −6 m s −1 . Both perturbation fields decrease to 0 with radius, following θ = θ max cos 2 (0.5π r) and u = u max cos 2 (0.5π r), for r <= 35 1, where r = ( y−y c R ) 2 + ( z−z c H ) 2 TS7 , R = 50 km is the horizontal radius, and H = 1.5 km is the vertical radius.
Code availability. The standard version of WRF v3.8.1 is publicly available at http://www2.mmm.ucar.edu/wrf/users/download/ get_sources.html (last access: 6 June 2019; National Center for Atmospheric Research, 2016). The inline budget retrieval tool in the WRF v3.8.1 described in this study can be found 5 at https://doi.org/10.5281/zenodo.3373872 (Chen, 2019). In this repository, all the files that remain unchanged from the defaults are tagged as "Initial commit". The modified files for the budget retrieval include the Registry.EM_COMMON within the directory registry; module_diag_misc.F, module_diagnostic_driver.F, and module_physics_addtendc.F within the directory phys; mod-ule_after_all_rk_steps.F, module_big_step_utilities_em.F, mod-ule_em.F, module_first_rk_step_par2.F, module_small_step_em.F, and solve_em.F within the directory dyn_em. The current version includes retrieval for terms of local tendency, advection, horizontal pressure gradient force, net force resulting from vertical pressure gradient and buoyancy, Coriolis force, curvature, upper damping (damp_opt = 2 and 3), turbulence or diffusion (diff_opt = 2), vertical-velocity damping (w_damping = 1) and parameterized physics from the planetary boundary layer 20 scheme (bl_pbl_physics), the radiation scheme (ra_lw_physics and ra_sw_physics), the cumulus scheme (cu_physics), and the shallow cumulus scheme (shcu_physics). Remarks from the typesetter TS1 Editor: your approval is necessary: "This correction was asked for as we realized that the original sentence might be misleading because upper-layer vertical damping is actually zero in the low levels. To avoid confusion, we thus ask to delete this sentence as suggested. Note that this does not change the original content".

TS2
Editor: your approval is necessary: "It was a mistake in the original argument, which mentions "A ratio of 10" first and then "a higher value of 6". The latter should be "a lower value of 6"." TS3 Editor: your approval is necessary: "The correct experiment names should be "POSTnonstag-(E+I)/2" for "POSTnonstag-I" and "POSTadvF-(E+I)/2" for "POSTadvF-I" (check Table 1). Note that the experiment names "POSTnonstag-I" and "POSTadvF-I" shown in the original manuscript do not exist." TS4 Editor: your approval is necessary.

TS5
Editor: your approval is necessary.

TS6
Editor: your approval is necessary.

TS7
Editor: your approval is necessary: "The original formula doesn't describe the true formula used in our experimental setup. The change corrects this mistake."