Study of the Jacobian of an Extended Kalman Filter for soil analysis in SURFEXv 5

Introduction Conclusions References Tables Figures


Introduction
Externalising surface schemes from upper-air atmospheric models has many advantages.If the interface between the different parts is defined in a flexible manner (see Best et al., 2004, for an example), then it provides the possibility to plug one scheme into different models, even targeting different applications, ranging from climate to high-impact weather.Another major advantage is that the scheme can also be used in an offline mode, allowing for cheap solutions in specific applications.An example of this is studied in the present paper: the implementation of an extended Kalman filter (EKF) for surface assimilation (Mahfouf et al., 2009), where cheap offline runs with the SURFEX external land surface model (Masson et al., 2013;Hamdi et al., 2014a) allow one to numerically estimate the observation operator Jacobian.
Surface assimilation techniques, like this EKF, can improve the boundary layer forecasts of a numerical weather prediction (NWP) model considerably (Douville et al., 2000;Hess, 2001;Drusch and Viterbo, 2007).The surface serves as a lower boundary condition for the NWP model and has an important impact on the lower atmosphere.Land surface models (LSMs) determine the partitioning of the energy into latent and sensible heat fluxes (e.g. by means of evapotranspiration processes) and these fluxes provide the main link between the surface and the atmosphere.In the past two decades LSMs have been improved considerably.Still, there are a lot of uncertainties and errors in model parameterisations, model resolution and observation measurements of soil variables.In order to provide an optimal initial surface state for an NWP forecast, the assimilation of surface observations into the land surface model is necessary.The amount and fre-A.Duerinckx et al.: Study of the Jacobian of an extended Kalman filter for soil analysis in SURFEX quency of direct soil observations, like root zone soil moisture content and root zone soil temperature, is too limited for soil analysis.Therefore, Douville et al. (2000) suggest using screen-level temperature and screen-level relative humidity as indirect observations for soil moisture content and soil temperature.These screen-level observations are more frequently and numerously available and in most situations they contain a lot of information about the soil moisture content and soil temperature.In the past, optimum interpolation (OI) (Giard and Bazile, 2000;Mahfouf et al., 2000) was the most commonly used soil analysis technique.A local OI algorithm to assimilate screen-level temperature and screen-level relative humidity has been tested within SURFEX (Mahfouf et al., 2009) and is used operationally in various NWP centres.
The screen-level temperature and relative humidity forecast errors are not always caused by errors in the soil variables (Draper et al., 2011).When the local soil moistureatmospheric boundary layer feedback is weak, for example in situations of weak radiative forcing or strong advection, the screen-level observations do not provide any information about errors in the soil.Therefore, it would be useful to also include other soil observation types in the soil analysis, for example remotely sensed soil moisture (Draper et al., 2009(Draper et al., , 2011)).OI uses analytically derived coefficients, making it difficult to include new observation types in this technique.To overcome this difficulty, a new surface assimilation technique has been recently developed for SURFEX: an extended Kalman filter (EKF) (Masson et al., 2013;Hamdi et al., 2014a).The advantages of the EKF over OI are the dynamically calculated gain coefficients.They make it easier to include new observation types.Another advantage is that those dynamical gain coefficients automatically take into account the situations in which there is only a weak link or even no link between the soil variables and the atmospheric boundary layer.Hence no hardcoded switches are needed to diminish or turn off the assimilation in such cases.
An EKF has been developed for SURFEX by Mahfouf et al. (2009), assimilating screen-level temperature and relative humidity to correct soil moisture and soil temperature.Results indicate that OI and the EKF have similar gain coefficients and increments.The EKF has been extended to include other observation types, like AMSR-E soil moisture retrievals (Draper et al., 2009), radar precipitation information (Mahfouf and Bliznak, 2011), and ASCAT surface soil moisture (Mahfouf, 2010;de Rosnay et al., 2012).
The cornerstone of the EKF is the Jacobian of the observation operator.The Jacobian describes the sensitivity of the screen-level observations to changes in the soil prognostic variables.Mahfouf et al. (2009) suggest calculating the Jacobian with a finite differences approach, using a reference run and one perturbed run for each of the soil prognostic variables (i.e. a run with an initial surface where one of the prognostic variables has been perturbed).These reference and perturbed runs can either be calculated using SURFEX cou-pled to a full atmospheric forecast or using SURFEX offline.The latter is computationally much cheaper.The calculation of this Jacobian with finite differences assumes a linear response of the land-surface evaporation to a small soil moisture variation.Balsamo et al. (2004) show that, even though this hypothesis is well satisfied, some noise may still enter the Jacobian matrix under certain meteorological conditions.For example, under rainy conditions, small perturbations in soil moisture content can have non-linear threshold effects on the cloudiness and precipitation.This leads to oscillatory model trajectories for the screen-level variables and introduces noise in the Jacobian matrix for the rainy areas.Balsamo et al. (2004) propose switching off the soil-moisture analysis under these circumstances.They also show the importance of using a good perturbation size that best satisfies this linearity hypothesis.Balsamo et al. (2007) compare the information content and the gain components for the offline and coupled Jacobian approaches of the EKF.They use a set of simulated observations in a 1-day assimilation experiment to verify the impact of the coupling assumption.They use the Interaction between the Soil, Biosphere and Atmosphere (ISBA, Noilhan and Planton, 1989;Noilhan and Mahfouf, 1996) surface scheme within the Global Environmental Multiscale Model (GEM) regional model (Mailhot et al., 2006;Côté et al., 1998) for the coupled runs.For the offline runs they use the Land Data Assimilation System (LDAS, Mitchell et al., 2004), with a 3 hourly forcing from GEM's lowest vertical level output (at 50 m height) and a vertical interpolation according to Delage (1997).They conclude that the gain values are smaller for offline runs, but they have the same spatial patterns as the values calculated with the fully coupled runs.The lack of coupling with the full planetary boundary layer in the case of the offline runs reduces the influence of the soil variables on the surface boundary layer (Mahfouf et al., 2009).Overall the Jacobians calculated with offline runs seem to be a good and computationally more feasible alternative to the use of the Jacobians calculated with the fully coupled model.In de Rosnay et al. (2012) fully coupled forecasts are used to calculate the Jacobian, because the ECMWF does not yet have an externalised version of their LSM (i.e.HTESSEL) at their disposal.They use the EKF operationally in combination with a four-dimensional variational (4DVAR) atmospheric assimilation, replacing the old OI soil analysis of the global ECMWF Integrated Forecasting System (IFS) since November 2010.In their current setup the EKF only corrects the soil moisture content, not the soil temperature.
The numerical approach to calculating the Jacobian makes the EKF scheme more flexible for surface analysis than the OI scheme.The EKF does not require analytical recomputation of the observation operator and gain coefficients each time new observation types are included.Having an externalised surface scheme that can be run in offline mode, like SURFEX, is essential to a computationally efficient calculation of the Jacobians.In this paper the difference between the offline and coupled Jacobian calculations is studied more in depth, correcting for both soil moisture content and soil temperature.The comparisons are made with SURFEX in offline mode and coupled to the ALARO model (Bubnová et al., 1993;Gerard et al., 2009), following the study of Balsamo et al. (2007).We document a case where spurious 2 t oscillations occur in some parts of the domain for the coupled as well as offline runs.The oscillations are too small to have a detrimental effect on the performance of the model runs and remain thus unnoticed in coupled model runs.However, in an EKF application, the magnitude of the numerical perturbations used to estimate the Jacobians may acquire the same order of magnitude as these oscillations, and this may induce noise in the affected increments of the data assimilation.In the present paper we provide a workaround for these oscillations by applying a numerical filter with the EKF formulation.We provide some evidence that these oscillations are due to a decouplling between the surface and the atmosphere.In Sect. 2 the ALARO model, the SURFEX scheme and the EKF technique are described and in Sect. 3 the experimental set-up is given.Section 4 shows the origin and effects of noisy Jacobians as well as the proposed filtering workaround.In Sect. 5 the results are presented and a comparison is made between the offline and coupled approaches for the EKF.Finally, the conclusions and perspectives are discussed in Sect.6.

Methodology
In this paper the atmospheric ALARO limited area model (LAM) has been used in combination with externalised surface model SURFEX (Hamdi et al., 2014a).When SURFEX is coupled to the atmospheric model, they exchange fluxes and forcing at every time step.SURFEX can also be used in offline mode, i.e. without coupling to an atmospheric run.In offline mode ALARO provides hourly forcing for SURFEX, but there is no feedback from SURFEX to ALARO.The difference between the coupled and offline approaches is shown in Fig. 1.An EKF is used to provide an initial state for the surface.The following subsections will discuss in more detail the ALARO model, the SURFEX scheme and the EKF data assimilation technique.

The ALARO atmospheric model
The ALADIN model is the LAM version of the Action de Recherche Peite Echelle Grande Echelle Integrated Forecast System (ARPEGE-IFS) (Bubnová et al., 1995), developed by Météo France and the ECMWF.In the ALARO model, ALADIN is updated with the ALARO-0 physics package.This parameterisation has been designed to run at resolutions from the mesoscale to the cloud-resolving scales in a scaleaware manner, based on the modelling approach of the Modular Multiscale Microphysics and Transport (3MT) cloud and precipitation scheme of Gerard andGeleyn (2005), Gerard (2007), and Gerard et al. (2009), and has been validated up to a spatial resolution of 4 km for NWP (Gerard et al., 2009;De Meutter et al., 2015) and climate (Hamdi et al., 2012(Hamdi et al., , 2014b;;De Troch et al., 2013).The ALARO-0 physics package is coupled to the dynamics via a physics-dynamics interface based on a flux-conservative formulation of the equations proposed by Catry et al. (2007).The ALARO model is running operationally at the Royal Meteorological Institute (RMI) of Belgium as well as in a number of other countries of the ALADIN and HIRLAM consortia.

2.2
The SURFEX land surface model SURFEX (SURFace EXternalisée) (Masson et al., 2013) is an external land surface scheme that originates from the meso-NH mesoscale model (Lafore et al., 1998).The coupling of SURFEX to the atmosphere follows the approach of Polcher et al. (1998) and Best et al. (2004).At every time step SURFEX receives forcing for every grid box from the atmospheric model and provides fluxes to the atmospheric model.The forcing includes low-level atmosphere temperature, specific humidity, horizontal wind components, surface pressure, total precipitation, long-wave radiation, and shortwave direct and diffuse radiations.The fluxes calculated by SURFEX are averaged fluxes for momentum, sensible and latent heats and radiative properties like surface temperature, surface direct and diffuse albedo and surface emissivity.SURFEX has a modular structure that can include new parameterisations.In SURFEX, a grid box is built up from four different tiles: sea, lakes, nature and town.The nature tiles can include up to 12 patches, representing the different vegetation types.The fluxes for each grid box are averaged according to the weight of each of the tiles for that grid box.For sea and ocean tiles, two options are available: a simple formulation with constant sea surface temperature (SST)  (Lebeaupin, 2007).The FLAKE model (Mironov et al., 2010) can be used in the case of a lake tile.Town tiles use the TEB (Town Energy Balance developed by Masson, 2000) scheme and nature tiles use the ISBA (Interaction between the Soil, Biosphere and Atmosphere, developed by Noilhan andPlanton, 1989 andNoilhan andMahfouf, 1996) scheme.SURFEX also includes the CANOPY parameterisation (Masson and Seity, 2009;Hamdi and Masson, 2008), a multilayer parameterisation for the natural and urban canopy.
In the set-up used here, surface assimilation is only performed on the nature tiles.For these tiles, the two-layer version of the ISBA scheme is used with one vegetation patch.It describes the heat, moisture and momentum exchanges between the surface and the atmospheric boundary layer, based on the force-restore method proposed by Deardorff (1977Deardorff ( , 1978)).The two-layer version of ISBA has four prognostic variables: surface and deep soil temperature (T s and T 2 ) and the corresponding soil water contents (W g and W 2 ).In offline mode the atmospheric forcing is applied at the first atmospheric model layer (∼ 17 m).Mahfouf et al. (2009) describe the EKF that has been developed within SURFEX.The equation for the model state analysis of the EKF is

The extended Kalman filter for soil analysis
where subscripts a, b, and o indicate the analysis, background and observations, such that the analysis model state x a is equal to the sum of the background model state x b and an increment based on the observation departure b )] and the Kalman gain matrix BH T (HBH T + R) −1 .t is the time step indicator, B is the covariance matrix of background errors, R is the covariance matrix of observation errors, and y is the observation vector.H is the observation operator projecting the model state onto the observation space.In the particular case of this study, the observation operator H is the product of the model state evolution from time t 0 = t − t to time t (the observation time), and the conversion of the model state into an observation equivalent, as is done in Mahfouf et al. (2009): The increments are thus applied at the end of the assimilation window instead of at the beginning (like in Balsamo et al., 2004).This saves a model integration starting from the analysis state.Furthermore, the B matrix is implicitly evolved by the linearised model, because H includes a model propagation.
H is the Jacobian of the observation operator, i.e. the linearised model observation operator.The use of this Jacobian allows the EKF to create dynamical coefficients that depend on the specific conditions of each grid point and leads to a relatively easy integration of new observation types into the EKF.Since the observation operator includes a model propagation from time t 0 to time t, the Jacobian of the observation operator reads as The numerical computation of the Jacobian uses a finite differences approach in the following way: A small perturbation δx j is added to one of the soil prognostic variables x j at time t 0 .Then the perturbed model state is evolved from time t 0 = t − t to time t and at time t the evolved perturbed model stated is projected into observation space to obtain the corresponding observation value y i (x + δx j ).The value of the Jacobian is determined by the difference between this perturbed observation value y i (x + δx j ) and the reference observation value y i (x).The value of the Jacobian thus depends on how the observation value changes after a t run, when the soil prognostic variable is perturbed at the initial time.The value δx j must be small enough to accurately approximate the derivative, but not too small to avoid round-off errors.
There are two possibilities for calculating the perturbed and reference y i : by means of a surface scheme coupled to an atmospheric scheme (coupled) or with a surface scheme decoupled from the atmospheric scheme (offline).In the former case, feedback from the surface to the upper-air atmosphere is possible.In the latter case, the atmospheric forcing is imposed from the lowest model level.

Experimental set-up
The EKF for soil analysis has been tested using the same setup and covariance values as in Mahfouf et al. (2009), with two soil layers and four prognostic variables: superficial soil water content (W g ), root zone soil water content (W 2 ), surface temperature (T s ) and deep soil temperature (T 2 ).Observations of T 2 m and RH 2 m are assimilated to correct errors in soil moisture and soil temperature.The observation error covariance matrix R is a diagonal matrix with elements set to 1 K for 2 m temperature and 10 % for 2 m relative humidity.The background error covariance matrix B is also a diagonal matrix, with 2 K for the background errors of T s and T 2 and 0.1×(w fc −w wilt ) for W g and W 2 , with W fc and W wilt respectively the volumetric water content at field capacity and at permanent wilting point.The B matrix is kept constant.Mahfouf et al. (2009) explain that the increase in the background error during the forecast step is balanced by the decrease in the background error during the analysis step.In accordance with that, Draper et al. (2009) found that using a constant B matrix instead of evolving the B matrix produces similar results for the analysis of near-surface soil moisture.Because of the constant B matrix the EKF is in fact a simplified EKF.
For the upper air, no data assimilation is performed.The initial upper-air conditions and the lateral boundary conditions are interpolated from an ARPEGE run, the global Metéo France model.Lateral boundary conditions are provided every 3 h from the ARPEGE model.The atmospheric model set-up has 46 vertical levels.All experiments were run over a 1-month period during July 2010, with a 6 h assimilation cycle for the surface.The operational ALARO Belgium domain was used, which has a 4 km resolution (181 × 181 grid points, see Fig. 2).
For the perturbed runs of the EKF Jacobian calculation, two methods were tested.The offline mode utilises offline SURFEX runs with hourly atmospheric forcing files calculated during the fully coupled forecast from the previous assimilation cycle (REFofl).In the coupled mode, the perturbed runs are calculated using SURFEX fully coupled to ALARO (REFcpl).
4 Oscillations in the boundary layer Balsamo et al. (2004) mention oscillatory trajectories of the screen-level variables that can introduce noise in the Jacobian matrix of the EKF.They show that these oscillatory trajectories occur under cloudy and rainy conditions and can be linked to evapotranspiration thresholds.In this section we document another kind of oscillation, a 2 t oscillation that can be linked to the stability parameters and the forma-tion of a stable boundary layer in the late afternoon.We will show how this oscillation influences the Jacobians and propose a method for filtering the oscillation before calculating the Jacobian.
Figure 3 shows the evolution of the Richardson number (top) and corresponding T 2 m (bottom) at location B and location C indicated in Fig. 2 for a coupled run.In Fig. 3a and b, the Richardson number for the lowest level is shown as it is calculated in SURFEX (black) and as it would be calculated for the same level in ALARO (red).As long as the Richardson number is negative (i.e.unstable conditions) the Richardson numbers calculated in SURFEX and ALARO correspond to each other.But when the Richardson number becomes positive (i.e. a stable boundary layer starts to form), there is a small divergence between SURFEX and the atmosphere.In some cases, as in Fig. 3a, an oscillation sets in when the Richardson number becomes positive.
These oscillations were found in the coupled as well as offline SURFEX runs from 12:00 to 18:00 UTC.The oscillations can be found in all surface variables that are related to the fluxes between the soil and the lower atmosphere.The oscillations occur only during the late afternoon when the surface cools down again.In those cases a stable boundary layer starts to form and the atmosphere decouples from the surface.
Figure 4 shows the evolution of T 2 m (black) and RH 2 m (red) from 12:00 to 18:00 UTC on 2 July 2010 for different settings at location A indicated in Fig. 2.An oscillation sets in as soon as the Richardson number becomes positive.This oscillation is clearly visible in the evolution of T 2 m (black) and RH 2 m (red).Figure 4a shows the evolution of these two variables for an offline SURFEX run with a time step of 300 s.Small oscillations are visible near the end of the run, with an average size of 2 % for RH 2 m and 0.2 K for T 2 m .In Fig. 4b the time step is 60 s instead of 300 s.The size and time interval of the oscillations is the same as in Fig. 4a, but the frequency of the oscillations increases with the time step.This means that the oscillations are 2 t oscillations, and hence they do not represent a physical process.The oscillations are also present in a coupled run for the same location and period.Figure 4c shows the evolution of T 2 m (black) and RH 2 m (red) for a coupled run with a time step of 180 s.The oscillation starts somewhat later than for the offline runs because the Richardson number remains negative for a longer period in this coupled run.The order of magnitude of the oscillations is the same as for the offline runs.Figure 4d shows the same evolution for a coupled run with a time step of 60 s instead of 180 s, and also here we can see that the 2 t oscillations do not diminish when the time step is increased.
The oscillations present in RH 2 m and T 2 m will also be present and even amplified in the Jacobian.Figure 5 shows the evolution of the Jacobian values during the 6 h forecast run for the offline case at the same grid point A as Fig. 4 for three different time frames.The Jacobian value in Fig. 5 is plotted at every time step (300 s).The red dots represent the  Jacobian values for a perturbation in the superficial soil layer (W g or T s ), while the black dots represent the Jacobian values for a perturbation in the deep soil layer (W 2 or T 2 ).For the Jacobians with a run from 12:00 to 18:00 UTC (bottom fig-ures) an oscillation sets in near the end of the 6 h window, introducing a noisy signal into the Jacobian values that can become of the same order of magnitude as the signal itself.This is the case for δT 2 m /δW g (red) and δT 2 m /δW 2 (black) in The perturbation size for the initial perturbed states is 10 −4 .In the upper left corner δT 2 m /δT s (red) and δT 2 m /δT 2 (black) are shown from 18:00 to 00:00 UTC, in the upper right corner δRH 2 m /δW g (red) and δRH 2 m /δW 2 (black) from 00:00 to 06:00 UTC, in the lower left corner δT 2 m /δW g (red) and δT 2 m /δW 2 (black) from 12:00 to 18:00 UTC, and in the lower right corner δRH 2 m /δW g (red) and δRH 2 m /δW 2 (black) from 12:00 to 18:00 UTC. Figure 5 also clearly shows the short time memory of the superficial soil layer (red dots).Any change in the superficial soil layer is quickly lost, causing the Jacobian value to return to zero, while changes in the deep soil layer (black dots) have a more lasting influence, resulting in non-zero Jacobian values at the end of the 6 h interval.Some Jacobian values converge once the initial disturbance has been taken up by the system, e.g.δT 2 m /δT s (red) and δT 2 m /δT 2 (black) in Fig. 5a.For others the value keeps rising until the end of the time window, eg.δRH 2 m /δW 2 (black) in Fig. 5b.
Figure 6a shows the spatial distribution of the oscillations for δRH 2 m /δW 2 on 2 July 2010 for the offline run from 12:00 to 18:00 UTC.The number of oscillations is shown at every grid point.This number is calculated by counting the number of consecutive time steps in which the gradient of the Jacobian evolution curve changes sign.Oscillations (i.e. the gradient changes sign in more than two consecutive time steps) occur in almost all parts of the domain.In some parts of the domain, there is a resemblance between the occurrence of oscillations and a soil wetness index (SWI) that is close to 0 (cf.Fig. 6b) where SWI is defined in the following way: The effect of non-linearities for SWI values close to 0 on the Jacobian values has already been pointed out by Balsamo et al. (2004Balsamo et al. ( , 2007)), and in Hamdi et al. (2014a) it was shown that for SWI values below 0 the Jacobians and increments are also 0. When looking at Fig. 6a there are also regions with oscillations that do not correspond to SWI values close to 0. This indicates that there are also other non-linearities that can trigger these oscillations.The regime shift of the Richardson number turning from negative to positive is one of them.As shown before, this change in sign of the Richardson number can cause spurious 2 t oscillations that also have a detrimental effect on the Jacobian values.In Table 1 the percentage of grid points is listed in which an oscillation occurs at the end of the run, thus influencing the Jacobian value, and in total, i.e. including those oscillations during the run that end before 18:00 UTC and hence do not influence the Jacobian value.For the offline approach only a small portion of the Ja-  cobian values is influenced by these oscillations, i.e. between 2.4 and 5.2 %.For the coupled run this percentage is somewhat higher, between 11 and 13 %.The higher number of oscillations in the coupled run could be explained by feedback processes of the atmosphere that are triggered when making small perturbations to the soil variables (Balsamo et al., 2004).In the case of the offline run, the atmosphere is forced, and hence no feedback processes are possible.
In conclusion one can say that due to non-linearities, like SWI values close to 0 or a change in sign of the Richardson number, oscillations may occur in some surface-related variables like RH 2 m and T 2 m .They are 2 t oscillations, indicating that the oscillations are artificial.These oscillations do not diminish when the time step is decreased, hence they are not fibrillations, but rather they originate from a decoupling between the surface and the atmosphere when a stable boundary layer starts to form in the evening or when the amount of soil moisture is too low.The oscillations occur at a small number of points, widespread over the domain.The oscillations occur for various lengths of the time step and perturbation sizes (not shown).They disappear again after a while and are harmless for a normal run, but are amplified in the calculation of the Jacobian.The oscillations can lead to spurious values in a limited number of grid points for the Jacobian, gain and increments of the EKF.
The oscillations occur at critical values of the Richardson number and are not merely a numerical effect.This suggests that they could be induced by a feedback in competing fluxes between the surface and its upper-air forcing, when changing from an unstable to a stable boundary layer.Such feedbacks are difficult to diagnose.Here we limit ourselves to documenting them, but demonstrate that the impact of these oscillations can easily be cured with a simple numerical temporal filter.
-We propose a workaround for these oscillations by filtering the reference and perturbed values of T 2 m and RH 2 m .The temporal filter works according to the following equation: with x the T 2 m or RH 2 m value to be filtered, t indicating the time step and w the weight attributed to the different parts of the filter.A number of values for w have been tested and a value of 0.5, the most optimal choice for filtering the 2 t mode, appeared to filter out the oscillation best.Since this filter uses the reference and perturbed observation values at times t, t − 1 and t + 1, two additional output files must be provided for every run.In order to change as little as possible to the original set-up of the EKF, we chose to work with time steps t, t − 1 and t − 2 instead, i.e. calculating the Jacobian for time step t − 1 instead of the time step at time t.In one time step the value of the Jacobian will change very little and this way we avoid the need for output at time step t + 1, which would require the offline runs to be extended for one additional time step and thus would also require the atmospheric forcing to be provided beyond the 6 h interval.The filter does not differentiate between oscillations initiated by different mechanisms.Therefore it will filter oscillations due to the critical RI values and SWI values close to 0, but also, for example, oscillations due to rainy conditions for the coupled approach as described by Balsamo et al. (2004).

Results and discussion
In the following part, the filtering approach (FIL) is compared to the reference approach without filtering (REFofl for the offline mode, REFcpl for the coupled mode).Comparisons are made with regards to the optimal perturbation size, the spatial distribution of the Jacobian values, the corresponding increments in the soil prognostic variables and the screen-level forecast scores.The offline and coupled approaches for the EKF are also compared to each other.

Impact of the filtering
Figure 7 shows the evolution of T 2 m (left) and RH 2 m (right) at location A (cf. Fig. 2), where an oscillation is present in the reference SURFEX run (black).Figure 7a and b (top) show the evolution in an offline SURFEX run, while Fig. 7c  and d (bottom) show the result from a coupled SURFEX run.The oscillation disappears when the result is filtered (FIL, red) and the values of the filtered result coincide with the reference values as long as there is no oscillation.

Optimal perturbation size and the linearity assumption
The Jacobians of the EKF are estimated by means of a finite differences approximation.This approximation is exact when the function is linear in the surroundings of the point.In that case neither the size nor the sign of the perturbation has any influence on the resulting value of the Jacobian.The difference between a Jacobian calculated with a positive (H + ) and with a negative (H − ) perturbation of the same size provides an indication of how linear the surroundings of the point are and how valid the finite differences approximation is.If the perturbation is too large, the perturbed value lies outside the linear regime around the point and the difference between H + and H − will be large.If the perturbation is too small, the Jacobian value will deteriorate because of numerical accuracy.The optimal perturbation size is the minimal perturbation size for which the Jacobian value is independent of the sign (i.e. for which the difference between H + and H − is as small as possible) (Balsamo et al., 2007).
Finding the optimal perturbation size is very important.In order to find it and to examine the differences between the approaches, experiments were run with perturbation sizes between 10 −11 and 10 −1 for each of the eight components of the Jacobian.Results are shown in Fig. 8, which shows the difference between H + and H − (black lines) and the average value ((H + − H − )/2) (red lines) for δRH 2 m /δW 2 and δT 2 m /δW 2 on 2 July 2010, averaged over the whole domain for all the perturbation sizes.For the Jacobian calculated with coupled perturbation runs, perturbation sizes smaller than 10 −4 caused a lot of noise, resulting in extremely high values There are a number of differences between the offline and coupled approaches.First, the optimal perturbation size is larger for the coupled approach (between 10 −2 and 10 −1 ) than for the offline approach (between 10 −9 and 10 −7 ).This is in accordance with Balsamo et al. (2007).For a coupled approach with an overly small perturbation size, non-linear feedbacks between the atmosphere and the soil can occur.These non-linearities cause the Jacobian to be noisy and inaccurate.Since in the offline approach the atmosphere is forced, these non-linear feedbacks cannot occur and the perturbation size can be a lot smaller.This optimal perturbation size for the coupled approach is similar to the optimal values of 15-20 % of the SWI value found in Balsamo et al. (2004) and the value of 0.01 m 3 m −3 used by de Rosnay et al. ( 2012) and Drusch et al. (2009).For the offline approach, the optimal perturbation size found here is somewhat smaller than the values used in Mahfouf et al. (2009), where 10 −4 m 3 m −3 is used for W g , and W 2 and 10 −5 K for T s and T 2 .
The differences between Jacobians from positive and negative perturbations (|H + − H − |) are a lot smaller for the offline approach than for the coupled approach, indicating that the linearity assumption is better approximated for the offline approach.This is a logical consequence of the fact that the coupled approach requires a larger perturbation size in order to avoid a noisy H matrix.If the perturbation size is larger, the perturbed value will more easily fall outside of the linear regime around the point at which the Jacobian is calculated.
The optimal perturbation size has also been studied for the filtering solution (FIL) (results not shown here).For FIL, the values of |H + − H − | and (H + − H − )/2 averaged over the domain are very similar to those of the REF run, and hence the optimal perturbation size remains the same.One thing that can be noted is that in FIL the non-linearities (measured by high values for |H + − H − |) are less extreme for the very high or low perturbation sizes.
Another way to verify the linear regime of the finite differences approximation is by plotting the Jacobian values from positive perturbations against those of negative perturbations.If all points are along the diagonal, the Jacobians are in the linear regime of the observation operator.Figure 9 shows such plots for the offline EKF (Fig. 9a and c) and the coupled EKF (Fig. 9b and d) for two different perturbation sizes.The offline EKF has much lower Jacobian values than the coupled EKF and the linear regime is better approximated for the offline approach.For a perturbation size of 10 −4 the points of the offline EKF are nicely aligned along the diagonal, indicating that the perturbation size is within the linear regime.The points of the coupled EKF follow slightly the opposite diagonal.It cannot be excluded that some non-linear feedback effects between the surface and the atmosphere are triggered here, but this is out of the scope of the present paper.If the perturbation size increases to 10 −2 for the offline EKF, more points deviate from the diagonal compared to the 10 −4 offline case.The horizontal line represents points that are sensitive to the positive perturbation (i.e. have a Jacobian value different from 0) but not sensitive to the negative perturbation (i.e. have a Jacobian value equal to 0).These points are in an area with negative SWI values.The negative perturbation decreases the SWI value even further, resulting in a Jacobian value of 0. The positive perturbation on the other hand is large enough to increase this SWI value above 0, and hence the Jacobian from this positive perturbation will not be 0.This is in accordance with what has been found by Mahfouf et al. (2009).
For the coupled EKF, increasing the perturbation size to 10 −2 causes the points to become more aligned with the correct diagonal line.However, when comparing them to the offline EKF, they deviate more from that diagonal and the values of the Jacobians are larger for the coupled EKF.The results for filtering solution FIL are very similar to those of the reference described here (not shown).

Diurnal cycle
Figure 10 shows the Jacobian and gain values for δRH 2 m /δW s and δRH 2 m /δW 2 averaged over the whole domain on 2 July 2010 for REFofl, FILofl, REFcpl and FILcpl for the different run times (00:00, 06:00, 12:00 and 18:00 UTC).The average values of REF and FIL lie close together for all components, indicating that on average the proposed solutions do not cause any major changes in the values of the Jacobians and gain coefficients.The sensitivity of T 2 m to soil moisture is mainly negative, while the link between RH 2 m and soil moisture is positive.The Jacobian values with respect to initial soil temperature perturbations correspond very well to the values shown by Mahfouf et al. (2009).A diurnal cycle can be seen where the sensitivity of RH 2 m to changes in soil moisture and soil temperature is largest during daytime (12:00 and 18:00 UTC), whereas the sensitivity of T 2 m to changes in the soil temperature is largest during nighttime (00:00 and 06:00 UTC).The link between the soil and the screen-level atmosphere is provided through turbulent surface fluxes, and these fluxes have a strong diurnal cycle (Mahfouf et al., 2009).The gain values of the deep soil layer (W 2 and T 2 ) are a factor 10 larger than those of the superficial soil layer (W g and T s ).This is caused by the longer memory of the deep soil layer compared to the superficial soil layer.Any change made at time t 0 in the superficial soil layer will dissipate quickly and, at analysis time t (i.e 6 h later), this perturbation in the superficial soil has almost completely disappeared.A perturbation to the deep soil layer at time t 0 has a more lasting effect on the screen-level variables and will still be present at the analysis time t, causing larger Jacobian and gain values.Therefore it is especially important to make sure that the increments in the deep soil layer are good, since their effect will be more lasting than the effect of increments in the superficial soil layer.
The values and diurnal cycle of the coupled case are similar to the offline case.The most important differences are the larger values for the four Jacobians related to soil moisture.These W g and W 2 related Jacobian and gain values are 2 to 4 times larger for the coupled case.There is a larger sensitivity of T 2 m and RH 2 m to changes in soil moisture for the coupled case.For soil temperature (not shown here) the average Jacobian and gain values are very similar to those of the offline case.The differences between FILcpl and REFcpl are somewhat larger, while in the offline case, the values of FILofl and REFofl were almost exactly the same.Thus, in the coupled case, the filter is more often needed to remove oscillations.

Spatial structure of gain and Jacobians
Figure 11 shows the spatial structure of the Jacobian values for δT 2 m /δW 2 on 6 July 2010 at 18:00 UTC for the reference calculation (REF) and the filtering solution (FIL).As expected, the Jacobian values are negative for δT 2 m /δW 2 , indicating that an increase in deep soil moisture (W 2 ) results in a decrease in screen-level temperature and vice versa.For the offline version (first row), there are some areas in which the Jacobian values are 0.These areas have a negative SWI value, indicating that the soil is too dry for the perturbation in W 2 to have any effect on T 2 m .At the right border in the middle of the REFofl figure, there are a few grid points with high positive Jacobian values, while their surroundings have the normal, negative values (cf. in the black circle).This is probably noise caused by non-linearities or oscillations in the Jacobian values during the runs.In FILofl, where the oscillations are filtered out, these spurious values disappear.The spatial structure of FILofl is almost identical to that of RE-Fofl.
The Jacobian values calculated with coupled runs (rows two and three) have a slightly different spatial structure than those of the offline runs (first row).The second row of Fig. 11 shows the Jacobian values calculated with positive perturba-tions of size 10 −2 .The areas where the offline version had 0 values are now characterised by very high negative values.This can be explained by the fact that the optimal perturbation size is much higher for the coupled version compared to the offline version (10 −2 vs. 10 −7 ).Due to this high, positive perturbation size, a relatively large amount of soil moisture is added in the perturbed run, which raises the slightly negative SWI value above 0 and, in doing so, re-enables the soil fluxes driven by evapotranspiration that were shut down when the SWI became negative.This results in a big difference between the reference run with a negative SWI value and the perturbed run with a positive SWI value, and hence in a large Jacobian value in these areas.The Jacobian values in these areas are the highest for REFcpl+, and somewhat lower for FILcpl+.This mechanism also becomes clear when we look at the Jacobian values of the third row.Here, the Jacobian values are calculated with coupled runs and negative perturbations of size 10 −2 , so the SWI value will only be decreased by the perturbations.In this case the areas with negative SWI values also have a Jacobian value of 0, like in the offline case.For the offline case there is no such difference between the Jacobians calculated with positive and negative perturbations (not shown here), because in the offline case the linearity assumption is much better approximated.In the presence of strong non-linearities, like around SWI values of 0, the validity of the linearity assumption breaks down and the EKF provides a suboptimal analysis.Balsamo et al. (2004) propose not doing any assimilation in these cases, using a masking function that checks for several thresholds like cloud cover and precipitation.Since it is not easy to list all possible sources of non-linearities, we propose filtering out the oscillations occurring in the case of non-linearities.
For the coupled runs in the north-eastern part of the domain, there are some spurious, positive Jacobian values (while it is expected that the link between T 2 m and W 2 will be negative).These are caused by non-linear feedback mech-anisms in the coupled runs that cannot occur in the offline runs.
The structure and values of the Jacobians calculated in coupled runs are similar to those of the Jacobians calculated in offline runs, which confirms the results of Balsamo et al. (2007).The offline runs are thus a valid and much cheaper alternative to the coupled runs.An added advantage of the offline runs is that they allow smaller perturbation sizes, and hence the linearity assumption has a much better validity.

Increments
Figure 12 shows the increments (i.e.analysis background) of W 2 and T 2 accumulated for 1 day, 6 July 2010, for the offline REF and FIL runs and the coupled REF run. Figure 13 shows the corresponding accumulated innovations (i.e.observation background) for T 2 m and RH 2 m .The region over Belgium is characterised by positive innovations for T 2 m up to 7 K and negative innovations for RH 2 m up to 40 %, indicating that the model is too cold and wet in this area.This can be seen in the increments.This area is characterised by negative increments for W 2 on this day up to 20 mm and positive increments for T 2 up to 3.3 K.The eastern side of the domain is characterised by positive W 2 increments corresponding to positive RH 2 m innovations and negative T 2 m innovations.The increments in W 2 are limited to the regions with a non-negative SWI value.In the areas with a negative SWI value, the Jacobian and hence also the increments are 0 (cf.Fig. 11).This causes the spatial structure of the W 2 increments to be somewhat irregular at those locations (Hamdi et al., 2014a).The differences between REF and FIL are very small.
The increments for W 2 are larger for REFcpl than for RE-Fofl, while the increments for T 2 are similar for the two runs.This corresponds to the findings about the Jacobian and gain values, which were also larger in the coupled case for the soil-moisture-related Jacobians.The spatial structure is very similar for the offline and coupled cases.

Evaluation for a single point
Figure 14a shows the increments for W 2 for July 2010 in Beitem (location indicated in Fig. 2) for REFofl (black) and FILofl (red).The increments of REF and FIL have the same sign and on most days are similar in size.The large increment for FIL on 14 July corresponds to a heavy precipitation event in the region.In the second half of the month the increments for FIL are often larger than those for REF.It is easily explained by the evolution of the SWI values for W 2 (not shown).On 9 July the negative increment of REF is much larger than that of FIL.In FIL the noise filtering in the Jacobian prevents the large negative increment.This results in a negative SWI value for REF, while the SWI value of FIL is just above 0.As a consequence FIL remains sensitive to increments, while in REF the increments for W 2 remain near 0 as long as the SWI value is negative.The heavy precipitation event of 14 July brings the SWI value of REF above 0 again, but on 16 and 19 July this results in a strong negative W 2 increment.After that the SWI value of REF remains below 0 most of the time, while the SWI value of FIL is positive and thus FIL has larger increments in this period.Figure 14b shows the evolution of the RH 2 m RMSE and BIAS forecast scores for a forecast range of 6 h during July 2010 in Beitem.Table 2 shows the T 2 m and RH 2 m forecast scores averaged over 13 stations in Belgium for REFofl, FILofl, REFcpl and FILcpl.This shows that, when averaging over 13 stations in Belgium, the filtered runs give a small improvement in scores for RH 2 m and similar scores for T 2 m .The scores of the offline and coupled runs are very similar to each other.In the coupled case the improvement in the filtering (FIL) RH 2 m scores compared to REF is larger than the improvement in the offline scores (Table 2 and Figs. 15 and 16).This is probably due to the fact that in the coupled case more oscillations are present due to feedback mechanisms between the soil and the atmosphere.Overall the scores of FILcpl are the lowest.For FILcpl, the coupling between the soil and the atmosphere allows a more correct Jacobian calculation, and the filtering succeeds in removing the more abundant oscillations.

Conclusions and perspectives
In this paper we have studied the Jacobians of an EKF using the SURFEX externalised version of land surface scheme ISBA.We tested this EKF with the assimilation of T 2 m and RH 2 m observations to correct errors in soil moisture and soil temperature.The experiments were run over the ALADIN Belgium 4 km domain for July 2010.The Jacobians of the EKF are calculated using finite differences approaches and require a perturbed run for each of the four soil prognostic variables.These perturbed runs can be done in coupled or offline SURFEX mode (i.e.coupled to an atmospheric run or with precalculated atmospheric forcing).We compared this offline and coupled approach for the calculation of the Jacobians.Results show that the offline approach allows smaller perturbations so that the linearity assumption for the calculation of the Jacobians with finite differences is better approximated.This is in accordance with Balsamo et al. (2007).The Jacobian and gain values are somewhat higher with the coupled approach for the soil-moisture-related Jacobians.The soil-temperature-related Jacobians have the same values in the coupled and offline approaches.The spatial structure of all Jacobians is similar between the two approaches.The offline approach is thus a good and computationally much cheaper alternative to the coupled approach for calculating the Jacobians.We identified 2 t oscillations during the late afternoon when a stable boundary layer starts to form and the Richardson number changes from negative to positive values.The oscillations occur in the surface variables related to surface fluxes and screen-level variables like T 2 m and RH 2 m that are interpolated between the surface and the lowest model level.These small oscillations are artificial and disappear again after a short time.They occur only in a limited number of grid points.They do not have a detrimental effect on the performance of the model runs, but can introduce noise locally into the Jacobian of the EKF.Nevertheless, as was shown in Fig. 14b, this noise turns out to have a substantial accumulated impact in a data assimilation cycle, and filtering it improves the forecast scores, specifically for relative humidity.We have proposed and tested a numerical filter to deal with these oscillations.The filter is applied to the simulated T 2 m and RH 2 m values before using them in the Jacobian calculation.Results show that the filter is successful in removing the oscillation.The advantage of the filter is that it is simple to implement and barely requires any additional computation.
The spatial structure and average value of the Jacobians and increments is very similar for the filtered run compared to the reference (i.e. with oscillations present).
The T 2 m and RH 2 m forecast scores for the offline and coupled approaches are very similar.In both approaches the filtering produces similar scores for T 2 m and a small improvement in the RH 2 m scores.This RH 2 m improvement is larger for the coupled approach and, in general, the coupled, filtered approach gives the best forecast scores.However, due to limited computational resources, we still prefer the offline filtered approach, which takes a lot less computing time.For example, on the Belgian computer the offline approach of the EKF takes 7 min on 6 cpus, while the coupled approach takes 52 min.
In conclusion we can say that the filter is effective in removing the oscillations and thus the noise in the Jacobian calculation.This is the case for the coupled as well as the offline approach, where the latter has the advantage of being computationally cheaper and better approximating the linearity assumption for the Jacobian calculation.
The results in this paper are specific to the choice of LSM, i.e. the two-layer IBSA scheme.For example, the dominance of the weights of the Jacobians of w 2 compared to w g is expected to change when a more realistic vertical discretisation of the soil layers is used, like in the ISBA-DIF scheme (Boone et al., 2000;Habets et al., 2003).The results also depend on the choice of the background and error covariance matrix values.In this paper we used the values proposed by Mahfouf et al. (2009).It could be interesting to compare the increments and forecast scores for different values of these covariance matrices.
The experiments in this paper were performed without atmospheric assimilation (i.e.no three-dimensional variational assimilation, 3DVAR), which could influence the results.In a next step the filtered offline approach of the EKF soil analysis for SURFEX will be combined with a 3DVAR assimilation for the upper air of the ALARO model.This will be an important step towards the operational use of the EKF which is planned for the future.

Figure 1 .
Figure 1.Schematic overview of the coupled and offline set-ups, used for the perturbed runs of the EKF.

Figure 3 .
Figure 3. Evolution of the Richardson number (RI, top) and T 2 m (bottom) during a 6 h coupled run for 2 July 2010 from 12:00 until 18:00 UTC at location B (left) and location C (right).In the top figures, the Richardson number for the lowest level is shown as it is calculated in SURFEX (black) and as it would be calculated in ALARO (red).

Figure 4 .
Figure 4. Evolution of T 2 m (black) and RH 2 m (red) during a 6 h SURFEX reference run for 2 July 2010 from 12:00 to 18:00 UTC at location A (output plotted every time step).The top left figure shows the results for an offline run with time step 300 s, and the top right figure an offline run with a time step of 60 s.The bottom left figure shows a coupled run with a time step of 180 s and the bottom right figure a coupled run with a time step of 60 s.

Fig
Fig.5cand for δRH 2 m /δW g (red) and δRH 2 m /δW 2 (black) in Fig.5d.Similar oscillations occur for the Jacobian values related to soil temperature for this case (not shown).These oscillations are found during the late afternoon of the runs from 12:00 to 18:00 UTC, and they correspond to the oscillations visible in RH 2 m , T 2 m and the Richardson number RI.The small oscillations of 2 % for RH 2 m and 0.2 K for T 2 m from Fig.4cause oscillations in the Jacobian values of up to 20 m 3 m −3 for δRH 2 m /δW 2 and up to 150 K m −3 m −3 for δT 2 m /δW 2 .Results of the coupled case (not shown) are similar to this offline case.Figure5also clearly shows the short time memory of the superficial soil layer (red dots).Any change in the superficial soil layer is quickly lost, causing the Jacobian value to return to zero, while changes in the deep soil layer (black dots) have a more lasting influence, resulting in non-zero Jacobian values at the end of the 6 h interval.Some Jacobian values converge once the initial disturbance has been taken up by the system, e.g.δT 2 m /δT s (red) and δT 2 m /δT 2 (black) in Fig.5a.For others the value keeps rising until the end of the time window, eg.δRH 2 m /δW 2 (black) in Fig.5b.Figure6ashows the spatial distribution of the oscillations for δRH 2 m /δW 2 on 2 July 2010 for the offline run from 12:00 to 18:00 UTC.The number of oscillations is shown at every grid point.This number is calculated by counting the number of consecutive time steps in which the gradient of

Figure 6 .
Figure 6.The number of the oscillations at every grid point for δRH 2 m /δW 2 (left) and the soil wetness index (SWI) of the deep soil layer (right) on 2 July 2010 for the offline reference run (REFofl) from 12:00 to 18:00 UTC.

Figure 7 .
Figure 7. Evolution of T 2 m (left) and RH 2 m (right) at location A for the offline (top) and coupled (bottom) reference runs (REF, black) and the filtered run (FIL, red).

Figure 9 .
Figure 9. Assessment of the linearity assumption for the calculation of the Jacobians by means of finite differences.Plot of the Jacobian values for δT 2 m /δW 2 on 2 July 2010, 12:00 UTC of the positive perturbations against the values of the negative perturbations.

Figure 10 .
Figure 10.Jacobian and gain values for δRH 2 m /δW s and δRH 2 m /δW 2 averaged over the whole domain on 2 July 2010 for REFofl, FILofl, REFcpl and FILcpl for 00:00, 06:00, 12:00 and 18:00 UTC.The solid lines represent the Jacobian values (values on the left vertical axis), and the dashed lines represent the gain values (values on the right vertical axis).

Figure 11 .
Figure 11.Map of the Jacobian and gain value for δT 2 m /δW 2 for 6 July 2010 at 18:00 UTC for REF (left) and FIL (right) of the offline (first row) and coupled (second and third rows) versions.The perturbation size for the offline runs was 10 −7 and, for the coupled runs, 10 0.01 (second row) and 10 −0.01 (third row).

Figure 12 .
Figure 12.Map of the increments (analysis background) for W 2 (in mm) and T 2 (in K) on 6 July 2010 for REFofl, FILofl and REFcpl.

Figure 13 .
Figure 13.Map of the innovations (observation background) for T 2 m (in K) and RH 2 m (in %) on 6 July 2010.

Figure 14 .
Figure 14.Evolution of the W 2 increments and RH 2 m forecast scores at a forecast range of 6 h (RMSE and BIAS) during July 2010 in Beitem (Belgium) for REFofl (black) and FILofl (red).

Figure 15 .
Figure 15.Forecast scores (BIAS and RMSE) for RH 2 m and T 2 m for all forecast ranges of the runs at 00:00 UTC averaged over July 2010 in Beitem (Belgium) for REFofl (black) and FIL1ofl (red).

Figure 16 .
Figure 16.Forecast scores (BIAS and RMSE) for RH 2 m and T 2 m for all forecast ranges of the runs at 00:00 UTC averaged over July 2010 in Beitem (Belgium) for REFcpl (black) and FILcpl (red).

Table 1 .
Percentage of grid points at which an oscillation occurs at the end of the run (thus influencing the Jacobian value) and in total (i.e.including those during the run that do not influence the Jacobian value).

Table 2 .
Overview of the RMSE and BIAS scores for T 2 m and RH 2 m averaged over the 13 stations and over July 2010.