Implementing the Water, HEat and Transport model in GEOframe WHETGEO-1D v.1.0: algorithms, informatics, design patterns, open science features, and 1D deployment

This paper presents WHETGEO and its 1D deployment, a new, physically based model simulating the water and energy budgets in a soil column. The purpose of this contribution is twofold. First, we discuss the mathematical and numerical issues involved in solving the Richardson-Richards equation, conventionally known as Richards’ equation, and the heat equation in heterogeneous soils. In particular, for the Richardson-Richards equation (R) we take advantage of the nested Newton-Casulli-Zanolli (NCZ) algorithm that ensures the convergence of the numerical solution in any condition. Second, 5 starting from numerical and modelling needs, we present the design of a software that is intended to be the first building block of a new, customisable, land-surface model that is integrated with process-based hydrology. WHETGEO is developed as an open-source code, adopting the Object-Oriented paradigm and a generic programming approach in order to improve its usability and expandability. WHETGEO is fully integrated in the GEOframe/OMS3 system allowing the use of the many ancillary tools it provides. Finally the paper presents the 1D deployment of WHETGEO, WHETGEO-1D, which has been tested against 10 the available analytical solutions presented in Appendix.


Introduction
The Earth's Critical Zone (CZ) is defined as the heterogeneous, near surface environment in which complex interactions involving rock, soil, water, air, and living organism regulate the natural habitat and determine the availability of life-sustaining 15 resources (National Research Council, 2001). Clear interest in studying the CZ is spurred on by ever-increasing pressure due to the growth in human population and climatic changes. Central to simulating the processes in the CZ is the study of soil moisture dynamics (Clark et al., 2015a). In the following we suggest that studying the CZ requires tools that are not yet readily available to researchers; then we propose one of our own. These tools should be flexible enough to allow the quick embedding of advancements in science. 20 as: where S s L −1 is the specific storage coefficient, defined as where c L −1 is the water retention capacity. Comparing the two terms in brackets, we can see that for ψ < 0, then c >> S s θ θs ; this means that under unsaturated conditions, the contribution of the specific storage is negligible. Whereas when the soil is saturated and ψ > 0, then c = 0 and therefore what counts is the specific storage. Because of this, it is possible to account for groundwater specific storage simply by modifying the SWRC as: Furthermore, switching from Richards to shallow water was made possible in the equation writing thanks to, for example, (Casulli, 2017;Gugole et al., 2018). Therefore, switching to a fully integrated, simultaneous treatment of the three domains is now possible.

The Necessary Coupling with the Energy Budget
As remarked in point 6 above, temperature affects water viscosity, which effectively doubles in passing from 5 to 20 degrees 100 Celsius (Eisenberg et al., 2005), with a positive feedback on the infiltration process. This has been clearly observed in natural systems (Ronan et al., 1998;Eisenberg et al., 2005;Engeler et al., 2011) where infiltration rates follow diurnal and seasonal temperature-cycles. In fact, according to Muskat and Meres (1936), the unsaturated hydraulic conductivity can be expressed as where κ r (θ) − is the relative permeability, κ L 2 is the intrinsic permeability, ρ L 3 M −1 is the liquid density, g is the acceleration of gravity, and ν L 2 T −1 is the kinematic viscosity of the liquid. Thus, for constant θ, variations in K(θ) due to temperature can be accounted as (Constantz and Murphy, 1991): rates and subsurface flows are significantly modified Walvoord et al. (2012).
-Requirement III -To account for thermodynamic effects, temperature should be at least present in the R 2 equation as a parameter, as in Eq. (8). However, for a more accurate approximation of the water dynamics, the option to solve the water and energy budgets simultaneously must be present.
Soil thermal properties are important physical parameters in modelling land surface processes (Dai et al., 2019) since they 115 control the partitioning of energy at the soil surface and its redistribution within the soil (Ochsner et al., 2001). For a multiphase material, like soil, their definition is always problematic since they depend on the physical properties of each phase and their variations (Dong et al., 2015;Dai et al., 2019;Nicolsky and Romanovsky, 2018). In literature different models have been proposed with such a scope, and further studies on it are recommended (Dai et al., 2019): nonetheless, when considering the phase change of water, the estimation of the unfrozen and frozen water fraction is still an unresolved issue for which different 120 models, usually referred to as SFCC, Soil Freezing Characteristics Curve, have been proposed (Kurylyk and Watanabe, 2013).
Thus, it is clear that the aspects related to the estimation of soil thermal properties fall fully within Requirement I too. Moreover, there are other reasons for which the equations of the water and energy budgets should be solved in a coupled manner (Rigon et al., 2006a): for instance, this makes it possible to include an appropriate treatment of evaporation and transpiration processes (Bonan, 2019;Bisht and Riley, 2019), as well as of the heat advection (Frampton et al., 2013;Walvoord and Kurylyk, 2016; 125 Wierenga et al., 1970;Ronan et al., 1998;Engeler et al., 2011;Zhang et al., 2019).
Finally, there is a great urge to model solutes transport according to the water movements. The range of applications for solute/tracer/pollutants spans from agriculture to industry to research itself. In fact, in recent years there has been a tumultuous growth of studies using tracers to asses the various pathways of water (Hrachowitz et al., 2016). However, so far these studies have mostly used lumped models with limited capability to investigate water age selection processes, processes that became 130 very important in the most recent literature e.g. (Penna et al., 2018). Using more complex modelling can benefit both the investigation of the processes and the construction of more refined water budget closures. Even though in this paper we do not detail the work on tracers, they must be kept in mind in software design so that the modules to be implemented eventually.

Heat Transport
Under the conditions defined above, the governing equation for the transport of energy in variably saturated porous media is 135 given by the following energy conservation equation: where h is the specific enthalpy of the medium L 2 T −2 , λ MLΘ −1 T −3 is the thermal conductivity of the soil, T Θ is the temperature, ρ w ML −3 is the water density, c w L 2 Θ −1 T −2 is the specific heat capacity of water, T ref Θ is a reference temperature used to define the enthalpy, and J w is the water flux LT −1 . The first term on the right-hand-side is the heat 140 conduction flux, described by the Fourier's law, and the second term is the sensible heat advection of liquid water. The specific enthalpy of a control volume of soil V c L 3 can be calculated as the sum of the enthalpy of the soil particles and liquid water: where ρ sp and ρ w are the densities of the soil particles and the water, c sp and c w are the specific heat capacities of the soil particles and the water. Equation (9) is the so-called conservative form.

Organisation and Scope
This paper describes the implementation and content of the WHETGEO-1D (Water, HEat and Transport in GEOframe) software, in observance of requirements I to III and aware of the hydrologic facts described in points 1 to 7 of Section 1.1. Further requirements derive from Numerics and Mathematics, as argued below. We do not treat transport here as its numerics is slightly different from those of the water and energy budgets;this topic will be covered in dedicated paper.

150
GEOframe (Formetta et al., 2014a;Bancheri, 2017;Bancheri et al., 2020) is a system of components, see Appendix A, built upon the Object Modelling System v3 (OMS3) framework  In the following section we discuss the mathematical issues and the numerics of our implementation. In the subsequent one, we describe the informatics and software engineering issues, and how we solved them. Finally, we discuss WHETGEO-1D implementation by testing it against some analytical solutions available for simplified parameterizations and boundary conditions, for details of these see Appendix C.

2 Mathematical Issues and Numerics
Translating these equations into numerical discretized codes implies to overcome some challenges, as we illustrate in the next sections by exploring, at first, the issues posed by each one of the equation.

General Issues of the R 2 Equation
Equation (2) is said to be written in "mixed form" because it is expressed in term of θ and ψ (and uses the SWRC to connect 170 the two variables). The "ψ-based form" is derived from Eq. (2) by applying the chain rule for derivatives: where with dimension L −1 , is the specific moisture capacity, also called hydraulic capacity. Even though Eq.
(2) and Eq. (11) are analytically equivalent under the assumption that the water content is a differentiable variable, this is not generally true in the discrete domain where the derivative chain rule is not always valid (Farthing and Ogden, 2017). Because of this the ψ-based form may suffer from large balance errors in the presence of big nonlinearities and strong gradients, as discussed in (Casulli and Zanolli, 2010;Farthing and Ogden, 2017;Zha et al., 2019) and literature therein. The specific moisture capacity c, which 180 appears in the storage term, itself depends on ψ and so is not constant over a discrete time interval during which ψ changes value. Let us discretise the time derivative in Eq. (11) by using the backward Euler scheme and obtain: wherec i is the discrete operator of the soil moisture capacity, c(ψ). In order to preserve the chain rule of derivatives at the discrete level, i.e. the equality c∂ψ/∂t = ∂θ(ψ)/∂t,c i has to satisfy the requirement (Roe, 1981) 185c As can be seen from the above equation, the right definition ofc i depends on the solution itself. To overcome this problem, in literature different techniques have been presented to improve the evaluation ofc i , but none ensures mass conservation (Farthing and Ogden, 2017).
There is a third form of Eq. (2), the so-called "θ-based form", that reads as where all is made explicit by inverting the SWRC and D(θ) := K(θ)∂ψ/∂θ is the soil-water diffusivity L 2 T −1 . The first term on the right-hand-side represents the water flow due to capillary forces, while the second term is the contribution due to gravity (Farthing and Ogden, 2017). The θ-based form is mass-conserving and it can be solved perfectly by mass conservative methods (Casulli and Zanolli, 2010). However, it applicability is limited to the unsaturated zone since water content varies between θ r 195 and θ s , whereas water suction is not bounded. This formulation is intrinsically not suited to fulfilling our requirement III.
Moreover, water content is discontinuous across layered soil since the SWRCs are soil specific, whereas water suction is continuous even in inhomogeneous soils (Farthing and Ogden, 2017;Bonan, 2019).
In WHETGEO, we directly use the conservative form of the R 2 equation, Eq.
(2), which seems the easiest way to deal with both the mass conservation issues and the extension of the equation to the saturated case. The implicit finite volume discretization of Eq.
(2) reads as: where ∆t is the time step size, is an optional source/sink term in volume, and θ i (ψ) is the ith water volume given by Equation (16) can be written in matrix form as where ψ = {ψ i } is the tuple of unknowns, θ(ψ) = θ i (ψ i ) is a tuple-function representing the discrete water volume, T is 210 the flux matrix, and b is the right-hand-side vector of Eq. (16), which is properly augmented by the known Dirichlet boundary condition when necessary. For a given initial condition ψ 0 i , at any time step n = 1, 2, . . ., Eq. (16) constitutes a nonlinear system for ψ n+1 i , with the nonlinearity affecting only the diagonal of the system and being represented by the water volume θ i (ψ n+1 i ).
This set of equations is a consistent and conservative discretization of Eq. (2). Therefore, regardless of the chosen spatial and temporal resolution, ψ n+1 i is a conservative approximation of the new water suction.

Surface Boundary Condition
The definition of the type of surface boundary condition (Neumann vs. Dirchlet) for the R 2 equation is a non trivial task since it can depend on the state of the system. In literature several approaches are used (Furman, 2008). These approaches are mainly based on a switch of the type of the boundary condition from a prescribed head to prescribed flux and viceversa. This switching often causes numerical difficulties that need to be addressed (Furman, 2008).

220
To overcome this problem we have included an additional computational node at the soil surface. As will be made clear in the following, for it we prescribe an "equation state" like that presented in (Casulli, 2009): where H L is the water depth, which also represents the pressure if the ponding is assumed to happen in hydrostatic conditions.
By doing so, it is possible to prescribe as the surface boundary condition the rainfall intensity (Neumann type) without resorting to any switching techniques to reproduce infiltration excess or saturation excess processes. In this case the system, Eq. (19), must to be modified to account for the additional computational node describing the state of the soil surface: where J n is the rainfall intensity and represents the Neumann boundary condition used to drive the system at the soil surface.
For any time step, Eq. (21) can be written in matrix form, similar to (Eq. (19)), as: where ψ = {ψ i } is the tuple of unknowns, V (ψ) = (θ i (ψ i )) for i = 1, 2, ..., N − 1 and V N (ψ) = H(ψ) is a tuple-function representing the discrete water volume, T is the flux matrix, and b is the right-hand-side vector of Eq. (16), which is properly augmented by the known Dirichlet boundary condition when necessary. For a given initial condition ψ 0 i , at any time step n = 1, 2, ... Eq. (22) constitutes a nonlinear system for ψ n+1 i , with the nonlinearity affecting only the diagonal of the system 235 and being represented by the water volume V i (ψ n+1 i ). Therefore, regardless of the chosen spatial and temporal resolution, ψ n+1 i is a conservative approximation of the new water suction.

Heat Transport Numerics Issues
Equation (9) is said to be written in conservative form and expresses an important property, which is the conservation of the scalar quantity, in this case the specific enthalpy. It is interesting to note that by making use of the mass conservation equation, 240 Eq. (2), Eq. (9) can be written in an analytically equivalent form, the so called non-conservative form (Sophocleous, 1979;Simunek et al., 2005): Equation (23) expresses another important property which is the maximum principle (Casulli and Zanolli, 2005), i.e. the analytical solution is always bounded, above and below, by the maximum and minimum of its initial and boundary values, as 245 shown in (Greenspan and Casulli, 1988, Chapter 7.3).
Although Eq. (9) and Eq. (23) are analytically equivalent, once they are discretised the corresponding numerical solution will, in general, either be conservative or satisfy a discrete max-min property (Casulli and Zanolli, 2005), but not both as would be required.
As in the case of the water flow, thermal budget issues can be subdivided into three aspects: the discretization of the equation, 250 the inclusion of the appropriate boundary conditions, and the implementation of some closure equation for the thermal capacity and conductivity.

The Discretization of the Heat Equation
The key feature (Casulli and Zanolli, 2005) to obtaining a numerical solution for the heat transport equation that is both conservative and possesses the max-min property is to solve the conservative form of the heat equation by using the velocity 255 field obtained in solving the continuity equation Eq. (2). By making use of the up-wind scheme for the advection part and the centered difference scheme for the diffusion part we have: where When heat equation does not consider water phase changes, it is decoupled from the R 2 equation and the finite volume discretisation leads to a linear algebraic system of equations. However, once freezing and thawing processes are considered, the heat 265 equation is fully coupled with the R 2 equation, as in (Dall'Amico et al., 2011) for instance, and the enthalpy function becomes nonlinear. At this point, since the enthalpy function is nonlinear the NCZ algorithm is required to linearise it, as shown in (Tubini et al., 2020). So far, we have not considered the problem of water flow in freezing soils, however being aware of this issue is important for the future developments and code design.

270
At the soil surface the heat equation is driven by the surface energy balance. The heat flux exchanged between the soil and the atmosphere, the surface heat flux, F MT −3 , is given as: where S in is the incoming short wave radiation, S out is the outgoing short wave radiation, L in is the incoming longwave radiation, L out is the outgoing longwave radiation, and H and LE are respectively the turbulent fluxes of sensible heat and Similarly to the definition of the surface boundary condition for the water flow, the surface boundary condition for the energy equation is also system dependent. In fact, in (26) the only fluxes that do not depend on the soil temperature and/or moisture are the incoming shortwave and longwave radiation fluxes, S in and L in . The outgoing shortwave radiation flux is usually 280 parameterized as: where the surface albedo α − can be assumed to vary with the soil moisture content (Saito et al., 2006) and radiation wavelength. The outgoing longwave surface radiation is: where T s Θ the temperature of the topmost layer of soil, is the soil emissivity, and σ is the Stefan-Boltzmann constant. The sensible heat flux H is taken as: where ρ a is the air density ML −3 , and c a is the thermal capacity of air per unit mass L 2 T −2 Θ −1 . Regarding the aerodynamic resistances r H TL −1 , it should be noted that it can be evaluated with different degrees of approximation and may require a 290 specific modelling solution. For instance, the aerodynamic resistance r H can be evaluated with models ranging from semiempirical models to the the Monin-Obukov similarity (Liu et al., 2007), or even by solving the turbulent dynamics with direct methods (Raupach and Thom, 1981;Mcdonough, 2004).
The latent heat flux is taken here as given by a formula of the type: where l ML 2 T −2 is the specific latent heat of vaporisation of water, E P is the potential evapotranspiration, r H and r v TL −1 are respectively the aerodynamic resistance and the soil surface resistance to water vapour flow. The latent heat flux it is the sum of two distinct processes evaporation and transpiration. Compared to the other fluxes, latent heat flux presents further complications because evaporation is both an energy and a water limited process, and transpiration depends also on the physiology of trees (as well as root distribution/growth and leaf cover). The latent heat flux is associated to the water flux that must Including the surface energy budget boundary condition requires the computation of additional quantities such as the incoming radiation fluxes, the shortwave radiation and the longwave radiation, and the potential evapotranspiration flux. These 305 quantities can be easily computed within the GEOframe system in which WHETGEO-1D is embedded. The proper estimation of the incoming radiation fluxes is far from being a simple task and it is often oversimplified in hydrological problems. Our approach is to use the tools already developed inside the system GEOframe which were tested independently and accurately (Formetta et al., 2013(Formetta et al., , 2014a. Similarly the evapotranspiration can be computed with other GEOframe components (Bottazzi, 2020;Bottazzi et al., 2021).

Algorithms
By using a numerical method, here the finite volume method, a partial differential equation is transformed into a system of nonlinear algebraic equations, as has already been shown. The system has to be solved with iterative methods and, at their core, these reduce the problem to using a linear systems solver. The solver can be of various types, according to the dimension of the problem. For instance in 1D, the final system of finite volume problems we present is tridiagonal and can be conveniently 315 solved with the Thomas algorithm (Quarteroni et al., 2010), which is a fast direct method. In 2D or 3D, the final matrix is not tridiagonal and a different solution method must be used, for instance the conjugate gradient (Shewchuk et al., 1994). These algorithms are well known and do not need to be explained here.
However, the reduction of a nonlinear system to a linear one is not trivial. We illustrate the issues by taking the R 2 equation as an example. As discussed in depth in Zha et al. (2019) and Farthing and Ogden (2017), and references therein, the linearisation 320 of the R 2 equation is challenging. Following the work of Celia et al. (1990), a lot of advancements have been made in this direction: Hydrus, CATHY, and ParFlow use variants of the Newton and Picard iteration methods (Zha et al., 2019;Paniconi and Putti, 1994), while GEOtop 2.0 implements a suitable globally convergent Newton method (Kelley, 2003). Although current algorithms are relatively stable, they may fail to converge or require a considerable computational cost (Zha et al., 2019). This has a significant impacts both on the reliability of the solution, which can have mass-balance errors, and on the 325 computational cost to produce it (Farthing and Ogden, 2017;Zha et al., 2019). Since Casulli and Zanolli (2010) and Brugnano and Casulli (2008), a new method was found, called nested Newton by the authors and NCZ in the following, that guarantees convergence in any situation, even with the use of large time steps and grid sizes.
As clearly pointed out by Casulli and Zanolli (2010), what makes the linearisation of the R 2 equation difficult is the nonmonothonic behaviour of the soil moisture capacity. A mathematical proof of convergence for NCZ exists 330 Casulli, 2008, 2009;Zanolli, 2010, 2012), which is not repeated here. However, we take the time to illustrate this new algorithm with care.
Let us start again from the nonlinear system (Casulli and Zanolli, 2012): defined for all ψ i ∈ R and can be expressed as: For all i = 1, 2, ..., N , the following assumptions are made on the functions a i (ψ) (we are here quite literally following (Casulli and Zanolli, 2010)): A1 : a i (ψ) is defined for all ψ ∈ R and is a nonnegative function with bounded variations; 340 A2 : There exists ψ * i ∈ R such that a i (ψ) is strictly positive and nondecreasing in (−∞, ψ * i ) and nonincreasing in (ψ * i , +∞).
T in eq (Eq. (20)) is the so-called matrix flux and it is a symmetric and (at least) positive semidefinite matrix satisfying one of the following properties: T1 : T is a Stieltjes matrix, i.e., a symmetric M-matrix, or T2 : T is irreducible, null(T) ≡ span(v), with v > 0 (componentwise), and T + D is a Stieltjes matrix for all diagonal 345 matrices D O, with O denoting the null matrix.
Finally b is the vector of the known terms. When T satisfies property T2, then for Eq. (20) to be physically and mathematically compatible, the following assumption about b is required: Having assumed that the a i (ψ) are non-negative functions of bounded variations, they are differentiable almost everywhere, admit only discontinuities of the first kind, and can be expressed as the difference of two non-negative, bounded, and nondecreasing functions, say p i (ψ) and q i (ψ), such that:  For ψ = ψ * , c(ψ) presents a maximum: for ψ < ψ * it is increasing, and for ψ > ψ * it is decreasing. This non monothonic behaviour causes problems when solving the nonlinear system. c(ψ) is thus replaced by p(ψ) (in green) and q(ψ), in blue, two monothonic functions whose difference is the original function c. Consequently, (b), θ(ψ) is replaced by θ1(ψ) and θ2(ψ), Eq. (35).
for all ψ ∈ R. When a(ψ) satisfy assumptions A1 and A2, the corresponding decomposition (known as the Jordan decomposition (Chistyakov, 1997) and presented in Figure 2), is given by: where the i-th component of V 1 (ψ) and V 2 (ψ) are defined as By making use of Eq. (35) the algebraic system in Eq. (20) can be written as when the water depth function is used to properly define the surface boundary condition. In the first case, i.e. when Neumann or Dirichlet boundary conditions are used, the vectorial function is defined as V (ψ) = (θ i (ψ i )) for i = 1, 2, ..., N .
Instead, when we consider the water depth function to describe the computational node at the soil surface, the vectorial function is defined as V (ψ) = (θ i (ψ i )) for i = 1, 2, ..., N − 1 and V N (ψ) = H(ψ). Therefore, the nonlinear system in Eq. (37) 370 is valid to describe both the subsurface and surface waters when the symbols are appropriately understood.
This aspect, the use of two different equation states, and the fact that the NCZ algorithm can be successfully reused to solve other problems (Casulli and Zanolli, 2012;Tubini et al., 2020), requires a careful design of its implementation, as discussed in the following sections.

375
The concepts and requirements previously illustrated must be cast into a software whose usability, expandability and inspectability are demanded by good software design, which adds further requirements. As discussed in Serafin (2019), codes were usually developed as monolithic code with severe drawbacks for maintainability and developments to improve the description of environmental processes, as has been proven by our own experience with the model GEOtop (Rigon et al., 2006a;Endrizzi et al., 2014), and by the experiences of other modelling frameworks (Clark et al., 2015b(Clark et al., , 2021Bisht and Riley, 2019).

380
Based on these experiences, the WHETGEO-1D code has been developed by adopting an Object-Oriented-Programming (OOP) approach and it has been integrated into OMS3. Information on OMS3 is provided in Appendix B. Furthermore, WHETGEO-1D is part of the system of interoperable components called GEOframe, a short description of which is given in Appendix A. The utility of GEOframe has been partially discussed previously, when treating the surface energy budget.

385
One of the major difficulties encountered by a research group concerns the development and reuse of scientific software (Berti, 2000) and the writing of structurally clean code, i.e. a code that is easily readable and understandable, with objects that have a specified, and possibly unique, responsibility (Martin, 2009).
An Object-Oriented-Programming approach, with the adoption of standard design patterns (DPs) (Gamma et al., 1995;Freeman et al., 2008) and the creation of new ones, has been adopted for the internal classes design and hierarchy.

390
The design principles followed by the WHETGEO-1D software can be summarised as follows: A. The software should be open source to allow inspection and improvements by third parties; B. For the same reason it should be organised into parts, each with a clear functional meaning and possibly a single responsibility.
C. The software can be extended with minimal effort and without modifications (according to the "open to extensions, 395 closed to modifications" principle Freeman et al. (2008)). In particular, the parts to be modified are those that, according to the discussion of the previous sections, could be changed to try new closures, i.e. the SWRC and the hydraulic conductivities in the case of the R 2 equation, and the thermal capacity and thermal conductivity in the case of the energy budget. Adding a new SWRC type or a new conductivity functions should be easy.
D. The largest set of boundary conditions should be smoothly manageable 400 E. The implementation of equations should be abstract, according to the principle of "programming to interfaces and not to concrete classes", which is the core of contemporary OOP (Gamma et al., 1995). The different equations describing the processes should be implemented within the set of classes by implementing a common interface.
F. The implementation of algorithms should not depend on the data formats of inputs and outputs.
Another requirement has to do with the user experience. In fact, solvers of PDEs (Menard et al., 2020) tend to be complex to 405 understand and run when features are added. In particular, the number of inputs grows exponentially when features are added, and the user has to overcome a steep learning curve before being able to use these software packages to appreciate all the cases implemented and their physics.
G. To simplify this situation, WHETGEO-1D has to be implemented is such a way that any of the alternative implementations must come only with their own parameters and variables, and appear to the user as simple as possible, though not 410 too simple.
There is finally a last requirement to consider: H. For computational and research purposes, there will be one,two and three dimensional (1D, 2D, 3D) implementations of the aforementioned equations. Therefore, as much as possible of the code should be shared across these. In particular, the NCZ and Newton algorithms should be shareable across the various applications.

415
This requirement implies that the geometry of the domain, as well as the topology, be specified in an abstract manner to cope with the specifics of each dimensionality.
The rest of this section is organised to respond to points from A to H. A is actually responded to in the next section describing where the software can be downloaded and with which open license. Points B and C are accomplished by studying an appropriate design of classes and the use of design patterns (Gamma et al., 1995;Freeman et al., 2008). For D, generic 420 programming is used and specific classes are implemented. E is resolved by deploying a set of classes that implement a common interface with an extension that allows it to obtain the required functionalities.
To respond to the issues raised in F and G, WHETGEO-1D is implemented as various components of OMS3, as shown below, each one with its own inputs. Therefore, the number of flags to check and the number of unused inputs are reduced to the minimum required by the solvers and parameterizations that the user chooses. If users want to solve the R 2 equation 425 alone, for instance, they can pick out the appropriate component and do not need to know about the inputs and details of the energy budget. The separation into components has two other advantages. First, it eases the testing of a single process against available analytical solutions and against other models results (Bisht and Riley, 2019). Second, it improves the model structure, facilitating the representation of new processes (Clark et al., 2021).
Point H is solved by deploying new components for the 1D, 2D, and 3D cases. In the following section we mainly deal with 430 points B to E. Before discussing details of some classes, a few general choices have to be reported. Data of any type are stored internally in vectors of doubles, in turn encapsulated in appropriate Java objects. OOP good practice would suggest that an object should be immutable (Bloch, 2001), but we decided that the main classes have to be mutable and allocated once forever as singletons (Gamma et al., 1995;Freeman et al., 2008). This potentially exposes the software to side effects but frees it to allocate new objects at any time step and decreases the computational burden and memory occupancy generated by deallocating 435 unused obsolete objects at runtime. This approach may be considered a specific design pattern for partial differential equation solvers.

The Software Organisation
The more visible effect of our choices is that we have built various OMS3 components: Internally, the classes are assembled by using some interfaces and abstract classes, since WHETGEO-1D is coded using the 445 Java language.
In order to improve the re-usability of the Java code we adopted a generic programming approach (Berti, 2000) that consists in decoupling of algorithm implementations from the concrete data representation while preserving efficiency. The generic approach has been balanced with domain-specific ones that can improve the computational efficiency of the software, as is the case of the previously mentioned Thomas algorithm used in 1-D implementations.

450
Another requirement regards the division of software classes into three main groups, as the lack of a proper separation between the parameterisation of physical processes and their numerical solutions has been recognised as one of the weak points of existing land surface models (Clark et al., 2015b(Clark et al., , 2021. One group describes the mathematical-physical problem, the second one implements the numerical solution (Berti, 2000), and the third one contains the growing group of concrete classes.
The first group contains the SWRC, and hydraulic and thermal conductivity, and it forms a stand-alone library since its content it can be implicitly contained in the vector representation: each element of the vector corresponds to a control volume of the grid and it is only connected with the elements preceding and following it. It is worth noting that this approach is peculiar to 1D problems and cannot be adopted for the 2D and 3D domains, where, especially when unstructured grids are used, the grid topology requires a smart implementation of the incidence and adjacency matrices.
For each control volume it is necessary to store its geometrical quantities, their position and dimension, its variables, its 470 parameter set, and the form of the equation to be solved there, referred to in the following as "equation state". The appropriate arrangement of information together with the internal design of the classes allows us to create a generic finite volume solver.

Closure equations, i.e. the ClosureEquation abstract class
The ClosureEquation class is shown in the UML diagram of Fig. (3). As explained in the Introduction, one of the core concepts of modelling water and heat transport in soils is the SWRC. Soil is a multi-phase material, thus knowledge of its 475 composition is of crucial importance in defining its unsaturated hydraulic conductivity and its thermal properties, specific internal energy and thermal conductivity.
An abstract class ClosureEquation is defined to contain only abstract methods that would be overwritten by the concrete classes implementing it. The ClosureEquation class essentially defines a new data-type. A closer inspection of The Simple Factory pattern SoilWaterRetentionCurvesFactory accomplishes the task of implementing the concrete classes. By preferring polymorphism to inheritance and using the Factory pattern (Gamma et al., 1995;Freeman et al., 485 2008), the developers can easily include and extend existing code or new formulations or parametrisations of SWRC. Besides, the Simple Factory fulfils the dependency inversion principle (Eckel, 2003)  Notably, the solver of a PDE problem can refer to the abstract class and to its abstract object without implying the specific concrete equation to be solved or its concrete parameterisations. Moreover, the compositionality of the EquationState 495 allows the creation of new solvers from existing closures without the need to add new subclasses. As shown in the UML of Figure 4, the EquationState class defines methods used to linearise the PDE when it is nonlinear. For instance, it computes the first and second derivatives, and the functions necessary to define the Jordan decomposition as required by the NCZ algorithm. Specifically, these methods are p, q, pIntegral, qIntegral, and computeXStar.
In our code design the ClosureEquation class is limited to computing a physical parameterisation, whereas the EquationState 500 class is used to discretise the equation state of the PDE, and whenever required to properly linearise it. Any new concrete EquationState subclass can either have the same physics of another with a different solver or a different physics with the same solver.

Generic Programming at work
As explained in Sect.  The EquationState contains a reference to a ClosureEquation object.
Let us consider, for example, the case of the R 2 equation with water depth as the surface boundary condition. Figure  Adopting the OOP and generic programming approach, Fig. (6), it is possible to implement the NCZ algorithm in such a way that enhances its reusability. The key feature is the decoupling of the computational grid from the algorithm(data) (Berti,520 2000). This is achieved through two elements. The first consists in creating a container of the objects that deal with the equation states of the problem, equationState, eS in Fig. (6). Second, we use a label equationStateID, eSID in Fig. (6), to specify the behaviour of each control volume. So, the behaviour of each control volume is determined by this label and not by the position of the element in the grid. Specifically, when we traverse the grid we use the equationStateID to determine which object inside the container equationState to use.

525
The NCZ algorithm has been implemented in the NestedNewtonThomas class. The NestedNewtonThomas contains a reference to the Thomas object, whose task is to solve a linear system, and to a list of EquationState objects, Fig. (7).  Considering the ubiquity of nonlinear problems in Hydrology and the robustness of the NCZ algorithm, the NCZ algorithm has been encapsulated in a stand-alone library.
The example has been illustrated in 1D but it becomes even more effective when working on 2D or 3D, especially with an 530 unstructured grid.  NestedNewtonThomas contains a reference each to the Thomas object, whose task it is to solve a linear system, and to a list of EquationState objects.

Information for Users and Developers
While most of the what written so far is of general application, the deployment shown here is 1D. Information on WHETGEO-1D for users and developers is provided in the supplemental material, where there is a Jupyter Notebook that contains the guidelines for executing the codes for any of the components. Its name starts with "00_" and we call it "Notebook Zero" of the 535 components. The latest executable code can be downloaded from https://github.com/geoframecomponents/WHETGEO-1D and can be compiled by following the instructions therein. The version of the OMS3 compiled project can be found here https://github.com/GEOframeOMSProjects/OMS_Project_WHETGEO1D. The code can be executed in the OMS3 console, which can be downloaded and installed according to the instructions given at:

540
https://geoframe.blogspot.com/2020/01/the-winter-school-on-geoframe-system-is.html Some brief information about GEOframe can be found in Appendix B, and more comprehensive information is at: https://abouthydrology.blogspot.com/2015/03/jgrass-newage-essentials.html https://geoframe.blogspot.com/2020/01/gsw2020-photos-and-material.html To run the tests, please follow the instructions on the Github repository of the GEOframe components. If a user wants to 545 compile the code themselves, they can use the appropriate Gradle script that guarantees independence from any IDE. For further information about input and output formats for WHETGEO-1D, please see the Notebook 00_WHETGEO1D_Richards.ipynb in the folder Documentation of the Zenodo distribution.

Workflow for Users
Examples of uses of WHETGEO-1D can be found in the form of Python Notebooks in the directory Notebooks/Jupyter.

550
Documentation can be found in form of Python Notebooks in the directory Documentation. Simulations with WHETGEO-1D are run as OMS3 simulations. Therefore, the first operation to accomplish is to prepare the appropriate .sim files. For new users, many simulation files are available in the directory simulation of the Zenodo distribution. As shown in Fig. (8), in the modelling solutions that involve WHETGEO-1D, there is always a "Main" component that is in charge of running the core code for solving the PDE. The inputs and the outputs are treated by other OMS3 components. They are tied together by a 555 Domain Specific Language (DSL) based on Groovy. This allows for great flexibility in using various input and output formats.

Inputs and Outputs
Input data can be broadly classified into time series, computational grid data, and simulation parameters. Time series are used to specify the boundary conditions of the problem. Time series are contained in .csv files with a specific format that is OMS3 compliant (David). With computational grid data we refer to the domain discretisation, initial condition, and soil parameters.

560
All these data are stored in a netCDF file. Time series and computational grid data are elaborated with dedicated Python modules distributed under the gf-group package (Tubini and Rigon, 2021). The simulation parameters, such as the start date and end date of the simulation, time step size, and file paths, are specified by the user in the OMS3 .sim file.
For the design of the output workflow we took advantage of the OMS3 system that allows the user to connect stand-alone components. Figure (8  WHETGEO-1D can be integrated with the built-in calibration component LUCA (Hay et al., 2006;Formetta et al., 2014a) and the Verification component, as shown in Fig. (9). The former is used to calibrate optimal parameters, the latter to com-

595
Here, as an example, we present how to add the Brooks-Corey (Brooks and Corey, 1964) model as an extension of the code base. The constitutive relationships are given by: where θ s and θ r are, respectively, the saturated and residual values of the volumetric water content, ψ d is the air-entry water suction value, n is the pore size distribution index, and K s is the saturated hydraulic conductivity at saturation. -equationState calculates the water volume using the Brook-Corey model for a given water suction value and set of parameters, Eq. (18).
-dEquationState calculates the first derivative of the equationState function, in this the moisture capacity 615 function. This method is used within the linearisation algorithm.
-ddEquationState calculates the second derivative of the equationState function. This method is relevant for those models where the ψ * cannot be computed analytically but requires the application of a root finding method such as the bisection method. An example is the soil internal energy function when considering the phase change of water Tubini et al. (2020).

620
p calculates the p function of the Jordan decomposition.
-pIntegral calculates the V 1 function of the Jordan decomposition.
-computeXStar calculates the ψ * value to properly define the functions p and V 1 .
-initialGuess calculates the initial guess for the linearisation algorithm.

625
In this paper we discussed the issues raised by implementing a new expandable system to model the Earth's CZ. The issues faced in doing so were grouped in various concerns: physical-hydrological; mathematical; software engineering.

630
The implementation has been shown to solve the issues presented in 7 observations, 3 requirements, and A to H design specifications. Each of these was analysed and informed the choice of algorithms and code implementation. The first deployment of the concepts was the 1D stand-alone water budget and the coupled water and energy budgets WHETGEO versions.
The water budget was tested against analytical solutions presented in (Srivastava and Yeh, 1991) and (Vanderborght et al., 2005). Some behavioural simulations were also performed to show some features of the code, such as the ability to deal with an OMS3 component . Components can be joined together to obtain multiple modelling solutions that can accomplish from simple to very complicated tasks. GEOframe has proved great flexibility and robustness in several applications (Bancheri et al., 2020;Abera et al., 2017a, b). There are more than 50 components available that can be grouped into the following categories: -Geomorphic and DEM analyses;

660
-Spatial extrapolation/interpolation of meteorological variables; -Estimation of the radiation budget; -Estimation of evapotranspiration; -Estimation of runoff production with integral distributed models; -Channel routing; Using the components for geomorphic and DEM analyses Rigon et al. (2006b), the basin can be discretised into Hydrological Response Units (HRUs), i.e., hydrologically similar parts, such as a catchment or a hillslope or one of its parts. The meteorological forcing data can be spatially interpolated using a geostatistical approach, such as the Kriging technique (Bancheri et al.,670 2018). Both shortwave and longwave radiation components are available for the estimation of the radiation budget (Formetta et al., 2013(Formetta et al., , 2016. Evapotranspiration can be estimated using three different formulations: the FAO Evapotranspiration model (Allen et al., 1998), the Priestley-Taylor model (Priestley and Taylor, 1972), and the Prospero model, (Bottazzi, 2020;Bottazzi et al., 2021). Snow melting and the snow water equivalent can also be simulated with three models, as described in (Formetta et al., 2014b). Runoff production is performed by using the Embedded Reservoir Model (ERM) or a combination of 675 its reservoirs (Bancheri et al., 2020). The discharge generated at each hillslope is routed to the outlet using the Muskingum-Cunge method (Bancheri et al., 2020). Travel time analysis of a generic pollutant within the catchment can be done using the approach proposed in (Rigon et al., 2016b, a). Model parameters can be calibrated using two algorithms, and several objective functions: Let Us CAlibrate (LUCA) (Hay et al., 2006) and Particle Swarm Optimization (PSO) (Kennedy and Eberhart, 1995). A graph-based structure, called NET3 (Serafin, 2019), is employed for the management of process simulations. NET3 680 is designed using a river network/graph structure analogy, where each HRU is a node of the graph, and the channel links are the connections between the nodes. In any NET3 node, a different modelling solution can be implemented and nodes (HRUs or channels) can be connected or disconnected at run-time through scripting. GEOframe is open source and helps the reproducibility and replicability of research (Bancheri, 2017). Developers and users can easily collaborate, share documentation, and archive examples and data within the GEOframe community.

685
Appendix B: OMS3 The Object Modeling System v.3 (OMS3) is a component-based environmental modelling framework that provides a consistent and efficient way to: 1) create science simulation components; 2) develop, parameterise, and evaluate environmental models, and modify and adjust them as science advances; and 3) re-purpose environmental models for emerging customer requirements .

690
In OMS3 the term component refers to self-contained, separate software units that implement independent functions in a context-independent manner . This means that developers and researches can build their model as composition of stand-alone components, moving away from the monolithic approach. The entire GEOframe system, and therefore WHETGEO-1D, is built upon the OMS3 framework.
Compared to other Environmental Modelling Frameworks (EMF), OMS3 is characterised by being a non-invasive and 695 lightweight framework (Lloyd et al., 2011). That is to say, the model code is not tightly coupled with the underlying framework -OMS3 -, i.e. the environmental modeller does not need a deep knowledge of the API, and the modelling components can still function and continue to evolve outside the framework . In fact, OMS3 relies on specific annotations to provide meta data for Java code. These annotations describe elements such as classes, fields, and methods, and are used by the framework to interpret the component as a building block of the modelling solution (MS), hence controlling its connectivity 700 and data flow . It is worth noting that, being meta data, these annotations do not directly affect the execution of the source code outside the OMS3 -non-invasive and lightweight framework.
Besides the technical aspects, the adoption of a software framework has a positive effect on "non-functional" quality attributes, such as maintainability, portability, reusability and understandability . MSs, thanks to the plug-in system of model components Peckham et al., 2013;Serafin, 2019 David et al., 2013;Formetta et al., 2014a;Bancheri, 2017;Serafin, 2019). Moreover, it is interesting to note that the component-based approach encourages collective model development (Serafin, 2019) and also eases the attribution of authorship since any component is a stand-alone chunk of code and can be authored separately.
Besides, the adoption of an environmental modelling framework promotes the concept of reproducible research, easing third parties inspection and providing consistent and verifiable model results (Formetta et al., 2013;Bancheri, 2017;Serafin, 2019).

715
Another advantage of using OMS3 is represented by the opportunity to keep the code development transparent to the user.

Appendix C: R 2 test cases
In this section we test the solver of the R 2 equation against the analytical solutions presented by Srivastava and Yeh (1991) and by Vanderborght et al. (2005). Then we present and discuss two "behavioural" test cases to try out WHETGEO-1D in simulating both the infiltration excess and the saturation excess process.

720
C1 Analytical solution Srivastava and Yeh (1991) Srivastava and Yeh (1991) derived an analytical solution describing the one-dimensional transient infiltration in an homogeneous and layered soil. The hydraulic properties of the soil are described by the following constitutive relations: where Ks is the saturated hydraulic conductivity, θ r is the residual water content, θ s is the saturated water content, and α is the soil pore-size distribution parameter, representing the desaturation rate of the SWRC. The lower boundary condition Details on the analytical solution can be found in (Srivastava and Yeh, 1991).

C1.1 Homogeneous soil
We consider a one-dimension homogeneous soil layer of 1  (Romano et al., 1998). Comparison between the numerical and the analytical solution for water suction is shown in Fig. (C3). Vanderborght et al. (2005) The next test case was defined by Vanderborght et al. (2005)     (c) Figure C4. Comparison of relative water suction error δ for the test problem TP2 using different interface hydraulic conductivity algorithms.

C2 Analytical solution
Panel (a) is computed with max., panel (b) with harmonic mean, and panel (c) with geometric mean. As reported in (Romano et al., 1998), the harmonic mean offers the best agreement with the analytical solution. This is particularly evident at the interface between the two layers.

C3 Surface boundary condition
The definition of the surface boundary condition is a nontrivial task since it is a system-dependent boundary condition. The infiltration rate through the soil surface depends on precipitation, rainfall intensity J, and on the moisture condition of the soil. Because of this, the surface boundary condition may change from the Dirichlet type -prescribed water suction -to the Neumann type -prescribed flux -and vice-versa. The works by Horton (1933) and Dunne and Black (1970) establish the 760 conceptual framework to explain the runoff generation.
The infiltration excess or Horton runoff occurs when the rainfall intensity is larger than infiltration capacity of the soil: Infiltration excess is most commonly observed with short-duration, intense rainfall.    The saturation excess or Dunnian runoff occurs when the soil is saturated and additional water exfiltrates at the soil surface.

765
Saturation excess generally occurs with long-duration, moderate rainfall, or with a series of successive precipitation events.
In this case the soil depth or the presence of shallow fragipan are determining factors for saturation excess. Another possible cause is the rise of the water table up to the soil surface.

C3.1 Infiltration excess
In this numerical experiment we consider a homogeneous soil of 3 m depth. Soil hydraulic properties are described with the 770 Van Genuchten's model, Table (   for the dry initial condition, however, in the dry case the capillary gradient is larger and because of this the soil infiltration 780 capacity is higher, as in Fig. (C7) panel (a). With regards to the water ponding, the maximum value is almost the same in both cases, 1 mm higher in the wet case, but the time evolution is different: in the wet case the water only infiltrates completely 13 h later than the dry case. This delay may seem counter-intuitive since wetter conditions are associated with higher values of hydraulic conductivity, Fig. (C8), but in the wet soil the capillary gradients are smaller than in the dry soil, Fig. (C9).

C3.2 Saturation excess 785
In this section we present two numerical experiments to simulate the saturation excess process. Saturation excess is more critical, in terms of simulation stability, than the infiltration excess (Forums). We consider two cases: one in which the water table reaches the soil surface; and another in which the total rainfall amount is larger than the maximum water holding capacity but the rainfall intensity is less than the maximum infiltration rate.
Firstly, we consider a layered soil of 3 m depth. The thicknesses of the loamy layer and clay layer are, respectively, 0.5 m,  Figure C7. Panel (a) shows a comparison between the infiltration rate for two cases: wet and dry initial condition. In the dry case, soil infiltration is greater than the wet case even though the hydraulic conductivity is smaller. This is due to the higher capillary gradients that develop in the soil. Panel (b) shows the time evolution of the water ponding at the soil surface.  Figure C9. Panel (a) shows the capillary gradient for the case of wet soil, panel (b) shows the the capillary gradient for the case of dry soil.
As can be seen, in the dry soil the capillary gradient is two orders of magnitude larger than in the wet soil. Because of this higher gradient water infiltrates faster in the dry soil than in the wet soil. the soil. Initially water infiltrates in the soil but then the clay layer, which is characterised by a lower conductivity than the loam-sand layer, limits the deep infiltration causing the saturation of the loam-sand layer from below.
Repeating the above numerical experiment with a thicker loam-sandy layer, Fig. (C12), there is no water ponding at the soil surface. In this case all the rainfall can infiltrate into the loam-sandy layer thanks to the increased water storage capacity.  Here we present a behavioural test case on the pure heat conduction considering the surface energy balance. The purpose of this simulation is twofold. First, we show an application of WHETGEO-1D that exploits existing GEOframe components to model the external components of the surface energy budget, specifically: the incoming shortwave radiation (Formetta et al., 2013), the incoming longwave radiation (Formetta et al., 2016), and the latent heat flux. Second, we show how to easily include the phase change of water by adding a new closure equation that describes the SFC model presented by Dall'Amico et al. (2011) 810 The soil column is 30 m deep and the initial condition is a constant temperature profile T = 12 • C. Figure (D1) panel (a) shows the components of the surface energy fluxes and the thermal regime of the uppermost 2 m of the soil column. As can be seen in panel (b), the soil temperature falls below 0 • C; therefore it is not reasonable to neglect freezing and thawing processes. FreeThaw-1D and was obtained with a minimum effort thanks to the code design we adopted. The soil column is 30 m depth and the initial condition is a constant temperature profile T = 12 • C. The parameters of the SFC are presented in Tab Figure D3. Panel (a) shows a comparison of the soil temperature at 0.05 m considering the phase change of water and without. As can be seen, in the latter case the soil temperature reaches lower value and fluctuates more: when considering the phase change of water we include in the problem the latent heat of water that increase the thermal inertia of the soil. Panel (b) shows a comparison of the position of the zero-isotherm in the two simulation. When the phase of water is not included the zero-isotherm does deeper into the ground. Tubini, N. and Rigon, R.: https://pypi.org/project/gf-group/, last accessed: 1 April 2021, 2021.
Tubini, N., Gruber, S., and Rigon, R.: A method for solving heat transfer with phase change in ice or soil that allows for large time steps