Soil hydraulic properties are often derived indirectly,
i.e. computed from easily available soil properties with pedotransfer
functions (PTFs), when those are needed for catchment, regional or
continental scale applications. When predicted soil hydraulic parameters are
used for the modelling of the state and flux of water in soils, uncertainty
of the computed values can provide more detailed information when drawing
conclusions. The aim of this study was to update the previously published
European PTFs (Tóth et al., 2015, euptf v1.4.0) by providing prediction
uncertainty calculation built into the transfer functions. The new set of
algorithms was derived for point predictions of soil water content at
saturation (0 cm matric potential head), field capacity (both

Quantitative information on state and flux of water in the critical zone is important for a wide range of environmental process models and decision support systems related to land surface processes (Lin, 2010; Zhao et al., 2018). Performance of hydrologic, climate, crop and other models related to soil hydrological processes depends on the quality and resolution of soil hydraulic input parameters (Vereecken et al., 2015). Simulations of flow through variably saturated soil media either rely on simple modelling approaches which only require few directly measurable input variables such as porosity, field capacity, and wilting point, or on the Richards equation. While the former are simple and straightforward to obtain, the Richards equation requires knowledge about the soil hydraulic properties over the full moisture range. In practice, one of the most common approaches to describe the water retention and hydraulic conductivity curves required to solve the Richards equation is arguably (Weber et al., 2019) the Mualem–van Genuchten model (MVG) (van Genuchten, 1980; Mualem, 1976). Since soil hydraulic measurements in the laboratory or in the field are often time consuming, expensive and difficult, indirect methods for estimating soil hydraulic properties using widely available surrogate data have been developed (Schaap, 2006). To date, a large number of pedotransfer functions have become popular to predict soil hydraulic properties and MVG model parameters (Van Looy et al., 2017).

Information on the uncertainty of the predicted soil hydraulic properties is
important for modelling the state and flux of water in soil. The source of
prediction uncertainty can be threefold: it can stem from the (i) predictor
(e.g. measurement uncertainty, non-representativeness of a sample), (ii) predicted variables (e.g. uncertainty in the estimated soil hydraulic model
parameters), and the (iii) algorithm which describes the relation between the
two. Information on the uncertainty of the predictor variables is commonly
not available in PTFs derived before the 2000s, but has become a more
intensively studied topic in the last decade. For example,
Weynants et al. (2009) quantified uncertainty of derived PTFs
related to experimental, model and fitting errors with the one-step
inversion method. Deng et al. (2009) differentiated and
quantified intrinsic and input uncertainty of PTFs.
Tranter et al. (2010) developed an
uncertainty estimation method using fuzzy

Machine learning methods can be more robust to construct PTFs in comparison to previous approaches such as linear regression or simple decision trees if the relationship between the predictors and response is highly non-linear (Araya and Ghezzehei, 2019). The random forest algorithm (Breiman, 2001) is able to outperform other machine learning methods (Olson et al., 2018), which was also shown for predicting soil properties (Hengl et al., 2018; Nussbaum et al., 2018). Improvements in computing power, statistical methods and statistical software provide the possibility to apply more easily even complex models on large datasets. Therefore, complexity of a prediction algorithm is no longer a barrier in selecting a suitable algorithm to develop and apply PTFs. Most of the recent machine learning algorithms have the built in possibility to compute the uncertainty in the predicted variable, e.g. by quantile regression forest (Meinshausen, 2006) or generalized boosted regression (Ridgeway, 2017). If PTFs are derived with these algorithms, the uncertainty of the predicted soil property can be directly estimated when applying the PTF (Szabó et al., 2019a), although this could also be achieved by applying the above mentioned uncertainty assessment methods without using machine learning methods (e.g. Kotlar et al., 2019; Tranter et al., 2010).

Despite the above mentioned developments, the euptfv1 (Tóth et al., 2015) and derived soil hydraulic property maps for Europe on a 1 km and 250 m grid (Tóth et al., 2017) do not include uncertainties in the prediction. Hence, the aim of our study was to update the euptfv1 by deriving a new set of soil hydraulic PTFs (euptfv2) providing uncertainty calculation built into the PTF model. For this, we rely heavily on the datasets used in the construction of the euptfv1. Methodologically, we constructed new soil hydraulic PTFs on the basis of the random forest method which facilitates quantification of prediction-uncertainties. The predicted variables of interest included soil water content at saturation, field capacity and wilting point, plant available water content, saturated hydraulic conductivity, MVG parameters of the moisture retention and hydraulic conductivity curves. The predictions are based on easily available soil properties. The predictor variables were similar to those of euptfv1, except the topsoil and subsoil distinction, which was replaced by mean soil depth of the sample, since it is typically known, anyway. Additionally, the improved performance of the euptfv2 was assessed against predictions using the earlier version. Moreover, we determined the minimum sufficient predictor variables for 32 input variables combinations.

The construction of a pedotransfer function requires three elements: predictor variables, predicted variables as the property of interest, and a transfer method between the former two. The predicted variables are in this case directly measured soil hydraulic properties on samples contained in a large pan-European dataset, ensuring a representativeness of the PTF for Europe. Additionally, Tóth et al. (2015) had fitted MVG model parameters for each sample dataset individually by inverse modelling, the results of which we reused in this study.

The European Hydropedological Data Inventory (EU-HYDI) (Weynants et al., 2013) provided the basis for the preparation of the prediction algorithms. The dataset partitions for training and testing the prediction algorithms were almost identical to the ones used in Tóth et al. (2015), except that the samples had to have information on soil depth as well. Depending on the soil hydraulic property of interest, 76 %–99 % of the originally selected samples were used to derive the new PTFs. It enabled comparison of the performance between the EU-PTFs (Tóth et al., 2015) – built in the euptfv1 (Weynants and Tóth, 2014) – and their improved version (euptfv2). Table 1 shows the number of samples in the training and test sets.

Prediction algorithms were derived for each of the following soil hydraulic
properties:

water content at saturation (THS): water content at 0 cm matric potential head;

water content at field capacity at

water content at wilting point (WP): water content at

plant available water content (AWC) based on the following equations:

saturated hydraulic conductivity (KS): hydraulic conductivity at 0 cm matric potential head;

Mualem–van Genuchten model parameters (VG; for the water retention model only, MVG; for the water retention and hydraulic conductivity model).

FC_2 was not predicted in euptfv1 and
was determined in this study as follows. In the EU-HYDI, 8231 samples have
at least one water content observation in the matric potential head range

As predictors we used the following easily available soil properties: the
particle size densities (PSD) characterised by the mass-percentages of clay
(

Replacing the topsoil/subsoil distinction with depth for the new PTFs was supported by the fact that this information is commonly available, too, or can be based on expert knowledge. Introducing more accurate information on depth might improve the performance without using machine learning algorithms for the prediction. However, we did not test this hypothesis, because our aim was to provide uncertainty of the predictions related to predictor variables of the PTFs. Tested predictor variables are shown in Table 1 with number of samples used to derive the PTFs and compute their performance.

Number of samples by predictor variable combinations used to derive the new European PTFs (euptfv2). Rows in italic font indicate PTFs with the same predictor variables as were tested in euptfv1 (Tóth et al., 2015).

We derived the PTFs adopting the random forest method (Breiman, 2001), implemented in the “ranger” R package (Wright and Ziegler, 2017). We selected this method, because (i) it is among the best performing prediction algorithms if there is a complex interaction structure in the dataset (Boulesteix et al., 2012), (ii) it computes quantiles of the predicted values, (iii) parallel processing is supported which saves significant computation time, and (iv) the initially black-box type algorithm can be interpreted based on computing variable importance and analysing partial dependence plots implemented in the “pdp” R package (Greenwell, 2017b).

In the case of a continuous response variable, a random forest is an ensemble of de-correlated regression trees (Breiman, 2001). The regression tree approach divides the predictor space into non-overlapping regions through minimizing the residual sum of squares. The aim of the method is to subset the data as homogeneously as possible at each split. The observations can be assigned to the defined regions in which the mean of the response variable is the predicted value. Single trees of the forest are noisy and limited in performance, but if many unbiased trees are derived and averaged with bagging, the variance is reduced and performance of the prediction improves (Hastie et al., 2009). Building of de-correlated trees is achieved by randomization at two levels. Firstly, each tree of the forest is grown on a randomly selected two thirds of the data with replacement, which is called bootstrap sample or in-bag fraction. Secondly, at each node of a single tree, randomly selected sets of predictors are analysed to split the data. This feature of randomization allows correlation between the response variables (Ziegler and König, 2014), which is an important advantage in the case of pedotransfer functions where predictors are often highly correlated.

Parameter tuning of the ranger was performed with the “caret” R package
(Kuhn et al., 2017, 2018). With the implemented train
function, a fivefold cross-validation was repeated 10 times to tune the
number of randomly selected predictor variables at each split (

We analysed the relevance of predictors and their influence on the response variable. The relevance of predictors was determined by computing the variable importance based on the mean decrease in impurity (Hastie et al., 2009) in the ranger function. The relative importance was assessed by dividing the variable importance of each predictor by the sum of the importance of all the predictors after Kotlar et al. (2019). The marginal effect of some selected predictors on the response – soil hydraulic parameters – was analysed with partial dependence plots (Greenwell, 2017a, b).

The final prediction algorithm was built on the whole training set based on the result of the tuning. To quantify the prediction uncertainties, quantile regression was used (Meinshausen, 2006). In random forest, as implemented in ranger, it is called quantile regression forest. For each node in each tree, the quantile regression forest not only keeps the mean of the predicted target variable, but all observations that belong to that node from which the full conditional distribution of the predicted variable is estimated. The width of the prediction interval varies with the predictor variables. The smaller the range of the prediction interval, the more accurate the prediction is. We analysed the 90 % prediction interval for all predictions, but the derived algorithms (PTFs) provide the possibility to compute the individual predictions of each tree.

The performance of the PTFs was calculated using the median values predicted
by the random forests. It was described with the root mean square error
(RMSE) (Eq. 3.), and the coefficient of determination (

Results of parameter tuning of the random forest:
optimization of

Scatter plot of the measured versus median predicted
water retention values of the worst and best performing PTF with 90 %
prediction interval on test datasets. THS: saturated water content (PTF01
vs. PTF03); FC_2: water content at

Additionally, the performance of the presented random forest based PTFs was compared to that of the euptfv1 (Tóth et al., 2015). For comparison, those PTFs from euptfv2 were selected which corresponded to the analysed input variable combination of the euptfv1.

The comparison of PTFs was done using a non-parametric Kruskal-Wallis test
at the 5 % significance level applied on the MSE values – computed on
TEST_BASIC and/or TEST_CHEM

All statistical analysis was performed in R [version 3.6.0] (R Core Team, 2019).

In the process of tuning the random forest parameters, the number of trees
was found to be sufficient when set to 200 in all cases. The number of
candidate predictors was found to be higher than the recommended square root
of the number of available predictor variables (

The RMSE values were between 0.020 and 0.068 cm

In the case of the point estimations, Figs. 2, S1 depict the scatterplots
of measured and predicted soil hydraulic properties/parameters with 90 % prediction
interval computed on the test sets. Performance of the worst to best PTFs
are shown. The addition of predictors that significantly improve the
predictions also decreases the uncertainty. The largest reduction in the
width of the inner 90 % of the prediction interval is visible for THS.
Specifically this value decreased from 0.21 to 0.10 cm

Figures S2, S4, S6, S8, S10, S12, S14, S16, S19 show the squared error of
the derived PTFs computed on the TEST_BASIC and
TEST_CHEM

Performance of pedotransfer functions (PTFs) by input
combination on training and test datasets to predict water content at
saturation (THS).

Performance of pedotransfer functions (PTFs) by input
combination on training and test datasets to predict water content at

Performance of pedotransfer functions (PTFs) by input
combination on training and test datasets to predict water content at

Performance of pedotransfer functions (PTFs) by input
combination on training and test datasets to predict water content at
wilting point (WP).

This study strengthens the importance of chemical soil properties in the prediction. CEC was found to be an important predictor by Pachepsky and Rawls (1999) for FC and WP, by Botula et al. (2013) for water retention at several matric potential head values, and by Hodnett and Tomasella (2002) for the VG parameters. Hodnett and Tomasella (2002) showed that pH influenced all four VG parameters. The role of CACO3 was shown to be not significant in the study of Khodaverdiloo et al. (2011). They highlight that a possible influence of CACO3 might already have been indirectly included by bulk density. The role of PSD, BD and OC has been studied extensively by various authors, e.g. Nemes et al. (2003), Rawls et al. (2003), Vereecken et al. (1989), Weynants et al. (2009), Wösten et al. (1999), which is in line with the general pattern of variable influence we see in this study.

Table S3 summarizes the recommended PTF for each combination of available predictor variables. The importance and influence of soil properties on the performance of hydraulic PTFs and results of partial dependence plots are reported below by predicted soil hydraulic properties.

The performance of the PTFs was computed for the training and test sets (Tables 2–8 and S1–2) indicating the presence of significant differences. For each predictor variable, the recommended PTF number is indicated and its predictor variables are highlighted in bold font in the respective tables. For easier comparison with euptfv1, the corresponding PTF number used in Tóth et al. (2015) is additionally provided in each table. In the following, detailed results of the constructed PTFs for the individual predicted variables are presented and discussed.

Table 2, Figs. S2 and S3 show the performance of the PTFs predicting THS.
The best performing random forest is PTF03. It is also the one trained on
the largest population. It uses PSD, DEPTH and BD as predictors. For the
prediction of THS, the most important variable by far is BD (Fig. 3). When
BD is not used for the computation of THS, values above 0.60 cm

The performance of the PTFs computed on training and test set are shown in Table 3, Figs. S4 and S5 for FC_2 and in Table 4, Figs. S6 and S7 for FC. The best performing PTF derived from the largest population is the one using (i) PSD, DEPTH, OC, BD and PH_H2O (PTF18) in the case of FC_2, and (ii) PSD, DEPTH, OC and BD (PTF07) for FC.

For FC_2, the two most important variables are USSAND and BD (Fig. 3). When BD and USSAND increase, FC_2 decreases (Fig. 4). Adding OC or BD to PSD and DEPTH significantly improves the prediction of FC_2. If either of CACO3, PH_H2O or CEC is added as a further predictor to PSD and DEPTH, the performance of the PTF does not significantly improve. If PSD, DEPTH and BD are available, adding OC or CACO3 or PH_H2O does not significantly improve the prediction. Including CEC as an additional predictor besides PSD, DEPTH and BD, significantly improves the estimation of FC_2.

USSAND and USCLAY are the two most important variables for the prediction of
FC (Fig. 3). Instead of analysing these two soil properties, both
characterizing the soil texture, we include OC next to USSAND in the partial
dependence plot analysis, because the amount of OC can be altered due to
change in climate, land use, soil and water management, cropping systems,
etc. (Wiesmeier et al., 2019). Within the
range of OC in the dataset FC increases with increasing OC regardless of
USSAND content by up to 0.08 cm

The performance of PTFs derived for WP prediction is shown in Table 5, Figs. S8 and S9. Among the best performing PTFs, PTF09 is derived on the largest training set. It uses PSD, DEPTH, OC and PH_H2O as predictors. Even though the most important variables for WP prediction were USCLAY and USSAND (Fig. 3), we included OC on the partial dependence plot (Fig. 4) as in the FC analysis. USCLAY had the strongest influence on WP. The influence of OC on WP can be detected for soils with OC less than 4 % and USCLAY less than 50 %. Below 10 % USCLAY, the WP slightly increases with increasing OC. When USCLAY is between 10 % and 50 % and OC is less than 4 %, increasing OC generally decreases WP.

OC significantly improves the prediction of WP if added to PSD and DEPTH. If BD or CACO3 or PH_H2O or CEC are added to PSD and DEPTH, the performance of the prediction does not improve significantly. Adding CACO3 and CEC to PSD, DEPTH and OC significantly improves the prediction.

Relative variable importance computed with the random
forest algorithm for the prediction of water content with PTF32 at
saturation (THS), at field capacity;

Partial dependence plot computed based on the random
forest algorithm (PTF07) for the prediction of water content at saturation
(THS), field capacity at

Tables S1, S2 and Figs. S1, S10–13 show the performance of AWC and AWC_2 predictions. Considering PSD, DEPTH, and BD as input, PTF03 is the best performing algorithm and in both case had the largest training data sets. For both AWC and AWC_2, BD is the most important predictor among the analysed variables (Fig. 3). The second most important variable is USCLAY in the case of AWC_2 and USSILT for AWC. Increasing BD and USCLAY decreases AWC_2. In the case of AWC, increasing BD and decreasing USSILT decreases the water content (Fig. 4).

OC and BD significantly improve the prediction of AWC_2 when added as input variables next to PSD and DEPTH. If either BD or OC is already included, adding the respective other, does not significantly improve the prediction. Neither PH_H2O, CACO3 nor CEC improves the prediction.

For the prediction of AWC, further addition of only BD or OC or CACO3 or PH_H2O or CEC to PSD and DEPTH does not significantly improve the prediction. If both OC and BD are included as predictors next to PSD and DEPTH, the prediction significantly improves.

There is no significant difference between direct and indirect predictions,
neither for AWC nor for AWC_2. However, the size of the test
set used for the statistical analysis is limited. There were only 145
samples in the TEST_BASIC set and 64 samples in
TEST_CHEM

Performance of pedotransfer functions (PTFs) by input
combination on training and test datasets to predict saturated hydraulic
conductivity (KS).

Scatter plot of the measured versus median predicted
water retention values computed with the van Genuchten (VG) model (PTF01 vs.
PTF29, i.e. the worst versus best performing PTF). PSD: particle size
distribution (sand, 50–2000

Scatter plot of the measured versus median predicted
hydraulic conductivity values computed with the Mualem–van Genuchten (MVG)
model (PTF01 vs. PTF27, i.e. the worst versus best performing PTF). PSD:
particle size distribution (sand, 50–2000

The performance of KS prediction is shown in Table 6, Figs. S14 and S15.
The predictors of the best performing PTF derived on the largest training
set are PSD, DEPTH and OC (PTF02). The prediction of KS significantly
improves if OC is included among the predictor variables next to PSD and
DEPTH. No other predictors significantly improve the performance of the PTF.
On the training dataset, when OC is greater than 2.5 %, the influence of
clay content on KS is more dominant than that of OC (Fig. 4). In the case of
KS prediction, the simplest best performing PTF – which was derived on a
training dataset with KS ranging between

Relative variable importance computed with the random
forest algorithm for the prediction of parameters of the van Genuchten and
Mualem–van Genuchten models based on PTF32.

The performance of parametric PTFs are shown in Tables 7 and 8 and Figs. 5, 6, S16–S21. Figure 7 illustrates the importance of variables for the prediction of VG and MVG parameters. The best performing PTF derived on the largest training set is PTF29 – with PSD, DEPTH, OC, BD, PH_H2O and CEC – for MRC and PTF27 – with PSD, DEPTH, OC, BD, CACO3, PH_H2O – for HCC.

For

Only a few studies have analysed the importance of CEC for MRC and HCC PTFs
(Botula et al.,
2013; Hodnett and Tomasella, 2002; Pachepsky and Rawls, 1999) which might be
linked to the fact that CEC is rarely available in soil hydraulic datasets.
It is noteworthy to highlight that all best performing MRC PTFs (PTF24,
PTF28, PTF29, PTF30, PTF31) include CEC among the predictors (Table 7). In
addition to that, Hodnett and Tomasella (2002) found that
CEC was important for the prediction of

Performance of pedotransfer functions (PTFs) by input
combination on training and test datasets to predict parameters of the van
Genuchten model to describe soil moisture retention curve (VG).

Performance of pedotransfer functions (PTFs) by input
combination on training and test datasets to predict parameters of the
Mualem–van Genuchten model to describe soil moisture retention and hydraulic
conductivity curve (MVG).

If BD or OC or CACO3 or CEC or PH_H2O are added as a
predictor to information on PSD and DEPTH, the performance of the PTF
significantly improves (Table 7, Fig. S16). Adding BD next to PSD and DEPTH
improves the predictions more than adding OC (Table 7, Fig. S17). BD and OC
together significantly improve the prediction compared to using PSD, DEPTH
together with either BD or OC. Adding OC next to PSD, DEPTH, BD and chemical
soil properties (CACO3 and/or CEC and /or PH_H2O) does not
significantly improve the prediction. If PSD, DEPTH, CACO3 and CEC are
available, further addition of PH_H2O does not improve the
prediction. The best performing PTF includes USSAND, USSILT, USCLAY, DEPTH,
BD, CACO3, CEC. Figure 5 shows a scatterplot of measured and predicted water
content values, including the performance of the worst and the best
performing PTF (PTF01 and PTF29). The importance of including chemical
properties and most importantly bulk density among the predictors is visible
when measured water contents are greater than 0.50 cm

OC, CACO3, PH_H2O and CEC significantly improves the
prediction of HCC when added to PSD and DEPTH. Adding BD next to PSD and
DEPTH does not improve the predictions (Table 8, Figs. S19, S20). If PSD,
DEPTH and OC are used as predictors, further addition of BD or CACO3 or
PH_H2O or CEC does not significantly improve the performance
of the PTFs. However, adding CaCO3 and CEC or PH_H2O
significantly improve the prediction. The performance of the worst and the
best performing PTF is shown on Fig. 6. The PTF with only PSD and DEPTH
underestimate hydraulic conductivity values smaller than 0.01 cm d

When soil chemical properties are not used as predictors, hydraulic
conductivity is underestimated close to saturation and at matric potential
heads smaller than

Samples with measurements of the HCC at pressure heads

We compared the performance of the best point prediction methods (Tables 2–5) with the best parameter estimations (Table 7) on the test sets. In 5 out of 20 cases, point predictions are significantly more accurate and for further 8 cases, RMSE was smaller. In all other cases, we have no significant difference between point and parametric PTFs (Table 9). The reason for higher RMSE in parameter estimation can be that the MVG model does not always adequately describe the measured MRC data (Weber et al., 2019). Therefore, when THS, FC, FC_2 and WP are computed with parameter estimation those are not only affected by the uncertainty of the prediction of VG parameters but by the goodness of VG model fit as well. We found similar results in the case of euptfv1 (Tóth et al., 2015). Tomasella et al. (2003) and Børgesen and Schaap (2005) had comparable findings regarding the performance of point and parametric PTFs. For THS, point estimation performed better than parameter estimation. When the moisture retention curve is not needed, but only THS and/or FC/FC_2 and/or WP, we recommend computing those with the point PTFs, more detailed explanation on it is included in Tóth et al. (2015).

The results of comparing the performance of parametric and
point pedotransfer functions (PTFs) on the test sets of EU-HYDI to predict
saturated water content (THS), water content at

The results of comparing the performance of euptfv1 and euptfv2 on the test sets of EU-HYDI to predict soil hydraulic properties. Rows in italic indicate cases where there was no significant difference between the two PTFs.

The euptfv2 performs significantly better than euptfv1 in 14 out of 19 cases. In the remaining 5 cases,
there is no significant difference (Table 10). Predictions of FC and MRC
improve in all cases. The most important reason for it can be that the
interaction between the target variable and the predictors is more complex
for the cases of predicting FC or VG parameters, which can be untangled using random forest. This may provide a reason the
random forest algorithm performed significantly better than the PTFs derived
with linear regression or a simple regression tree. For THS, WP, KS, and MVG
only those PTFs did not improve significantly, for which comparisons on the
TEST_CHEM

We recommend the use of euptfv2 instead of euptfv1 if continuous soil properties are available. If only texture classes – i.e. no particle size distribution – are available, class PTFs of euptfv1 can be used, that is PTF18 for modified FAO texture classes and PTF19 for USDA texture classes.

List of recommended pedotransfer functions (PTFs) by predicted soil hydraulic property and available predictor variables.

The minimum input requirements for all PTFs are sand, silt and clay content, and soil depth. Soil depth is defined as the mean sampling depth, e.g. if PSD, BD and OC are provided for a soil sample from a depth of 0–20 cm, then the soil depth input (DEPTH) to the prediction algorithm is set to 10 cm.

If only soil texture information is available for the predictions, the class PTFs from euptfv1 could be applied (Tóth et al., 2015).

We emphasize that:

the units of input soil properties (predictors) have to be the same as
indicated in the text and that the sand, silt, and clay are defined by the
following particle diameters: clay

when only specific water content values at saturation, field capacity or wilting point are required (i.e. THS, FC_2, FC, WP) it is recommended to use point PTFs. This is also true for the prediction of KS,

for AWC, the most accurate way is to predict FC and WP with the point predictions, first, and then compute AWC using Eq. (1), and similarly for AWC_2 using FC_2 and Eq. (2),

it is recommended to do the VG prediction if only moisture retention curve parameters are needed, and

the MVG prediction when both moisture retention and hydraulic conductivity parameters are required.

Table 11 shows the recommended PTFs for each predicted soil hydraulic property and available predictor variables. The users need to check which basic soil properties are available for the predictions, then look in Table 11 which PTF is recommended to use.

The algorithms have been implemented in a web interface to facilitate the use of the PTFs, where the PTFs' selection is automated based on soil properties available for the predictions and required soil hydraulic property. The Code and data availability section provides information on how to access this resource.

The updated EU-PTFs – euptfv2 – perform significantly better than euptfv1
and are applicable for 32 predictor variables combinations. Uncertainties of
the predicted soil hydraulic properties and model parameters can be
computed. These uncertainties are, without further discrimination, related
to the considered input data, predictors and the applied algorithm. The
euptfv2 includes transfer functions to compute soil water content at
saturation (0 cm matric potential head), field capacity (both

The current version of euptfv2 is available from
a user friendly web interface:

The supplement related to this article is available online at:

BS, TKDW and MW conceptualized the study and designed the methodology. BS supervised the research. MW cured the EU-HYDI dataset. BS, TKDW and MW prepared scripts for the statistical analysis, BS carried out the formal analysis, visualization and coordinated building of the PTF web interface. TKDW and BS built the R package with contributions of MW. BS and TKDW performed the validation. BS and TKDW wrote the paper with considerable input from MW.

The authors declare that they have no conflict of interest.

We are grateful to the individual scientists and their institutions in Europe who contributed to the establishment of the EU-HYDI database. We thank Gergely Tóth and Luca Montanarella for supporting the initiative that led to the EU-HYDI database. On behalf of the János Bolyai Research Scholarship we are grateful for the use of the MTA Cloud (

This research has been supported by the Hungarian National Research, Development and Innovation Office (grant no. KH124765), the János Bolyai Research Scholarship of the Hungarian Academy of Sciences (grant no. BO/00088/18/4), and the German Research Foundation (grant no. SFB 1253/1 2017).

This paper was edited by Wolfgang Kurtz and reviewed by three anonymous referees.