Performance analysis of regional AquaCrop (v6.1) biomass and surface soil moisture simulations using satellite and in situ observations

The current intensive use of agricultural land is affecting the land quality and contributes to climate change. Feeding the world’s growing population under changing climatic conditions demands a global transition to more sustainable agricultural systems. This requires good insight in land cultivation practices at the field to global scale. This study outlines a spatially distributed version of the field-scale crop model AquaCrop version 6.1, to simulate agricultural biomass production and soil moisture variability over Europe at a relatively fine resolution of 30 arcseconds (~1 km). A highly 5 efficient parallel processing system is implemented to run the model regionally with global meteorological input data from the Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2), soil textural information from the Harmonized World Soil Database, version 1.2 (HWSDv1.2), and generic crop information. Daily crop biomass production is evaluated with the Copernicus Global Land Service dry matter productivity (CGLS-DMP) data. Surface soil moisture is compared against NASA Soil Moisture Active Passive surface soil moisture (SMAP-SSM) retrievals, the Copernicus Global 10 Land Service surface soil moisture (CGLS-SSM) product derived from Sentinel-1, and in situ data from the International Soil Moisture Network (ISMN). Over central Europe, the regional AquaCrop model is able to capture the temporal variability in both biomass production and soil moisture, with a spatial mean correlation of 0.8 (CGLS-DMP), 0.74 (SMAP-SSM) and 0.52 (CGLS-SSM), respectively. The higher performance when evaluating with SMAP-SSM compared to Sentinel-1 CGLS-SSM is largely due to the lower quality of CGLS-SSM satellite retrievals under growing vegetation. The regional model further 15 captures the interannual variability, with a mean anomaly correlation of 0.46 for daily biomass, and mean anomaly correlations of 0.65 (SMAP-SSM) and 0.50 (CGLS-SSM) for soil moisture. It is shown that soil textural characteristics and irrigated areas influence the model performance. Overall, the regional AquaCrop model proves to be useful in assessing crop production and soil moisture at various scales and could serve as a bridge between point-based and global models. https://doi.org/10.5194/gmd-2021-98 Preprint. Discussion started: 17 May 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction 20
Over the past 60 years, global agricultural production has more than tripled (FAO, 2017). This became possible through productivity-enhanced technologies, industrialization and expansion of agricultural land. However, the current intensive use of cropland is resulting in reduced land quality and increased greenhouse gas emissions, which in turn impact agricultural systems (Foley et al., 2011;Kopittke et al., 2019). To meet the future crop demand of a vastly growing population, while minimizing the ecological footprint and increasing the crop resilience for changing climatic conditions, the need to adapt to 25 more effective and sustainable land cultivation practices is urgent (Aznar-Sánchez et al., 2019;Pingali, 2012;Raes and Vanuytrecht, 2017).
To evaluate the effect of environmental conditions and different management practices on crop production, there exists a variety of models that simulate the biophysiological growth of crops at the field scale. An overview of 70 of such crop models is given by Di Paola et al. (2016). Some of these point-based crop models have more recently been upscaled and assessed at a 30 regional to global level (Balkovic et al., 2013, Boogaard et al., 2013, De Wit and van Diepen, 2007, Liu et al., 2007, Müller et al., 2018, Nichols et al., 2011, Resop et al., 2012, Roerink et al., 2012, Stöckle et al., 2014. Large-scale crop models are a valuable asset in providing information to policy makers and for applications in climate scenario analyses (Asseng et al., 2013, Iizumi et al., 2018. A downside of upscaled crop models, especially at the global scale, is that they often suffer from the generalization of input data, resulting in a higher bias and loss of information at smaller scales. Some studies have attempted 35 to reduce such errors by assimilating satellite observations in a crop modelling system (Mladenova et al., 2019). The AgMIP Global Gridded Model Intercomparisons (GGCMI) is a framework initiated to overcome this issue. It is built on a large group of crop modelling researchers that combine and intercompare a set of upscaled point models or global gridded crop models to assess and reduce the bias and uncertainties at a global level (Müller et al., 2017). However, the transitions between plant, field, and regional scales in agricultural modelling remains a challenge, and more insight is needed in agricultural crop 40 responses at various spatial and temporal scales with different levels of agricultural practices. To this end, a high-resolution regional crop model can serve as a bridge between the point and global level, combining information from various scales.
This study presents a methodology to apply the field-scale AquaCrop model efficiently over a large region at any spatial resolution. AquaCrop was developed by the FAO to estimate responses of herbaceous crops to water Steduto, et al., 2009). It differs from most other crop models by its low requirement of detailed input data, as it aims for a balance 45 between simplicity, accuracy and robustness . The model has been applied in numerous studies for various crop types and environmental conditions and shows good results in simulating crop biomass and yield, especially when calibrated for local field conditions (Abedinpour et al., 2012;Geerts et al., 2009;Maniruzzaman et al., 2015;Razzaghi et al., 2017;Sandhu and Irmak, 2019). Earlier spatially distributed versions of AquaCrop were developed by e.g. Lorite et al. (2013), Sallah et al. (2019) and Huang et al. (2019), using a Geographic Information System or batch processing with remote sensing 50 data input. Some challenges of existing distributed AquaCrop systems are related to the limited scalability and high computational cost when they are applied to any large domain at any resolution, the difficulty to adapt to new AquaCrop model versions, the limitations in the upscaling of crop parameters from the plant or field to the grid scale (Han et al., 2020), or the availability of other suitable spatially distributed parameters or input information. Applications of the AquaCrop model at a continental scale exist, but are very limited (Dale et al., 2017) and so far are only used in combination with coarse spatial 55 resolutions. To our best knowledge, no study has yet reported on high-resolution and large-scale (beyond country level) applications of AquaCrop.
The main objective of this research is to assess whether a high-resolution regional AquaCrop model can capture seasonal, inter-annual and short-term temporal variability, as well as the spatial variability, of biomass and soil moisture, when using global spatially distributed input data about soil texture and meteorology and assuming a generic crop. The model performance 60 will be evaluated over Europe at a spatial resolution of 30 arcseconds (1/120°, ~1 km at the equator), using satellite products derived from both optical and microwave sensors and in situ measurements.
The structure of the paper will be as follows: sections 2 and 3 will cover the methodology, with a description of the regional AquaCrop model, the evaluation datasets and performance metrics. In section 4 the results will be presented and discussed, followed by a conclusion in section 5. 65 2 The regional gridded AquaCrop model

AquaCrop
AquaCrop is a daily crop-water productivity model that translates, on a daily basis, the simulated amount of crop transpiration into a proportional amount of biomass . The relation between transpiration and biomass production is defined by a Water Productivity (WP) factor: 70 B (ton ha -1 ) is the cumulative biomass produced, WP* is the WP (g m -2 ) factor normalized for atmospheric CO2 (369.41 ppm for the year 2000) and for climate, and Tr (mm day -1 ) is the transpiration, also normalized for climate after division by the reference evapotranspiration, ET0 (mm day -1 ). Because of this normalization, the WP* factor only significantly differs between C3 and C4 crops, where C4 crops have a higher WP* due to a more efficient carbon assimilation process. The calculation of 75 Tr is dependent on ET0, the adjusted green canopy cover (CC*; -), the crop transpiration coefficient (Kc,tr; -) and the soil water stress coefficient (Ks; -).
In AquaCrop, the water reservoir in the root zone is balanced by ingoing fluxes of precipitation (minus runoff) and potentially irrigation and capillary rise, and outgoing fluxes of evaporation and deep percolation. To calculate the soil water balance, 80 AquaCrop divides the soil profile into multiple compartments over the total soil layer. Downward flow over these https://doi.org/10.5194/gmd-2021-98 Preprint. Discussion started: 17 May 2021 c Author(s) 2021. CC BY 4.0 License. compartments is described by an exponential drainage function based on the volumetric water content in the compartment i (θi) within the soil layer and drainage characteristics of the soil layer (Raes et al., 2006: ∆θi/∆t is the decrease in water content over time (m 3 m -3 d -1 ), θFC and θsat, are the volumetric moisture content at field capacity 85 and at saturation (i.e. the porosity) of the soil layer, and τi is the drainage coefficient derived from the saturated hydraulic conductivity (Ksat). Upward flow by capillary rise is estimated based on the depth of the groundwater table and hydraulic characteristics of the soil layers. Since no groundwater table is implemented in the regional version of the model in this paper, capillary rise is not included in the simulations. Soil evaporation is based on the soil wetness and crop cover (Ritchie, 1972) and water extraction by roots is described with the sink term from Feddes (1982). Because the root density for most crops is 90 highest near the soil surface and decreases with increasing soil depth, the water extraction pattern by roots is simulated as follows: 40/30/20/10% for the upper quarter to the lowest quarter of the root zone . The estimated water retained in the root zone that will be available to the plants (Wr) at each daily timestep is described by the fraction of total available water (TAW) after subtraction of depleted water (Dr). TAW is the difference of volumetric moisture content between field capacity (θFC) and wilting point (θWP) over the root zone and is therefore dependent on soil texture and depth. 95 Plant stresses, such as water excess or water limitation, cold/heat air temperature stress, soil fertility and salinity stresses, can affect biomass production during different steps of the calculation procedure, depending on the process that is affected (i.e. canopy expansion, crop transpiration, pollination). The inclusion of stress factors is done by assigning unique thresholds to each of these biological processes (Raes et al., 2018).

Regional model structure 100
For the regional AquaCrop model, the simulation unit of a single field was replaced by a 30 arcsecond (~1-km) resolution pixel, and input and output were defined independently for each pixel. The system can easily be set up for any given resolution over any domain. The AquaCrop input data are categorized into several components e.g. climate, soil, vegetation and management. For each component, parameters are described in a text-file with a specific file extension that is recognized by the model. A Project Management (PRM) file oversees all the information for a single field (or pixel) and contains paths and 105 names of these input files. This PRM-file is read and executed by AquaCrop, after which an output file is created.
The original Delphi source code of AquaCrop v6.1 was minimally adjusted and compiled on the Linux-based High-Performance Computer (HPC) of the Vlaams Computer Centrum (VSC), and the resulting executable was plugged into a Python wrapper to allow massively parallel simulations to optimize the model efficiency over larger spatial domains. The current system allows for easy implementations of later versions of AquaCrop. The regional input files for the climate and soil 110 are prepared before model execution. The Python wrapper then creates the PRM-file for a pixel as a first step of the model run, after which the AquaCrop model is executed and time series output is stored into a new folder for each pixel. The reason for creating the PRM-files right before the model execution is so that changes to a project can be made quickly. With this set (θsat), and the saturated hydraulic conductivity (Ksat). These parameters are available for a top layer (0-30 cm) and a deeper layer (30-100 cm). The total soil depth available for root development is taken from the 1-km resolution root-zone depth map of the European Soil Data Centre (ESDAC) (Hiederer, 2013). In case the root-zone depth was less than or equal to 30 cm, only the top layer parameters were considered. The soil properties of the deeper layer were included when the root-zone depth was 135 deeper than 30 cm. Stoniness and soil salinity were not considered. The modelled soil moisture was initialized at θFC on January 1 st (winter in Europe) for all simulations.
A soil fertility stress parameter in the field management file provides an indication of the overall soil quality. The default of this parameter is 0%, referencing to unlimited soil fertility. Since this situation is very rare in real fields, even for wellmaintained land, the value was set to 30% after initial model evaluation. With this reduction in soil fertility, a good to moderate 140 crop production over the entire domain can be simulated in absence of water stress.
A single crop file was created to simulate crop production over Europe. Since the focus of this research is on biomass and not on yield, a single generic reference crop was parameterized. Spatial and temporal gaps of information at the ~1-km resolution prevent the inclusion of a more detailed crop parameterization, and the results will confirm the adequacy of the generic crop file. This file was minimally tuned after visual model evaluation and quantitative comparisons against satellite-based dry matter 145 productivity (DMP, see below; Smets et al., 2019). The generic reference crop was developed to simulate annual biomass https://doi.org/10.5194/gmd-2021-98 Preprint. Discussion started: 17 May 2021 c Author(s) 2021. CC BY 4.0 License. development of C3 crops, which are predominantly found in temperate climates, as opposed to C4 crops that are more common in hot and dry climatological zones (Monfreda et al., 2008, Still et al., 2003. The crop was simulated as a transplant, assuming a small presence of vegetation even during winter, and with an annual cycle of 365 days, starting on the first of January.
Because of this fixed cycle, the canopy development had to be simulated in calendar days. This results in an error in the 150 simulation of canopy expansion during cold periods, but due to the consideration of growing degrees in the simulation of crop transpiration, the reduced biomass production in these periods is still captured. The generic crop file is mostly suitable to simulate canopy development during the spring and summer season. The main crop parameters are presented in table 1.
The model was exclusively run for dominantly rainfed areas, based on the land use map of the CORINE Land Cover inventory (Büttner, 2014) for the year 2012. This dataset is available at 100-m resolution and was aggregated to 30 arcseconds. If 50 or 155 more classes within the aggregated pixel were identified as non-irrigated agriculture, the pixel was included for the regional AquaCrop model simulations.

CGLS -DMP
To evaluate simulations of daily biomass production, the ~1-km dry matter productivity product from the Copernicus Global Land Service (CGLS-DMP; kg ha -1 day -1 ) was used (Smets et al., 2019). The CGLS-DMP is based on a simplified Monteith (1972) approach that makes use of the fraction of absorbed photosynthetically active radiation (fAPAR), which is derived from 165 the optical satellite missions Satellite Pour l'Observation de la Terre (SPOT;1999-2014 and Project for On-Board Autonomy -Vegetation (PROBA-V; 2014-present). The CGLS-DMP data are provided in 10-daily time steps, where each value is representative of the past 10 days for the years 1999 up to present date. To compare the data with the AquaCrop biomass, the nearest-neighbour function was used to spatially match the gridded simulations to the grid of CGLS-DMP and the median of the modelled daily biomass production was computed for the corresponding 10-daily intervals of the CGLS-DMP products. 170 Since the crop parameterization in AquaCrop is suited to simulate for the main growing season, the months November up to February were not included for the biomass evaluation.

CGLS -SSM
AquaCrop surface moisture content, i.e. the output of soil moisture in the top compartment of the soil profile, was evaluated with a second CGLS product, i.e. the relative surface soil moisture (CGLS-SSM). CGLS-SSM provides data for the top few 175 centimetres of the soil, available at the same ~1-km resolution as CGLS-DMP. This product is derived from the C-band radar onboard Sentinel-1, processed by the TU Wien (Bauer-Marschallinger et al., 2018), and available from October 2014 onwards.
Processing steps included geo-correction, radiometric calibration and normalization of the incidence angle. No correction was included for dynamics in vegetation or surface roughness. The data are provided as relative soil moisture estimates (%), that have to be multiplied by the porosity (θsat) to convert to absolute volumetric soil moisture contents (m 3 m -3 ). The Sentinel-1 180 satellite has varying overpass densities, resulting in a slightly different number of data-points in various areas, but the temporal resolution is generally between 3 to 8 days. To exclude potential datapoints for days in which the soil was nearly frozen, the soil temperature variable from MERRA-2 was used to identify and remove all data at which the soil temperature was below 4℃, following the recommended data masking by e.g. Gruber et al. (2020). The CGLS-SSM product contains masks for areas where it cannot be applied, i.e. a water mask, for pixels containing only water, a sensitivity mask, for pixels with a low 185 sensitivity (urban, rivers, dense forests) and a slope mask, screening out pixels with a topographic slope higher than 17°. Since the simulation domain was restricted to agricultural areas, there is an implicit extra quality screening of trivial inferior data.

SMAP -SSM
Surface soil moisture simulations were further evaluated with retrievals from the NASA Soil Moisture Active Passive (SMAP) mission, from April 2015 onwards. More specifically, the enhanced level-2 radiometer half-orbit, version 4, was used at 9-km 190 resolution (O'Neill et al., 2020, Chaubell et al., 2020. The SMAP radiometer measures L-band brightness temperatures in https://doi.org/10.5194/gmd-2021-98 Preprint. Discussion started: 17 May 2021 c Author(s) 2021. CC BY 4.0 License. vertical and horizontal polarization at an incidence angle of 40°. It scans the earth's surface in a sun-synchronous orbit, which is 6:00 A.M. for descending and 6.00 P.M. for ascending mode, and with a temporal resolution of 2-3 days. The SMAP product provides three estimates of surface (~5 cm) soil moisture (m 3 m -3 ), derived from different retrieval algorithms (O'Neill et al., 2020). The 'Single Channel Algorithm using vertical polarization' is the current baseline for SMAP soil moisture and was also 195 used for AquaCrop evaluations.
SMAP data are projected onto the 9-km EASE grid version 2.0 (EASE2, Brodzik et al., 2012) and the AquaCrop soil moisture output was aggregated to this grid, by simply averaging all ~1-km pixels belonging to the same EASE2 grid cell. Only cells that were at least 50% filled with AquaCrop output were included for evaluation. The number of AquaCrop pixels per 9-km grid cell varies, depending on the location on the EASE2 grid. For SMAP-SSM, the recommended conservative quality control 200 was applied, and a MERRA-2-based temperature threshold of 4°C was applied to exclude nearly frozen soils.

In situ -SSM
In situ soil moisture measurements up to 5 cm depth were taken from the International Soil Moisture Network (ISMN, Dorigo were made available by partners of the SHui consortium and contributed data to three extra clustered pixels, resulting in a total of 45 evaluation points with CGLS-SSM.

Metrics
The regional model was run over a part of Europe (35°N-55°N, 10°E-20°E; and all ISMN sites), for the years 2011 through 2018. Daily simulated biomass and surface soil moisture (WC01 in AquaCrop) were evaluated. To assess the temporal 215 performance of the AquaCrop model, the temporal Pearson correlation (R), anomaly correlation (anomR), bias and unbiased root mean square difference (ubRMSD) were calculated against satellite and in situ products. Comparing products with different spatial resolutions will always result in representativeness bias, which is especially acute when using in situ observations to evaluate pixel-scale estimates. Therefore, the focus of the evaluation will be on temporal variability, using the R, anomR and ubRMSD metrics. The anomR was computed to assess both the short term and inter-annual variability of biomass and soil moisture compared to the satellite products only, for lack of sufficiently long records of in situ data. A multi-year climatology was computed and subtracted from the datasets to obtain anomalies. The climatology is built on 31-day moving window averages, requiring either a minimum of 3 10-daily CGLS-DMP estimates or a minimum of 10 instantaneous CGLS-SSM and SMAP-SSM observations 230 within a 31-day window. The climatology of AquaCrop was computed using the same moving window and time period as the evaluation product. For surface soil moisture, only daily model output that matched the days of observations of the evaluation product was used, whereas for biomass evaluations, the data consisted of the matching 10-daily medians.
To further analyse possible influencing factors on the model performance, the FAO map 'Area Equipped for Irrigation' (AEI: Siebert et al., 2015), was used to identify areas that are occasionally irrigated and not necessarily captured by the irrigation 235 class the Corine land cover inventory, which only considers regularly irrigated areas. The available 1-km and 10-km AEI map version were used to stratify correlation values with CGLS-DMP and with SMAP-SSM, respectively.

Biomass
A visual comparison of simulated and satellite-based biomass at different days in the year of 2017 is presented in Fig. 1 and  240 gives an indication of the spatial performance of the regional AquaCrop model against the CGLS-DMP product. The figure shows that the model is able to capture large regional and temporal differences in biomass production, but the absolute values can differ between CGLS-DMP and the model. The coarser resolution MERRA-2 climate input is visible in the blocky pattern of the AquaCrop biomass maps. For the days in June and July, simulations over most of Italy ceased to produce biomass, whereas the CGLS-DMP product shows spatial variability in productivity. Water stress in the simulations is putting crop 245 production to a halt, which is not in agreement to the CGLS-DMP. This can be either caused by an overestimation of water stress by the model, unmodelled irrigation, or because the CGLS-DMP product does not account well for drought stresses.
Drought stress is indirectly included in the CGLS-DMP via the observed fAPAR, but could still lead to overestimations of DMP in drier periods (Smets et al., 2019). The spatial variability in ubRMSD can be attributed to different factors that limit crop growth, which will be mostly cold temperatures in the North, and low soil water contents in the South. Across the domain the ubRMSD is 0.03 T ha -1 day -1 and typically less than 20% of the amplitude in biomass production. The anomaly correlation is lower than the correlation, but still significant, with a mean of anomR=0.46. The raw correlation includes the trivial agreement in the seasonal variability and is 260 thus inevitably higher, whereas the anomaly correlation only evaluates short-term and interannual variability, as illustrated in  Both the model and satellite data show anomalous high biomass production in June 2017, whereas anomalous low values are found in both datasets in June 2013. The short-term anomaly biomass productivity increments are also corresponding well to the evaluation data, but for AquaCrop they are often more pronounced. Across the region, the lower anomaly correlation values 265 can be partly associated with soil texture. In areas with typically sandy soils, defined by a low TAW and high Ksat, and a high rate of rainfall, such as northern Germany and Poland, soils can be quickly drained, resulting in water stress, thereby limiting crop growth, even during the main growth season. The effect of such stresses may not be observed in the CGLS-DMP, and will result in deviating interannual variabilities.

Surface moisture content 270
Surface soil moisture content was evaluated using three products at different scales; point measurements from ISMN and some additional sites in the HOAL catchment, 1-km CGLS-SSM and 9-km SMAP-SSM. Figure 3 shows the AquaCrop performance metrics against the satellite data. The spatial mean R and anomR value with SMAP retrievals are 0.74 and 0.65, respectively.
The anomR is especially high in the central part of Europe and decreases towards the North. Overall, AquaCrop is much better correlated with SMAP-SSM than with CGLS-SSM. The mean R and anomR value of AquaCrop with CGLS-SSM are 0.52 275 and 0.50, respectively. Several areas with higher elevations have lower correlation values (central Italy, eastern Alps).
When looking at the absolute values of the bias and ubRMSD, the evaluation of AquaCrop against CGLS-SSM is also far worse than that against SMAP-SSM, but the spatial pattern of the errors is similar for SMAP-SSM and CGLS-SSM. The spatial mean ubRMSD against SMAP-SSM is 0.05 m 3 m -3 , close to the global target product uncertainty of 0.04 m 3 m -3 (Entekhabi et al., 2014). For both datasets, the larger errors correspond again to areas with sandy soils. ubRMSD values of 280 0.14 m 3 m -3 and higher for CGLS-SSM, correspond to outliers in the HWSDv1.2 classification of extremely sandy soils (93% sand). This soil class is characterised by very high Ksat and very low values for θWP and θFC, resulting in extremely low simulated available moisture content in the top layers. Because the low θWP is very close to the soil evaporation demand, the model is not able to simulate soil moisture correctly for the top layers for such short, daily timesteps. It is important to note that AquaCrop is a crop simulation model and this soil class is unrealistic for agricultural land. In future applications, it is 285 recommended to limit the simulations to soils that are actually suitable to cultivate crops, or else to adapt the soil parameters.
Nonetheless, the poor performance against the 1-km Sentinel-1-based CGLS-SSM is in general not due to model shortcomings, but dominated by poor satellite retrievals, as will be discussed below. A comparison between in situ data, CGLS-SSM, SMAP-SSM and AquaCrop surface soil moisture at ISMN sites and 3 sites in the HOAL catchment is shown in Fig. 4. Across 45 in situ sites, the mean R value between AquaCrop and in situ soil 290 moisture is 0.61 (Fig. 4a) and higher than the mean R value of 0.52 with CGLS-SSM (Fig. 4b). The mean ubRMSD between AquaCrop and in situ measurements is 0.06 m 3 m -3 , significantly lower than the mean between AquaCrop and CGLS-SSM (0.10 m 3 m -3 ). The mean R between Sentinel-1 CGLS-SSM and in situ is even lower, with a value of 0.42 and a mean ubRMSD of 0.11 m 3 m -3 (Fig. 4c). The comparison with SMAP-SSM over 42 in situ sites shows that SMAP-SSM correlates significantly better with both AquaCrop (R=0.81, ubRMSD=0.05 m 3 m -3 ) and in situ measurements (R=0.69, ubRMSD= 0.05 m 3 m -3 ), even 295 though the product has a lower spatial resolution than CGLS-SSM. This is further illustrated in the time series at three locations presented in Fig. 5, where SMAP-SSM follows the pattern of in situ data well and slightly better than AquaCrop, whereas the For CGLS-SSM, lower observed soil moisture was often found for the months April, May and June, as can be seen in Fig. 5b and c. The poor correlation of CGLS-SSM during these months is most likely due to the fact that the Sentinel-1 backscatter signals are dynamically affected by changing vegetation during the growing season, but the soil moisture retrievals are only corrected for with a static vegetation value for every day of the year. Furthermore, changes in surface soil roughness are not 305 accounted for in the retrievals and could play an important role in the lower quality of the CGLS-SSM retrievals.   Figure 6 shows the spatial distribution of the R values of AquaCrop biomass and soil moisture with CGLS-DMP and SMAP-SSM, respectively, grouped into two percentage classes of AEI. In terms of biomass, higher R-values between AquaCrop and 310 CGLS-DMP (mean R= 0.81) are found for pixels where AEI < 10% than for areas where AEI >= 10% (mean R= 0.72). For soil moisture, the correlation with SMAP-SSM shows barely any difference between the AEI groups (AEI < 10%: mean R = 0.74; AEI >= 10%: mean R= 0.73). It should be noted that SMAP-SSM has a much smaller coverage than the CGLS-DMP, because SMAP-SSM is screened conservatively based on its quality flags. The results of this comparison suggest that, even if the simulations were limited to dominantly rainfed agricultural areas, a possible effect of irrigation on the correlation between 315 the model and evaluation datasets should not be neglected.

Conclusions
In this paper, a spatially distributed version of the field-scale AquaCrop model v6.1 is presented and evaluated against various satellite data products and in situ data. The new regional AquaCrop infrastructure allows to simulate biomass and soil moisture over large domains in an efficient way, due to the massive parallelization of the gridded simulations. In this case study, the 320 regional AquaCrop model is forced with meteorological input based on MERRA-2 re-analysis data, the soil information is extracted from the HWSDv1.2, and a generic crop is parameterized. Even when using coarse meteorological input data, the AquaCrop model is able to capture seasonal, interannual and short-term temporal dynamics of biomass over Europe at a fine shows that both AquaCrop and SMAP-SSM better agree with in situ data (mean R= 0.61, 0.69, respectively) than Sentinel-1 CGLS-SSM (mean R= 0.52). The lower performance of Sentinel-1 CGLS-SSM can be attributed to the static correction for vegetation, which causes soil moisture retrieval errors during the growing season, and the fact that there is no correction for surface roughness. For both the evaluations with SMAP and Sentinel-1 retrievals, the effect of soil characteristics influences the evaluation performance of the AquaCrop model. When certain soil characteristics are unsuitable for crop cultivation, such 340 as a high Ksat and a very low θWP and low TAW, soil moisture becomes inaccurately represented by the AquaCrop model, increasing the model error. At the same time, the errors in satellite retrievals might also differ for various soil textures.
Improvements to the regional AquaCrop model can be made in terms of higher resolution meteorological input data to better capture small-scale spatial differences, by revising the soil hydraulic parameters to better represent soil types used for agricultural land, and by introducing spatio-temporally varying crop parameters when such information becomes available. 345 Overall, the current model is able to well represent temporal and spatial differences at the field and regional scale in both biomass production and surface soil moisture, requiring only easily accessible input data. The computationally efficient modelling system is ideal to foster future improvements in the spatial patterns in both soil moisture and biomass production via local parameter optimization based on historical records of satellite data, and improvements in the short-term and interannual temporal variations via sequential satellite data assimilation. 350 Code and data availability. The code and data needed to run the regional version of AquaCropv6.1 on a Linux-based system are currently only available to the editors and reviewers. After acceptance, a repository will be made publicly available on https://doi.org/10.5194/gmd-2021-98 Preprint. Discussion started: 17 May 2021 c Author(s) 2021. CC BY 4.0 License.
Zenodo with a citable DOI. Apart from the code, this repository will include the generic crop file, the management file and ancillary soil data from De Lannoy et al. (2014). All other input data and evaluation datasets are freely available, except for the in situ measurements of the HOAL experiment site. Please visit the following links for data access. http://www.fao.org/aquastat/en/geospatial-information/global-maps-irrigated-areas. 365 Author contributions. SDR created the code to execute the regional version of the model, prepared the input data and conducted the model evaluation. GDL prioritised the main steps taken in the paper, provided supervision and scientific guidance throughout all research advances and manages HPC usage. DR provided scientific guidance regarding the use and interpretation of the AquaCrop model, developed the generic crop file, and provided the Delphi source code of AquaCropv6.1.

SDR wrote the paper and all authors contributed. 370
Competing interests. The authors declare that they have no conflict in interest.
Acknowledgements. The authors would like to thank the HPC VSC team, in particular Geert-Jan Bex and Martijn Oldenhof, for their help during the AquaCrop compilation on the VSC HPC. We would also like to thank Peter Strauss and Gerhard Rab from Vienna University of Technology (TU Wien) for sharing their data from the HOAL experiment site, and Stefan Siebert for providing a 1-km map with area equipped for irrigation. 375 Financial support. This research is conducted as part of the H2020 project SHui, that stands for "Soil Hydrology research platform underpinning innovation to manage water scarcity in European and Chinese cropping systems". SHui is co-funded by the European Union Project GA 773903 and the Chinese MOST.