Parallel gridded simulation framework for DSSAT-CSM (version 4.7.5.21) using MPI and NetCDF

The Decision Support System for Agrotechnology Transfer Cropping Systems Model (DSSAT-CSM) is a widely used crop modeling system that has been integrated into large-scale modeling frameworks. Existing frameworks generate spatially-explicit simulated outputs at grid points through an inefficient process of translation from binary, spatially-referenced inputs to point-specific text input files followed by translation and aggregation back from point-specific, text output files to binary, spatially-referenced outputs. The main objective of this paper was to document the design and implementation of 5 a parallel gridded simulation framework for DSSAT-CSM. A secondary objective was to provide preliminary analysis of execution time and scaling of the new parallel gridded framework. The parallel gridded framework includes improved code for model-internal data transfer, gridded input/output with the Network Common Data Form (NetCDF) library, and parallelization of simulations using the Message Passing Interface (MPI). Validation simulations with the DSSAT-CSM-CROPSIM-CERESWheat model revealed subtle discrepancies in simulated yield due to the rounding of soil parameters in the input routines of 10 the standard DSSAT-CSM. Utilizing NetCDF for direct input/output produced a 3.7to 4-fold reduction in execution time compared to text-based input/output. Parallelization improved execution time for both versions with between 12.2(standard version) and 13.4-fold (parallel gridded version) speedup when comparing 1 to 16 compute cores. Estimates of parallelization of computation ranged between 99.2 (standard version) and 97.3 percent (parallel gridded version) indicating potential for scaling to higher numbers of compute cores. 15


Griddded Input and Output
Gridded input and output for the PG version of DSSAT-CSM relies on an interface to Network Common Data Form (NetCDF), a set of software libraries that define self-describing, portable data formats that support the creation, access, and sharing of array-oriented scientific data (Unidata, 2017). The DSSAT-CSM NetCDF interface is defined in two Fortran modules: nf90_file_module and dssat_netcdf. The nf90_file_module provides a derived type for NetCDF files that 5 contains type-dependent low-level utility functions/subroutines for creating and manipulating NetCDF files including reading and writing scalar and array variables of real, integer, and character types. The dssat_netcdf module provides extensions to this basic derived type and associated type-dependent utility functions/subroutines that are specific to the various types of input files required for DSSAT-CSM. Thus, the dssat_netcdf operates as a higher-level interface that DSSAT-CSM model developers would interact with, while leaving the more mundane details of communicating with the NetCDF libraries to be 10 handled by the nf90_file_module.
An example program using the dssat_netcdf module to read DSSAT-CSM FileX, soil, weather, and genotype-specific parameter inputs from NetCDF files is given in Appendix B and is illustrated visually in Figure 2. This example program also makes use of several other Fortran modules that implement a data structure (ordered_array) that stores its elements in increasing order, various utility functions and subroutines related to reading command line arguments (dssat_cmd_arg), the Following the weather file, an example of reading the NetCDF genotype parameter file is provided (Appendix B). Setting the file name and opening the NetCDF file follows the same procedure as for the other file types. However, instead of setting latitude and longitude coordinates (as is done for soil and weather files), the cultivar code is set by calling the type-bound subroutine nc_gen%set_cul_eco(). Following this call, the cultivar parameter P1V is read from the file using the typebound subroutine nc_gen%read_cul(). Similarly, the ecotype parameter PARUE is read from the file using the type-bound 5 subroutine nc_gen%read_eco() and the species parameter TRLFG is read using the type-bound subroutine nc_gen% read_spe(). More specifics on the NetCDF genotype parameter file structure is provided in Section 2.4.
In addition to the development of the DSSAT-CSM NetCDF interface, existing code was modified to ensure compatibility with the new DSSAT-CSM NetCDF interface. Subroutines modified to use the NetCDF interface are given in

Parallelization
Parallelization for the PG version of DSSAT-CSM relies on the Message Passing Interface (MPI), a widely used specification that follows a message-passing parallel programming model especially suitable for distributed memory systems (Message 20 Passing Interface Forum, 2015). The MPI specification was selected primarily due to its wide-spread usage and suitability for distributed memory systems (e.g. HPC clusters). Specifically, MPI has been used successfully in other applications for parallelizing crop models (Nichols et al., 2011;Kang et al., 2015;Jang et al., 2019;Vital et al., 2013). The message-passing parallel programming model allows multiple independent DSSAT-CSM instances to run concurrently, each of which manages its own memory and most of its input and output. This approach reduces the number of required modifications to the core 25 DSSAT-CSM model code and thereby eases the burden in maintaining synchronization between the PG version and the model code maintained by the DSSAT core developer team.
Overall, the DSSAT MPI interface consists of a Fortran module (dssat_mpi) and a control program ("parent" process) that spawns multiple instances of the DSSAT-CSM ("child" processes), assigns a range of treatment numbers to each instance, receives simulated results from each instance, and writes a NetCDF format output file. Several changes were made to the 30 DSSAT-CSM main program to open an MPI connection with the parent process, store simulated output for subsequent transfer, and transfer simulated output to the parent process. All communication between the parent process and each spawned child process is mediated through two derived types defined in the dssat_mpi module, namely mpi_parent and mpi_ child. An example program illustrating the parent process is given in Appendix C and a corresponding example child pro-    cess program is given in Appendix D. A visual illustration of the parent and child processes and communication between them is provided in Figure 3. The parent program begins by defining a comma-delimited list of variables and storing it in the mpi_parent%varlist variable and defining an output file name. The program then calls the type-bound subroutine mpi_parent%init(), which initiates an MPI connection by calling MPI_Init() as defined in the MPI specification.
Thereafter, the parent program sets several local variables that are used to spawn the DSSAT-CSM child processes. In the PG 5 version of DSSAT-CSM, each of these variables, mpi_parent%varlist, and the output file name are supplied to the master process program as command line arguments. The type-bound subroutine mpi_parent%spawn_dssat_children() is then called, which constructs working directory names, command line arguments, and commands for each of the children DSSAT-CSM processes and calls MPI_Comm_spawn_multiple(), the MPI specification for spawning multiple children processes. The command line flag --MPI is included in the argument list for each of the children DSSAT-CSM processes, 10 which signals to the executable to initiate an MPI connection and connect to the parent program. The mpi_parent%spawn_ dssat_children() subroutine also uses MPI_Send() (the MPI specification for performing a blocking send) to transfer the treatments assigned to each DSSAT-CSM child process, the output variable list, the run mode, and the crop code for simulation. These MPI data transfers are represented by a dashed arrow in Figure 3. After sending these data to each child process, the subroutine allocates an array of data structures (one element per child process) in which to store the simulated output from each child process. Each data structure (referred to as a variable registry) is then initialized using the comma-separated variable list contained in mpi_parent%varlist.
Appendix D provides an example child process that mimics the action of a DSSAT-CSM child process in the way in which it interacts with the parent process. The program begins by connecting to the parent process by calling the type-bound subroutine mpi_child%connect(), which calls MPI_Init(), MPI_Comm_rank() and MPI_Comm_get_parent() each 5 once and MPI_Recv() multiple times to receive the elements of data sent from the parent process. The subroutine then uses the system() intrinsic subroutine to run the mkdir shell command with the path for the specific working directory assigned to that child process. The chdir() intrinsic subroutine is then called to move the child process into that working directory to begin running simulations.
The reliance upon the mkdir shell command and the non-standard, GNU gfortran extension chdir() subroutine makes 10 the code less portable, thus, a few words on the matter are warranted here. Assigning child processes to their own specific working directories has been implemented to avoid competition between processes in simultaneously attempting to write to identical output files. Although most direct writing of output to text files has been eliminated, occasional messages are still generated by various components across the DSSAT-CSM and written to standard files (e.g. WARNING.OUT). With all child processes running in the same directory, competition for these files can become problematic. The long-term intention is to 15 eventually account for these sources of output and remove the need for invoking shell commands and non-standard subroutines.
Nevertheless, because most HPC clusters run some form of *nix operating system, the shell command strategy should not be problematic in most use cases. Likewise, the GNU compiler collection is available in most HPC environments. Users also have the option of customizing these two lines of code within the connect_to_parent() subroutine to fit their own compiler and operating system constraints.

20
Once the child process has connected with its parent, the child initializes a variable registry to contain seasonal simulated output by calling the seasonal_registry%csv_to_registry() with the mpi_child%varlist argument (Appendix D). This initializes a variable registry with an entry for each of the variables listed in mpi_child%varlist. The variable registry data structure is an instance of the derived type registry_type defined within the dssat_variable_ is of type real, then the real array is allocated. If the type is integer, then the integer array is allocated. To store values for a given model variable, one of the pointers (real or integer depending on the type of the variable) must also be associated with the address of the variable. In the example child program this is done by calling the seasonal_registry%set_target() subroutine first for a real variable rvar and then for an integer variable ivar (Appendix D). In the PG version of DSSAT-CSM, the seasonal_registry%set_target() subroutine is called in each subroutine where a variable of interest 35 is defined. For example, within the CROPSIM-CERES-Wheat CSCER() subroutine there is a line that calls seasonal_ registry%set_target('CWAM',CWAD), which associates the memory address of the CWAD state variable for aboveground biomass with the CWAM name within the variable registry. Once the addresses for all variables has been stored in the variable registry, the seasonal_registry%store() subroutine can be called at each iteration and the then current value at each of the associated memory addresses will be stored. An example of this is shown in Appendix D. Once all simula- in latitude. Once this process is complete, the output NetCDF file is created by calling nf90_output%create() and calling nf90_output%add_dim(), nf90_output%add_var() and nf90_output%write_variable() for each coordinate variable (lat, lon, and season). These operations are summarized in the dashed box within the Parent Process box in Figure 3. Once the output file is set up, the parent program calls the mpi_parent%receive_registries() type-bound subroutine, which listens for transfer of simulated output from the DSSAT-CSM children processes in the form of 25 a variable registry, as discussed above.
Once the DSSAT-CSM children processes complete all simulations and the parent program has received the simulated output, the parent program closes the MPI connection and invokes the rm shell command using the system() intrinsic subroutine to remove each of the directories previously created for the DSSAT-CSM children processes. The parent program then concludes by calling the type-bound subroutine nf90_output%write_netcdf() to write the the values returned by the children

Input file conventions
The NetCDF interface described in Section 2.2 is unusable without properly formatted input files. Thus, this section will document the file structures required for use with the DSSAT-CSM NetCDF interface. Header information from the example FileX, soil, weather and genotype-specific parameter input NetCDF files is presented in Appendices E, F, G and H.

Weather inputs
The NetCDF weather data file also has a relatively simple structure with the latitude and longitude dimensions defined in the same way as for the NetCDF soil data file (Appendix G). The NetCDF weather file also contains a third dimension and other variables following the DSSAT standard nomenclature for weather data are described in the WEATHER.CDE file (available at https://github.com/palderman/dssat-csm-os/tree/gridded/Data/WEATHER.CDE).

Genotype-specific parameters
Finally, the genotype-specific parameter (GSP) NetCDF file structure is slightly more complex than the NetCDF soil and weather data files although less so than the FileX. At present, the GSP NetCDF file is the only NetCDF input file that does defined with dimension l1 (with length 1) because that parameter is a scalar value and LAFS (leaf area senesced; also a species parameter) is defined with dimension l6 because it contains values for each of 6 growth stages. Cultivar and ecotypes in standard DSSAT format are specified using a unique 6-digit character code for each cultivar and ecotype. In the NetCDF GSP file these 6-digit character codes are prepended with either CUL (cultivar) or ECO (ecotype) to generate unique name for a scalar integer that stores the index value for that particular cultivar or ecotype. In order to compare performance of the text-based OS version to the PG version, a set of wrapper functions were written for the OS version in the R statistical programming language utilizing the ncdf4 (Pierce, 2019), Rmpi (Yu, 2002), and DSSAT (Alderman, 2020) R packages. The code for these wrapper functions is available at https://github.com/palderman/GridDSSAT. and performing data analysis. The tidyverse R package (Wickham et al., 2019) was used for data manipulation, and the 20 DiagrammeR (Iannone, 2020), ggplot2 (Wickham, 2016), raster (Hijmans, 2020) R packages were used to generate figures. Curves for analyzing execution time were fit using function nls() from the stats R package (R Core Team, 2020).

Input data sources
The grid used for simulations in this study was a 0.05 • grid matching that of the climate hazards infrared precipitation with stations (CHIRPS) dataset (Funk et al., 2015) clipped to the boundaries of the state of Oklahoma, USA. The Oklahoma state 25 boundary used for clipping was extracted from the TIGER/Line ® database (United States Census Bureau, 2016).

Soil data
Gridded soil data were derived from the STATSGO2 soil database (Soil Survey Staff, 2017), the National Elevation Dataset (NED; Gesch et al., 2018), and the 2017 Wheat Frequency Layer from the Cropland Data Layer dataset (CDL; National Agricultural Statistics Service, 2017). The CDL data were used to construct a 30m-resolution mask layer of potential wheat-30 growing areas by using the gdal_translate command-line utility (GDAL/OGR contributors, 2020) to convert the Wheat Frequency Layer from 8-bit integer to 32-bit integer and the gdalwarp utility to reproject the data into the Albers Equal-Area projection (the same projection as the STATSGO2 spatial data). The map unit key (MUKEY) for each grid point was extracted from the STATSGO2 database using the gdal_rasterize utility and the gdal_translate utility was used to mark water land cover (STATSGO2 MUKEY 657964) as missing data. The gdal_calc.py utility was then used to apply the reprojected Wheat Frequency Layer as a mask to the rasterized map unit key data producing a wheat-specific raster layer of 5 map unit keys. The wheat-specific map unit keys were then reprojected to the World Geodetic System 84 (WGS84) coordinate system and spatially resampled to the 0.05 • CHIRPS grid described above using gdalwarp. The spatial resampling was done using the mode resampling method, whereby the map unit key assigned to a given grid box was determined based on the most frequent map unit key within that grid box. The map unit key for each grid box was then used to extract the soil component and associated layer specific data for each grid point from STATSGO2. Slope was calculated from the NED 1/3 second resolution 10 using the terrain() function in the raster R package (Hijmans, 2020). The resulting slope data were then resampled to the 0.05 • CHIRPS grid using the aggregate() function in the raster R package (Hijmans, 2020

Weather data
The gridded weather data used for simulations were derived from data measured by the Oklahoma Mesonet (DOI: 10.15763/ dbs.mesonet). The Oklahoma Mesonet, commissioned in 1994, is an automated network of 120 remote, meteorological stations across Oklahoma (Brock et al., 1995;McPherson et al., 2007). Mesonet data are collected and transmitted to a central facility every 5 min where they are quality controlled, distributed, and archived (Shafer et al., 2000). Daily summaries of near-surface 20 cumulative solar radiation (MJ m −2 d −1 ) and rainfall (mm d −1 ), average relative humidity (percent) and windspeed (km d −1 ), and maximum and minimum temperature ( • C) were calculated from 5-min data for each station. The daily summaries were merged with coordinates for each station as provided by the updatestn() function of the okmesonet R package (Allred et al., 2014) and the spatially-referenced daily data were interpolated by inverse distance weighting (IDW) to the 0.05 • CHIRPS using the idw() function of the gstat R package (Pebesma, 2004;Gräler et al., 2016). Interpolation for a given grid point 25 was performed using the nearest 5 Mesonet stations with an IDW power of 2. As mentioned above, header information for the final NetCDF soil file can be found in Appendix G.

Genotype-specific parameters
where T t is the total measured execution time, T p is the estimated time spent in parallelized code, N c is the specified number of compute cores, and T s is the estimated time spent in serial code. The top panel of Figure 4 shows the difference in simulated winter wheat yield between the OS and PG versions of DSSAT-CSM for one simulated year. Although the majority of grid points had values at or near zero, deviations ranging from approximately -150 to 150 kg ha −1 were readily evident in the simulated output. The grid points at which these deviations occurred varied depending on the season of simulation, but the magnitude of deviations was on the same order of magnitude. In investigating possible sources of this discrepancy, the PG-MI version of DSSAT-CSM was implemented and run for the same 25 simulation set. The lower panel of Figure 4 shows the difference between simulated yield between the OS and PG-MI versions.
The fact that all PG-MI simulated yields were within 0.5 kg ha −1 of the OS simulated yields indicates that the rounding of soil data was the primary cause of the differences in yield observed in the top panel. The DSSAT-CSM has several options for how soil profile data are handled for simulation. When the FileX variable MESOL is set to 1 (as was the case in this study), fixed depths for soil layers are used. If the depths for soil layers in the soil input file do not match these soil layers, the data from  This subtle difference in values for soil variables does not generally result in large effects on the simulated output. However, in some seasons at some grid points when limited rainfall occurred, the differences in soil variables were large enough to cause detectable differences in simulated yield. Once modifications are made to DSSAT-CSM that avoid writing soil inputs 5 to intermediate text files (such as those documented in Section 2.1), these differences will be resolved. However, at present simulations run with the PG version in some study areas may differ from the OS version depending on input data and which MESOL option is selected. Although a thorough evaluation of parallel efficiency is beyond the scope of this study, these numbers suggest a relatively high potential for further scaling to higher numbers of compute cores.

Summary and Conclusions
This article documented the design and implementation of a parallel simulation framework with gridded input and output for the DSSAT-CSM using MPI and NetCDF libraries. This framework was demonstrated with simulations of wheat yield across