LARGE 0.2.0: 2D numerical modelling of geodynamic problems

We present here LARGE 0.2.0 (Lithosphere AsthenospheRe Geodynamic Evolution) a geodynamic modelling Python package that implements a flexible and user friendly tool for the geodynamic/modelling community. It simulates 2D large scale geodynamic processes by solving the conservation equations of mass, momentum, and energy by a finite difference method with the moving tracers technique. LARGE uses advanced modern numerical libraries and algorithms but unlike common simulation code written in Fortran or C this code is written entirely in Python. Simulations are driven by configuration 5 files that define thoroughly the lithologies and the parameters that distinguish the model. Documentation for them and for all the modules is included in the package together with a complete set of examples and utilities. The package can be used to reproduce results of published studies and models or to experiment new simulations. LARGE can run in serial mode on desktop computers but can take advantage of MPI to run in parallel on multi node HPC systems.

(eq. 1), momentum (eq. 2) and energy (eq. 3) in a 2-D continuum assuming material incompressibility: where the quantities are denoted in Table 1. DT /Dt is the material time derivative of T and the repeated indices i, j mean summation (Einstein notation). Eqs. (1, 2) are discretized on an Eulerian grid with a conservative finite difference scheme on a fully staggered mesh, that can be either regular or irregular (Duretz et al., 2011;Gerya and Yuen, 2007). Eq. (3)  The TIC method handles advection and the movement of material properties in the velocity field calculated from the Stokes equations 1, 2. The advection uses a CorrMinMod algorithm (Jenny et al., 2001;Pusok et al., 2017) coupled with a time and spatial explicit fourth-order Runge-Kutta scheme.
The deviatoric stress components at the current time step, σ (t) ij in Eq.
(2) is formulated from the rheological constitutive 75 relationships. The rheological model uses a Maxwell linear stress-strain relationship: where: Following Moresi et al. (2003), the material derivative of eq. 6 can be discretized with backward finite difference in time to: Substituting eqs. 5, 7 in eq. 4 the stress strain-rate relationship becomes: where Z, the visco-elastic factor, is defined as: Hence, the deviatoric stress in eq. 2 depends on the effective viscosity, the strain-rate, the advection time interval and the deviatoric stress of the previous time step corrected for rotation and material diffusion. In this formulation the effective viscosity is modified by elastic stress and modulated by Z. The effective viscosity is modified by the rheological behaviour of rocks such 90 as non-Newtonian material creep and plasticity: The effective viscosity of rocks depends on rheological parameters measured in laboratory experiments, temperature, pressure and stress according to the following relationship (Ranalli, 1995): 95 where F is a dimensionless correction term that depends on the type of experiment used to measure the rock rheological parameters and lets to transform the equation in tensorial format (Ranalli, 1995;Gerya and Yuen, 2007). Furthermore the effective viscosity can be reduced by plastic deformation according to the Drucker-Prager failure criterion: Plasticity is crucial in the deformation of the lithosphere since it contributes to shear localization and the enucleation of several 100 patterns of shear zones. The Stokes equations is non linear and the solution may not converge (Spiegelman et al., 2016) since in eq. 2 the deviatoric stress depends on viscosity and viscosity depends on the rheological equations and their parameters.
LARGE implements an iterative algorithm (Gerya, 2018(Gerya, , 2019 to solve this issue. A chain of tracers (Van Keken et al., 1997) represents and mimics the evolution of the surface topography. The chain is advected at each step of evolution according to the velocity field calculated by the Stokes equations. Additionally the surface 105 can be deformed by smoothing processes modelled by a standard diffusion equation (Pelletier, 2008): The equation is solved using the Cranck-Nicolson scheme to avoid numerical diffusion. This method is unconditionally stable therefore no time step iteration is needed.
In the TIC method quantities carried by the tracers are interpolated from the tracers cloud to the staggered grid of nodes 110 through bilinear interpolation according to a 4-Cell or 1-Cell scheme (Duretz et al., 2011).

Code design
LARGE was written from scratch, with an object-oriented structure, to be readable and reusable. The computation complexity of numerical simulation has been abstracted in classes that can be subclassed to add more features. LARGE can run in two models that cannot fit in a single computing node memory or to speed up the execution taking advantage of the multi core architecture of modern CPU or using HPC systems.

Dependencies
LARGE makes use of the following packages freely available from the Python ecosystem: -Numpy (Walt et al., 2011) implements an array data structure and fast vectorized basic arithmetic computations. It's a 120 standard in the scientific Python community and it is integrated in several scientific Python packages. Algorithms that process a single value at the time can be converted to operate on a set of values at the same time. Vectorization implies the substitution of for-loop construct with vectorized methods provides by Numpy. Sometimes vectorized code looses its readability but the speed improvement can be awesome.
-Scipy (Virtanen et al., 2020) is based on Numpy and provides fast routines for numerical integration, sparse matrix 125 representation, interpolation, statistics, optimization, linear algebra, statistics and more.
-QCOBJ (Vidmar and Creati, 2018) was created during the development of LARGE to allow the definition of physical quantities with units of measurement in configuration files freeing the users from possible unit conversions errors.
-H5py (Collette, 2013) is a bind to the High-performance data management and storage suite (The HDF Group, 1997). It provides a set of I/O operation on a file format designed to store and organize large volumes of data.

130
-Numba (Lam et al., 2015) is a just-in-time compiler for Python that speeds up array operations. The most common way to use Numba is through its collection of decorators that can be applied to functions to instruct Numba to compile them.
When a call is made to a Numba decorated function it is compiled to machine code "just-in-time" for execution and the code can subsequently run at native machine code speed.
-colorAlphabet (Vidmar, R., 2018) adds color to text written on a terminal using Paul Green-Armytage color alphabet 135 (Green-Armytage, 2010). LARGE uses it to differentiate logging messages originated from different processes and their severity.
-PySide2 (The Qt Company) (or PyQt5 (Riverbank Computing Limited)) is not strictly required to run LARGE but is necessary to take advantage of the graphic utilities included in the package.
For LARGEP also the following packages are needed:

140
-MPI4py (Dalcín et al., 2005) provides Python bindings for the Message Passing Interface (MPI) standard. It allows a Python program to execute on a computer cluster of different computing nodes using MPI to communicate.
-Petsc4py (Dalcin et al., 2011) allows the Portable Extensible Toolkit for Scientific Computation (PETSc) to be used in Python. PETSc is a suite of data structure and routines able to solve discrete algebraic equations, usually arising from the discretization of partial differential equations. It has many features, including scalable, parallel linear, and non linear 145 solvers, which are used and tested in numerous other models and scientific applications.
-miniga4py (Vidmar, R., 2020b) is a subset of the binding to the Global Array Toolkit (GA) (Nieplocha et al., 2006) tailored to operate in LARGE. GA implements the partitioned global address space (PGAS) parallel programming model and allows the creation and management of arrays defined in the context of a distributed data structures. GA arrays are global and can be used as if they were stored in a shared memory environment. All details of the data distribution, include get, put, scatter, and gather. These operations are truly one-sided/unilateral and will complete regardless of any action taken by the remote process(es) which own(s) the referenced data.
-h5pyp (Vidmar, R., 2020a) provides Python bindings for the parallel version of the High-performance data management and storage suite (The HDF Group, 1997).

The configuration file
LARGE numerical simulations are driven by a single configuration file. This text file has a hierarchical structure with many parameters related to the model and its evolution in time and five sections that define other features in detail.
-The Mesh section defines the geometry of the domain, the size and the layout of the grid.
-The Lithologies section defines the different type of rocks that make the model, their spatial arrangement and their 160 rheological behaviour through twenty parameters for each of them. Each lithology is bounded by polygons that are defined in this section.
-The Geothermic Model section defines areas of the model, delimited by polygons, with different initial temperature.
LARGE provides some of the most common geothermal models: conductive, radiogenic, half-space, plate. These are used in the initialization of the model to calculate the initial temperature of the Lagrangian tracers. As the definition of 165 the initial temperature distribution can be complex, LARGE allows the user to define his own model through a Python module provided at run time.
-The Boundary Conditions section defines the behaviour of the Stokes and Heat equations at the domain boundaries.
By now LARGE does not support every possible boundary condition but only few of the most commonly used in the geodynamic literature. Stokes equations can manage free-slip and no-slip conditions. Moreover a prescribed velocity can 170 be set on free-slip boundaries and the edges can also move (moving wall condition). Heat equation supports Dirichlet and Neumann conditions.
-The topography section defines the parameters for the creation and evolution of a 1D chain of tracers with a fixed elevation. An external file can be specified to load a height profile if the topography is not constant.
Every physical quantity related to the physics of the simulation must be expressed with its unit of measurement. LARGE, 175 while parsing the configuration file, checks its dimensionality and converts it into the right units required by the code.

Program execution
LARGE starts its execution creating a Model object instance, parsing the configuration file, and initializing some other objects that will be used during the evolution ( Fig. 1):  -The GlobalArrayManager (GAM) is trivially an array container when running in serial mode but allocates parallel dense bidimensional arrays among all available processors when running LARGEP. It provides a high level interface to some arithmetic operations on parallel arrays and methods to modify only parts of the allocated arrays. Based on miniga4py, GAM hides all the complexity of managing the distributed arrays between the computing nodes making 185 them available as Numpy arrays. All details of the data distribution, addressing, and data access are encapsulated in global arrays objects.
-The LagrangianTracers object manages the tracers used to record quantity changes in the moving medium. Tracers are a sparse cloud of points, built with a Halton distribution (Halton, 1964), that in parallel mode is distributed among all the processors. This object takes care of all communication between the processors regarding tracers distribution, advection, 190 deletion, repopulation and ghosts handling through algorithms that use Numpy and MPI.
-The Topography object, whose usage is activated in the configuration file, provides methods to evaluate how the top of a model deforms. This profile is represented as a 1D chain of tracers advected by the velocity calculated solving the Stokes equations. It is furthermore deformed by an erosional process described by a diffusion equation.  Figure 2. Flowchart of the evolve and VPI algorithms used in LARGE.
-The ComputeEngine object groups operations on global arrays or tracers that do not involve any interpolation of quan-195 tities from tracers to node or node to tracers.
-The InterpolatorEngine object instead implements all algorithms for the interpolation of quantities from tracers to node and back.   At this point all the quantities have been calculated and can now be saved to an HDF5 file. This step file is a snapshot of the model evolution at a certain time from the beginning and is the output of the simulation.
Step files have a hierarchical structure ( Fig. 3) with the following HDF5 groups: arrays of the physical quantities defined on the Lagrangian grid (2D arrays), tracers (1D arrays), mesh and optionally topography. Each group contains the arrays as datasets.

240
As the size of these files can be large and the I/O operations slow down the model evolution, the configuration file provides options to choose the quantities to save and how often to save them. The quantities saved by default are necessary to allow LARGE to continue a previously interrupted simulation, in append mode, re-starting from the last valid step.
LARGE uses an MPI compliant logging system that helped during the developing stage since it lets to follow the single phases of the evolution in every processor when running LARGEP.

4 Command line utilities
LARGE integrates some tools that can be executed from the command line.
cfg_gui is a graphical tool that helps the creation and tuning of the configuration file which is usually a long and error prone procedure. It shows the hierarchical structure of the configuration file with a vertical tree representation. Each field of the tree can be edited to change the value. Moreover it is possible to compare two and even more configuration files.  This proved to be useful when only few parameters change between two simulations. It is based on the cfggui module provided by QCOBJ (Fig. 4a) and integrates a graphic editor designed to visualize and interactively edit the polygons defined in the configuration file (Fig. 4b).
large2largep This tool has been developed to automate the process of downloading, compiling and installing all the libraries and Python packages required by LARGEP.

255
largep is the launcher of LARGEP. It accepts all the options of LARGE and also: sets the number of MPI process to use; kills hanging LARGEP process of simulations ended abnormally; executes every single LARGE processes in separate terminal windows, one for each process, to make debugging easier. largel allows to visualize and search through LARGE log files that use ANSI escape codes to add color information to the text.

265
largeinfo is a GUI application created to display the content of the base output file of a LARGE simulation run . The command line used when the simulation was launched, the software versions of the most important packages used, the whole configuration file content, the last step done and the number of MPI processes used are some of the information provided.

Examples
The ability of LARGE to solve geodynamic problems can be tested with some examples inspired from well known models 5b) evolution steps shear bands appear with an angle of around 26 • that is close to the Coulomb angle of 45 • ± θ/2 where θ is the angle of friction. At step 180 (Fig. 5b) some shear bands die but if they reach a boundary they are reflected, as the boundary acts as a point to initiate a new shear band. This phenomenon happens because the VPI is enabled, if disabled, setting just one iteration for the VPI in the cfg, no shear bands will appear.

285
The setup of plate subduction models can be tricky since the lithological and the geothermal model may have many different subsections but even if the overall geometry can be complex, large_cfg_gui can help to draw the model. We will describe here two models: 13 https://doi.org/10.5194/gmd-2020-372 Preprint. Discussion started: 16 December 2020 c Author(s) 2020. CC BY 4.0 License. the plates converge model which is the subduction of an oceanic plate under a continental one driven by the oceanic plate converge velocity (Gorczyk et al., 2007). The configuration file for this model, subduction.cfg is located in the cfg 290 folder of the LARGE source code repository.
the slab retreat model which is the subduction and retreat of an oceanic plate under a younger oceanic one driven by the buoyancy of the older. The configuration file for this model, slab_retreat.cfg is located in the cfg folder of the LARGE source code repository.

Plates converge 295
The model (Fig. 6a) consists of a continental plate with a 35 km crust, composed by 20 km of upper crust wet quarzite (Ranalli, 1995) and 15 km of lower crust anorthosite (Ranalli, 1995), with the base of the continental lithosphere fixed at 80 km. The lid and the asthenosphere adopt a dry olivine rheology (Ranalli, 1995). The oceanic plate has a lithosphere thickness of 60 km a wet olivine rheology with no friction, is configured at the boundary between the oceanic and continental plate to promote the subduction start. The initial temperature distribution is different for the continental and oceanic zones. The initial continental lithosphere temperature is described by a conductive model while the initial temperature of the oceanic lithosphere is calculated 310 by a standard plate model (Parson and Sclater, 1977). VPI is enabled to ensure the occurrence of plastic shearing in the brittle parts of the lithosphere.
The weak zone triggers the subduction and after 9.4 My the tip of the bending slab reaches a depth of 200 km (Fig. 6b). The bending is favored by plastic deformation along shear bands in the upper portion of the subduction slab.

315
The slab retreat model (Fig. 7a) is an opportunity to show how to setup a user provided geothermal model module. Here, the subduction is driven by the buoyancy of the subducting plate. There are no prescribed velocities at the edges of the domain and the boundaries cannot move. The geometry is characterized by two oceanic plates of different ages, 10 My and 60 My, with the older one that is already subducting and has penetrated the asthenosphere. A weak zone, approximated by a wet olivine rheology with 0 friction, is configured at the boundary between the two oceanic plates to favour plates decoupling. In 320 this model the subducting plate is penetrating the asthenosphere and temperature must be calculated in an inclined slab ( Fig.   7b) but LARGE does not provide any function to calculate the initial temperature in complex geometries so a custom Python module was created and loaded at run time to compute the right temperature. The LAB temperature has been fixed at 1330°C , and the heat equation is solved setting the Neumann condition at the left and right boundaries and the Dirichlet condition, km in depth (y axis) and has 10 million Lagrangian tracers. VPI is enabled to ensure the occurrence of plastic shearing in the brittle parts of the lithosphere. The model evolution stresses the natural bending of the slab under its own weight triggering the spontaneous retreat of the subducting plate (Fig. 7c). Mantle material upwells in the space opened between the two plates. As in the previous example, shear bands form in the upper portion of the subducting lithosphere favoured by the bending.

330
The structural setting of a stretched lithosphere depends on how the extension distributes in the lithosphere. Extension can be uniform or not uniform hence the overall results can be seen in several geodynamic contexts around the world (Huismans and Beaumont, 2003;Huismans et al., 2005;Huismans and Beaumont, 2014).
An extensional model is really easy to set up since the lithological model consists of straight layers while the initial temperature can be described by a conductive geotherm (Fig. 8a) and a small seed of weak mantle in the lid that helps to localize the deformation. A wet quarzite rheology is used for the crust and a dry-mantle rheology for the mantle. The initial temperature consists of three conductive layers whose temperatures are The model evolves with the formation of paired symmetrical shear zones in the lithosphere that accomodates the thinning and stretching (Fig. 8b). Lid thinning is extreme and at 4.5 My the mantle lithosphere is completely removed under the stretched 350 crust. A non-uniform stretching model can be achieved by setting a low stretching velocity and enabling strain softening (Huismans and Beaumont, 2003). LARGE allows to set two values for the cohesion and friction angle parameters used to evaluate the plastic rheologies of rocks. The asymmetric model uses the same configuration of the symmetric one with some minor changes: double values for friction and accumulated strain and an extensional velocity of 0.6 cmy −1 . The configuration file for this model, rift_asym.cfg is located in the cfg folder of the LARGE source code repository. Further localization of the 355 extension at different depths can be achieved by separately scaling the viscosity (creep viscosity) of some layers (upper and lower crust, or mantle lithosphere). The model (Fig. 8c) develops an asymmetric mode of stretching where the deformation localizes on the right shear zone that drives the overall crustal stretching. The lid and the crust show two different thinning 6 Parallel scalability LARGE source code has undergone several improvement phases during its development in order to make it as fast as possible.
The code has been profiled to find the bottlenecks and the algorithms have been tuned for performance but LARGE parallel features depend mainly on third party packages hence much of the parallel scaling resides on them. We arranged a strong scaling test using the shear bands model described in section 5.1 with 10 million Lagrangian tracers and a 600x600 node grid. 365 The test was run using the GALILEO supercomputer at CINECA. In this test the heat solver and the VPI were deactivated to reduce the CPU usage for each time step and no HDF5 output file was generated to make it independent of I/O. Taking this into account the scaling tests results are promising (Fig. 9). The measured speedup has been fitted to the Amdahl law (Amdahl, 1967): where S is the speed up, P is the number of parallel processors and α is the serial fraction of the code. From our tests we calculated for this model an α value of 0.06 therefore only 6 percent of the code is serial and a further speed up could only be achieved optimizing this part while any other increase of the number of processor cannot reduce the computing time but on the contrary increases it as we observed.

375
We presented here LARGE, a finite difference/tracers modeling Python package for the geodynamic community. It has been designed to be flexible and user friendly and developed with advanced modern numerical libraries and algorithms and a structured input system for setting all simulation parameters. All Python modules which make LARGE, the command line utilities and the configuration file reference are thoroughly documented in HTML and included in the package together with a complete set of examples. We have showed, in section 5, the ability of LARGE to reproduce results of published studies and models.

380
In the wish list, for the upcoming developments, there are the extension of the supported boundary conditions for the Stokes equation, the use of a petrological model to compute rock density, thermal expansion and heat capacity of rocks, the migration of fluids (water or magma melt) in the lithosphere and their effects.
Code availability. LARGE 0.2.0 code is freely available, with a MIT license, from https://bitbucket.org/ncreati/large. The serial version of LARGE depends on the following Python packages: Numpy, Scipy, Numba, QCObj, h5py, ColorAlphabet. The parallel version also depends 385 on: mpi4py, petsc4py, h5pyp and miniga4py. All configuration files used to reproduce examples models of Sect.5 as weel as plotting utilities are included in the repository and in the frozen archive attached to the paper.