the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A fast and efficient MATLABbased MPM solver: fMPMMsolver v1.1
Yury Alkhimenkov
Michel Jaboyedoff
Yury Y. Podladchikov
We present an efficient MATLABbased implementation of the material point method (MPM) and its most recent variants. MPM has gained popularity over the last decade, especially for problems in solid mechanics in which large deformations are involved, such as cantilever beam problems, granular collapses and even largescale snow avalanches. Although its numerical accuracy is lower than that of the widely accepted finite element method (FEM), MPM has proven useful for overcoming some of the limitations of FEM, such as excessive mesh distortions. We demonstrate that MATLAB is an efficient highlevel language for MPM implementations that solve elastodynamic and elastoplastic problems. We accelerate the MATLABbased implementation of the MPM method by using the numerical techniques recently developed for FEM optimization in MATLAB. These techniques include vectorization, the use of native MATLAB functions and the maintenance of optimal RAMtocache communication, among others. We validate our inhouse code with classical MPM benchmarks including (i) the elastic collapse of a column under its own weight; (ii) the elastic cantilever beam problem; and (iii) existing experimental and numerical results, i.e. granular collapses and slumping mechanics respectively. We report an improvement in performance by a factor of 28 for a vectorized code compared with a classical iterative version. The computational performance of the solver is at least 2.8 times greater than those of previously reported MPM implementations in Julia under a similar computational architecture.
The material point method (MPM), developed in the 1990s (Sulsky et al., 1994), is an extension of a particleincell (PIC) method to solve solid mechanics problems involving massive deformations. It is an alternative to Lagrangian approaches (updated Lagrangian finite element method) that is well suited to problems with large deformations involved in geomechanics, granular mechanics or even snow avalanche mechanics. Vardon et al. (2017) and Wang et al. (2016c) investigated elastoplastic problems of the strain localization of slumping processes relying on an explicit or implicit MPM formulation. Similarly, Bandara et al. (2016), Bandara and Soga (2015), and Abe et al. (2014) proposed a poroelastoplastic MPM formulation to study levee failures induced by pore pressure increases. Additionally, Baumgarten and Kamrin (2019), Dunatunga and Kamrin (2017), Dunatunga and Kamrin (2015), and Więckowski (2004) proposed a general numerical framework of granular mechanics, i.e. silo discharge or granular collapses. More recently, Gaume et al. (2019, 2018) proposed a unified numerical model in the finite deformation framework to study the whole process (i.e. from failure to propagation) of slab avalanche releases.
The core idea of MPM is to discretize a continuum with material points carrying state variables (e.g. mass, stress and velocity). The latter are mapped (accumulated) to the nodes of a regular or irregular background finite element (FE) mesh, on which an Eulerian solution to the momentum balance equation is explicitly advanced forward in time. Nodal solutions are then mapped back to the material points, and the mesh can be discarded. The mapping from material points to nodes is ensured using the standard FE hat function that spans over an entire element (Bardenhagen and Kober, 2004). This avoids a common flaw of the finite element method (FEM), which is an excessive mesh distortion. We will refer to this first variant as the standard material point method (sMPM).
MATLAB© allows a rapid code prototyping, although it is at the expense of significantly lower computational performance than compiled language. An efficient MATLAB implementation of FEM called MILAMIN (Million a Minute) was proposed by Dabrowski et al. (2008) that was capable of solving twodimensional linear problems with 1 million unknowns in 1 min on a modern computer with a reasonable architecture. The efficiency of the algorithm lies in the combined use of vectorized calculations with a technique called blocking. MATLAB uses the Linear Algebra PACKage (LAPACK), written in Fortran, to perform mathematical operations by calling basic linear algebra subprograms (BLAS, Moler, 2000). The latter results in an overhead each time a BLAS call is made. Hence, mathematical operations over a large number of small matrices should be avoided, and operations on fewer and larger matrices should be preferred. This is a typical bottleneck in FEM when local stiffness matrices are assembled during the integration point loop within the global stiffness matrix. Dabrowski et al. (2008) proposed an algorithm in which a loop reordering is combined with operations on blocks of elements to address this bottleneck. However, data required for a calculation within a block should entirely reside in the CPU's cache; otherwise, additional time is spent on the RAMtocache communication, and the performance decreases. Therefore, an optimal block size exists, and it is solely defined by the CPU architecture. This technique of vectorization combined with blocking significantly increases the performance.
More recently, Bird et al. (2017) extended the vectorized and blocked algorithm presented by Dabrowski et al. (2008) to the calculation of the global stiffness matrix for discontinuous Galerkin FEM considering linear elastic problems using only native MATLAB functions. Indeed, the optimization strategy chosen by Dabrowski et al. (2008) also relied on nonnative MATLAB functions, e.g. sparse2
of the SuiteSparse package (Davis, 2013). In particular, Bird et al. (2017) showed the importance of storing vectors in a columnmajor form during calculation. Mathematical operations are performed in MATLAB by calling LAPACK, written in Fortran, in which arrays are stored in columnmajor order form. Hence, elementwise multiplication of arrays in columnmajor form is significantly faster; thus, vectors in columnmajor form are recommended whenever possible. Bird et al. (2017) concluded that vectorization alone results in a performance increase of between 13.7 and 23 times, whereas blocking only improved vectorization by an additional 1.8 times. O’Sullivan et al. (2019) recently extended the works of Bird et al. (2017) and Dabrowski et al. (2008) to optimized elastoplastic codes for continuous Galerkin (CG) or discontinuous Galerkin (DG) methods. In particular, they proposed an efficient native MATLAB function, accumarray()
, to efficiently assemble the internal force vector. This kind of function constructs an array by accumulation. More generally, O’Sullivan et al. (2019) reported a performance gain of 25.7 times when using an optimized CG code instead of an equivalent nonoptimized code.
As MPM and FEM are similar in their structure, we aim to improve the performance of MATLAB up to the level reported by Sinaie et al. (2017) using the Julia language environment. In principal, Julia is significantly faster than MATLAB for an MPM implementation. We combine the most recent and accurate versions of MPM: the explicit generalized interpolation material point method (GIMPM, Bardenhagen and Kober, 2004) and the explicit convected particle domain interpolation with secondorder quadrilateral domains (CPDI2q and CPDI; Sadeghirad et al., 2013, 2011) variants with some of the numerical techniques developed during the last decade of FEM optimization in MATLAB. These techniques include the use of accumarray()
, optimal RAMtocache communication, minimum BLAS calls and the use of native MATLAB functions. We did not consider the blocking technique initially proposed by Dabrowski et al. (2008), as an explicit formulation in MPM excludes the global stiffness matrix assembly procedure. The performance gain mainly comes from the vectorization of the algorithm, whereas blocking has a less significant impact over the performance gain, as stated by Bird et al. (2017). The vectorization of MATLAB functions is also crucial for a straight transpose of the solver to a more efficient language, such as the CCUDA language, which allows the parallel execution of computational kernels of graphics processing units (GPUs).
In this contribution, we present an implementation of an efficiently vectorized explicit MPM solver, fMPMMsolver (v1.1 is available for download from Bitbucket at https://bitbucket.org/ewyser/fmpmmsolver/src/master/, last access: 6 October 2020), taking advantage of the vectorization capabilities of MATLAB©. We extensively use native functions of MATLAB©, such as repmat( )
, reshape( )
, sum( )
or accumarray( )
. We validate our inhouse code with classical MPM benchmarks including (i) the elastic collapse of a column under its own weight; (ii) the elastic cantilever beam problem; and (iii) existing experimental and numerical results, i.e. granular collapses and slumping mechanics respectively.
We demonstrate the computational efficiency of a vectorized implementation over an iterative one for an elastoplastic collapse of a column. We compare the performance of the Julia and MATLAB language environments for the collision of two elastic discs problem.
2.1 A material point method implementation
The material point method (MPM), originally proposed by Sulsky et al. (1995, 1994) in an explicit formulation, is an extension of the particleincell (PIC) method. The key idea is to solve the weak form of the momentum balance equation on an FE mesh while state variables (e.g. stress, velocity or mass) are stored at Lagrangian points discretizing the continuum, i.e. the material points, which can move according to the deformation of the grid (Dunatunga and Kamrin, 2017). MPM could be regarded as a finite element solver in which integration points (material points) are allowed to move (Guilkey and Weiss, 2003) and are, thus, not always located at the Gauss–Legendre location within an element, resulting in higher quadrature errors and poorer integration estimates, especially when using loworder basis functions (Steffen et al., 2008a, b).
A typical calculation cycle (see Fig. 1) consists of the three following steps (Wang et al., 2016a):

a mapping phase, during which properties of the material point (mass, momentum or stress) are mapped to the nodes;

an updated Lagrangian FEM (ULFEM) phase, during which the momentum equations are solved on the nodes of the background mesh, and the solution is explicitly advanced forward in time;

a convection phase, during which (i) the nodal solutions are interpolated back to the material points, and (ii) the properties of the material point are updated.
Since the 1990s, several variants have been introduced to resolve a number of numerical issues. The generalized interpolation material point method (GIMPM) was first presented by Bardenhagen and Kober (2004). They proposed a generalization of the basis and gradient functions that were convoluted with a characteristic domain function of the material point. A major flaw in sMPM is the lack of continuity of the gradient basis function, resulting in spurious oscillations of internal forces as soon as a material point crosses an element boundary while entering into its neighbour. This is referred to as cellcrossing instabilities due to the C_{0} continuity of the gradient basis functions used in sMPM. This issue is minimized by the GIMPM variant (Acosta et al., 2020).
GIMPM is categorized as a domainbased material point method, unlike the later development of the Bspline material point method (BSMPM, e.g. de Koster et al., 2020; Gan et al., 2018; Gaume et al., 2018; Stomakhin et al., 2013) which cures cellcrossing instabilities using Bspline functions as basis functions. Whereas only nodes belonging to an element contribute to a given material point in sMPM, GIMPM requires an extended nodal connectivity, i.e. the nodes of the element enclosing the material point and the nodes belonging to the adjacent elements (see Fig. 2). More recently, the convected particle domain interpolation (CPDI and its most recent development CPDI2q) has been proposed by Sadeghirad et al. (2013, 2011).
We choose the explicit GIMPM variant with the modified update stress last scheme (MUSL; see Nairn, 2003, and Bardenhagen et al., 2000, for a detailed discussion), i.e. the stress of the material point is updated after the nodal solutions are obtained. The updated momentum of the material point is then mapped back to the nodes a second time in order to obtain an updated nodal velocity, which is further used to calculate derivative terms such as strains or the deformation gradient of the material point. The explicit formulation also implies the wellknown restriction on the time step, which is limited by the Courant–Friedrichs–Lewy (CFL) condition to ensure numerical stability.
Additionally, we implemented a CPDI/CPDI2q version (in an explicit and quasistatic implicit formulation) of the solver. However, in this paper, we do not present the theoretical background of the CPDI variant nor the implicit implementation of an MPMbased solver. Therefore, interested readers are referred to the original contributions of Sadeghirad et al. (2013, 2011) for the background of the CPDI variant and to the contributions of Acosta et al. (2020), Charlton et al. (2017), Iaconeta et al. (2017), Beuth et al. (2008), and Guilkey and Weiss (2003) for an implicit implementation of an MPMbased solver. Regarding the quasistatic implicit implementation, we strongly adapted our vectorization strategy to some aspects of the numerical implementation proposed by Coombs and Augarde (2020) in the MATLAB code AMPLE v1.0. However, we did not consider blocking, because our main concern regarding performance is on the explicit implementation.
2.2 Domainbased material point method variants
Domainbased material point method variants could be treated as two distinct groups:

the material point's domain is a square for which the deformation is always aligned with the mesh axis, i.e. a nondeforming domain uGIMPM (Bardenhagen and Kober, 2004), or it is a deforming domain cpGIMPM, (Wallstedt and Guilkey, 2008), where the latter is usually related to a measure of the deformation, e.g. the determinant of the deformation gradient;

the material point's domain is either a deforming parallelogram that has its dimensions specified by two vectors, i.e. CPDI (Sadeghirad et al., 2011), or it is a deforming quadrilateral solely defined by its corners, i.e. CPDI2q (Sadeghirad et al., 2013). However, the deformation is not necessarily aligned with the mesh anymore.
We first focus on the different domainupdating methods for GIMPM. Four domainupdating methods exists: (i) the domain is not updated, (ii) the deformation of the domain is proportional to the determinant of the deformation gradient det(F_{ij}) (Bardenhagen and Kober, 2004), (iii) the domain lengths l_{p} are updated according to the principal component of the deformation gradient F_{ii} (Sadeghirad et al., 2011) or (iv) the domain lengths l_{p} are updated with the principal component of the stretching part of the deformation gradient U_{ii} (Charlton et al., 2017). Coombs et al. (2020) highlighted the suitability of generalized interpolation domainupdating methods according to distinct deformation modes. Four different deformation modes were considered by Coombs et al. (2020): simple stretching, hydrostatic compression or extension, simple shear and pure rotation. Coombs et al. (2020) concluded the following:

Not updating the domain is not suitable for simple stretching and hydrostatic compression or extension.

A domain update based on det(F_{ij}) will results in an artificial contraction or expansion of the domain for simple stretching.

The domain will vanish with increasing rotation when using F_{ii}.

The domain volume will change under isochoric deformation when using U_{ii}.
Consequently, Coombs et al. (2020) proposed a hybrid domain update inspired by CPDI2q approaches: the corners of the material point domain are updated according to the nodal deformation, but the midpoints of the domain limits are used to update the domain lengths l_{p} to maintain a rectangular domain. Even though Coombs et al. (2020) reported an excellent numerical stability, the drawback is to compute specific basis functions between nodes and material point's corners, which has an additional computational cost. Hence, we did not selected this approach in this contribution.
Regarding the recent CPDI/CPDI2q version, Wang et al. (2019) investigated the numerical stability under stretching, shear and torsional deformation modes. CPDI2q was found to be erroneous in some case, especially when the torsion mode was involved, due to distortion of the domain. In contrast, CPDI and even sMPM showed better performance with respect to modelling torsional deformations. Although CPDI2q can exactly represent the deformed domain (Sadeghirad et al., 2013), care must be taken when dealing with very large distortion, especially when the material has yielded, which is common in geotechnical engineering (Wang et al., 2019).
Consequently, the domainbased method and the domainupdating method should be carefully chosen according to the deformation mode expected for a given case. The domainupdating method will be clearly mentioned for each case throughout the paper.
3.1 Rate formulation and elastoplasticity
The large deformation framework in a linear elastic continuum requires an appropriate stress–strain formulation. One approach is based on the finite deformation framework, which relies on a linear relationship between elastic logarithmic strains and Kirchhoff stresses (Coombs et al., 2020; Gaume et al., 2018; Charlton et al., 2017). In this study, we adopt another approach, namely a ratedependent formulation using the Jaumann stress rate (e.g. Huang et al., 2015; Bandara et al., 2016; Wang et al., 2016c, b). This formulation provides an objective (invariant by rotation or frameindifferent) stress rate measure (de Souza Neto et al., 2011) and is simple to implement. The Jaumann rate of the Cauchy stress is defined as
where C_{ijkl} is the fourth rank tangent stiffness tensor, and v_{k} is the velocity. Thus, the Jaumann stress derivative can be written as
where ${\mathit{\omega}}_{ij}=({\partial}_{i}{v}_{j}{\partial}_{j}{v}_{i})/\mathrm{2}$ is the vorticity tensor, and Dσ_{ij}∕Dt denotes the material derivative
Plastic deformation is modelled with a pressuredependent Mohr–Coulomb law with nonassociated plastic flow, i.e. both the dilatancy angle ψ and the volumetric plastic strain ${\mathit{\u03f5}}_{\mathrm{v}}^{\mathrm{p}}$ are null (Vermeer and De Borst, 1984). We have adopted the approach of Simpson (2017) for a twodimensional linear elastic, perfectly plastic (elastoplasticity) continuum because of its simplicity and its ease of implementation. The yield function is defined as
where c is the cohesion, ϕ the angle of internal friction,
and
The elastic state is defined when f<0; however, when f>0, plastic state is declared and stresses must be corrected (or scaled) to satisfy the condition f=0, as f>0 is an inadmissible state. Simpson (2017) proposed the following simple algorithm to return stresses to the yield surface:
where $\mathit{\beta}=(\mid c\mathrm{cos}\mathit{\varphi}\mathit{\sigma}\mathrm{sin}\mathit{\varphi}\mid )/\mathit{\tau}$, and ${\mathit{\sigma}}_{xx}^{\ast}$, ${\mathit{\sigma}}_{yy}^{\ast}$ and ${\mathit{\sigma}}_{xy}^{\ast}$ are the corrected stresses, i.e. f=0.
A similar approach is used to return stresses when considering a nonassociated Drucker–Prager plasticity (see Huang et al., 2015, for a detailed description of the procedure). In addition, their approach also allows one to model associated plastic flows, i.e. ψ>0 and ${\mathit{\u03f5}}_{\mathrm{v}}^{\mathrm{p}}\ne \mathrm{0}$.
3.2 Structure of the MPM solver
The solver procedure is shown in Fig. 3. In the main.m
script, the respective meSetup.m
and mpSetup.m
functions define the geometry and related quantities such as the nodal connectivity (or element topology) array, e.g. the e2N
array. The latter stores the nodes associated with a given element. As such, a material point p located in an element e can immediately identify which nodes n it is associated with.
After initialization, a while loop solves the elastodynamic (or elastoplastic) problem until a time criterion T is reached. This time criterion could be restricted to the time needed for the system to reach an equilibrium or if the global kinetic energy of the system has reached a threshold.
At the beginning of each cycle, a connectivity array p2e
between the material points and their respective element (a material point can only reside in a single element) is constructed. As (i) the nodes associated with the elements and (ii) the elements enclosing the material points are known, it is possible to obtain the connectivity array p2N
between the material points and their associated nodes, e.g. p2N=e2N(p2e,:)
in a MATLAB syntax (see Fig. 2 for an example of these connectivity arrays). This array is of dimension (n_{p},n_{n}). Here, n_{p} is the total number of material points; n_{n} is the total number of nodes associated with an element (16 in twodimensional problems); and n_{i,j} is the node number, where i corresponds to the material point and j corresponds to its jth associated nodes, which results in the following:
The following five functions are called successively during one calculation cycle:

SdS.m
calculates the basis functions, the derivatives and assembles the straindisplacement matrix for each material point. 
p2Nsolve.m
projects the quantities of the material point (e.g. mass and momentum) to the associated nodes, solves the equations of motion and sets boundary conditions. 
mapN2p.m
interpolates nodal solutions (acceleration and velocity) to the material points with a doublemapping procedure (see Zhang et al., 2016, or Nairn, 2003, for a clear discussion of update stress first, update stress last and MUSL algorithms). 
DefUpdate.m
updates incremental strains and deformationrelated quantities (e.g. the volume of the material point or the domain halflength) at the level of the material point based on the remapping of the updated material point momentum. 
constitutive.m
calls two functions to solve for the constitutive elastoplastic relation, namely
elastic.m
, which predicts an incremental objective stress assuming a purely elastic step, further corrected by 
plastic.m
, which corrects the trial stress by a plastic correction if the material has yielded.

When a time criterion is met, the calculation cycle stops and further postprocessing tasks (visualization, data exportation) can be performed.
The numerical simulations are conducted using MATLAB© R2018a within a Windows 7 64bit environment on an Intel Core i74790 (fourthgeneration CPU with four physical cores of base frequency at 3.60 GHz up to a maximum turbo frequency of 4.00 GHz) with 4×256 kB L2 cache and 16 GB DDR3 RAM (clock speed 800 MHz).
3.3 Vectorization
3.3.1 Basis functions and derivatives
The GIMPM basis function (Coombs et al., 2018; Steffen et al., 2008a; Bardenhagen and Kober, 2004) results from the convolution of a characteristic particle function χ_{p} (i.e. the material point spatial extent or domain) with the standard basis function N_{n}(x) of the mesh, which results in
Here, l_{p} is the length of the material point domain; h is the mesh spacing; and $x={x}_{p}{x}_{n}$, where x_{p} is the coordinate of a material point and x_{n} the coordinate of its associated node n. The basis function of a node n with its material point p is constructed for a twodimensional model, as follows:
Here, the derivative is defined as
Similar to the FEM, the straindisplacement matrix B consists of the derivatives of the basis function and is assigned to each material point, which results in the following:
where n_{n} is the total number of associated nodes to an element e, in which a material point p resides.
The algorithm outlined in Fig. 4 (the function [mpD] = SdS(meD,mpD,p2N)
called at the beginning of each cycle; see Fig. 4) represents the vectorized solution of the computation of basis functions and their derivatives.
Coordinates of the material points mpD.x(:,1:2)
are first replicated and then subtracted by their associated nodes' coordinates, e.g. meD.x(p2N)
and meD.y(p2N)
respectively (lines 3 or 5 in Fig. 4). This yields the array D
with the same dimension of p2N
. This array of distance between the points and their associated nodes is sent as an input to the nested function [N,dN] = NdN(D,h,lp)
, which computes the onedimensional basis function and its derivative through matrix elementwise operations (operator .*
) (either in line 4 for x coordinates or line 6 for y coordinates in Fig. 4).
Given the piecewise Eq. (11), three logical arrays (c1
, c2
and c3
) are defined (lines 21–24 in Fig. 4), whose elements are either one (the condition is true) or zero (the condition is false). Three arrays of basis functions are calculated (N1
, N2
and N3
, lines 26–28) according to Eq. (12). The array of basis functions N
is obtained through a summation of the elementwise multiplications of these temporary arrays with their corresponding logical arrays (line 29 in Fig. 4). The same holds true for the calculation of the gradient basis function (lines 31–34 in Fig. 4). It is faster to use logical arrays as multipliers of precomputed basis function arrays rather than using these in a conditional indexing statement, e.g. N(c2==1) = 1abs(dX(c2==1))./h
. The performance gain is significant between the two approaches, i.e. an intrinsic 30 % gain over the wallclock time of the basis functions and derivatives calculation. We observe an invariance of such gain with respect to the initial number of material points per element or to the mesh resolution.
3.3.2 Integration of internal forces
Another computationally expensive operation for MATLAB© is the mapping (or accumulation) of the material point contributions to their associated nodes. It is performed by the function p2Nsolve.m
in the workflow of the solver.
The standard calculations for the material point contributions to the lumped mass m_{n}, the momentum p_{n}, the external ${\mathit{f}}_{n}^{e}$ and internal ${\mathit{f}}_{n}^{i}$ forces are given by
where m_{p} is the material point mass, v_{p} is the material point velocity, b_{p} is the body force applied to the material point and σ_{p} is the material point Cauchy stress tensor in the Voigt notation.
Once the mapping phase is achieved, the equations of motion are explicitly solved forward in time on the mesh. Nodal accelerations a_{n} and velocities v_{n} are given by
Finally, boundary conditions are applied to the nodes that belong to the boundaries.
The vectorized solution comes from the use of the builtin function accumarray( )
of MATLAB© combined with reshape( )
and repmat( )
. The core of the vectorization is to use p2N
as a vector (i.e. flattening the array p2N(:)
results in a row vector) of subscripts with accumarray
, which accumulates material point contributions (e.g. mass or momentum) that share the same node.
In the function p2Nsolve
(code fragment 2, shown in Fig. 5), the first step is to initialize nodal vectors (e.g. mass, momentum and forces) to zero (lines 4–5 in Fig. 5). Then, temporary vectors (m
, p
, f
and fi
) of material point contributions (namely, mass, momentum, and external and internal forces) are generated (lines 10–17 in Fig. 5). The accumulation (nodal summation) is performed (lines 19–26 in Fig. 5) using either the flattened p2n(:)
or l2g(:)
(e.g. the global indices of nodes) as the vector of subscripts. Note that for the accumulation of material point contributions of internal forces, a short forloop iterates over the associated node (e.g. from 1 to meD.nNe
) of every material point to accumulate their respective contributions.
To calculate the temporary vector of internal forces (fi
at lines 15–17 in Fig. 5), the first step consists of the matrix multiplication of the straindisplacement matrix mpD.B
with the material point stress vector mpD.s
. The vectorized solution is given by (i) elementwise multiplications of mpD.B
with a replication of the transposed stress vector repmat(reshape(mpD.s,size(mpD.s,1),1,
mpD.n),1,meD.nDoF(1))
, whose result is then (ii) summed by means of the builtin function sum( )
along the columns and, finally, multiplied by a replicated transpose of the material point volume vector, e.g. repmat(mpD.V',meD.nDoF(1),1)
.
To illustrate the numerical efficiency of the vectorized multiplication between a matrix and a vector, we have developed an iterative and vectorized solution of B(x_{p})^{T}σ_{p} with an increasing n_{p} and considering single (4 bytes) and double (8 bytes) arithmetic precision. The wallclock time increases with n_{p} with a sharp transition for the vectorized solution around n_{p}≈1000, as shown in Fig. 6a. The mathematical operation requires more memory than available in the L2 cache (1024 kB under the CPU architecture used), which inhibits cache reuse (Dabrowski et al., 2008). A peak performance of at least 1000 Mflops (million floating point operations per second), shown in Fig. 6b, is achieved when n_{p}=1327 or n_{p}=2654 for simple or double arithmetic precision respectively, i.e. it corresponds exactly to 1024 kB for both precisions. Beyond this value, the performance drops dramatically to approximately half of the peak value. This drop is even more severe for a double arithmetic precision.
3.3.3 Update of material point properties
Finally, we propose a vectorization of the function mapN2p.m
that (i) interpolates updated nodal solutions to the material points (velocities and coordinates) and (ii) the doublemapping (DM or MUSL) procedure (see Fern et al., 2019).
The material point velocity v_{p} is defined as an interpolation of the solution of the updated nodal accelerations, which is given by
The material point updated momentum is found by ${\mathit{p}}_{p}^{t+\mathrm{\Delta}t}={m}_{p}{\mathit{v}}_{p}^{t+\mathrm{\Delta}t}$. The doublemapping procedure of the nodal velocity v_{n} consists of the remapping of the updated material point momentum on the mesh, divided by the nodal mass, given as
and for which boundary conditions are enforced. Finally, the material point coordinates are updated based on the following:
To solve for the interpolation of updated nodal solutions to the material points, we rely on a combination of elementwise matrix multiplication between the array of basis functions mpD.S
with the global vectors through a transform of the p2N
array, i.e. iDx=meD.DoF*p2N1
and iDy=iDx+1
(lines 3–4 in code fragment 3 in Fig. 7), which are used to access the x and y components of global vectors.
When accessing global nodal vectors by means of iDx
and iDy
, the resulting arrays are naturally of the same size as p2N
and are, therefore, dimensioncompatible with mpD.S
. For instance, a summation along the columns (e.g. the associated nodes of material points) of an elementwise multiplication of mpD.S
with meD.a(iDx)
results in an interpolation of the x component of the global acceleration vector to the material points.
This procedure is used for the velocity update (line 6 in Fig. 7) and for the material point coordinate update (line 22 in Fig. 7). A remapping of the nodal momentum is carried out (lines 11 to 14 in Fig. 7), which allows for the calculation of the updated nodal incremental displacements (line 15 in Fig. 7). Finally, boundary conditions of nodal incremental displacements are enforced (lines 19–20 in Fig. 7).
3.4 Initial settings and adaptive time step
Regarding the initial setting of the background mesh of the demonstration cases presented in the following, we select a uniform mesh and a regular distribution of material points within the initially populated elements of the mesh. Each element is evenly filled with four material points, e.g. n_{pe}=2^{2}, unless otherwise stated.
In this contribution, Dirichlet boundary conditions are resolved directly on the background mesh, as in the standard finite element method. This implies that boundary conditions are resolved only in contiguous regions between the mesh and the material points. Deviating from this contiguity or having the mesh not aligned with the coordinate system requires specific treatments for boundary conditions (Cortis et al., 2018). Furthermore, we ignore the external tractions as their implementation is complex.
As explicit time integration is only conditionally stable, any explicit formulation requires a small time step Δt to ensure numerical stability (Ni and Zhang, 2020), e.g. smaller than a critical value defined by the Courant–Friedrichs–Lewy (CFL) condition. Hence, we employ an adaptive time step (de Vaucorbeil et al., 2020), which considers the velocity of the material points. The first step is to compute the maximum wave speed of the material using (Zhang et al., 2016; Anderson Jr., 1987)
where the wave speed is $V=\left(\right(K+\mathrm{4}G/\mathrm{3})/\mathit{\rho}{)}^{\frac{\mathrm{1}}{\mathrm{2}}}$, K and G are the bulk and shear moduli respectively, ρ is the material density, and (v_{x})_{p} and (v_{y})_{p} are the material point velocity components. Δt is then restricted by the CFL condition as follows:
where $\mathit{\alpha}\in [\mathrm{0};\mathrm{1}]$ is the time step multiplier, and h_{x} and h_{y} are the mesh spacings.
In this section, we first demonstrate our MATLABbased MPM solver to be efficient at reproducing results from other studies, i.e. the compaction of an elastic column (Coombs et al., 2020; e.g. quasistatic analysis), the cantilever beam problem (Sadeghirad et al., 2011; e.g. large elastic deformation) and an application to landslide dynamics Huang et al., 2015; e.g. elastoplastic behaviour). Then, we present both the efficiency and the numerical performance for a selected case, e.g. the elastoplastic collapse. We offer conclusions on and compare the performance of the solver with respect to the specific case of an impact of two elastic discs previously implemented in a Julia language environment by Sinaie et al. (2017).
Regarding the performance analysis, we investigate the performance gain of the vectorized solver considering a double arithmetic precision with respect to the total number of material points for the following reasons: (i) the mesh resolution, i.e. the total number of elements n_{el}, influences the wallclock time of the solver by reducing the time step due to the CFL condition, thereby increasing the total number of iterations. In addition, (ii) the total number of material points n_{p} increases the number of operations per cycle due to an increase in the size of matrices, i.e. the size of the straindisplacement matrix depends on n_{p} and not on n_{el}. Hence, n_{p} consistently influences the performance of the solver, whereas n_{el} determines the wallclock time of the solver. The performance of the solver is addressed through both the number of floating point operations per second (flops) and via the average number of iteration per second (iterations s^{−1}). The number of floating point operations per second was manually estimated for each function of the solver.
4.1 Validation of the solver and numerical efficiency
4.1.1 Convergence: the elastic compaction of a column under its own weight
Following the convergence analysis proposed by Coombs and Augarde (2020), Wang et al. (2019) and Charlton et al. (2017), we analyse an elastic column of an initial height l_{0}=10 m subjected to an external load (e.g. the gravity). We selected the cpGIMPM variant with a domain update based on the diagonal components of the deformation gradient. Coombs et al. (2020) showed that such a domain update is well suited for hydrostatic compression problems. We also selected the CPDI2q variant as a reference, because of its superior convergence accuracy for such problems compared with GIMPM (Coombs et al., 2020).
The initial geometry is shown in Fig. 8. The background mesh is made of bilinear fournoded quadrilaterals, and roller boundary conditions are applied on the base and the sides of the column, initially populated by four material points per element. The column is 1 element wide and n elements tall, and the number of elements in the vertical direction is increased from 1 to a maximum of 1280 elements. The time step is adaptive, and we selected a time step multiplier of α=0.5, e.g. minimal and maximal time step values of $\mathrm{\Delta}{t}_{\text{min}}=\mathrm{3.1}\times {\mathrm{10}}^{\mathrm{4}}$ s and $\mathrm{\Delta}{t}_{\text{max}}=\mathrm{3.8}\times {\mathrm{10}}^{\mathrm{4}}$ s respectively for the finest mesh of 1280 elements.
To consistently apply the external load for the explicit solver, we follow the recommendation of Bardenhagen and Kober (2004), i.e. a quasistatic solution (given that an explicit integration scheme is chosen) is obtained if the total simulation time is equal to 40 elastic wave transit times. The material has a Young's modulus $E=\mathrm{1}\times {\mathrm{10}}^{\mathrm{4}}$ Pa and a Poisson's ratio ν=0 with a density ρ=80 kg m^{−3}. The gravity g is increased from zero to its final value, i.e. g=9.81 m s^{−2}. We performed additional implicit quasistatic simulations (named iCPDI2q) in order to consistently discuss the results with respect to what was reported in Coombs and Augarde (2020). The external force is consistently applied over 50 equal load steps. The vertical normal stress is given by the analytical solution (Coombs and Augarde, 2020) ${\mathit{\sigma}}_{yy}\left({y}_{\mathrm{0}}\right)=\mathit{\rho}g({l}_{\mathrm{0}}{y}_{\mathrm{0}})$, where l_{0} is the initial height of the column, and y_{0} is the initial position of a point within the column.
The error between the analytical and numerical solutions is as follows:
where (σ_{yy})_{p} is the stress along the y axis of a material point p (Fig. 8) of an initial volume (V_{0})_{p}, and V_{0} is the initial volume of the column, i.e. ${V}_{\mathrm{0}}={\sum}_{p=\mathrm{1}}^{{n}_{p}}({V}_{\mathrm{0}}{)}_{p}$.
The convergence toward a quasistatic solution is shown in Fig. 9a. It is quadratic for both cpGIMPM and CPDI2q; however, contrary to Coombs et al. (2020); Coombs and Augarde (2020), who reported a full convergence, it stops at $\text{error}\approx \mathrm{2}\times {\mathrm{10}}^{\mathrm{6}}$ for the explicit implementation. This has already been outlined by Bardenhagen and Kober (2004) as a saturation of the error caused by resolving the dynamic stress wave propagation, which is inherent to any explicit scheme. Hence, a static solution could never be achieved, because unlike quasistatic implicit methods, the elastic waves propagate indefinitely and the static equilibrium is never resolved. This is consistent when compared to the iCPDI2q solution we implemented, whose behaviour is still converging below the limit $\text{error}\approx \mathrm{2}\times {\mathrm{10}}^{\mathrm{6}}$ reached by the explicit solver. However, the convergence rate of the implicit algorithm decreases as the mesh resolution increases. We did not investigate this as our focus is on the explicit implementation. The vertical stresses of material points are in good agreement with the analytical solution (see Fig. 9b). Some oscillations are observed for a coarse mesh resolution, but these rapidly decrease as the mesh resolution increases.
We finally report the wallclock time for the cpGIMPM (iterative), cpGIMPM (vectorized) and the CPDI2q (vectorized) variants. As claimed by Sadeghirad et al. (2013, 2011), the CPDI2q variant induces no significant computational cost compared to the cpGIMPM variant. However, the absolute value between vectorized and iterative implementations is significant. For n_{p}=2560, the vectorized solution completed in 1161 s, whereas the iterative solution completed in 52 856 s. The vectorized implementation is roughly 50 times faster than the iterative implementation.
4.1.2 Large deformation: the elastic cantilever beam problem
The cantilever beam problem (Sinaie et al., 2017; Sadeghirad et al., 2011) is the second benchmark that demonstrates the robustness of the MPM solver. Two MPM variants are implemented, namely (i) the contiguous GIMPM (cpGIMPM), which relies on the stretching part of the deformation gradient (see Charlton et al., 2017) to update the particle domain as large rotations are expected during the deformation of the beam, and (ii) the convected particle domain interpolation (CPDI, Leavy et al., 2019; Sadeghirad et al., 2011). We selected the CPDI variant as it is more suitable to large torsional deformation modes (Coombs et al., 2020) than the CPDI2q variant. Two constitutive elastic models are selected, i.e. neoHookean (Guilkey and Weiss, 2003) or linear elastic (York et al., 1999) solids. For consistency, we use the same physical quantities as in Sadeghirad et al. (2011), i.e. an elastic modulus E=10^{6} Pa, a Poisson's ratio ν=0.3, a density ρ=1050 kg/m^{3}, the gravity g=10.0 m/s and a realtime simulation t=3 s with no damping forces introduced.
The beam geometry is depicted in Fig. 11 and is discretized by 64 fournoded quadrilaterals, each of them initially populated by nine material points (e.g. n_{p}=576) with a adaptive time step determined by the CFL condition, i.e. the time step multiplier is α=0.1, which yields minimal and maximal time step values of $\mathrm{\Delta}{t}_{\text{min}}=\mathrm{5.7}\times {\mathrm{10}}^{\mathrm{4}}$ s and $\mathrm{\Delta}{t}_{\text{max}}=\mathrm{6.9}\times {\mathrm{10}}^{\mathrm{4}}$ s respectively. The large deformation is initiated by suddenly applying the gravity at the beginning of the simulation, i.e. t=0 s.
As indicated in Sadeghirad et al. (2011), the cpGIMPM simulation failed when using the diagonal components of the deformation gradient to update the material point domain, i.e. the domain vanishes under large rotations as stated in Coombs et al. (2020). However, as expected, the cpGIMPM simulation succeeded when using the diagonal terms of the stretching part of the deformation gradient, as proposed by Coombs et al. (2020) and Charlton et al. (2017). The numerical solutions, obtained by the latter cpGIMPM and CPDI, to the vertical deflection Δu of the material point at the bottom free end of the beam (e.g. the red cross in Fig. 11) are shown in Fig. 12. Some comparative results reported by Sadeghirad et al. (2011) are depicted by black markers (squares for the FEM solution and circles for the CPDI solution), whereas the results of the solver are depicted by lines.
The local minimal and the minimal and maximal values (in timing and magnitude) are in agreement with the FEM solution of Sadeghirad et al. (2011). The elastic response is in agreement with the CPDI results reported by Sadeghirad et al. (2011), but it differs in timing with respect to the FEM solution. This confirms our numerical implementation of CPDI when compared to the one proposed by Sadeghirad et al. (2011). In addition, the elastic response does not substantially differ from a linear elastic solid to a neoHookean one. It demonstrates the incremental implementation of the MPM solver to be relevant in capturing large elastic deformations for the cantilever beam problem.
Figure 13 shows the finite deformation of the material point domain (panel a or c) and the vertical Cauchy stress field (panel b or d) for CPDI and cpGIMPM. The stress oscillations due to the cellcrossing error are partially cured when using a domainbased variant compared with the standard MPM. However, spurious vertical stresses are more developed in Fig. 13d than in Fig. 13b, where the vertical stress field appears even smoother. Both CPDI and cpGIMPM give a decent representation of the actual geometry of the deformed beam.
We also report quite a significant difference in execution time between the CPDI variant compared with the CPDI2q and cpGIMPM variants: CPDI executes in an average of 280.54 iteration per second, whereas both CPDI2q and cpGIMPM execute in an average of 301.42 iterations s^{−1} and an average of 299.33 iterations s^{−1} respectively.
4.1.3 Application: the elastoplastic slumping dynamics
We present an application of the MPM solver (vectorized and iterative version) to the case of landslide mechanics. We selected the domainbased CDPI variant as it performs better than the CPDI2q variant in modelling torsional and stretching deformation modes (Wang et al., 2019) coupled to an elastoplastic constitutive model based on a nonassociated Mohr–Coulomb (MC) plasticity (Simpson, 2017). We (i) analyse the geometrical features of the slump and (ii) compare the results (the geometry and the failure surface) to the numerical simulation of Huang et al. (2015), which is based on a Drucker–Prager model with tension cutoff (DP).
The geometry of the problem is shown in Fig. 14; the soil material is discretized by 110×35 elements with n_{pe}=9, resulting in n_{p}=21 840 material points. A uniform mesh spacing ${h}_{x,y}=\mathrm{1}$ m is used, and rollers are imposed at the left and right domain limits, while a noslip condition is enforced at the base of the material. We closely follow the numerical procedure proposed in Huang et al. (2015), i.e. no local damping is introduced in the equation of motion and the gravity is suddenly applied at the beginning of the simulation. As in Huang et al. (2015), we consider an elastoplastic cohesive material of density ρ=2100 kg m^{3}, with an elastic modulus E=70 MPa and a Poisson's ratio ν=0.3. The cohesion is c=10 Pa, and the internal friction angle is ϕ=20^{∘} with no dilatancy, i.e. the dilatancy angle is ψ=0. The total simulation time is 7.22 s, and we select a time step multiplier α=0.5. The adaptive time steps (considering the elastic properties and the mesh spacings ${h}_{x,y}=\mathrm{1}$ m) yield minimal and maximal values of $\mathrm{\Delta}{t}_{\text{min}}=\mathrm{2.3}\times {\mathrm{10}}^{\mathrm{3}}$ s and $\mathrm{\Delta}{t}_{\text{max}}=\mathrm{2.4}\times {\mathrm{10}}^{\mathrm{3}}$ s respectively.
The numerical solution to the elastoplastic problem is shown in Fig. 15. An intense shear zone, highlighted by the second invariant of the accumulated plastic strain ϵ_{II}, develops at the toe of the slope as soon as the material yields and propagates backwards to the top of the material. It results in a rotational slump. The failure surface is in good agreement with the solution reported by Huang et al. (2015) (continuous and discontinuous red lines in Fig. 15), but we also observe differences, i.e. the crest of the slope is lower compared with the original work of Huang et al. (2015). This may be explained by the problem of spurious material separation when using sMPM or GIMPM (Sadeghirad et al., 2011), with the latter being overcome with the CPDI variant, i.e. the crest of the slope experiences considerable stretching deformation modes. Despite some differences, our numerical results appear coherent with those reported by Huang et al. (2015).
The vectorized and iterative solutions are resolved within approximately 630 s (a wallclock time of ≈ 10 min and an average of 4.20 iterations s^{−1}) and 14 868 s (a wallclock time of ≈ 1 h and an average of 0.21 iterations s^{−1}) respectively. This corresponds to a performance gain of 23.6. The performance gain is significant between an iterative and a vectorized solver for this problem.
4.2 Computational performance
4.2.1 Iterative and vectorized elastoplastic collapses
We evaluate the computational performance of the solver, using the MATLAB version R2018a on an Intel Core i74790, with a benchmark based on the elastoplastic collapse of the aluminiumbar assemblage, for which numerical and experimental results were initially reported by Bui et al. (2008) and Huang et al. (2015) respectively.
We vary the number of elements of the background mesh, which results in a variety of different regular mesh spacings h_{x,y}. The number of elements along the x and y directions are ${n}_{\mathrm{el},x}=[\mathrm{10},\mathrm{20},\mathrm{40},\mathrm{80},\mathrm{160},\mathrm{320},\mathrm{640}]$ and ${n}_{\mathrm{el},y}=[\mathrm{1},\mathrm{2},\mathrm{5},\mathrm{11},\mathrm{23},\mathrm{47},\mathrm{95}]$ respectively. The number of material points per element is kept constant, i.e. n_{pe}=4, and this yields a total number of material points ${n}_{p}=[\mathrm{10},\mathrm{50},\mathrm{200},\mathrm{800},\mathrm{3200},\mathrm{12}\phantom{\rule{0.125em}{0ex}}\mathrm{800},\mathrm{51}\phantom{\rule{0.125em}{0ex}}\mathrm{200}]$. The initial geometry and boundary conditions used for this problem are depicted in Fig. 16. The total simulation time is 1.0 s, and the time step multiplier is α=0.5. According to Huang et al. (2015), the gravity g=9.81 m s^{−2} is applied to the assemblage, and no damping is introduced. We consider a noncohesive granular material (Huang et al., 2015) of density ρ=2650 kg m^{3}, with a bulk modulus K=0.7 MPa and a Poisson's ratio ν=0.3. The cohesion is c=0 Pa, the internal friction angle is ϕ=19.8^{∘} and there is no dilatancy, i.e. ψ=0.
We conducted preliminary investigations using either uGIMPM or cpGIMPM variants – the latter with a domain update based either on the determinant of the deformation gradient or on the diagonal components of the stretching part of the deformation gradient. We concluded that the uGIMPM was the most reliable, even though its suitability is restricted to both simple shear and pure rotation deformation modes (Coombs et al., 2020).
We observe a good agreement between the numerical simulation and the experiments (see Fig. 17), considering either the final surface (blue square dotted line) or the failure surface (blue circle dotted line). The repose angle in the numerical simulation is approximately 13^{∘}, which is in agreement with the experimental data reported by Bui et al. (2008), who reported a final angle of 14^{∘}.
The vectorized and iterative solutions (for a total of n_{p}=12 800 material points) are resolved within approximately 1595 s (a wallclock time of ≈ 0.5 h and an average of 10.98 iterations s^{−1}) and 43 861 s (a wallclock time of ≈ 12 h and an average of 0.38 iterations s^{−1}) respectively. This corresponds to a performance gain of 28.24 for a vectorized code over an iterative code to solve this elastoplastic problem.
The performance of the solver is demonstrated in Fig. 18. A peak performance of ≈ 900 Mflops is reached as soon as n_{p} exceeds 1000 material points, and a residual performance of ≈ 600 Mflops is further resolved (for n_{p}≈50 000 material points). Every function provides an even and fair contribution on the overall performance, except the function constitutive.m
for which the performance appears delayed or shifted. First of all, this function treats the elastoplastic constitutive relation, in which the dimensions of the matrices are smaller when compared with the other functions. Hence, the number of floating point operations per second is lower compared with other functions, e.g. p2Nsolve.m
. This results in lower performance for an equivalent number of material points. It also requires a greater number of material points to increase the dimensions of the matrices in order to exceed the L2 cache maximum capacity.
These considerations provide a better understanding of the performance gain of the vectorized solver shown in Fig. 19: the gain increases, reaches a plateau and then ultimately decreases to a residual gain. This is directly related to the peak and the residual performance of the solver showed in Fig. 18.
4.2.2 Comparison between Julia and MATLAB
We compare the computational efficiency of the vectorized CPDI2q MATLAB implementation and the computational efficiency reported by Sinaie et al. (2017) of a Juliabased implementation of the collision of two elastic discs problem. However, we note a difference between the actual implementation and the one used by Sinaie et al. (2017); the latter is based on a USL variant with a cutoff algorithm, whereas the present implementation relies on the MUSL (or doublemapping) procedure, which necessitates a doublemapping procedure. The initial geometry and parameters are the same as those used in Sinaie et al. (2017). However, the time step is adaptive, and we select a time step multiplier α=0.5. Given the variety of mesh resolution, we do not present minimal and maximal time step values.
Our CPDI2q implementation, in MATLAB R2018a, is at least 2.8 times faster than the Julia implementation proposed by Sinaie et al. (2017) for similar hardware (see Table 1). Sinaie et al. (2017) completed the analysis with an Intel Core i76700 (four cores with a base frequency of 3.40 GHz up to a turbo frequency of 4.00 GHz) with 16 GB RAM, whereas we used an Intel Core i74790 with similar specifications (see Sect. 2). However, the performance ratio between MATLAB and Julia seems to decrease as the mesh resolution increases.
In this contribution, a fast and efficient explicit MPM solver is proposed that considers two variants (e.g. the uGIMPM/cpGIMPM and the CPDI/CPDI2q variants).
Regarding the compression of the elastic column, we report good agreement of the numerical solver with previous explicit MPM implementations, such as Bardenhagen and Kober (2004). The same flaw of an explicit scheme is also experienced by the solver, i.e. a saturation of the error due to the specific usage of an explicit scheme that resolves the wave propagation, thereby preventing any static equilibrium from being reached. This confirms that our implementation is consistent with previous MPM implementations. However, the implicit implementation suffers from a decrease in the convergence rate for a fine mesh resolution. Further work would be needed to investigate this decrease in the convergence rate. This case also demonstrated that cpGIMPM and CPDI variants have a similar computational cost, and this confirms the suitability of cpGIMPM with respect to CPDI, as previously mentioned by Coombs et al. (2020) and Charlton et al. (2017).
For the cantilever beam, we report good agreement of the solver with the results of Sadeghirad et al. (2011), i.e. we report the vertical deflection of the beam to be very close in both magnitude and timing (for the CPDI variant) to the FEM solution. However, we also report a slower execution time for the CPDI variant when compared with both the cpGIMPM and CPDI2q variants.
The elastoplastic slump also demonstrates the solver to be efficient at capturing complex dynamics in the field of geomechanics. The CDPI solution showed that the algorithm proposed by Simpson (2017) to return stresses when the material yields is well suited to the slumping dynamics. However, as mentioned by Simpson (2017), such return mapping is only valid under the assumption of a nonassociated plasticity with no volumetric plastic strain. This particular case of isochoric plastic deformations raises the issue of volumetric locking. In the actual implementation, no regularization techniques are considered. As a result, the pressure field experiences severe locking for isochoric plastic deformations. One way to overcome locking phenomena would be to implement the regularization technique initially proposed by Coombs et al. (2018) for quasistatic sMPM and GIMPM implementations.
Regarding the elastoplastic collapse, the numerical results demonstrate the solver to be in agreement with both previous experimental and numerical results (Huang et al., 2015; Bui et al., 2008). This confirms the ability of the solver to address elastoplastic problems. However, the choice of whether to update the material point domain or not remains critical. This question remains open and would require a more thorough investigation of the suitability of each of these domainupdating variants. Nevertheless, the uGIMPM variant is a good candidate as (i) it is able to reproduce the experimental results of Bui et al. (2008), and (ii) it ensures numerical stability. However, one must consider its limited range of suitability regarding the deformation modes involved. If a cpGIMPM is selected, the splitting algorithm proposed in Gracia et al. (2019) and Homel et al. (2016) could be implemented to mitigate the amount of distortion experienced by the material point domains during deformation. We did not selected the domainupdating method based on the corners of the domain as suggested in Coombs et al. (2020). This is because a domainupdating method necessitates the calculation of additional shape functions between the corners of the domain of the material point with their associated nodes. This results in an additional computational cost. Nevertheless, such a variant is of interest and should also be addressed when computational performance is not the main concern.
The computational performance comes from the combined use of the connectivity array p2N
with the builtin function accumarray( )
to (i) accumulate material point contributions to their associated nodes or (ii) to interpolate the updated nodal solutions to the associated material points. When a residual performance is resolved, an overall performance gain (e.g. the number of iterations per second) of 28 is reported. As an example, the functions p2nsolve.m
and mapN2p.m
are 24 and 22 times faster than an iterative algorithm when the residual performance is achieved. The overall performance gain is in agreement with other vectorized FEM codes, i.e. O’Sullivan et al. (2019) reported an overall gain of 25.7 for a optimized continuous Galerkin finite element code.
An iterative implementation would require multiple nested forloops and a larger number of operations on smaller matrices, which increase the number of BLAS calls, thereby inducing significant BLAS overheads and decreasing the overall performance of the solver. This is limited by a vectorized code structure. However, as shown by the matrix multiplication problem, the L2 cache reuse is the limiting factor, and it ultimately affects the peak performance of the solver due to these numerous RAMtocache communications for larger matrices. This problem is serious, and its influence is demonstrated by the delayed response in terms of performance for the function constitutive.m
.
However, we also have to mention that the overall residual performance was resolved only for a limited total number of material points. The performance drop of the function constitutive.m
has never been achieved. Consequently, we suspect an additional decrease in the overall performance of the solver for larger problems.
The overall performance achieved by the solver is higher than expected, and it is even higher than what was reported by Sinaie et al. (2017). We demonstrate that MATLAB is even more efficient than Julia, i.e. a minimum 2.86 performance gain achieved compared with a similar Julia CPDI2q implementation. This confirms the efficiency of MATLAB for solid mechanics problems, provided that a reasonable amount of time is spent on the vectorization of the algorithm.
We have demonstrated the capability of MATLAB as an efficient language with regard to a material point method (MPM) implementation in an explicit formulation when bottleneck operations (e.g. calculations of the shape function or material point contributions) are properly vectorized. The computational performance of MATLAB is even higher than the performance previously reported for a similar CPDI2q implementation in Julia, provided that builtin functions such as accumarray( )
are used. However, the numerical efficiency naturally decreases with the level of complexity of the chosen MPM variant (sMPM, GIMPM or CPDI/CPDI2q).
The vectorization activities that we performed provide a fast and efficient MATLABbased MPM solver. Such vectorized code could be transposed to a more efficient language, such as the CCUDA language, which is known to efficiently take advantage of vectorized operations.
As a final word, a future implementation of a poroelastoplastic mechanical solver could be applied to complex geomechanical problems such as landslide dynamics while benefiting from a faster numerical implementation in CCUDA, thereby resolving high threedimensional resolutions in a decent and affordable amount a time.
PIC  Particle in cell 
FLIP  Fluid implicit particle 
FEM  Finite element method 
sMPM  Standard material point method 
GIMPM  Generalized material point method 
uGIMPM  Undeformed generalized material point 
method  
cpGIMPM  Contiguous particle generalized material 
point method  
CPDI  Convected particle domain interpolation 
CPDI2q  Convected particle domain interpolation 
secondorder quadrilateral 
The fMPMMsolver developed in this study is licensed under the GPLv3 free software licence. The latest version of the code is available for download from Bitbucket at https://bitbucket.org/ewyser/fmpmmsolver/src/master/ (last access: 6 October 2020; Wyser et al., 2020b). The fMPMMsolver archive (v1.0 and v1.1) is available from a permanent DOI repository (Zenodo) at https://doi.org/10.5281/zenodo.4068585 (Wyser et al., 2020a). The fMPMMsolver software includes the reproducible codes used for this study.
EW wrote the original draft of the paper. EW and YP developed the first version the solver (fMPMMsolver, v1.0). YA provided technical support, assisted EW with the revision of the latest version of the solver (v1.1) and corrected specific parts of the solver. EW and YA wrote and revised the final version of the paper. MJ and YP supervised the early stages of the study and provided guidance. All authors reviewed and approved the final version of the paper.
The authors declare that they have no conflict of interest.
Yury Alkhimenkov gratefully acknowledges support from the Swiss National Science Foundation (grant no. 172691). Yury Alkhimenkov and Yury Y. Podladchikov gratefully acknowledge support from the Russian Ministry of Science and Higher Education (project no. 0751520191890). The authors gratefully thank Johan Gaume for his comments that contributed to improving the overall quality of the article.
This research has been supported by the Swiss National Science Foundation (grant no. 172691). This research has been supported by the Russian Ministry of Science and Higher Education (project no. 0751520191890).
This paper was edited by Alexander Robel and reviewed by two anonymous referees.
Abe, K., Soga, K., and Bandara, S.: Material point method for coupled hydromechanical problems, J. Geotechn. Geoenviron. Eng., 140, 04013033, https://doi.org/10.1061/(ASCE)GT.19435606.0001011, 2014. a
Acosta, J. L. G., Vardon, P. J., Remmerswaal, G., and Hicks, M. A.: An investigation of stress inaccuracies and proposed solution in the material point method, Comput. Mechan., 65, 555–581, 2020. a, b
Anderson Jr., C. E.: An overview of the theory of hydrocodes, Int. J. Impact Eng., 5, 33–59, 1987. a
Bandara, S. and Soga, K.: Coupling of soil deformation and pore fluid flow using material point method, Comput. Geotech., 63, 199–214, 2015. a
Bandara, S., Ferrari, A., and Laloui, L.: Modelling landslides in unsaturated slopes subjected to rainfall infiltration using material point method, Int. J. Num. Anal. Method. Geomechan., 40, 1358–1380, 2016. a, b
Bardenhagen, S., Brackbill, J., and Sulsky, D.: The materialpoint method for granular materials, Comput. Method. Appl. M., 187, 529–541, 2000. a
Bardenhagen, S. G. and Kober, E. M.: The generalized interpolation material point method, Comp. Model. Eng., 5, 477–496, 2004. a, b, c, d, e, f, g, h, i, j
Baumgarten, A. S. and Kamrin, K.: A general fluid–sediment mixture model and constitutive theory validated in many flow regimes, J. Fluid Mechan., 861, 721–764, 2019. a
Beuth, L., Benz, T., Vermeer, P. A., and Więckowski, Z.: Large deformation analysis using a quasistatic material point method, J. Theor. Appl. Mechan., 38, 45–60, 2008. a
Bird, R. E., Coombs, W. M., and Giani, S.: Fast nativeMATLAB stiffness assembly for SIPG linear elasticity, Comput. Mathe. Appl., 74, 3209–3230, 2017. a, b, c, d, e
Bui, H. H., Fukagawa, R., Sako, K., and Ohno, S.: Lagrangian meshfree particles method (SPH) for large deformation and failure flows of geomaterial using elastic–plastic soil constitutive model, Int. J. Num. Anal. Method. Geomechan., 32, 1537–1570, 2008. a, b, c, d, e
Charlton, T., Coombs, W., and Augarde, C.: iGIMP: An implicit generalised interpolation material point method for large deformations, Comput. Struct., 190, 108–125, 2017. a, b, c, d, e, f, g
Coombs, W. M. and Augarde, C. E.: AMPLE: A Material Point Learning Environment, Adv. Eng. Softw., 139, 102748, https://doi.org/10.1016/j.advengsoft.2019.102748, 2020. a, b, c, d, e
Coombs, W. M., Charlton, T. J., Cortis, M., and Augarde, C. E.: Overcoming volumetric locking in material point methods, Comput. Method. Appl. Mechan., 333, 1–21, 2018. a, b
Coombs, W. M., Augarde, C. E., Brennan, A. J., Brown, M. J., Charlton, T. J., Knappett, J. A., Motlagh, Y. G., and Wang, L.: On Lagrangian mechanics and the implicit material point method for large deformation elastoplasticity, Comput. Method. Appl. Mechan., 358, 112622, https://doi.org/10.1016/j.cma.2019.112622, 2020. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p
Cortis, M., Coombs, W., Augarde, C., Brown, M., Brennan, A., and Robinson, S.: Imposition of essential boundary conditions in the material point method, Int. J. Num. Method., 113, 130–152, 2018. a
Dabrowski, M., Krotkiewski, M., and Schmid, D.: MILAMIN: MATLABbased finite element method solver for large problems, Geochem. Geophys. Geosyst., 9, 4, https://doi.org/10.1029/2007GC001719, 2008. a, b, c, d, e, f, g
Davis, T. A.: Suite Sparse, available at: https://people.engr.tamu.edu/davis/research.html (last access: 6 October 2020), 2013. a
de Koster, P., Tielen, R., Wobbes, E., and Möller, M.: Extension of Bspline Material Point Method for unstructured triangular grids using Powell–Sabin splines, Comput. Part. Mechan., 1–16, https://doi.org/10.1007/s40571020003283, 2020. a
de Souza Neto, E. A., Peric, D., and Owen, D. R.: Computational methods for plasticity: theory and applications, John Wiley & Sons, John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom, 2011. a
de Vaucorbeil, A., Nguyen, V., and Hutchinson, C.: A TotalLagrangian Material Point Method for solid mechanics problems involving large deformations, Computer Methods in Applied Mechanics and Engineering, 360, https://doi.org/10.1016/j.cma.2019.112783, 2020. a
Dunatunga, S. and Kamrin, K.: Continuum modelling and simulation of granular flows through their many phases, J. Fluid Mechan., 779, 483–513, 2015. a
Dunatunga, S. and Kamrin, K.: Continuum modeling of projectile impact and penetration in dry granular media, J. Mechan. Phys. Solids, 100, 45–60, 2017. a, b, c
Fern, J., Rohe, A., Soga, K., and Alonso, E.: The Material Point Method for Geotechnical Engineering. Boca Raton: CRC Press, https://doi.org/10.1201/9780429028090, 2019. a
Gan, Y., Sun, Z., Chen, Z., Zhang, X., and Liu, Y.: Enhancement of the material point method using Bspline basis functions, Int. J. Num. Method., 113, 411–431, 2018. a
Gaume, J., Gast, T., Teran, J., van Herwijnen, A., and Jiang, C.: Dynamic anticrack propagation in snow, Nat. Commun., 9, 1–10, 2018. a, b, c
Gaume, J., van Herwijnen, A., Gast, T., Teran, J., and Jiang, C.: Investigating the release and flow of snow avalanches at the slopescale using a unified model based on the material point method, Cold Reg. Sci. Technol., 168, 102847, https://doi.org/10.1016/j.coldregions.2019.102847, 2019. a
Gracia, F., Villard, P., and Richefeu, V.: Comparison of two numerical approaches (DEM and MPM) applied to unsteady flow, Comput. Part. Mechan., 6, 591–609, 2019. a
Guilkey, J. E. and Weiss, J. A.: Implicit time integration for the material point method: Quantitative and algorithmic comparisons with the finite element method, Int. J. Num. Method., 57, 1323–1338, 2003. a, b, c
Homel, M. A., Brannon, R. M., and Guilkey, J.: Controlling the onset of numerical fracture in parallelized implementations of the material point method (MPM) with convective particle domain interpolation (CPDI) domain scaling, Int. J. Num. Method., 107, 31–48, 2016. a
Huang, P., Li, S.l., Guo, H., and Hao, Z.m.: Large deformation failure analysis of the soil slope based on the material point method, Comput. Geosci., 19, 951–963, 2015. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p
Iaconeta, I., Larese, A., Rossi, R., and Guo, Z.: Comparison of a material point method and a galerkin meshfree method for the simulation of cohesivefrictional materials, Materials, 10, 1150, 2017. a
Leavy, R., Guilkey, J., Phung, B., Spear, A., and Brannon, R.: A convectedparticle tetrahedron interpolation technique in the materialpoint method for the mesoscale modeling of ceramics, Comput. Mechan., 64, 563–583, 2019. a
Moler, C.: MATLAB Incorporates LAPACK, available at: https://ch.mathworks.com/de/company/newsletters/articles/matlabincorporateslapack.html?refresh=true (last access: 6 October 2020), 2000. a
Nairn, J. A.: Material point method calculations with explicit cracks, Comput. Model. Eng. Sci., 4, 649–664, 2003. a, b
Ni, R. and Zhang, X.: A precise critical time step formula for the explicit material point method, Int. J. Num. Method., 121, 4989–5016, 2020. a
O’Sullivan, S., Bird, R. E., Coombs, W. M., and Giani, S.: Rapid nonlinear finite element analysis of continuous and discontinuous galerkin methods in matlab, Comput. Mathe. Appl., 78, 3007–3026, 2019. a, b, c
Sadeghirad, A., Brannon, R. M., and Burghardt, J.: A convected particle domain interpolation technique to extend applicability of the material point method for problems involving massive deformations, Int. J. Num. Method., 86, 1435–1456, 2011. a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r
Sadeghirad, A., Brannon, R., and Guilkey, J.: Secondorder convected particle domain interpolation (CPDI2) with enrichment for weak discontinuities at material interfaces, Int. J. Num. Method., 95, 928–952, 2013. a, b, c, d, e, f
Simpson, G.: Practical finite element modeling in earth science using matlab, Wiley Online Library, 2017. a, b, c, d, e
Sinaie, S., Nguyen, V. P., Nguyen, C. T., and Bordas, S.: Programming the material point method in Julia, Adv. Eng. Softw., 105, 17–29, 2017. a, b, c, d, e, f, g, h, i, j
Steffen, M., Kirby, R. M., and Berzins, M.: Analysis and reduction of quadrature errors in the material point method (MPM), Int. J. Num. Method., 76, 922–948, 2008a. a, b
Steffen, M., Wallstedt, P., Guilkey, J., Kirby, R., and Berzins, M.: Examination and analysis of implementation choices within the material point method (MPM), Comput. Model. Eng. Sci., 31, 107–127, 2008b. a
Stomakhin, A., Schroeder, C., Chai, L., Teran, J., and Selle, A.: A material point method for snow simulation, ACM Transactions on Graphics (TOG), 32, 1–10, 2013. a
Sulsky, D., Chen, Z., and Schreyer, H. L.: A particle method for historydependent materials, Comput. Method. Appl. Mechan. Eng., 118, 179–196, 1994. a, b
Sulsky, D., Zhou, S.J., and Schreyer, H. L.: Application of a particleincell method to solid mechanics, Comput. Phys. Commun., 87, 236–252, 1995. a
Vardon, P. J., Wang, B., and Hicks, M. A.: Slope failure simulations with MPM, J. Hydrodynam., 29, 445–451, 2017. a
Vermeer, P. A. and De Borst, R.: Nonassociated plasticity for soils, concrete and rock, HERON, 29, 1984, 163–196, 1984. a
Wallstedt, P. C. and Guilkey, J.: An evaluation of explicit time integration schemes for use with the generalized interpolation material point method, J. Computat. Phys., 227, 9628–9642, 2008. a
Wang, B., Hicks, M., and Vardon, P.: Slope failure analysis using the random material point method, Géotech. Lett. 6, 113–118, 2016a. a
Wang, B., Vardon, P., and Hicks, M.: Investigation of retrogressive and progressive slope failure mechanisms using the material point method, Comput. Geotech., 78, 88–98, 2016b. a
Wang, B., Vardon, P. J., Hicks, M. A., and Chen, Z.: Development of an implicit material point method for geotechnical applications, Comput. Geotech., 71, 159–167, 2016c. a, b
Wang, L., Coombs, W. M., Augarde, C., Cortis, E. M., Charlton, T. J., Brown, M. J., Knappett, J., Brennan, A., Davidson, C., Richards, and Blake, D. A.: On the use of domainbased material point methods for problems involving large distortion, Comput. Method. Appl. Mechan. Eng., 355, 1003–1025, 2019. a, b, c, d
Więckowski, Z.: The material point method in large strain engineering problems, Comput. Method. Appl. Mechan. Eng., 193, 4417–4438, 2004. a
Wyser, E., Alkhimenkov, Y., Jayboyedoff, M., and Podladchikov, Y.: fMPMMsolver, Zenodo, https://doi.org/10.5281/zenodo.4068585, 2020a. a
Wyser, E., Alkhimenkov, Y., Jayboyedoff, M., and Podladchikov, Y.: fMPMM, available at: https://bitbucket.org/ewyser/fmpmmsolver/src/master/, last access: 6 October 2020. a
York, A. R., Sulsky, D., and Schreyer, H. L.: The material point method for simulation of thin membranes, Int. J. Num. Method., 44, 1429–1456, 1999. a
Zhang, X., Chen, Z., and Liu, Y.: The material point method: a continuumbased particle method for extreme loading cases, Academic Press, ©2017 Tsinghua University Press Limited, Elsevier Inc., 2016. a, b
 Abstract
 Introduction
 Overview of the material point method (MPM)
 MATLABbased MPM implementation
 Results
 Discussion
 Conclusions
 Appendix A: Acronyms used throughout the paper
 Appendix B: fMPMMsolver variables
 Code availability
 Author contributions
 Competing interests
 Acknowledgements
 Financial support
 Review statement
 References
 Abstract
 Introduction
 Overview of the material point method (MPM)
 MATLABbased MPM implementation
 Results
 Discussion
 Conclusions
 Appendix A: Acronyms used throughout the paper
 Appendix B: fMPMMsolver variables
 Code availability
 Author contributions
 Competing interests
 Acknowledgements
 Financial support
 Review statement
 References