Fast and efficient MATLAB-based MPM solver (fMPMM-solver v1.0)

In this contribution, we present an efficient MATLAB-based implementation of the material point method (MPM) and its most recent variants. MPM has gained popularity over the last decade, especially for problems in solid mechanics in which large deformations are involved, i.e., cantilever beam problems, granular collapses and even large-scale snow avalanches. Although its numerical accuracy is lower than that of the widely accepted finite element method (FEM), MPM has been proven useful in overcoming 5 some of the limitations of FEM, such as excessive mesh distortions. We demonstrate that MATLAB is an efficient high-level language for MPM implementations that solve elasto-dynamic and elasto-plastic problems, such as the cantilever beam and granular collapses, respectively. We report a computational efficiency factor of 20 for a vectorized code compared to a classical iterative version. In addition, the numerical efficiency of the solver surpassed those of previously reported MPM implementations in Julia, ad minima 2.5 times faster under a similar computational 10 architecture.

A. Points to Nodes Projection t t + ∆t C. Nodes to Points Interpolation B. Nodal solution Figure 1. Typical calculation cycle of an MPM solver for an homogeneous velocity field, inspired by Dunatunga and Kamrin (2017). The continuum (orange) is discretized into a set of Lagrangian material points (red dots), at which state variables or properties (e.g., mass, stress, and velocity) are defined. The latter are mapped to an Eulerian finite element mesh made of nodes (blue square).
Since the 1990's, several variants were introduced to resolve a number of numerical issues. The generalized interpolation material point method (GIMPM) was first presented by Bardenhagen and Kober (2004) and proposed a generalization of the basis and gradient functions that were convoluted with a characteristic domain function of the material point. A major flaw in sMPM is the lack of continuity of the gradient basis function, resulting in spurious oscillations of internal forces as soon as a material point crosses an element boundary while entering into its neighbour. This is referred to as cell-crossing instabilities 60 due to the C 0 continuity of the gradient basis functions, and it is minimized by the GIMPM variant. This variant allows the material point's domain to deform (cpGIMPM) or not (uGIMPM) according to its deformation gradient tensor (see Coombs et al. (2020)). Hence, GIMPM is categorized as a domain-based material point method, unlike the later development of the B-spline material point method (BSMPM, see Gaume et al. (2018); Stomakhin et al. (2013); Steffen et al. (2008b)) which cures cell-crossing instabilities using B-spline functions as basis functions. Whereas in sMPM only nodes belonging to an Nodes associated with the material point are denoted by filled blue squares, and the element number appears in green in the center of the element. The connectivity array between the material point and the element is p2e, the array between the element and its associated nodes is e2N and the array between the material point and its associated nodes is p2N.
condition to ensure numerical stability. The domain update is based on the diagonal components of the stretch tensor of the material point (Charlton et al. (2017)), which ensures optimal convergence properties according to Coombs et al. (2020).
Additionally, we implemented a CPDI/CPDI2q version (in an explicit and quasi-static implicit format) of the solver. However, in this paper, we do not present the theoretical background of the CPDI variant nor the implicit implementation of an MPM-based solver, and therefore, interested readers are referred to the original contributions of Sadeghirad et al. (2013Sadeghirad et al. ( , 2011 and Charlton et al. (2017); Iaconeta et al. (2017); Beuth et al. (2008); Guilkey and Wiess (2003), respectively.

Structure of the MPM solver
The solver procedure is shown in Fig. 3. A typical MPM calculation cycle consists of the three following steps Wang et al. In the main.m script, both functions meSetup.m and mpSetup.m, respectively, define the geometry and related quantities, particularly the nodal connectivity array, i.e., the e2N array (Simpson (2017)). The latter stores the nodes associated with a given element. As such, a material point p located in an element e can immediately identify which nodes N it is associated with. After initialization, a while loop solves the elasto-dynamic (or elasto-plastic) problem until a time criterion T is reached.

95
This time criterion could be restricted to the time needed for the system to reach an equilibrium, or if the global kinetic energy of the system has reached a threshold.
At the beginning of each cycle, a connectivity array p2e between the material points and their respective element (a material point can only reside in a single element) is created. Since i) the nodes associated with the elements and ii) the elements enclosing the material points are known, it is possible to obtain the connectivity array p2N between the material points and 100 their associated nodes, e.g., p2N=e2N(p2e,:) in a MATLAB syntax (see Fig. 2 for an example of these connectivity arrays).
This array is of dimension (n p , n N e ), with n p the total number of material points, n N e the total number of nodes associated with an element (16 in two-dimensional problems) and n i,j the node number where i corresponds to the material point and j corresponds to its j-th associated nodes, which results in the following: n 1,2 · · · n 1,n N e n 2,1 n 2,2 · · · n 2,n N e . . . . . . . . . . . .
The following functions are called successively during one cycle: 1. SdS.m calculates the basis functions and the gradient of the basis functions and assembles the strain-displacement matrix.
2. p2Nsolve.m projects the quantities of the material point (e.g., mass and momentum) to the associated nodes, solves the equations of motion and sets Dirichlet boundary conditions. 3. mapN2p.m interpolates nodal solutions (acceleration and velocity) to the material points with a double mapping procedure (see Zhang et al. (2017) or Nairn (2003) for a clear discussion of USF, USL and MUSL algorithms).

4.
DefUpdate.m updates incremental strains and deformation-related quantities (e.g., the volume of the material point or the domain half-length) at the level of the material point based on the remapping of the updated material point momentum.
115 5. constitutive.m calls two functions to solve for the constitutive elasto-plastic relation, i.e., (a) elastic.m, which predicts an incremental objective stress-rate (the Jaumann stress-rate is selected for its ease of implementation and its acknowledged accuracy) considering a purely elastic step, further corrected by (b) plastic.m, which corrects the trial stress by a plastic correction if the material has yielded.
When a time criterion is met, the calculation cycle stops and further post-processing tasks (visualization, data exportation) can 120 be performed.
The numerical simulations are conducted using MATLAB © R2018a within a Windows 7 64-bit environment on an Intel Core i7-4790 (4th generation CPU with 4 physical cores of base frequency at 3.60 GHz up to a maximum turbo frequency of 4.00 GHz) with 8 MB cache and 16 GB DDR3 RAM (clock speed 800 MHz).
function N I (x p ) of the mesh, which results in: with l p the length of the material point domain, h the mesh spacing, x = x p − x I where x p is the coordinate of a material point and x I the coordinate of its associated node I. The basis function of a node I with its material point p is constructed for a two-dimensional model, as follows: for which the derivative is defined as: Within the GIMPM variant, uGIMPM (undeformed GIMPM) and cpGIMPM (contiguous particle GIMPM) can be chosen to update the material point domain length l 0 p , i.e., l p = l 0 p for the undeformed GIMPM, l i,p = det(F jk )l 0 i,p or l i,p = F ii l 0 i,p where F ii is the diagonal component of the deformation gradient for the cpGIMPM. However, Charlton et al. (2017) recommend using the diagonal components U ii = (F kj F ki ) 0.5 of the stretch part of the deformation gradient instead of the deformation 140 gradient. Hence, the variant we use relies either on the determinant or on the stretch part of the deformation gradient.
Similar to the FEM, the strain-displacement matrix B consists of the derivatives of the basis function and is assigned to each material point, which results in the following: where n N e is the total number of associated nodes to an element e, in which a material point p resides.  Coordinates of the material points mpD.x(:,1:2) are first replicated and then subtracted by their associated nodes co-150 ordinates, e.g., meD.x(p2N) or meD.y(p2N) (Lines 3 or 4 in 4). This yields the array D with the same dimension of p2N. This array of distance between the points and their associated nodes is sent as an input to the local function [N,dN] = NdN(D,h,lp), which computes 1D basis functions and derivatives through matrix piecewise operations (operator . * ) (either in Line 4 for x coordinates or Line 6 for y coordinates in Fig. 4). Note that the use of one single array D avoids a redundant usage of memory.

155
Given the piecewise Eq. 2, three logical arrays (c1, c2 and c3) are defined (Lines 28-30 in Fig. 4), whose elements are either 1 (the condition is true) or 0 (the condition is false). Three arrays of basis functions are calculated (N1, N2 and N3, Lines 32-34) according to Eq. 3. The array of basis functions N is obtained through a summation of the elementwise multiplications of these temporary arrays with their corresponding logical arrays (Line 35 in Fig. 4). The same holds true for the calculation of the gradient basis function (Lines 37-40 in Fig. 4). Furthermore, it is faster to use logical arrays as multipliers of precomputed basis 160 function arrays rather than using these in a conditional indexing statement, e.g., N(c2==1) = 1-abs(dX (c2==1))./h.
The performance gain is significant, i.e., a 30 % gain over a calculation cycle.

Integration of internal forces
Another computationally expensive operation for MATLAB © is the mapping (or accumulation) of the material point contributions to their associated nodes. It is performed by the function p2Nsolve.m in the workflow of the solver.

165
The standard calculations for the material point contributions to the lumped mass m I , the momentum p I , the external f e I and internal f i I forces are given by: with m p the material point mass, v p the material point velocity, b p the body force applied to the material point and σ p the material point Cauchy stress tensor in the Voigt notation.
Once the mapping phase is achieved, the equations of motions are explicitly solved forward in time on the mesh. Nodal accelerations and velocities are given by: Finally, boundary conditions are applied to the nodes that belong to the boundaries. Such an operation corresponds to the summation operator in Eqs. 6-9. The core of the vectorization is to use p2N as the vector (i.e., flattening the array p2N(:) results in a row vector) of subscripts with accumarray, which accumulates material point contributions (e.g., mass or momentum) that share the same node.
In the function p2Nsolve (Code Fragment 2 shown in Fig. 5), the first step is to initialize nodal vectors (mass, momentum, To illustrate the efficiency of such a procedure, we have developed an iterative and vectorized solution of B(x p ) T σ p with an increasing n p and survey the computational time spent on solving the problem. We report a gain in computational time (close to an order of magnitude faster for one order of increase in magnitude of n p for the vectorized solution).  Finally, the material point coordinates are updated based on the following: This procedure is used for the velocity update (Lines 6-7 in Fig. 7) and for the material point coordinate update (Line 11 in Fig. 7). A remapping of the nodal momentum is carried out (Lines 17 to 20 in Fig. 7), which allows calculating the updated nodal incremental displacements (Line 22 in Fig. 7). Finally, boundary conditions of nodal incremental displacements are enforced (Lines 29-32 in Fig. 7). To consistently apply the external load for the explicit solver, we follow the recommendation of Bardenhagen and Kober (2004), i.e., a quasi-static solution (given an explicit integration scheme is chosen) is obtained if the total simulation time is equal to 40 elastic wave transit times. The material has an elastic modulus E = 1 · 10 6 Pa and a Poisson's ratio ν = 0 with 235 a density ρ = 80 kg m −3 . The gravity g is increased from 0 to its final value, i.e., g = 9.81 m s −2 . We performed additional implicit quasi-static simulations (named iCPDI) in order to consistently discuss the results with respect to what was reported in Coombs and Augarde (2020). The external force is consistently applied over 25 equal load steps. The vertical normal stress is given by the analytical solution (Coombs and Augarde (2020)) σ yy (y 0 ) = ρg(l 0 − y 0 ) where l 0 is the initial height of the column and y 0 is the initial position of a point within the column.

240
The error between the analytical and numerical solutions is as follows: where (σ yy ) p is the vertical stress of a material point p of an initial volume (V 0 ) p and V 0 is the initial volume of the column, i.e., V 0 = np p=1 (V 0 ) p . Figure 9. Linear convergence of the error: a limit is reached at error ≈ 2 · 10 −6 for the explicit solver, whereas the quasi-static solution still converges. This was already demonstrated in Bardenhagen and Kober (2004) as an error saturation due to the explicit scheme, i.e., the equilibrium is never resolved.
The convergence toward a quasi-static solution is shown in Fig. 9. As mentioned by Coombs et al. (2020), it is linear for both 245 cpGIMP and CPDI2q, but contrary to Coombs and Augarde (2020); Coombs et al. (2020) who reported a full convergence, it stops at error ≈ 2 · 10 −6 for the explicit implementation. This was already outlined by Bardenhagen and Kober (2004) as a sat-uration of the error caused by resolving the dynamic stress wave propagation, which is inherent to any explicit scheme. Hence, a static solution could never be achieved because, unlike quasi-static implicit methods, the elastic waves propagate indefinitely and the static equilibrium is never resolved. This is consistent when compared to the iCPDI solution we implemented (green 250 circles), whose behaviour is still converging below the limit error ≈ 2 · 10 −6 reached by the explicit solver. In addition, the convergence becomes quadratic below this limit. It confirms that the error saturation is due to the explicit format and not to our numerical implementation.

Large deformation: the elastic cantilever beam problem
The cantilever beam problem Sinaie et al. (2017) The beam geometry is depicted in Fig. 10 and is discretized by 64 bi-linear four-noded quadrilaterals, each of them initially populated by 3 2 material points (e.g., n p = 576) with a time step determined by the CFL condition, e.g., ∆t = 10 −3 s. The large deformation is initiated by suddenly applying the gravity at the beginning of the simulation, i.e., t = 0 s. Sadeghirad et al. (2011), the cpGIMP simulation failed when using the diagonal components of the defor-265 mation gradient to update the material point domain. However, as expected, the cpGIMP simulation succeeded when using the diagonal terms of the stretch part of the deformation tensor, as proposed by Charlton et al. (2017). The numerical solutions, obtained by the latter cpGIMP and CPDI, to the vertical deflection ∆u of the material point at the bottom free end of the beam (e.g., the red cross in Fig. 10) are shown in Fig. 11. Some comparative results reported by Sadeghirad et al. (2011) are depicted by black markers (squares for the FEM solution and circles for the CPDI solution), whereas the results of our MPM solver are

Elasto-plasticity: the column collapse
We compare our MPM solver with a non-associated plasticity based on a Drucker-Prager model with tension cutoff (Huang et al. (2015)) to the experimental results of an elasto-plastic collapse of a material (e.g., an aluminium-bar assemblage, see Bui 280 et al. (2008)). The numerical implementation is detailed in Huang et al. (2015), and we therefore suggest the interested reader to directly refer to their contribution since we do not describe the elasto-plastic model in this manuscript.
Our elasto-plastic MPM solver closely follows the implementation of Huang et al. (2015), except our solver relies on an MUSL procedure, whereas Huang et al. (2015) selected the USF procedure. The elasto-plastic problem is solved by i) an elastic trial, ii) corrected by a return mapping when the material yields either in shear or in tension or both, assuming a non-285 associated (the dilation angle ψ = 0) perfectly plastic behaviour of the material. Since the MPM variant in Huang et al. (2015) was not clearly stated, we use the uGIMP variant. The reason is the collapse results in extreme deformations, for which a domain update based on the deformation gradient systematically resulted in a failure during the simulation. We conducted preliminary investigations and concluded that the uGIMP variant was the most reliable MPM variant for such a problem. and is discretized by npe = 4 material points per element, and Table 2 summarizes the material properties.
The initial geometry and boundary conditions used for this problem are depicted in Fig. 12, while the parameters used are 290 summarized in Table 2 and represent the problem described in Bui et al. (2008). The aluminium assemblage is discretized by n p = 28 800 material points within a mesh made of 320 × 48 quadrilateral elements, resulting in a uniform spacing of h = 1.25 mm. The time step is given by the CFL condition, e.g., ∆t = 7.02 · 10 −5 s for a total simulation time of 1.25 s.  These numerical results demonstrate the solver to be in agreement with both previous experimental (Bui et al. (2008)) and numerical results (Huang et al. (2015)) and confirms its ability to solve elasto-plastic problems such as granular collapses using an appropriate constitutive model. However, it also demonstrates the inability of the MPM variants based on a domain update (GIMPM or CPDI) to resolve extremely large plastic deformations when relying on the normal components of the deformation gradient or its stretch part to update the material point domain (interested readers are referred to the contribution of Coombs et al. (2020) regarding the suitability of different domain update variants). Consecutively, we performed an additional simulation using a domain update based on the determinant of the deformation gradient. No significant differences were 305 observed with the experimental results, and the simulation succeeded.

Computational efficiency: loop-based code versus vectorized code
We evaluate the computational efficiency of the MATLAB-based MPM solver, using the MATLAB version R2018a on an Intel Core i7-4790, with a benchmark based on the collapse of an aluminium-bar assemblage.
We vary the number of elements of the background mesh, which results in a variety of different mesh spacings h. The number 310 of elements along the x-direction is n el,x = [20,40,80,160,320,640], and the resulting number of elements in the y-direction is n el,y = [2,5,11,23,47,95]. The number of material points per element is kept constant, i.e., n pe = 3 2 , and the total number of material points is n p = [128, 465, 1830, 7260, 28 920, 115 440].
We calculate the average number of iterations per second (it/s) which corresponds to the number of calculation cycles during 1 second for an increasing total number of elements. We use this metric since it provides a clearer insight regarding the 315 efficiency of the numerical solver, i.e., the more iterations there are, the more efficient the solver. Figure 14 shows an overall speed-up ratio of 20 reached by the vectorized solver compared to the iterative implementation. Such a gain is appreciable since In addition, we also monitor the average time spent on the different functions called during a single execution cyclet c 320 (Fig. 15). When comparing time spent on the functions p2Nsolve.m and mapN2p.m, we report a speed up of 24 and 22, respectively. This difference is expected since these functions originally necessitate an extensive use of two nested for-loops to calculate i) the material point contribution or ii) the interpolation of updated nodal solutions. between the actual implementation and the one used by Sinaie et al. (2017); the latter is based on a USL variant with a cutoff algorithm, whereas our implementation relies on the MUSL (or double mapping) procedure, which necessitates a double mapping procedure. The initial geometry and parameters are the same as those used in Sinaie et al. (2017).
Our CPDI2q implementation ,in MATLAB R2018a, is, ad minima, 2.8 times faster than the Julia implementation proposed by Sinaie et al. (2017) for similar hardware (see Table 3). Sinaie et al. (2017) completed the analysis with an Intel Core i7-6700

330
(4 cores with a base frequency of 3.40 GHz up to a turbo frequency of 4.00 GHz) with 16 GB RAM, whereas we used an Intel Core i7-4790 with similar specifications (see section 2). However, the performance ratio between MATLAB and Julia seems to decrease as the mesh resolution increases.

Elasto-plastic slumping
We present an application of the MPM solver to the case of landslide mechanics. Considering a CPDI2q variant coupled to an 335 elasto-plastic constitutive model based on a Mohr-Coulomb (M-C) non-associated plasticity Simpson (2017), we i) analyse the geometrical features of the slump and ii) compare the results (the geometry and the failure surface) to the numerical simulation of Huang et al. (2015), which is based on a Drucker-Prager model with tension cutoff (D-P).
Since an explicit scheme is used, we introduce a local damping in the equations of motion to resolve the equilibrium during a loading phase of 8 seconds, during which the gravity g is linearly ramped up to 9.81 m s −2 . The elasto-plastic 340 behaviour is activated once this loading procedure is terminated, and the simulation proceeds during 7 additional seconds for a total simulation time of 15 seconds. The local damping coefficient is set to 0.1 since the latter damps the oscillations while preserving the dynamics Wang et al. (2016b).
The geometry of the problem is depicted in Fig. 16, the soil material is discretized by 110 × 35 elements with n pe = 3 2 , resulting in n p = 21 840 material points; a mesh spacing h = 1 m and rollers are imposed at the left and right domain limits 345 while a no-slip condition is enforced at the base of the material. The parameters used for the solution are shown in Table 4.
The numerical solution to the elasto-plastic problem is shown in Fig. 17. An intense shear zone, highlighted by the second invariant of the plastic strain II , develops at the toe of the slope as soon as the material yields and it propagates backwards to the top of the material. It results in a rotational slump. The geometry and the failure surface reported by Huang et al. (2015) are highlighted by the continuous and the dotted lines respectively.

Discussion
In this contribution, an efficient MPM solver is proposed that considers two variants (e.g., the cpGIMP and the CPDI/CPDI2q approach that would require multiple nested for-loops. For the cantilever beam, the MATLAB-based solver is, ad minima, twice faster than a Julia implementation.

365
Regarding the elastic compaction of a one element-wide column, we report a good agreement of the numerical solver with previous explicit MPM implementations, such as Bardenhagen and Kober (2004). The same flaw of an explicit scheme is also experienced by the solver, i.e., a saturation of the error due to the specific used of an explicit scheme that resolves the wave propagation, thus preventing any static equilibrium to be reached. This confirms that our implementation is consistent with previous MPM implementations, especially for the implicit formulation where a quadratic convergence is resolved.

370
Regarding the cantilever beam problem, we report a good agreement of our solver with the results obtained by Sadeghirad et al. (2011). Furthermore, we report the vertical deflection of the beam to be very close in both magnitude and timing (for the CPDI2 variant) to the FEM solution. The beam almost recovered all the vertical deflection it experienced during the gravitational loading.
Since we compared both cpGIMPM and CPDI2 implementations for the cantilever beam problem, we observed a difference 375 regarding the computational efficiency; the cpGIMP (e.g., it/s = 252.40) implementation is 3.5 times faster than the CPDI2 (e.g., it/s = 75.78) implementation for this case. This is mainly due to the extra cost of calculation of the basis functions and their derivatives. Additional computational resources are required to calculate these quantities, i.e., the basis function and their derivatives weights.
The elasto-plastic collapse demonstrates the abilities of the solver, considering both the number of iterations per seconds and In addition, the computational efficiency reached by the solver is higher than expected and is even higher in the elastic 385 case with respect to what was reported by Sinaie et al. (2017). This confirms the efficiency of MATLAB for solid mechanics problems, provided a reasonable amount of time is spent on the vectorization of the MPM algorithm.
The elasto-plastic slump also demonstrates the solver to be efficient in capturing complex dynamics in the field of geomechanics. Moreover, the CDPI2q solution showed that the algorithm proposed by Simpson (2017) returns stresses when the material yields and is well suited to the slumping mechanics.

390
However, as mentioned by Simpson (2017), such return mapping is valid only under the assumption of a non-associated plasticity with no volumetric plastic strain.

Conclusions
Our numerical investigations revealed that a domain-based approach in MPM fails when extremely large plastic deformations are involved, i.e., the elasto-plastic collapse. This can be avoided when a domain update based on the determinant of the 395 deformation gradient is chosen.
We also have demonstrated the capability of MATLAB as an efficient language in regard to a material point method ( The vectorization tasks we performed provide a fast and efficient MATLAB-based solver; however, the algorithmic structure could be transposed to a more efficient language, such as the C-CUDA language, that is known to efficiently take advantage of vectorized operations.
As a final word, a future implementation of a poro-elasto-plastic mechanical solver could be applied to complex geomechan-405 ical problems such as landslide dynamics while benefiting from a faster numerical implementation in C-CUDA, thus resolving high three-dimensional resolutions in an affordable amount a time.