NorSand4AI: a comprehensive triaxial test simulation database for NorSand constitutive model materials

Ozelim, Luan Carlos de Sena Monteiro; Casagrande, Michéle Dal Toé; Cavalcante, André Luís Brasil

doi:https://doi.org/10.5194/gmd-17-3175-2024

Articles | Volume 17, issue 8

https://doi.org/10.5194/gmd-17-3175-2024

Articles | Volume 17, issue 8

Model experiment description paper

23 Apr 2024

Model experiment description paper |

| 23 Apr 2024

NorSand4AI: a comprehensive triaxial test simulation database for NorSand constitutive model materials

Luan Carlos de Sena Monteiro Ozelim, Michéle Dal Toé Casagrande, and André Luís Brasil Cavalcante

Abstract

In soil sciences, parametric models known as constitutive models (e.g., the Modified Cam Clay and the NorSand) are used to represent the behavior of natural and artificial materials. In contexts where liquefaction may occur, the NorSand constitutive model has been extensively applied by both industry and academia due to its relatively simple critical state formulation and low number of input parameters. Despite its suitability as a good modeling framework to assess static liquefaction, the NorSand model still is based on premises which may not perfectly represent the behavior of all soil types. In this context, the creation of data-driven and physically informed metamodels emerges. The literature suggests that data-driven models should initially be developed using synthetic datasets to establish a general framework, which can later be applied to experimental datasets to enhance the model's robustness and aid in discovering potential mechanisms of soil behavior. Therefore, creating large and reliable synthetic datasets is a crucial step in constructing data-driven constitutive models. In this context, the NorSand model comes in handy: by using NorSand simulations as the training dataset, data-driven constitutive metamodels can then be fine-tuned using real test results. The models created that way will combine the power of NorSand with the flexibility provided by data-driven approaches, enhancing the modeling capabilities for liquefaction. Therefore, for a material following the NorSand model, the present paper presents a first-of-its-kind database that addresses the size and complexity issues of creating synthetic datasets for nonlinear constitutive modeling of soils by simulating both drained and undrained triaxial tests. Two datasets are provided: the first one considers a nested Latin hypercube sampling of input parameters encompassing 2000 soil types, each subjected to 40 initial test configurations, resulting in a total of 160 000 triaxial test results. The second one considers nested quasi-Monte Carlo sampling techniques (Sobol and Halton) of input parameters encompassing 2048 soil types, each subjected to 42 initial test configurations, resulting in a total of 172 032 triaxial test results. By using the quasi-Monte Carlo dataset and 49 of its subsamples, it is shown that the dataset of 2000 soil types and 40 initial test configurations is sufficient to represent the general behavior of the NorSand model. In this process, four machine learning algorithms (Ridge Regressor, KNeighbors Regressor and two variants of the Ridge Regressor which incorporate nonlinear Nystroem kernel mappings of the input and output values) were trained to predict the constitutive and test parameters based solely on the triaxial test results. These algorithms achieved 13.91 % and 16.18 % mean absolute percentage errors among all 14 predicted parameters for undrained and drained triaxial test inputs, respectively. As a secondary outcome, this work introduces a Python script that links the established Visual Basic implementation of NorSand to the Python environment. This enables researchers to leverage the comprehensive capabilities of Python packages in their analyses related to this constitutive model.

Download & links

How to cite.

Received: 22 Jul 2023 – Discussion started: 18 Sep 2023 – Revised: 09 Feb 2024 – Accepted: 19 Mar 2024 – Published: 23 Apr 2024

1 Introduction

In situations where liquefaction is a potential concern, geotechnical engineers and soil scientists seek suitable modeling frameworks to accurately evaluate and mitigate associated risks. One specific scenario highlighting this need is the case of filtered tailing piles. These piles pose significant geotechnical risks related to liquefaction, requiring thorough assessment through appropriate constitutive modeling. Factors such as the height and speed of stacking play crucial roles in creating vulnerable regions within the pile susceptible to liquefaction. The existence of a liquefaction trigger, particularly in undrained loading conditions, has the potential to result in the structural collapse of the pile.

In this scenario, the NorSand constitutive model emerges as a suitable alternative to liquefaction modeling due to its relatively simple critical state formulation and low number of input parameters. This model is a generalized critical state model based on the state parameter ψ, as defined by Jefferies (1993):

\begin{matrix} (1) & ψ = e - e_{c}, \end{matrix}

where e is the current void ratio and e_c is the void ratio at the critical state. The NorSand model emulates natural soil behavior by incorporating associated plasticity and limited hardening, which enables dilation similar to that observed in real soils. This limited hardening causes yielding during unloading conditions and provides second-order detail in replicating observed soil behavior (Silva et al., 2022; Jefferies and Been, 2015).

Despite its suitability as a good modeling framework to assess static liquefaction (Sternik, 2015), the NorSand model still is based on premises which may not perfectly represent the behavior of all soil types. Also, only recently the NorSand method has been implemented in commercial finite element software (Rocscience, 2022; Itasca Consulting Group, 2023; Bentley, 2022). Besides, regarding open-source distributions, only the Visual Basic (VBA) implementation presented by Jefferies and Been (2015) is available. It is precisely in this context that the creation of data-driven and physically informed metamodels emerges. These metamodels, when based on artificial intelligence techniques, especially machine learning (ML) and deep learning (DL), may be able to provide accurate and computationally cheap models, allowing them to be a perfect link between complex computational models and real-time data collection and monitoring. Such methods need to be trained on large-scale datasets and this is where the NorSand model comes in handy: by using NorSand simulations as the training dataset, data-driven constitutive metamodels can then be fine-tuned using real test results. These models will combine the power of NorSand with the flexibility provided by data-driven approaches, enhancing the modeling capabilities for liquefaction.

Thus, the current paper aims to address three main issues: the quantity and complexity of synthetic datasets for nonlinear constitutive modeling of soils and the availability of open-source implementations of the NorSand constitutive model. The first two aspects are addressed by simulating both drained and undrained triaxial tests. Two datasets are provided: the first one will be used to study how large a given dataset must be in order to accurately capture the behavior of a NorSand material, while the second one, completely different from the first dataset, will be a perfect out-of-sample testing dataset used to perform the sample size validations mentioned. A byproduct of such sample size validation will be the training of different machine learning algorithms to perform the following learning task: obtain the input parameters of the NorSand model solely from the results of triaxial tests. Different sampling techniques will be used to produce the datasets mentioned, such as nested Latin hypercube and quasi-Monte Carlo sampling of input parameters. Then, the third aspect is considered by presenting an implementation which connects the well-known VBA implementation to the Python environment. We will use the VBA code as the “processing kernel” of our Python implementation, taking advantage of the years of tests and validation of the algorithm provided by Jefferies and Been (2015). This new Python code allows other researchers to use the full power of Python packages during their analyses involving NorSand.

The paper is structured as follows: Sect. 2 presents the general concepts of data-driven metamodels, with special emphasis given to soil constitutive modeling. Then, Sect. 3 introduces the Norsand model. Section 4 presents the methods considered in this study. Section 5 describes the associated data records, while Sect. 6 presents technical validation of the results. Section 7 presents some usage notes and codes considered in the paper. Finally, Sect. 8 presents the conclusions.

2 Data-driven metamodels

Montáns et al. (2019) emphasize that human learning involves observing and experiencing the world, collecting data and identifying patterns through repeated experiments. Scientific discovery involves formalizing these patterns and relationships into laws and equations, transforming data into properties and variables, and converting observations into events. Although laws and equations aid learning, the classical learning process in science is often slow and expensive, requiring extensive observation and experimentation to understand the main variables and their impact on the phenomenon. Data-driven procedures, on the other hand, seek, if possible, an implicitly unbiased approach to our learning experience based on raw data from actual or synthetic observations. These procedures have the added advantage of testing correlations between different variables and observations, learning unanticipated patterns in nature and allowing us to discover new scientific laws or even make predictions without the availability of such laws.

The recent rapid increase in the availability of measurement data from physical systems as well as from massive numerical simulations has stimulated the development of many data-driven methods for modeling and predicting dynamics. At the forefront of data-driven methods are deep neural networks (DNNs). DNNs not only achieve superior performance for tasks such as image classification, but have also proven effective for future-state prediction of dynamical systems (Haghighat et al., 2021). A key limitation of DNNs and similar data-based methods is the lack of interpretability of the resulting model: they are focused on prediction and do not provide governing equations or clearly interpretable models in terms of the original set of variables. An alternative data-based approach uses symbolic regression to directly identify the structure of a nonlinear dynamical system from data (Schmidt and Lipson, 2009). This works remarkably well for discovering interpretable physical models, but symbolic regression is computationally expensive and can be difficult to scale to large problems (Montáns et al., 2019).

2.1 Data-driven constitutive modeling

In order to create metamodels from neural networks (NN), this type of approach generally requires a priori calibration of the algorithms from data considered to be representative of material behavior (He et al., 2021). For example, NNs have been applied to model a variety of materials, including concrete materials (Ghaboussi et al., 1991), hyperelastic materials (Shen et al., 2005), viscoplastic steel material (Furukawa and Yagawa, 1998) and homogenized properties of mixed structures (Lefik and Schrefler, 2003). Once calibrated, NN-based constitutive models have been integrated into finite element codes to predict path- or rate-dependent material behaviors (Lefik and Schrefler, 2003; Hashash et al., 2004; Jung and Ghaboussi, 2006; Stoffel et al., 2019).

Recently, DNNs with special mechanistic architectures, such as recurrent neural networks (RNNs), have been applied to path-dependent materials (Wang and Sun, 2018; Mozaffar et al., 2019; Heider et al., 2020). It is clear that this type of approach has found significant application in a wide range of engineering fields, as reinforced by He et al. (2021) when they argue that data-driven computation with physical constraints is an emerging computational paradigm that allows the simulation of complex materials directly based on the materials database and disregards the classical constitutive model construction.

To develop a data-driven constitutive model, a substantial and reliable dataset is necessary. However, obtaining a sufficiently large dataset for soil science can be challenging since experimental data are often limited and inadequate for training ML and DL algorithms. Generating synthetic data using a theoretical function can be a useful alternative, as it allows for the creation of an unlimited supply of data (Zhang et al., 2021 a).

The literature suggests that data-driven models should initially be developed using synthetic datasets to establish a general framework, which can later be applied to experimental datasets to enhance the model's robustness and aid in discovering potential mechanisms of soil behavior (Zhang et al., 2021 a). By calibrating constitutive models on synthetic datasets, the impact of experimental and measurement errors on the mapping ability of machine learning algorithms can be eliminated (Zhang et al., 2020). Therefore, creating large and reliable synthetic datasets is a crucial step in constructing data-driven constitutive models.

2.2 Data-driven soil constitutive models

Currently, there is a lack of robust and high-volume datasets in the literature for soil modeling tasks. One effective method to generate synthetic datasets is through numerical simulations performed on digital soil models. Typically, these simulations involve selecting a parametric constitutive model, sampling some parameters and running simulations that mimic real-world test setups. In soil modeling, triaxial tests are commonly simulated using conventional physics-driven constitutive models, such as simple monotonic Konder's expression (Basheer, 2000), or more advanced models like the Modified Cam Clay (MCC) (Fu et al., 2007; Zhang et al., 2023).

In particular, a simple sand shear constitutive model was used to generate synthetic datasets in the work of Zhang et al. (2021 b). A total of 14 curves were generated to develop the ML-based constitutive model (9 curves for training and 5 curves for testing).

On the other hand, the MCC constitutive model was utilized to produce a benchmark stress–strain dataset of a virtual soil in the work of Zhang et al. (2023). In that study, a total of 250 soil types were considered, with 125 being part of the training dataset and the remaining 125 in the testing dataset. Considering all the initial states in the paper by Zhang et al. (2023), 1125 sets of stress–strain samples were employed as the training dataset, while 1250 sets of stress–strain samples constituted the testing dataset.

The MCC model has been a fundamental element in numerous complex models developed in recent times (Yao et al., 2008). However, this model and its variations are not well suited for depicting the behavior of actual sands due to their insufficient representation of key features such as yielding and dilation. This is because these models assume that soils denser than the critical state line are overconsolidated, resulting in unrealistically high stiffness and excessively exaggerated strength (Woudstra, 2021). As indicated in the Introduction section, the NorSand constitutive model presents clear advantages over the MCC model and, therefore, shall be described in detail in the next section.

Table 1Input values for NorSand model also used as inputs for the NorSandTXL VBA routine (Jefferies and Been, 2015).

Download Print Version | Download XLSX

3 NorSand

The NorSand constitutive model is a comprehensive critical state model that effectively accounts for the impact of void ratio on soil behavior, providing a robust framework for modeling static liquefaction in engineering applications. A distinctive characteristic of soils is that their void ratios or relative densities influence their mechanical properties. In this regard, NorSand, as a constitutive model, aptly elucidates changes in soil behavior resulting from variations in void ratio (Jefferies and Been, 2015).

Within the Critical State Soil Mechanics (CSSM) framework, NorSand aligns with widely used models like the Original Cam Clay (OCC; Schofield and Wroth, 1968) and the MCC (Roscoe and Burland, 1968). In fact, the NorSand and OCC yield surfaces have the same shapes and the same flow rules. CSSM is founded on two principles: (1) the presence of a unique failure locus known as the critical state locus (CSL) and (2) the assertion that shear strain guides soil toward the CSL.

The primary limitation of MCC, especially when applied to sands, lies in its inability to capture the dilation behavior observed in dense sands. Moreover, it proves inadequate in predicting the behavior of loose sands and is unsuitable for addressing liquefaction-related issues. NorSand's key advantage lies in its incorporation of a state parameter, representing the difference between the current void ratio of the soil and its critical state. This approach uniquely relates soil dilation or compaction to the state parameter (Rocscience, 2022).

NorSand stands out for its ease of use, particularly for practical geotechnical engineers. It relies on a minimal set of material properties, conveniently measurable through standard laboratory tests. The model effectively captures a wide range of soil behaviors influenced by varying density and confining stress. The key additional parameter, beyond what is necessary for defining an MCC model, is the state parameter. In situations where precision in representing volume change is crucial, the added effort required for parameter determination is more than justified.

Developed initially for sands based on observations in large-scale hydraulic fills such as tailing dams, NorSand applicability extends beyond, encompassing any soil where particle-to-particle interactions are controlled by contact forces and slips, rather than cohesive bonds. Present applications of NorSand span a range from well-graded tills to sands and clayey silts (Jefferies and Been, 2015).

The input parameters of the NorSand model are presented in Table 1, where the meaning of each parameter is also presented in the column “Description”. The sampling ranges presented will be discussed in the next section, as they are not intrinsic to the NorSand model.

4 Methods

4.1 Data generation

The NorSandTXL program is an Excel spreadsheet with all coding in the VBA environment and can be downloaded at http://www.crcpress.com/product/isbn/9781482213683 (last access: 8 February 2024), as indicated in the book by Jefferies and Been (2015). This particular spreadsheet simulates drained and undrained triaxial tests of materials governed by the NorSand constitutive model. The input features available in NorSandTXL are presented in Table 1, as well as their sampling ranges. The sampling ranges adopted come from literature results on the behavior of real granular materials. An initial version of such ranges was first presented by Jefferies and Shuttle (2002) and has been updated ever since. The ranges presented in Table 1 reflect the latest compilation available and reported by Jefferies and Been (2015). This way, practitioners will especially benefit from the datasets generated, since the parameters involved have been chosen so as to represent real granular materials.

In order to massively simulate triaxial test conditions for materials following the NorSand constitutive model, a Python routine has been developed. This routine performs two main steps: sampling and simulation. For the sampling process, all 14 input parameters are sampled in a nested manner, as there are two levels of hierarchy in the parameters: the higher level deals with the soil properties, which are unique for a given material, while the lower level considers the initial soil state during the triaxial tests. As a result, the sampling process needs to (a) account for different types of materials and (b) for each type of material, consider several testing conditions. Two datasets will be produced, as the next subsection will describe.

Thus, the following sampling procedure is considered to account for n_soils types of soils under n_conditions initial testing conditions:

Sample the soil properties (the first 10 parameters in Table 1), obtaining a vector of properties sp_i, $i = 1, \dots, n_{soils}$ , such that sp_i∈ℝ¹⁰. The sampling is performed using the centered Latin hypercube sampling (LHS) algorithm implemented in the Chaospy package (Feinberg and Langtangen, 2015) with a maximin criterion (first dataset) or using a Sobol (Sobol, 1967) quasi-Monte Carlo sampling technique implemented in SciPy (Virtanen et al., 2020) (second dataset).
For each sp_i, the initial testing conditions (the last four parameters in Table 1) are sampled using the standard Latin hypercube sampling algorithm implemented in the Chaospy package (Feinberg and Langtangen, 2015) with a ratio criterion (first dataset) or a Halton (Halton, 1960) quasi-Monte Carlo sampling scheme (second dataset) implemented in SciPy (Virtanen et al., 2020). This way, the vectors $i c_{i, j} \in R^{4}$ , $j = 1, \dots, n_{conditions}$ are obtained for each sp_i. The maximum value of ψ₀ is set to $ψ_{\max} / 5$ (as indicated in Table 1) for numerical stability. Additionally, to make the ic_i,j different for each sp_i, the random seed of the sampling algorithm is changed for each i.

From the procedure above, the matrix In of input parameters is obtained, whose rows are NorSandTXL input vectors obtained by concatenating each sp_i with all the ic_i,j, i.e., $[concat (s p_{1}, i c_{1, 1}), concat (s p_{1}, i c_{1, 2}), \dots, concat (s p_{n_{soils}}, i c_{n_{soils}, n_{conditions}})]$ , where “concat” denotes a concatenation operation between vectors. This implies that In is a (n_soilsn_conditions) by 14 matrix. The filling capabilities of the sampling schemes considered can be seen in Fig. 1.

https://gmd.copernicus.org/articles/17/3175/2024/gmd-17-3175-2024-f01

Figure 1Scatter plot illustrating how each space-filling technique works for particular pairs of constitutive and test-related parameters.

NorSand4AI: a comprehensive triaxial test simulation database for NorSand constitutive model materials

2.1 Data-driven constitutive modeling

2.2 Data-driven soil constitutive models

4.1 Data generation

4.2 Sample size validation

6.1 Understanding the learning task

6.1.1 Drained versus undrained test performance

6.1.2 Effect of training sample sizes on the learning task

7.1 Simply run NorSand in Python

7.2 Generate and save files

7.3 Analyzing errors during learning tasks