Particle swarm optimization for the estimation of surface complexation constants with the geochemical model PHREEQC-3 . 1 . 2

Sorption of metals on minerals is a key process in treatment water, natural aquatic environments, and other water-related technologies. Sorption processes are usually simulated with surface complexation models; however, identifying numeric values for the thermodynamic constants from batch experiments requires a robust parameter estimation technique that does not get trapped in local minima. Recently, particle swarm optimization (PSO) techniques have attracted many researchers as an efficient and effective optimization technique to find (near-)optimum model parameters in several fields of research. In this work, uranium at low concentrations was sorbed on quartz at different pH, and the hydroPSO R optimization package was used – the first time – to calibrate the PHREEQC geochemical model, version 3.1.2. Results show that thermodynamic parameter values identified with hydroPSO are more reliable than those identified with the well-known parameter estimation (PEST) software, when both parameter estimation software are coupled to PHREEQC using the same thermodynamic input data. In addition, post-processing tools included in hydroPSO were helpful for the correct interpretation of uncertainty in the obtained model parameters and simulated values. Thus, hydroPSO proved to be an efficient and versatile optimization tool for identifying reliable thermodynamic parameter values of the PHREEQC geochemical model.


Introduction and scope
Particle swarm optimization technique (PSO) is an evolutionary optimization technique proposed by Eberhart and Kennedy (1995) and was influenced by the activities of flocks of birds in search of corn (Kennedy and Eberhart, 1995;Eberhart and Kennedy, 1995).Both PSO and genetic algorithms (GAs) share a few similarities (Eberhart and Shi, 1998).GAs have evolutionary operators like crossover or selection, while PSO does not have it (Eberhart and Shi, 1998).Recently, PSO has been implemented in a wide range of applications, e.g., in the water resources (e.g., Bisselink et al. 2016;Zambrano-Bigiarini and Rojas, 2013;Abdelaziz and Zambrano-Bigiarini, 2014;Formetta et al., 2014), geothermal resources (Ma et al., 2013;Beck et al., 2010), finance and economics (Das, 2012), structural design (Kaveh and Talatahari, 2009;Schutte and Groenwold, 2003), and applications of video and image analysis (Donelli and Massa, 2005;Huang and Mohan, 2007;Abdelaziz et al., 2018).For example, the groundwater model MODFLOW2000/2005 was linked with PSO to estimate permeability coefficients (Sedki and Ouazar, 2010) and a multi-objective PSO code was used to derive rainfall-runoff model parameters by introducing the Pareto rank concept (Gill et al., 2006).Notwithstanding recent popularity, PSO has never been used to calculate the parameters of a surface complexation model (SCM) simulating sorption behavior of metal and metalloids on mineral surfaces.Hence, this paper attempts to examine the efficiency and effectiveness of PSO for parameter estimation of a sur-face complexation model as the PHREEQC (Parkhurst and Appelo, 1999).
Nowadays, a number of PSO software codes exist, such as MADS (Harp and Vesselinov, 2011;Vesselinov and Harp, 2012) and OSTRICH (Matott, 2005), with most of the codes using the basic PSO formulation developed in 1995.However, in this paper, we use the latest standard particle swarm optimization proposed in literature (Clerc, 2012;Zambrano-Bigiarini et al., 2013), named SPSO2011, as implemented in the hydroPSO R package (R Core team, 2016) version 0.3-3 (Zambrano-Bigiarini and Rojas, 2013Rojas, , 2014)).hydroPSO is an independent R package that includes the newest standard PSO (SPSO-2011), which was specifically developed to calibrate a wide range of environmental models.In addition, the plotting functions in hydroPSO are user-friendly and aid the numeric and visual interpretation of the optimization results.The source code, installation files, tutorial (vignette), and manual are available on https://cran.r-project.org/web/package=hydroPSO (last access: 12 June 2018).
hydroPSO is used in this article, for the first time, to estimate the parameters of a surface complexation for uranium(VI)-quartz system, to properly capture the nonlinear interactions between the model parameters.The aim of this article is to examine the suitability of hydroPSO as a global optimization tool for parameter estimation of geochemical models, in particular PHREEQC v3.1.2.To this end, surface/sorption reaction constants (log K) of the SCM obtained with hydroPSO will be compared to those previously obtained with parameter estimation (PEST) software (Doherty, 2010) by Nair et al. (2014).
PEST and PSO are both model-independent parameter optimizers; i.e., they do not require making any change to the model code.PEST uses the Gauss-Marquardt-Levenberg method to minimize, in the weighted least squares sense, the differences between observations and the corresponding model-simulated values (Abdelaziz and Bakr, 2012;Edet et al., 2014).PEST is a gradient-based algorithm that initially calculates the Jacobian matrix and is then used to build and upgrade parameter set, to enhance the searching ability to obtain a better objective function value (Doherty, 2010).The model then iterates adjusting the model parameters on the basis of a new Marquardt lambda value (Doherty, 2010), which drives the objective function for faster convergence.As a local optimizer, PEST is sensitive to the initial condition (see a complete description in Doherty, 2010).In contrast, PSO is global optimizer which randomly initializes a population of particles within the D-dimensional parameter space.PSO allows initializing the position of each particle using a random uniform distribution or Latin hypercube sampling (LHS), while velocities can be initialized at zero, with two different random distributions, or with two different LHS strategies (see Zambrano-Bigiarini and Rojas, 2013).Velocity and position of each particle in the parameter space are updated in successive iterations following equations specific to the selected PSO version (see a complete description in Zambrano-Bigiarini and Rojas, 2013, andAbdelaziz andZambrano-Bigiarini, 2014).As a state-of-the-art global optimizer, PSO is less subject get trapped in local minima compared to PEST.
2 Model description PHREEQC version 3.1.2(Parkhurst and Appelo, 1999) and the database of Nuclear Energy Agency thermodynamic (NEA_2007) (Grenthe et al., 2007), as well as the LLNL database (Lawrence Livermore National Laboratory) are used to model sorption of metal species.Both databases were modified by setting constant values for MUO 2 (CO 3 ) 2− 3 and M 2 UO 2 (CO 3 ) 0 3 species (M equals Ca, Mg, Sr) obtained from Geipel et al. (2008) and Dong andBrooks (2006, 2008).PHREEQC is a geochemical software which is able to simulate sorption, surface complexation, and other types of reactions.SCMs are considered to be able to describe the processes at liquid-solid interfaces (Huber and Lützenkirchen, 2009), and have been widely used to simulate the sorption of metal species from aqueous solution depending on its concentration, pH value, ionic strength, and redox conditions (Davis et al., 2004;Štamberg et al., 2003;Zheng et al., 2003).A different group of reactions has taken place between aqueous species in the bulk solution and the surface of the sorbent leads to the formation of surface complexes (Nair et al., 2014).The surface complexation constants for these reactions are indispensable and independent of changes in solution condition or solution complex (Dzombak and Morel, 1990;Hayes et al., 1991;Volesky, 2003).
There are different type of SCMs, such as the generalized two-layer model (GTLM), non-electrostatic model (NEM), constant capacitance model (CCM), diffuse-layer model (DLM), and modified triple-layer model (modified TLM).Here, a GTLM (Dzombak and Morel, 1990) was used to simulate the sorption behavior of U(VI) on quartz.The GTLM was used instead of other models because it is relatively simple and can be used in a wide range of chemical conditions.A comprehensive review of the GTLM is presented in Dzombak and Morel (1990).Quartz is a nonporous and non-layered mineral, and therefore the actual area of surface is supposed to be equal to the specific surface area.In this study, the surface of quartz is considered as a single binding site which takes the charge for every surface reaction.The sorption reactions and log K values are related to the aqueous species and thus depend on the thermodynamic database used.Uranyl carbonate complexes ((UO 2 ) 2 CO 3 (OH) 3 −, UO 2 (CO 3 ) 2 2 −, and UO 2 (CO 3 ) 4 3 −) are the dominant species under our experimental conditions.Therefore, the surface complexation reactions for quartz were calculated with respect to these species.
The sorption of U(VI) on quartz was investigated and discussed by Huber and Lützenkirchen (2009) without considering the formation of alkaline metal uranyl carbonates.The formation of Mg-, Ca-, and Sr-uranyl-carbonato complexes shows a significant impact on the sorption of uranium on quartz.This was studied by Nair and Merkel (2011) in batch experiments adding 10 g of powdered quartz to 0.1 L of water containing rather low U(VI) concentrations (0.126×10 −6 M) in the absence and presence of Mg, Sr, and Ca (1 mM) at a pH value between 9 and 6.5 in steps of 0.5.NaHCO 3 (1 × 10 −3 M) and NaCl (1.5 × 10 −3 M) were used as ionic strength buffers.The low U concentrations were used to avoid precipitation of Ca-U carbonates.In the absence of alkaline earth elements, the percentage of uranium was sorbed on quartz approximately 90 % independent from pH.In the existence of Mg, Sr, and Ca, the percentage of sorption of uranium on quartz decreased to 50 %, 30 %, and 10 % respectively (Nair and Merkel, 2011).In this paper, the surface was the generalized two-layer model and was taken from Dzombak and Morel (1990) with no explicit calculation of the diffuse-layer composition.
Table 1 displays the parameter ranges used to optimize the six parameters selected to calibrate PHREEQC, based on Nair et al. (2014).

Computational implementation
Inverse modeling is a complex issue for modelers as a result of the numerous uncertainties in model parameters and observations (e.g., Carrera et al., 2005;Beven, 2006).PSO is an evolutionary optimization algorithm originally developed by Kennedy and Eberhart (1995), which has proven to be highly efficient when solving a large collection of case studies from different disciplines (see, e.g., Poli, 2008).In PSO, each individual of the population searches for the global optimum in a multi-dimensional parameter space, considering the individual and collective past experiences.The canonical PSO algorithm starts with a random initialization of the particles' positions and velocities within the parameter space.Velocity and position of each particle in the parameter space are updated in successive iterations following equations specific to the selected PSO version, in order to find the minimum or (maximum) value of a user-defined objective function (see a complete description in Zambrano-Bigiarini and Rojas, 2013).In the last decades, several improvements have been proposed to the canonical PSO algorithm, and the selected optimization tool implements several of them in a single piece of software.The hydroPSO R package v0.3-3 (Rojas and Zambrano-Bigiarini, 2012;Zambrano-Bigiarini andRojas, 2013, 2014) is a model-independent optimization package, which implements a state-of-the-art PSO algorithm to carry out a global parameter optimization, and it has been successfully applied as calibration tool for both hydrogeological and hydrological models (Zambrano-Bigiarini and Rojas, 2013; Thiemig et al., 2013;Abdelaziz and Zambrano-Bigiarini, 2014;Bisselink et al., 2016).In particular, hy-droPSO implements six PSO variants (equations used to update particles' position and velocities), four topologies (ways in which particles are linked during each iteration), two initialization of particles' positions (random uniform distribution or Latin hypercube sampling), and five alternatives for initializing particles' velocities, among other fine-tuning options (see Zambrano-Bigiarini and Rojas, 2013).In the application of hydroPSO to PHREEQC, the following configuration was used: a swarm with 10 particles, 200 iterations, LH initialization of particle positions and velocities, random topology with five informants, acceleration coefficients c 1 and c 2 equal to 2.05, linearly decreasing clamping factor for V max in the range [1.0, 0.5], and use of Clerc's con- striction factor instead of the inertia weight.hydroPSO requires no instruction or template files as UCODE (Poeter et al., 2005;Abdelaziz and Merkel, 2015) and PEST software (Doherty, 2010(Doherty, , 2013) ) do.In order to couple hydroPSO with the PHREEQC geochemical model, three text files have to be prepared by the user to handle data transfer between the model code and the optimization engine.These files include (i) ParamFiles.txt, which describes the names of a set of parameters to be estimated and locations in the model input files to be utilized in the inverse modeling, (ii) ParamRanges.txt,which defines the minimum and maximum values that each selected parameter might have during the optimization, and (iii) PSO_OBS.txt, which contains the observations that will be compared against its simulated counterparts.In addition, a user-defined R script file (Read_output.R) should define the instructions to read model outputs.Finally, for coupling hy-droPSO with PHREEQC, an R script template provided by hydroPSO developers was slightly modified by the user in order to carry out the optimization.Figure 1a shows a flow chart that depicts how hydroPSO is coupled with PHREEQC to calibrate its parameters.Run-phreeqc.bat is a batch file to run PHREEQC-3.1.2 in the DOS environment, which reads *.phrq files to produce *.prn files as output (simulated data); *.ins files are instructions to read model outputs, by using the Read_output.R script.At each iteration, hydroPSO modifies model parameter values to minimize the value of the user-defined objective function.Finally, the new parameter values are updated following the locations provided in the "ParamFiles.txt"file.In contrast, to couple PEST with PHREEQC, four files are required.These include (i) template files (*.tpl), (ii) instruction files (*.ins), (iii) a main control file (*.pst), and (iv) a batch file to execute PHREEQC and PEST(*.bat).Template files are built to modify the input files for PHREEQC with other values while an instruction file is used to extract the simulated values from the output file for PHREEQC.The main control file includes the model application to be run, the observations, parameters to be es-timated, control data keywords, and others.Further information about PEST can be found in its manual (Doherty, 2010).Figure 1a, b show the key files used to couple PHREEQC with hydroPSO and explain the flowchart and files involved in the inverse modeling of the surface complexation constants for the U(VI) sorption model.
For numerical optimization, the residual sum of squares (RSS or SSR; see Eq. 1) was utilized to compute the goodness of fit (GoF) between the corresponding model outputs (C s j ) and observed U-carbonate concentration values (C o j ) at different pH values for every iteration step i; n is the number of observation points (measured sorption U(VI) onto quartz).Minimizing the residual sum of squares was chosen as the method for estimating the surface/sorption reaction constants in calibrations by Nair et al. (2014) when PEST was combined with PHREEQC.It was decided for consistency to select SSR as the criterion for goodness of fit when applying hydroPSO with PHREEQC.After some initial trials, the number of maximum iterations (T ) was set to 200 and the number of particles used to search for the minimum RSS in the parameter space was fixed at 10 (i.e., 2000 model runs).All the input files required for running PHREEQC and hydroPSO can be found at Zenodo (https: //zenodo.org/record/1044951#.WgVTbVuCzIU, last access: 10 November 2017), including all the optimization results. (1)

Results and discussion
One approach to evaluate model performance is by plotting the simulation against observed values (i.e., visualizing model outputs), as shown in Fig. 1.The coefficient of determination (R 2 ) for the relation between calculated and observed values is 0.89, indicating that the simulated values obtained with the best parameter set found by hydroPSO are able to explain a good portion of the variability of the response data (Fig. 2).
In hydroPSO, there are two types of criteria for convergence: i. "absolute", when the global optimum found in a given iteration is below/above a user-defined threshold (useful for minimization/maximization problems where the true minimum/maximum is known); and ii. "relative", when the absolute difference between the model performance in the current iteration and the model performance in the previous iteration for the best performing particle is less than or equal to a userdefined threshold (useful to prevent too many model runs without any improvement in the optimum found by the algorithm).
If none of the two previous criteria are met, then the algorithm stops when the user-defined number of iterations is finally achieved.Figure 3 shows the evolution of the best model performance (i.e., smallest RSS) found by all the particles in a given iteration, and the normalized swarm radius (δ norm , a measure of the spread of the population in the range of search-space) versus the iterations number.One may observe that both δ norm and the best model performance become smaller with an increasing iteration number, which indicates that the main particles are "flying" around a small region within the parameter space.Only 100 iterations (i.e., 100×10 = 1000 model runs) were enough to reach the region of the global optimum (i.e., RSS ca.2.52), and the remaining iterations were just used to refine the search, as shown in Fig. 3. Visual inspection of Fig. 5 shows a good exploratory capability of PSO because the particles are well spread over the entire range space.It is clearly visible that the parameter samples are denser around the optimum value (lowest SSR), showing a small uncertainty range around the optimum value.
Figures 6 and 7 give a graphical summary for optimized parameters.Empirical cumulative density functions (ECDFs) in Fig. 6 show the sampled frequencies for the six calibrated parameters.The horizontal gray dotted lines show the median of the distribution (cumulative probability equal to 0.5),    while the vertical gray dotted lines depict the corresponding parameter value displayed at the top of every panel in Fig. 6.The thin vertical red line in Fig. 7 points out the optimum value achieved for each parameter.Histograms in Fig. 7 show near-normal distributions for K 1 and K 2 , while K 4 and K 5 follow a skewed distribution with sampled values concentrated near the upper boundary of each parameter.Figure 8 illustrates the correlation matrix among K values and model performance (SSR), with horizontal and vertical axes displaying the ranges used for the calibration of each parameter.The figure shows that the highest correlation coefficient occurred among the measure of model performance (SSR) and K 4 , K 6 , and K 3 .In addition, a higher correlation coefficient was observed between K 4 and K 5 , K 3 and K 4 , and K 1 and K 6 .
Figure 9 shows the model output using hydroPSO fitted log K values and the monitored sorption ratio.It is worthwhile to mention that the surface complexation constants for Eqs. ( 1), (2), and (5) (Table 1) are more important, and the equations that are less important are Eqs.( 3), (4), and (6) in optimizing the log K values.It proves that UO 2 (CO 3 ) 4− 3 , UO 2 (CO 3 ) 2− 2 , and (UO 2 )CO 3 (OH) − 3 are the most dominant species for sorption on quartz.The surface complexation constants for Eqs. ( 2) and (4) were 21.18 and 3.229, respectively (Nair et al., 2014), which is higher than the electrostatic (ES) and non-electrostatic (NES) models, while the optimized value for Eq. ( 1) is 25.156, which is higher than the NES model and almost the same as the ES model (Nair et al., 2014).
The experimental conditions used to calibrate PHREEQC with hydroPSO were the same as during the PEST optimization carried out by Nair et al. (2014).In other words, PEST was applied for the same experiment and the same data, and Fig. 9 shows that the log K values obtained with hydroPSO are better than those obtained by PEST, except for pH = 7.The main reason is that PSO is a global optimization technique, which searches for optimum values in the whole parameter space using the parameter ranges given in Table 1, while PEST searches in a neighborhood of the initial solution.In particular, local optimization carried out by PEST minimize a weighted least squares objective function via the Gauss-Marquardt-Levenberg non-linear regression method (Marquardt, 1963).Actually, a major drawback of PEST, as of all gradient-based techniques, is the dependency of the quality of the optimization results upon the initial point used for the optimization, which might lead to a local optimum rather than the global one.Thus, PSO techniques offer promising possibilities for similar surface complexation and reactive transport applications in hydrogeology and hydrochemistry.

Conclusions
The coupling of hydroPSO and PHREEQC was successfully carried to estimate surface complexation constants for uranium (VI) species on quartz, based on a data set published by Nair andMerkel (2011), andNair et al. (2014).The open-source hydroPSO R package proved to be a useful tool for inverse modeling of surface complexation models with PHREEQC and allowed a prompt evaluation of the calibration results.Furthermore, thermodynamic values obtained with hydroPSO provided a better match to observation sorption rates in comparison to those obtained with PEST, using the same input data and experimental setup.

Figure 1 .
Figure 1.Flow chart used to couple (a) PHREEQC with hydroPSO and (b) PHREEQC with PEST for inverse modeling of surface complexation constants for uranium carbonate (U(VI)) species on quartz using the PHREEQC geochemical model.

Figure 2 .
Figure 2. Scatter plot with the experimentally observed and calculated values of uranium carbonate (sorption percentage).

Figure 3 .
Figure 3. Evolution of the normalized swarm radius (δ norm) and the global optimum (SSR) over 200 iterations.

Figure 4 .
Figure 4. Boxplots for the optimized parameters.The horizontal red lines indicate the optimum value for each parameter.Parameter names are defined in Table1.

Figure 7 .
Figure 7. Histograms of calibrated parameter values.Horizontal axis shows the sampled range for each parameter, and the vertical axis represents the amount of parameter sets in each of the classes used to divide the horizontal axis.

Figure 8 .
Figure 8. Correlation matrix between model performance (SSR) and calibrated parameters.Red lines represent lowest smoothing, using locally weighted polynomial regression, and numbers in the upper panel represents the Pearson-moment correlation coefficient between each pair of parameters.Vertical and horizontal axes illustrate the physical range utilized for parameter optimization.* * * stands for a p < 0.001; * * stands for p < 0.01, according to level of statistical significance

Figure 9 .
Figure 9. Observed and simulated sorption of uranium in quartz vs. pH with both PEST and hydroPSO calibrated log K values.

Table 1 .
Complexation reactions with their respective log K range values.