CompLaB v1.0: a scalable pore-scale model for flow, biogeochemistry, microbial metabolism, and biofilm dynamics

. Microbial activity and chemical reactions in porous media depend on the local conditions at the pore scale and can involve complex feedback with fluid flow and mass transport. We present a modeling framework that quantitatively accounts 10 for the interactions between the bio(geo)chemical and physical processes, and that can integrate genome-scale microbial metabolic information into a dynamically changing, spatially explicit representation of environmental conditions. The model couples a Lattice-Boltzmann implementation of Navier-Stokes (flow) and advection-diffusion-reaction (mass conservation) equations. Reaction formulations can include both kinetic rate expressions and flux balance analyses, thereby integrating reactive transport modeling and systems biology. We also show that the use of surrogate models such as neural network 15 representations of in silico cell models can speed up computations significantly, facilitating applications to complex environmental systems. Parallelization enables simulations that resolve heterogeneity at multiple scales, and a cellular automata module provides additional capabilities to simulate biofilm dynamics. The code thus constitutes a platform suitable for a range of environmental, engineering and – potentially - medical applications, in particular ones that involve the simulation of microbial dynamics.


Introduction
Biogeochemical turnover in Earth's near-surface environments is governed by the activity of microbes adapted to their surroundings to catalyze reactions and gain energy. In turn, these activities shape the environmental composition, which feeds back on metabolic activities and creates ecological niches. Such feedbacks can be captured by reactive transport models that compute the evolution of geochemical conditions as a function of time and space and simulate microbial activities in porous 25 media (Meile and Scheibe, 2019). Commonly used macroscopic reactive transport models simplify small scale features of natural porous media. For example, heterogeneous pore geometry and transport phenomena are represented by only a few macroscopic parameters such as porosity, permeability, and dispersivity (Steefel et al., 2015). However, such simplifications can lead to a disparity between model estimations and actual observations because these models do not resolve the physical and geochemical conditions at the scale that is relevant for microbial activity (e.g., Molins 2015, Oostrom et al. 2016. 30 Furthermore, microbial reaction rates are often formulated using Monod expressions, which describe a dependency of metabolic rates on nutrient availability, but substantially simplify the complex metabolic adaptation of microbes in changing environments. This recognition has prompted the development of constraint-based models including, for example, COMETS (Harcombe et al., 2014), BacArena (Bauer et al., 2017), and IndiMeSH (Borer et al., 2019), which have enabled detailed descriptions of complex microbial metabolisms and metabolic interactions (Dukovski et al. 2021). However, most constraint-35 based models are not designed to capture combined diffusive and advective transport of metabolites in heterogeneous subsurface environments and are not optimized to handle such settings in a computationally efficient way. Notably, computational efficiency and the integration of adequate formulations of microbial function has been identified as critical aspects in pore scale models of microbial activity (Golparvar et al., 2021).
To account for the feedback between environmental conditions, chemical processes, microbial metabolism, and structural 40 changes in the porous medium caused by these activities, we introduce a novel pore scale reactive transport modeling framework with spatially explicit descriptions of hydrological and biogeochemical processes. Our work complements existing efforts, encompassing both individual-and population-based spatially explicit microbial models reviewed by König et al. (2020), some of which take into consideration the structure of the porous medium. Our modular framework is developed to account for various chemical reactions and/or genome-scale metabolic models with advective and diffusive transports in 45 porous media at the pore scale. The Lattice Boltzmann (LB) method is used to compute fluid flow and solute transport in complex porous media, capable of simulating both advection and diffusion dominated settings. Microbial metabolism and chemical reactions are incorporated as source or sink terms in the LB method solving mass conservation equations. These sources or sinks can be described classically using approximations such as Monod kinetics , or can be derived from cell scale growth and metabolic fluxes simulated with flux balance analysis (Orth et al. 2010). Biomass dynamics 50 can be described by keeping track of cell densities (similar to chemical concentration fields) of different organisms or populations, with cell movement either based on an advection-diffusion formulation or using a cellular automata approach. In addition, we incorporate a surrogate modeling approach to make larger scale simulations possible. Thus, the framework provides options either to maximize computational efficiency via the use of surrogate models, or to directly utilize wellestablished metabolic modeling environments without losing the inherent parallel scalability of the LB method. The model is 55 validated by comparing model simulations to published simulation results. We demonstrate the flexibility of the new microbial reactive transport framework, its scalability, and the benefits of using surrogate models to circumvent computational bottlenecks posed by flux balance analysis. Our work therefore facilitates cross-disciplinary efforts that integrate bioinformatic approaches underlying cell models with descriptions suitable to resolve the dynamic nature of natural environments. This allows for the representation of microbial interactions, which is a major challenge to our current quantitative understanding of 60 microbially mediated elemental cycling (Sudhakar et al. 2021).
To establish a modeling framework that builds on the existing and future knowledge and know-how from multiple disciplines, our approach uses the open-source software Palabos and integrates it with an open-source linear programming solver GLPK, and the COBRApy (CPY) python package for genome-scale metabolic modeling. Palabos (Parallel Lattice Boltzmann Solver) 65 is a modeling platform that has established itself as a powerful approach in the field of computational fluid dynamics based on the Lattice Boltzmann (LB) method. The Palabos software is designed to be highly extensible to couple complex physics and other advanced algorithms without losing its inherent capability of massive parallelization (Latt et al., 2021). Palabos has been parallelized using the Message Passing Interface where computational domains are subdivided while minimizing the interprocess communication. It has been used for building modeling platforms to simulate deformable cell suspensions in relation 70 to blood flows (Kotsalos et al., 2019) and complex subsurface biogeochemical processes at the pore scale Meile, 2019, 2021). It is highly scalable and hence was chosen as high-performance modeling framework to be integrated with our representations of chemical and microbial dynamics. GLPK (GNU Linear Programing Kit) is an open-source library designed for solving linear programming (LP), mixed integer programming and other related problems (Makhorin, 2009). It contains the simplex method, a well-known efficient numerical approach to solve LP problems, and the interior-point method, which 75 solves large-scale LP problems faster than the simplex method. GLPK provides application programming interface (API) written in C language to interact with a client program. COBRApy is an object-oriented python implementation of COnstraint-Based Reconstruction and Analysis (COBRA) methods (Ebrahim et al. 2013) which is suitable to be integrated with other libraries without requiring commercial software. Through a simple python API, the fast evolving and expanding biological modeling capacity of COBRApy, which includes features such as flux balance analysis (FBA), flux variability analysis (FVA), 80 metabolic models (M-models), and metabolism and expression models (ME-models), can be employed.

Model description
CompLaB simulates a fully saturated 2D fluid flow and solute transport at the pore scale based on the LB method implemented in Meile (2019, 2021). These earlier efforts established some of the underlying model developments, such as the simulation of the flow field, mass transport, and biochemical processes including kinetic rate expressions and cellular automata 85 implementation of biofilm growth. This study expands on the previously established models to offer a much broader applicability by building the modular structure that makes the use of flux balance and surrogate models possible. The LB method is particularly useful for simulating subsurface processes because boundaries between solid and fluid can be handled by a simple bounce-back algorithm (Ziegler, 1993) in addition to its massive parallelization efficiency (Latt et al., 2021). For these reasons, the LB method has been applied to simulate a broad range of pore scale reactive transport processes (e.g., Huber 90 et al., 2014;Kang et al., 2007;. The simulations are guided by an input file, CompLaB.xml, that sets the scope of the simulation and capabilities utilized through command blocks that define the model domain, chemical state variables, microorganisms, and model input/output ( Figure 1). Below, the features of the model are presented, and we refer to the manual in the code repository for examples of the implementation.

The Lattice-Boltzmann flow and mass transport solvers
The LB method retrieves the numerical solutions of the Navier-Stokes (NS) for fluid flow and advection-diffusion-reaction equations (ADRE) for solute transport by solving the mesoscopic Boltzmann equation across a defined set of particles (Krüger et al., 2017). CompLaB obtains a steady state flow field by running a flow solver with a D2Q9 lattice BGK  Krook; Bhatnagar et al. 1954) model defined as where fi ( where wi are the lattice weights (w0 = 4/9, w1-4 = 1/9, w5-8 = 1/36), r is the macroscopic density (r = S fi), r0 is the rest state 115 constant, and u is the macroscopic velocity calculated from the momentum (ru = S ci fi). The steady state flow field is then imposed on a transport solver defined as where ! ( , ) represents the discrete particle set i of a transported entity j at position r and time t. + , is the relaxation time for 120 solute transport related to the diffusivity ( , . The equilibrium distribution function for transport ( ! ,,#$ ) using a D2Q5 lattice for numerical efficiency, which satisfying the isotropy requirement for a LB transport solver, is given by with the lattice weights w0 = 1/3 and w1-4 = 1/6, lattice velocities c0-4, and the solute concentration C j ( , = ∑ ! , ). In solving 125 an advection-diffusion problem, CompLaB adjusts the value of , which controls the length of a timestep, to obtain a userprovided Péclet number (Pe j = UL/D j ), for a given average flow velocity U and a user-provided characteristic length L. With a steady-state flow field obtained from the solution of Equation 1 and a reaction step Ω ! 123 = Δ ! , the LB transport solver recovers an ADRE with the following form Here, the transported entity j includes solute concentrations (C) and planktonic biomass densities, and R is a reaction term computed by the reaction solver of CompLaB as described below.

Reactions
The reaction step (Ω ! 123 ) computation is separated from a transport computation via the sequential non-iterative approach 135 (Alemani et al., 2005). A unique feature of CompLaB is that its reaction solver can compute biochemical reaction rates R through (1) kinetic rate expressions, (2) flux balance analysis, (3) a surrogate model such as a pre-trained artificial neural network, or combinations thereof, by summing their contributions to the net reaction rates of individual state variables.

Kinetic rate expressions
CompLaB provides a C++ template that users can adapt to formulate kinetic rate expressions using metabolite concentrations 140 and biomass densities (defineReactions.hh). This is designed to accommodate user-specific needs and to enable simulating various microbial dynamics including Monod kinetics, microbial attachment/detachment, and arbitrary rate expressions defined by the user. Reactions can be restricted to particular locations using material numbers (mask) differentiating fluid, biomass and grain surfaces.
Local biomass densities and concentrations calculated after the collision step of transport solvers are transferred to the function 145 as vectors B and C where the vector elements follow the order defined in the user interface (CompLaB.xml). The biomass density and metabolite concentrations are updated according to: where the cell specific biomass growth rate (gt), and for microbially mediated reactions, the rate ( , is expressed as the product of (user-defined, typically substrate concentration-dependent) metabolite uptake/release rates (Ft) per cell multiplied by the 150 cell density Bt ( ( = ( ( ), calculated every timestep for every pore and biomass grid cell.

Flux balance analysis
For genome-enabled metabolic modeling, CompLaB loads metabolic networks and calculates microbial growth rates as well as metabolite uptake/release rates through an FBA method (Orth et al., 2010). FBA investigates the metabolic capabilities by imposing several constraints on the metabolic flux distributions. Assuming that metabolic systems are at steady state, the 155 system dynamics for a metabolic network is described by the mass balance equation, Sv = 0. Here, S is a m ´ n matrix with m compounds and n reactions where the entries in each column are the stoichiometric coefficients of the metabolites composing a reaction, and v is a n ´ 1 flux vector representing metabolic reactions and uptake/release of chemicals by the cell. Most metabolic models have more reactions than compounds (n > m) meaning that there are more unknowns than equations. To solve such underdetermined systems, FBA confines the solutions to a feasible set by imposing constraints on metabolic fluxes 160 lb (lower bounds) ≤ v ≤ ub (upper bounds) and applies an objective function f(v) = c T v, where c is the vector of weights for the objective function to identify an optimal solution. Commonly used objective functions include maximization of biomass yield, maximization of ATP production, and minimization of nutrient uptake (Nikdel et al. 2018). CompLaB utilizes the stoichiometric matrix S from standard metabolic databases such as BiGG and KEGG which are widely used in FBA simulation environments (e.g., COBRA toolbox (Heirendt et al., 2019), COBRApy (Ebrahim et al., 2013), KBase (Arkin et al., 2018)). 165 Therefore, CompLaB can integrate many existing in silico cell models.
CompLaB computes the solution of the metabolic models at each point in space and time for each organism or microbial community (if the model represents multiple microorganisms) and updates biomass density and metabolite concentrations according to Equations 6 and 7. The metabolic uptake fluxes are set through imposing constraints by defining the lower bound (lb) of a chemical (uptake fluxes are negative) through one of the following approaches. The first is the parameter-based 170 method employed by Harcombe et al., (2014), setting the metabolic fluxes in analogy with Michaelis-Menten kinetics using a maximum uptake rate (Vmax; e.g., mmol/gDW/h, where gDW is gram dry weight) where C is a local metabolite concentration (e.g., mM), and Ks is a half-saturation constant (e.g., mM). The second is the semi-175 linear approach employed by Borer et al. (2019). This method replaces Vmax with C/BDt where B is a local biomass density (e.g., gDW/L) and Dt is the length of a time step (e.g., h) If Ks is set to 0, then the uptake flux estimation becomes a linear function to local concentrations. Note that the units in the 180 fluid flow ad mass conservations model simulation must match those of the FBA bounds, which in our case were mmol/gDW/h.
With lower bounds defined, the solution of an FBA problem outputs biomass growth rate (g; e.g., 1/h) and uptake/release rates of metabolites (F; e.g., mmol/gDW/h).

Surrogate model
CompLaB also provides a C++ template (surrogateModel.hh) where users can incorporate a pre-trained surrogate model 185 for calculating biogeochemical reactions, including artificial neural networks (ANN). This functionality can be used to replace FBA which requires solving many computationally expensive linear optimization problems (section 3.2.2). In the example shown in section 5, CompLaB provides local metabolite concentrations and biomass densities as inputs and the surrogate model outputs microbial growth rate (g) and uptake/production rates of metabolites (F). While our demonstration is based on ANN models, any pre-trained statistical surrogate model (e.g., De Lucia and Kühn, 2021) that describes the sources and sinks -or 190 their parameterization -can be used to enhance computational efficiency and accommodate various user-specific needs.

Biomass redistribution
To explicitly model the spatial biomass expansion, CompLaB utilizes a cellular automaton (CA) with a predefined maximum biomass density (Bmax) based on the CA algorithm developed by Jung and Meile (2021). After updating local biomass densities, the CA algorithm checks at every time steps if there is any grid cell exceeding Bmax and redistributes the excess biomass (B -195 Bmax) to a randomly selected neighboring grid cell. If the selected grid cell cannot hold the excess biomass, the first chosen grid cell is filled up to the maximum holding capacity (Bmax, a value defined by the user) and then the remaining excess biomass is allocated to a randomly chosen second neighbor cell. If all the neighboring grid cells are saturated with biomass (B ≥ Bmax) and hence the excess biomass cannot be placed, the Manhattan distances of biomass grid cells to the closest pore cells are evaluated. The remaining excess biomass is then placed in a neighboring grid cell that is closer to pores and this biomass 200 allocation process is repeated until all the excess biomass is redistributed. Figure 2 shows an example of the cellular automaton process for biomass redistribution. Note that this biomass redistribution method is a simple approximation for biomass density conservation with room for improvement (e.g., . amongst those with the shorter Manhattan distance. This allocation process is repeated until the excess biomass is moved to the grid cell with the Manhattan distance of 1 (color-coded white) where the excess biomass first encounters unsaturated neighbor cells. The biomass holding capacity (Bmax -B) of the randomly selected neighboring grid cell (outlined green) is first evaluated. If the excess biomass exceeds the holding capacity, the excess biomass is redistributed to the first chosen neighbor up to the maximum holding capacity and the remaining excess biomass is placed to the next available neighbor grid cell (outlined yellow), finalizing the cellular automaton algorithm. The horizontal 215 black dashed lines indicate the minimum biomass level required for a grid cell to be designated as biofilm.
When the sessile biomass reaches a threshold density (B ≥ yBmax, where y is a user-defined threshold biomass fraction; 0 ≤ y ≤ 1) the pore grid cell is designated as biomass; if the biomass density falls below the threshold due to microbial decay or detachment (B < yBmax), then a biomass grid cell is converted back to a pore. Pore grid cells allow for both advective and 220 diffusive transport, but in the biofilm, sessile biomass hinders (i.e., permeable biofilm) or prevents (i.e., impermeable biofilm) flow and the feedback between biomass growth/decay and advective flow conditions is accounted for by rerunning the flow solver to steady state after updating biomass distribution and corresponding pore geometry (Jung and Meile 2021). The reduced advective transport efficiency in permeable biomass grid cells is implemented by modifying local fluid viscosity in the biofilm (nbf) with nbf =nf/X, where X is a user-defined viscosity ratio (0 ≤ X ≤ 1), while for impermeable biomass, a bounce-back 225 condition is imposed (Pintelon et al., 2012). After imposing the new steady-state flow field, a streaming step of the transport solver is executed (Figure 1).
To verify the CompLaB implementation, the engineered metabolic interaction between two co-dependent mutant strains (E. coli K12 and S. enterica LT2) originally established by Harcombe (2010) and implemented in COMETS (Harcombe et al., 230 2014) and IndiMeSH (Borer et al., 2019) was chosen as a test case. E. coli K12 is deficient in producing methionine and relies on the release of methionine by the mutant S. enterica LT2. In turn, S. enterica LT2 requires acetate released by E. coli K12 because of its inability to metabolize lactose. As a result, these genetically engineered strains are obliged to engage in mutual interaction where neither species can grow in isolation. The ratio of the two strains converged to a stable relative composition after 48 hours in all the in vitro and in silico experiments in both initial ratios of 1:99 and 99:1. 235 Both COMETS (Harcombe et al., 2014) and IndiMeSH (Borer et al., 2019) integrate flux balance cell models of these two microorganisms into a two-dimensional environment in which metabolites are exchanged via diffusion. The initial and boundary conditions of these simulations were mirrored in CompLaB, with 100 grid cells containing 3´10 -7 g biomass each (total = 3´10 -5 g biomass) distributed randomly across a two-dimensional domain of 25´25 grid squares. The grid length was set to 500 µm, and the initial distributions of the two species were allowed to overlap. Three replicate simulations were carried 240 out for each initial microbial ratios of E. coli and S. enterica  the model results of COMETS and IndiMeSH, demonstrating the metabolic inter-dependence of two strains, and the convergence to a stable composition ratio (Appendix A, Figure A1).

255
Error bars represent 1 standard deviation.

Surrogate model integration
A major issue in fully coupling genome-scale metabolic networks to reactive transport models is the large computational demand due to the repeated calculation of the LP problems at every biomass grid cell and every time step. Previous studies have alleviated this issue by interpolating the solutions of LP problems from a look-up table generated in advance to a reactive 260 transport simulation (Scheibe et al., 2009), dynamically creating a solution pool of the LP problems during the reactive transport process (Fang et al., 2011), or systematically reducing the size of the matrix encoding the metabolic network (Ataman et al., 2017;Ataman and Hatzimanikatis, 2017). Here, we introduce a statistical surrogate modeling approach using a pretrained artificial neural network (ANN) following the approach presented in Song et al. (2022). A trained ANN model directly relates input parameters (i.e., uptake rates of substrates) to outputs (i.e., biomass and metabolite production rates) through a 265 set of nonlinear algebraic equations. As computing such input-output relationships from a pre-trained ANN model is several orders of magnitude faster than running FBA using a full-fledged metabolic model, the use of such surrogate models to achieve a significant speedup has attracted much attention recently (e.g., De Lucia and Kühn. 2021; Prasianakis et al., 2020). http://bigg.ucsd.edu/models/iAF987), a strict anaerobe capable of coupling the oxidation of organic compounds to the 270 reduction of metals such as iron and manganese, and using ammonium as nitrogen source to train a ANN model. The dataset used for training a surrogate ANN model was obtained by collecting FBA solutions of the base model obtained from cplex with the objective function chosen to maximize biomass production. The solution set was prepared by randomly sampling 5,000 combinations of two growth-limiting metabolites, acetate (CAc) and ammonium (CNH 4 ), within the concentration ranges of 0 ≤ CAc ≤ 0.5 mM and 0 ≤ CNH 4 ≤ 0.05 mM via Monte-Carlo simulations. Substrate concentrations were converted to uptake 275 fluxes via the parameter-based approach (Eq. 7) using the parameters from Fang et al. (2011) (Vmax = 10 mmol acetate/gDW/h and 0.5 mmol ammonium/gDW/h; Ks = 0.01 mM acetate and 0.1 mM ammonium). These fluxes were used as lower bounds and Fe 3+ was allowed to be consumed without limitation. The collected FBA solution dataset was split and used to train (70%), validate (15%), and test (15%) and we developed an ANN model using MATLAB's neural network toolbox. As key hyperparameters, the number of layers and the number of nodes in the ANN model were respectively determined to be four 280 and ten through grid search. Ensuring the accuracy of a surrogate model is of critical importance in reactive transport models because even small errors accumulating over successive timesteps can lead to a substantial error. In this simple example, the trained ANN estimates the biomass growth rate of the full FBA model almost perfectly (R > 0.999) against training, validation, and test datasets. This shows that the full-fledged metabolic model can be replaced by the surrogate ANN model without substantial loss of accuracy, boosting the simulation speed as shown next. 285

Model performance
CompLaB inherits the massive scalability of Palabos which decomposes a simulation domain into multiple subdomains and assigns them to individual computational nodes. In the following, the scalability of various components of CompLaB is assessed for a simplified microbial dynamics problem.

Test case 290
The simulation domain (Figure 4) was prepared by taking a subsection of 500 ´ 300 square elements from the porous medium of Souzy et al. (2020). The length of each element was 16.81 µm, and material numbers were assigned to solid (0), pore (1), solid-pore interfaces (solid side of interface = 2, pore side of interface = 3). Ten percent of the interface grid cells (pore side) were randomly assigned as sessile biomass grid cells (4) initially. Flow was induced from left to right by imposing a fixed pressure gradient between left-and right-side boundaries and no flow conditions were set on the top and bottom boundaries, 295 as well as on the grain surfaces. The steady-state flow field was then provided to the CompLaB transport solver for mass transport and reaction simulations (Péclet number = 1, for a characteristic length scale of 2 mm). Two growth limiting metabolites, acetate (CH3COO -) and ammonium (NH4 + ), were considered for the mass transport simulations. Acetate was injected at the inlet (left) boundary with the fixed concentration of 0.45 mM to the simulation domain initially filled with the same concentration. Ammonium concentration in the inflowing fluid and initially in the domain was 0 mM, but it was produced 300 at solid surfaces assuming a zeroth-order mineralization rate (Table 1). For both metabolites, no gradient conditions were imposed at the outlet, top/bottom, and grain boundaries. No external source and initial planktonic biomass were assumed, so that all planktonic biomass is detached sessile biomass. The biogeochemical problem is described by the following set of ADREs: 12 = ∇ ⋅ ( 12 ∇ 12 ) − ⋅ ∇ 12 − 12 0 (10)

310
CAc and CNH 4 are the concentrations of acetate and ammonium, respectively, BP is the planktonic biomass density, BS is the sessile biomass density, u denotes the flow field, d indicates the presence (d = 1) or absence (d = 0) of a grain surface (dS) and a biomass grid cell (dB), DAc, DNH 4 , and DB are the diffusion coefficients of acetate, ammonium, and planktonic biomass, respectively, which differ between pore, biomass and solids. For simplicity, the diffusivities of all the metabolites and attachment/detachment (katt and kdet), biomass decay (µB), and organic matter mineralization (kNH 4 ) were simulated using the reaction kinetics solver, with the corresponding rate constants summarized in Table 1. The simulation assumed that G.
metallireducens grows only on solid surfaces and planktonic biomass attaches only to existing surface-attached aggregates (Grinberg et al., 2019). 320 The computational efficiency and parallel performance of CompLaB was tested by executing four independent simulations 325 utilizing each reaction solver. The cell-specific metabolic fluxes (F, Eq. 9) and biomass growth rates (g, Eq. 12) were calculated either through FBA (CPY or GLPK), ANN, and/or a reaction kinetics (KNS). The KNS solver was combined with other solvers (CPY, GLPK, and ANN) or used as a stand-alone reaction solver with the cellular automaton algorithm invoked (CA) and Bmax set to 100 gDW/L. The model performance simulations with the FBA solvers (CPY and GLPK) were prepared with the same conditions used for training the ANN model (section 5). The pretrained ANN model from section 5 was used for the separate 330 simulation ANN. For the CA simulation, we create a situation similar to the above examples, but in which substantial biofilm growth over a short simulation time is artificially induced. To that end, F and g were computed as: where F denotes the metabolic fluxes (for simplicity assuming FAc = FNH4), g is the biomass growth rate, kkns = 2.5 ´ 10 -6 mM/s 335 (Marozava et al., 2014) is the maximum uptake rate, and Kac = 0.1 mM and KNH 4 = 0.01 mM are the half saturation constants for acetate and ammonium, respectively. The growth yield Y is set to 40,000 gDW/mmol_Ac, an arbitrarily large number used only to invoke the CA algorithm within 10,000 timesteps (440 seconds). The flow field was updated every 10 timesteps when the CA algorithm was invoked (as, e.g., Baveye 2008, Jung andMeile 2021).
node has an AMD EPYC 7702P 64-core processors with a 2.0 GHz clock cycle (AMD Rome), with 128 GB of RAM. The nodes are interconnected via an EDR InfiniBand network, with 100 GB/s effective throughput and run a 64-bit Linux operating system (CentOS 7.9 distribution). The elapsed (wall-clock) time for 10,000 timesteps was recorded.

Performance
Comparison of simulation times for flow, transport and reaction shows that most compute time is used for simulating the 345 reactions, in particular when integrating in silico cell models into a reactive transport framework ( Figure 5). This highlights the benefit of using efficient surrogate models. The surrogate ANN model substantially improves computational efficiency compared to CPY and GLPK (about 2 orders of magnitude in total elapsed time; Figure 5a) because calculating the pretrained ANN is much faster than solving the linear programming problem every time step in FBA (Figure 5c). For reaction calculations (Figure 5d), the ANN simulation (sky-blue solid line with grey square symbols) even exhibits comparable but slightly shorter 350 simulation times than the traditional reaction kinetics calculation (KNS; dashed lines with orange symbols) because of the ANN implementation only operates on biomass grid cells while KNS operates on both biomass and pore grid cells.
In addition to the computational efficiency, negligible errors introduced by the surrogate ANN model assure the use of a surrogate ANN model ( Figure B1). Although the errors in biomass calculation accumulates over simulation timesteps, it is kept to very low values throughout the simulation (the relative error is on the order of 10 -9 ; Appendix B) and has practically 355 no influence on metabolite concentration calculations. This observation illustrates that CompLaB can calculate microbial metabolic reactions in porous media with heterogeneous distribution of pore, biomass, organic matter, and minerals, based on the genome-scale metabolic model with the superior computational efficiency of a surrogate model without losing accuracy.
Applying Monte-Carlo simulations to generate FBA data to train a neural network model required solving the linear programming problem 5,000 times with a set of randomly chose uptake rates of acetate and ammonium, which does not add a 360 significant computational burden. In our application, we determined the number of layers and nodes in the neural network to be four and ten, respectively, because no further improvement in model performance was observed beyond those values.
The computational efficiency of ANN also works in favor of scalability. The reaction processes of CompLaB are inherently an embarrassing parallel task because calculating biogeochemical subprocesses is completely independent of the neighbors (except CA) and all model performance simulations show the reasonable scalability up to 64 cores ( Figure 5). However, the 365 scaling behaviors of all simulations except ANN illustrate suboptimal scalability with no or limited speed-up when using more compute resources. The loss of efficiency originates mostly from the calculation of reaction processes (Figure 5c) because the domain decomposition applied to the heterogeneous simulation domain (Figure 4) resulted in an uneven distribution of biofilm grid cells per subdomain and hence a variable size of the problem to be solved on each core. In fact, in our simple example problem (total 500 ´ 300 computational grid cells, constant random seeds), domain decomposition when using 64, 128, and 370 192 cores produced 6, 38, and 76 subdomains, respectively, that have no initial biomass grid cells. Such subdomains do not contribute to computing microbial metabolisms (FBA) and biomass redistribution (CA) that consumes most of the computational cost (e.g., GLPK calculation consumes ~ 99% of the total computational cost), preventing the ideal parallel performance (Figure 5d). While this is also true for ANN, computational efficiency of ANN reduces the time wasted by such subdomains. As a result, the suboptimality is not readily apparent in ANN (Figure 5c). 375 The CA algorithm implemented in CompLaB is a nonlocal process requiring information of neighboring grid cells, but the result exhibits still a good scalability up to 64 cores and suboptimal scalability with more cores used like the other simulations.
The CA simulation required less time than FBA simulations because the metabolic reactions were calculated using KNS. But CA spent extra time that was not used by other simulations in updating flow field (Figure 5b) and redistributing excess biomass (Figure 5c-d). This illustrates that the actual time required for the CA algorithm depends on the nature of the biomass expansion. 380 For example, more time will be required for a system with rapid biofilm growth excess because excess biomass has to travel a longer distance through a thick biofilm. Furthermore, flow fields will need to be updated more often to reflect the influence of rapid biofilm growth on flow.
385 Figure 5. The wall-clock time recorded in seconds for (a) total, (b) flow and transport, (c) all reactions, and (d) each reaction part of the CompLaB algorithm. Four different simulations were carried out deploying each reaction solver: Three simulations used COBRApy (CPY), GLPK, artificial neural network (ANN) solvers for microbial metabolism calculation and a kinetic solver (KNS) for the attachment/detachment of biomass (AT-DT). One simulation used only a kinetic solver for both microbial metabolism and AT-DT with the cellular automaton algorithm invoked (CA). The simulation time for CA only can be inferred by subtracting the time for AT-DT with CA (green dashed line with orange triangle symbols) from the time for KNS including CA (green solid line with grey triangle symbols) in (d). Each symbol represents the average of 2 simulations. The simulations exhibiting no or limited speed-up with 128 and 192 cores were excluded from drawing the regression lines except the ANN simulation. The negative numbers are the slopes of the solid regression lines. The average slope of all the dashed regression lines in panel (d) is -1.14.

Conclusions 395
The numerical modeling platform CompLaB for simulating 2D pore scale reactive transport processes is capable of utilizing quantitative implementation of microbial metabolism through the coupling of genome-scale metabolic models. The integration of in silico cell models with reactive transport simulations makes this framework broadly applicable and enables the integration of knowledge gained from the omics-based characterization of microbial systems. For example, the successful reproduction of experimentally observed convergences to a stable composition of a two species consortium (S. enterica and E. coli) 400 demonstrates the capability of CompLaB to investigate metabolic interactions between multiple microbial species.
Our novel numerical framework based on the Lattice-Boltzmann method allows simulating advection as well as diffusion of metabolites in complex porous media. A wide range of simulation domains can be used to represent soil structure and fractured rock images directly obtained from various imaging techniques (e.g., µ-CT, FIB-SEM, etc.) or numerically generated porous media with material numbers assigned to pore, solid, and source/sink grid cells for biogeochemical reactions which 405 include but are not limited to biofilms. The inherent parallel efficiency of CompLaB facilitates simulating dynamic flux balance analysis capturing the microbial feedback on flow and transport in porous media, as done previously using Monod-type representations of microbial activity (Jung and Meile 2021). Furthermore, the versatile simulation environment of CompLaB allows utilizing surrogate models, such as an artificial neural network. The resulting speed-up enables the investigation of complex biogeochemical processes in natural environments. 410

Appendix A. Convergence of the verification model to a stable ratio after 100 hours
The 6 simulation cases used in section 4 for model verification were run for 100 hours of simulation time to further evaluate if the observed convergence to an average composition ratio is stable ( Figure A1). The composition ratio observed after 48 hours (0.75) is largely maintained through the extended simulation period (increases only to 0.78 after 100 hours). 415 Figure A1. The evolution of the fraction of E. coli relative to the total cell density (E. coli + S. enterica) over 100 hours.
Dashed and dotted lines denote an initial abundance of 99 and 1% E. coli, respectively.

Appendix B. Surrogate model accuracy
Surrogate modeling approach inevitably introduces errors in model estimations. The errors should be maintained sufficiently 420 low throughout the surrogate simulation otherwise errors can propagate in successive iterations and result in unphysical results.
To quantify the errors in surrogate model estimations, the solutions of our artificial neural network (ANN) were compared to the reference simulation COBRApy (CPY) by calculating the arithmetic mean of the root mean squared errors: where j is the variable type (BS, BP, CAc, and CNH 4 ), m is the number of variables j used in calculating the error, nj is the number of grid cells for each variable j, and t is the timestep where the error is evaluated. The differences between the FBA simulations 425 using GLPK and CPY solvers are negligible, so only the CPY solution was chosen as the reference ( Figure B1).