A Mass-and Energy-Conserving Framework for Using Machine Learning to Speed Computations

. Large air quality models and large climate models simulate the physical and chemical properties of the ocean, land surface and/or atmosphere to predict atmospheric composition, energy balance, and the future of our planet. All of these models employ some form of operator splitting, also called the method of fractional steps, in their structure, which enables each physical or chemical process to be simulated in a separate operator or module within the overall model. In this structure, each of the modules calculates property changes for a fixed period of time; that is, property values are passed into the module which 15 calculates how they change for a period of time and then returns the new property values, all in round robin between the various modules of the model. Some of these modules require the vast majority of the computer resources consumed by the entire model so increasing their computational efficiency can either improve the model’s computational performance or enable more realistic physical or chemical representations in the module, or a combination of these two. Recent efforts have attempted to replace these modules with ones that use machine learning tools to memorize the input-output relationships of the most time-20 consuming modules. One shortcoming of some of the original modules and their machine learned replacements is lack of adherence to conservation principles that are essential to model performance. In this work, we derive a mathematical framework for machine learned replacements that conserves properties, say mass, atoms, or energy, to machine precision. This framework can be used to develop machine learned operator replacements in environmental models.


Introduction
Complex systems require large models that simulate the wide range of physical and chemical properties that govern their performance.In the air quality realm, models include CMAQ (Foley, Roselle et al. 2010), CAMx (Yarwood, Morris et al. 2007), WRF-Chem (Grell, Peckham et al. 2005) and GEOS-Chem (Eastham, Weisenstein et al. 2014).In the climate change arena, models include HadCM3 (Jones, Gregory et al. 2005), GFDL CM2 (Delworth, Rosati et al. 2012), ARPEGE-Climat (Somot, Sevault et al. 2008), CESM (Kay, Deser et al. 2015), and E3SM (Golaz, Caldwell et al. 2019).These models employ operator splitting, also called the method of fractional steps (Janenko 1971), in their structure, so that each module can be https://doi.org/10.5194/gmd-2020-83Preprint.Discussion started: 28 April 2020 c Author(s) 2020.CC BY 4.0 License.tasked with representing one or a small number of physical and/or chemical processes.This modular structure enhances model maintenance and sustainability while enabling diverse physical and chemical processes to interact.Each module is tasked with simulating its processes over a fixed period of time, each module called in turn until they have all returned their results.Usually, the computational performance of these models is governed by one or two modules that consume the vast majority of the computer resources.In air quality models, this is usually the photochemistry and/or aerosol dynamics modules.In climate models, this is usually the radiative energy transport module.
Machine learning has been used to improve the computational efficiency of modules in atmospheric models for decades (Potukuchi and Wexler 1997).As machine learning algorithms have improved, these efforts have matured (Hsieh 2009, Kelp, Tessum et al. 2018, Rasp, Pritchard et al. 2018, Keller and Evans 2019, Pal, Mahajan et al. 2019).But the effort to replace physical and chemical operators with machine learned modules is challenging because small systematic errors can build.For instance, an 0.1% error over a 1 hour time step could lead to a 72% error after a month of simulation.This problem is compounded if the replacement module does not conserve quantities that are essential to model accuracy, such as atoms in a photochemical module, molecules and mass in an aerosol dynamics module, or energy in a radiative transfer module.
Recent efforts at developing and using machine learned replacement modules has focused on memorizing how the quantities change.If instead, we focus on how the fluxes between quantities change, we can guarantee adherence to conservation principles to machine precision.In photochemical modules, the fluxes are how atoms move between chemical species as reactions progress.In aerosol dynamics, the fluxes are the condensation/evaporation or coagulation processes that move material between the gas and particle phases or between particle sizes.In radiative transfer, the fluxes are the energy movements between spatial domains.
In this work, we derive a mathematical framework that enables the use of machine learning tools to memorize these fluxes.
We focus this work on atmospheric photochemistry and provide an example for a simple photochemical reaction mechanism, because the number of species and the complexity of the problem exercises many aspects of the framework.
where  is a vector containing the current concentration of the chemical species,  is temperature and  is the relative humidity.The right hand side can be written as where  is a matrix describing the stoichiometry and  is a vector of reaction.The form of the right hand side assures mass balance because it is compose of reactions that destroy one species while creating one or more other ones, all in balance, described by .The  terms take forms such as  ! " where  is the rate constant for a reaction between species,  !where where  is a diffusion or mass transport rate constant between two spatial locations or between the gas and particle phases.
In the method of fractional steps, all modules integrate their equations forward for a fixed time step, ∆, that we call the operator splitting time step.Combining these two equations and integrating gives Or in matrix form where .  " is the flux integral.For atmospheric photochemistry, it is the flux of atoms between molecules.
For aerosol dynamics, it is the flux of molecules condensing on or evaporating from particles or the flux of small particles coagulating on large particles.For radiative transfer,  " is the energy between spatial coordinates.We are able to pull the A out of the integral if it is a constant, which is usually the case or can be approximated as such.
Using machine learning tools to learn the relationship  = (, , , actinic flux, stability, etc. ) has advantages over memorizing a concentration-concentration relationship because: a.The formulation in Eq. ( 3) conserves mass.
b.The R terms are simpler to memorize because they do not contain the complexity in A.
c.There are fewer concentrations directly influencing  than  so the machine learning algorithm should be simpler.
The difficulty resides in developing the training and testing sets needed to train and test the machine learning algorithm corresponding to Eq. ( 4).In principle, we can run a model many times, generate a data set, and then learn that data using machine learning techniques.That is, we can run many models that integrate Eq. ( 1) to find the relationship between concentrations at two time steps to develop our machine learning training set.But such models do not provide the value of  and since the chemical system is stiff the integrators make many complex calls to calculate the right hand side of Eq. ( 1) to integrate it.Another way of saying this is that the ∆ is easily available from the models but the  is not.
If we have many sets of ∆ values, in principle we can invert Eq. ( 3b) to obtain the corresponding  values.The difficulty with this approach is that there are more elements of  than ∆, so a conventional inverse cannot be applied.Instead, we employ the generalized inverse of  to obtain  via the relationship where  & is the generalized inverse of .Since in general  is not square and even if it is square, it may be singular, there are an infinite number of generalized inverses  & which means that given values for ∆, Eq. ( 5) will not give reliable values for .

Given
where  ' & is the generalized inverse of  restricted to the subspace of all possible solutions by the projection  ' , which in turn is defined by a set of basis vectors that define the subspace.Before we discuss obtaining the basis vectors, we first need to discuss how to obtain the projection,  ' .
Assume for the moment that we have the basis vectors  -.We concatenate them (column-wise) to form the matrix : The projection onto the subspace defined by these basis vectors  -and the matrix  is then (Mukhopadhyay 2014): where  $ is the transpose of .
Atmospheric chemistry problems are stiff so the  $  may be ill-conditioned.One source of this ill conditioning which also can hamper machine learning tools is that the concentrations are often orders of magnitude apart.The modules use actual concentrations to make the mechanism easier to understand and debug.Normalizing the concentrations helps with both learning and stiffness/ill conditioning.Since the  vectors describe the subspace where the solutions must reside, their magnitude does not matter, just their direction.So we normalize the  vectors by dividing by the average of the non-zero values.Mathematically, we form a diagonal square matrix,  ' , with the averages on the diagonal and calculate the normalized  with Since  ' is diagonal, the inverse is simply the reciprocal of each diagonal element.The ∆ values are recovered from the  /012 values via Atmospheric chemistry problems are also high dimensional.Typical air quality models may have 100 to 200 chemical species and since the vertical column mixing time scale is similar to the slower time scales of the chemistry, some models solve the vertical transport and chemistry simultaneously.Since typical air quality models have 10 to 20 vertical cells, the dimension of the problem is 1,000 to 4,000.Even though the inverse ( $ ) .*only has to be calculated once, this inversion may be intractable.Providing that the condition number of  $  is not too large, Gram-Schmidt orthonormalization can be performed on the  !vectors before carrying out Eqs. ( 7) and ( 8) in which case they will describe the same subspace but now the matrix  $  will be the identity matrix which is its own inverse.Now let us return to the question of how to find the basis vectors that define the "legal" subspace of S.These can be developed by solving Eq. ( 1) using Euler's method, in which case Eq. ( 3) becomes That is The value of ∆ does not matter since it just changes the length of  -not its direction and therefore not its value in describing the subspace.The original module that calculates  -can be run many times under many conditions to generate a set of  - vectors that span the subspace.Then Locality Preserving Projections (LPP), Principal Component Analysis (PCA) or another similar algorithm can be used to find a minimum set of vectors that define the subspace.
4 Solution Procedure for a Photochemical Module 1. Determine which species are active in the photochemical mechanism.That is, not the steady state or build-up species.
2. From the mechanism, extract the A matrix for these species.
3. Using a representative set of atmospheric concentrations, T, RH, and actinic flux, use Eq. ( 10) and the photochemical module to generate data that match values of ∆ and  for many values of , T, RH, and actinic flux for the models operator splitting time step.
4. Normalize the  vectors by dividing each by the average of its nonzero elements.Use these averages to form the  ' matrix, which relates  to  /012 via Eq.( 9).
5. Use the  /012 vectors and Eq. ( 7) to form the  matrix and then the  $  matrix.What is the condition number of the  $  matrix?If the system is large and not ill-conditioned, use Gram-Schmidt orthonormalization on the  vectors before calculating  and  $ , in which case  $  should be an identity matrix or a subset of one.
8. Use Eq. ( 5) to calculate values of  from the values of ∆.
9. Compare the values of  obtained from steps 3 and 7 to make sure they are very similar using the dot product to calculate the angle between them.If they are, we have a good  ' & .
10. Use neural networks or another machine learning algorithm to memorize the () relationship obtained using (a) Eq.
(5), and (b) many runs of the mechanism for a wide range of , T, RH, and actinic flux values.
11. Replace the mechanism with the neural network to calculate () and Eq.(3b) to march forward.12. Clock the speed improvement.
13. Calculate standard measures of performance such as mass balance, bias, and error compared to runs using the complete mechanism.

Photochemical Mechanism
We tested the methods described above on the following very simplified set of photochemical reactions used by Dr. Kleeman at the University of California, Davis when teaching the modeling of atmospheric photochemistry.Although this mechanism is abbreviated, it contains the essential components of all atmospheric photochemical mechanisms related to ozone formation: NOx chemistry, VOC chemistry, formation of peroxy radicals from VOC chemistry that then react with NO to form NO2 and OH, both of which may react to terminate.
The 10 reactions are given in Table 1.The oxygen atom and hydroxyl radical are assumed to be in steady state so there are 6 active species, which are listed in Table 2.

HO2H
The resulting  matrix represents the stoichiometry of the reactions where the rows correspond to each species and the columns to each reaction: As in prior efforts (Kelp, Tessum et al. 2018, Keller andEvans 2019), we employed a box model in Julia to generate 60 independent days of output for both ∆and , recording data every 6 minutes.We are interested in the set of  vectors that form a basis describing the subspace that contains the desired  vectors.First, the transformation in Eq. ( 9) is performed to normalize the sample  vectors.In this example, we use LPP (Locality Preserving Projections) (He and Niyogi 2004), which is similar to PCA (Principle Component Analysis) but more robust for this application.Here LPP yields a basis set of 7 vectors, which form the columns of the  matrix: Since  and  ' & have 7 and 6 independent columns, respectively, but 10 rows, and the row rank is equal to the column rank, there must be linearly dependent rows.One manifestation of this is that the first two rows of  and  ' & are identical, or nearly so.The  values computed from  ' & may not be identical to the original  corresponding to the ∆ values.However, all  values calculated from Eq. (5) using the above  ' & are "legal": in other words, within the subspace defined by the basis set .
Furthermore, the inverse  ' & by definition satisfies  ' & = , so that even if a calculated  is not identical to the  from the original box model output, it can be used in Eq. (3b) to return a ∆ identical to that of the box model output.
sufficient constraints,  & will be unique and provide the desired values of  that are needed to develop a machine learning training set.Ben-Israel and Greville (Ben-Israel and Greville 2003) show that the inverse can be unique if the https://doi.org/10.5194/gmd-2020-83Preprint.Discussion started: 28 April 2020 c Author(s) 2020.CC BY 4.0 License.solutions,  , are restricted to lie in a subspace that defines the "legal" solutions and these restrictions are sufficiently constraining.The constrained generalized inverse of  the produces solutions, , that lie in the legal subspace defined by good examples of solutions  is given by https://doi.org/10.5194/gmd-2020-83Preprint.Discussion started: 28 April 2020 c Author(s) 2020.CC BY 4.0 License.