Development of a Grid-independent Geos-chem Chemical Transport Model (v9-02) as an Atmospheric Chemistry Module for Earth System Models

The GEOS-Chem global chemical transport model (CTM), used by a large atmospheric chemistry research community , has been re-engineered to also serve as an atmospheric chemistry module for Earth system models (ESMs). This was done using an Earth System Modeling Framework (ESMF) interface that operates independently of the GEOS-Chem scientific code, permitting the exact same GEOS-Chem code to be used as an ESM module or as a stand-alone CTM. In this manner, the continual stream of updates contributed by the CTM user community is automatically passed on to the ESM module, which remains state of science and referenced to the latest version of the standard GEOS-Chem CTM. A major step in this re-engineering was to make GEOS-Chem grid independent, i.e., capable of using any geophysical grid specified at run time. GEOS-Chem data sockets were also created for communication between modules and with external ESM code. The grid-independent, ESMF-compatible GEOS-Chem is now the standard version of the GEOS-Chem CTM. It has been implemented as an atmospheric chemistry module into the NASA GEOS-5 ESM. The coupled GEOS-5–GEOS-Chem system was tested for scalability and performance with a tropospheric oxidant-aerosol simulation (120 coupled species, 66 transported tracers) using 48–240 cores and message-passing interface (MPI) distributed-memory parallelization. Numerical experiments demonstrate that the GEOS-Chem chemistry module scales efficiently for the number of cores tested, with no degradation as the number of cores increases. Although inclusion of atmospheric chemistry in ESMs is computa-tionally expensive, the excellent scalability of the chemistry module means that the relative cost goes down with increasing number of cores in a massively parallel environment.


Introduction
Global modeling of atmospheric chemistry involves the solution of 3-D continuity equations for the concentrations of chemical species including the effects of emissions, transport, chemistry, and deposition.This is commonly done with chemical transport models (CTMs) driven by input meteorological data and surface boundary conditions.CTMs are relatively simple computational tools because the chemical continuity equations are solved without coupling to atmospheric dynamics.They are adequate for many applications and play a central role in advancing knowledge of atmospheric chemistry.However, there is also increasing demand for atmospheric chemistry to be implemented as a coupled module in Earth system models (ESMs) that calculate the ensemble of processes affecting the Earth system prognostically.Here we describe a software framework through which the state-ofscience GEOS-Chem CTM can be implemented seamlessly as a module in ESMs, so that the stand-alone CTM and the ESM module use exactly the same code.We describe the deployment of this capability in the NASA Goddard Earth Observing System (GEOS) developed at NASA's Global Modeling and Assimilation Office (GMAO).GEOS-Chem (http://www.geos-chem.org) is a sharedmemory parallel (OpenMP) global 3-D Eulerian CTM driven by assimilated meteorological data (Bey et al., 2001).It is used by over 100 research groups worldwide for a wide range of applications including simulation of tropospheric oxidants (Mao et al., 2013), aerosols (Fairlie et al., 2007;Jaeglé et al., 2011;Park et al., 2004;Trivitayanurak et al., 2008), carbon gases (Nassar et al., 2010;Wang et al., 2004;Wecht et al., 2014), mercury (Holmes et al., 2010;Selin et al., 2008), and stratospheric chemistry (Eastham et al., 2014;Murray et al., 2012).GEOS-Chem is based on core principles of open-source code development, modular structure, nimble approach to innovation, strong version control, rigorous quality assurance (QA), extensive documentation, and user support.The large user base permits extensive model diagnosis and generates a continual stream of new developments to maintain the model at the forefront of the particular science.Implementation of new developments in the standard GEOS-Chem code can be done quickly and efficiently because of the simplicity of the code and the common interests of the user community.Maintaining state-of-science capability is more challenging in ESMs because of complexity of managing the central code and the need for dialogue across research communities to prioritize model development.On the other hand, CTMs such as GEOS-Chem have more difficulty staying abreast of high-performance computing (HPC) technology because of limited software engineering resources.
Here we present a re-engineered standard version of the GEOS-Chem CTM capable of serving as a flexible atmospheric chemistry module for ESMs.A key innovation is that GEOS-Chem is now grid independent, i.e., it can be used with any geophysical grid.The same standard GEOS-Chem code can be integrated into ESMs through the Earth System Modeling Framework (ESMF;Hill et al., 2004) interface, or used as before as a stand-alone CTM driven by assimilated meteorological data.The re-engineered gridindependent flexibility has been integrated into the standard open-code version of the GEOS-Chem CTM.The exact same scientific code in the GEOS-Chem CTM now serves as atmospheric chemistry module in the GEOS-5 ESM of the NASA Global Modeling and Assimilation Office (GMAO) (Molod et al., 2012).Scientific updates to the GEOS-Chem CTM contributed by its user community and incorporated in the standard model following QA are automatically integrated into the GEOS-5 ESM, so that the ESM effortlessly remains state of science and traceable to the latest standard version of GEOS-Chem.

Grid-independent GEOS-Chem model description
The GEOS-Chem CTM consists of four modules executing operations for chemistry and dry deposition, emissions, wet deposition, and transport (Fig. 1).GEOS-Chem solves the general Eulerian form of the coupled continuity equa- tions for m chemical species with number density vector n Here U is the wind vector (including sub-grid components parameterized as turbulent diffusion and convection), and P i (n) and L i (n) are the local production and loss rates of species i including terms to describe chemical reactions, aerosol microphysics, emissions, precipitation scavenging, and dry deposition.In GEOS-Chem, as in all 3-D CTMs, Eq. ( 1) is solved by operator splitting to separately and successively apply concentration updates over finite time steps from a transport operator and a local operator (commonly called chemical operator) The transport operator includes no coupling between species, while the chemical operator has no spatial coupling.The transport operator is further split into 1-D advection operators, a convection operator, and a boundary layer mixing operator.Operator splitting breaks down the multidimensionality of the coupled system (1) and enables numerical solution by finite differencing.The chemical operator in GEOS-Chem is further split into chemistry and dry deposition, emissions, and wet deposition modules for computational convenience.Gravitational settling of particles is treated as part of the chemical operator.Wet deposition from sub-grid convective precipitation cannot be decoupled from convective transport (Balkanski et al., 1993) and is treated as part of convection in the transport operator.The transport operators in the standard GEOS-Chem CTM are applied on fixed latitude-longitude grids (e.g., Wu et al., 2007).When integrated into an ESM, GEOS-Chem does not need to calculate its own transport; this is done separately in the ESM as part of the simulation of atmospheric dynamics, where transport of chemical species is done concurrently with transport of meteorological variables.Thus, the ESM only uses GEOS-Chem to solve the chemical operator (Eq. 3) over specified time steps.The GEOS-Chem chemical operator must in turn be able to accommodate any ESM grid and return concentration updates on that grid.
The chemical operator has no spatial dimensionality (0-D) and could in principle be solved independently for all grid points of the ESM.However, grouping the grid points by column is more efficient as it permits simultaneous calculation of radiative transfer, precipitation scavenging, gravitational settling, and vertically distributed emissions for all grid points within the column.Thus, we take a 1-D vertical column as the minimum set of grid points to be handled by a call to the chemical operator.Chemical operator updates for a given column can be completed without information from neighboring columns.Solving for the chemical operator column by column reduces memory overhead and facilitates scalable single program, multiple data (SPMD; Cotronis and Dongarra, 2001) parallelization in a distributed computing environment using the message-passing interface (MPI).It may sometimes be preferable to apply the chemical operator to ensembles of columns, grouped independent of geography, to balance the computational burden and achieve performance gains (Long et al., 2013).
Prior to this work, the horizontal grid of GEOS-Chem was defined at compile time from a limited selection of fixed latitude-longitude grids (1/4 ) compatible with the advection module and offline meteorological fields.Our goal here was to reengineer the existing GEOS-Chem code to accept any horizontal grid defined at runtime.The horizontal grid would be able to span the entire global domain, represent a single column to be calculated on a single compute node, or represent any collection of columns defined by their location.This permits use of the same scientific code for stand-alone CTM and coupled ESM applications.

Code modularization and structure
In order for the GEOS-Chem code to permit run-time horizontal grid definition, much of the FORTRAN-77 code base was updated to Fortran-90.This included extensive conversion of static to dynamically allocatable arrays, and introduction of pointer-based derived data types.Data flow into, through, and out of GEOS-Chem's routines was reconfigured to use derived-type objects passed to routines as arguments in place of publicly declared global-scope variables.
This permitted the bundling of data structures with similar functionality into common interfaces (data sockets) that simplify module communication within GEOS-Chem and coupling to external components through the ESMF interface (see Sect. 2.2).Three sockets are defined: a meteorology and physics socket, a chemistry socket, and an input options socket.The meteorology and physics socket provides data defining geophysical state variables and arrays.This includes temperature, pressure, humidity, wind fields, and many others.The chemistry socket provides data structures for chemical species including indexing, species names, and concentrations.The input options socket provides runtime information such as calendar, grid dimensions, diagnostic definitions, and locations of offline information stored on disk.Together, these sockets incorporate all of the quantities and fields necessary for coupling to and driving modules within GEOS-Chem.
The GEOS-Chem code includes specific conditionalcompilation flags to accommodate the ESMF interface and permit coupling with external data streams.These flags do not interfere with GEOS-Chem's scientific operation and are used exclusively in grid, I/O, and utility operations.There are three flags invoked as C-preprocessor statements: ESMF_, EXTERNAL_GRID, and EXTERNAL_FORCING.Code bounded by these flags is neither compiled nor executed unless the specific flag is enabled at compile time.The ESMF_ flag bounds code specific for the ESMF.The EXTER-NAL_GRID flag bounds code that allows GEOS-Chem to operate on an externally defined and initialized grid (e.g., by an ESM).The EXTERNAL_FORCING flag bypasses GEOS-Chem's internal, offline data I/O operations necessary for CTM operation, and replaces them with ESMF-based I/O.Users do not need to have the ESMF installed in order to run GEOS-Chem as a stand-alone CTM.The system reverts to the standard GEOS-Chem CTM code relying on the legacy module interface when compiled without these flags enabled.
The recently developed Harvard-NASA Emissions Component HEMCO (http://wiki.geos-chem.org/HEMCO/) is used for emission calculations (Keller et al., 2014).HEMCO is a Fortran-90-based, ESMF compliant, highly customizable module that uses base emissions and scale factors from a library of emission inventories to construct time-dependent emission field arrays.Emission inventories and scale factors are selected by the user in a HEMCO-specific configuration file.Emission inventories for different species and source types need not be of the same grid dimensions or domain.
The redesign of GEOS-Chem's data structures was meant to simplify coupling of GEOS-Chem with any ESM regardless of its ESMF compatibility.In the absence of an ESMF interface, users would be required to engineer a specific interface for their ESM, However, GEOS-Chem's data sockets and conditional-compilation flags facilitate this task by having all input and output data structures and associated methods conveniently located in a few specific modules.
As with all modifications to the publicly available GEOS-Chem source code, changes made for ESM coupling and grid-independence were subject to rigorous QA by conducting prescribed 1-month and 1-year test simulations as benchmarks (http://acmg.seas.harvard.edu/geos/geos_benchmark.html),and comparing results to the benchmarks of the previous model version.Our changes were not expected to modify any aspect of the benchmark simulation results and we verified that they did not.Results from the benchmark simulations for version 9-02k can be found at http://wiki.seas.harvard.edu/geos-chem/index.php/ GEOS-Chem_v9-02_benchmark_history#v9-02k.

ESMF interface
We made GEOS-Chem ESMF-compatible for interfacing with external ESMs.The ESMF is an open-source software application programming interface that provides a standardized high-performance software infrastructure for use in ESM design.It facilitates HPC, portability, and interoperability in Earth science applications (Collins et al., 2005).
GEOS-Chem is executed within the ESMF as a gridded component.The gridded component is the basic element of an ESMF-based program, and is defined as a set of discrete scientific and computational functions that operate on a geophysical grid.Likewise, other components of the Earth system are implemented as gridded components (e.g., atmospheric dynamics, ocean dynamics, or terrestrial biogeochemistry).
Each gridded component consists of a routine establishing ESMF-specific services, and initialize, run, and finalize operations methods for gridded component execution by the ESMF.The initialize method is executed once at the beginning of the simulation and initializes component-specific runtime parameters.The run method interfaces local data structures with ESMF states (see below) and executes the component code (GEOS-Chem in our case).The finalize method wraps up code execution, closes any remaining open files, finalizes I/O and profiling processes, and flushes local memory.
Gridded components exchange information with each other through states.A state is an ESMF derived type that can contain multiple types of gridded and non-gridded information (Collins et al., 2005;Suarez et al., 2013).An ESMF gridded component is associated with an import state and an export state.The import state provides access to data created by other gridded components.The export state contains data that a component generates and makes available to other components.In the ESMF-enabled GEOS-Chem, data are passed into and out of the GEOS-Chem gridded component via interfacing an appropriate state with a corresponding GEOS-Chem data socket (Fig. 1), making these data available within GEOS-Chem or to other ESM gridded components (see Sect. 2.1).
The ESMF was implemented within GEOS-Chem as an independent layer that operates on top of the CTM code.It includes code for interfacing with and executing GEOS-Chem as an ESMF gridded component.When coupling GEOS-Chem to an ESM, the GEOS-Chem transport modules are excluded and only those modules necessary to solve Eq. ( 3) are used.Coupling specifically to the GEOS-5 ESM required an adaptation of GEOS-Chem's ESMF interface for the GMAO's Modeling Analysis and Prediction Layer (MAPL) extension (Suarez et al., 2013).MAPL is otherwise not required for GEOS-Chem.

Implementation, performance, and scalability
The ESMF-enabled GEOS-Chem was embedded within the NASA GEOS-5 ESM (version Ganymed-4.0).The GEOS-5 ESM is the forward model of the GEOS-5 atmospheric data assimilation system (GEOS-DAS) (Ott et al., 2009;Rienecker et al., 2008).The system is built on an ESMF, and uses a combination of distributed memory (MPI) and, in some cases, hybrid distributed/shared-memory parallelization.The dynamical core used here is based on Lin (2004), and operates on horizontal grid resolutions ranging from 2 • × 2.5 • to 0.25 • × 0.3125 • , with 72 vertical layers up to 0.01 hPa.Ocean surface and sea-ice boundaries are prescribed.The land and snow interfaces are based on Koster et al. (2000) and Stieglitz et al. (2001), respectively.For the coupled simulations, GEOS-5 ESM native dynamics and moist physics are applied to the GEOS-Chem chemical tracers.
The coupled GEOS-5-GEOS-Chem system was tested on 2 • × 2.5 • and 0.5 • × 0.625 • grids with a standard oxidantaerosol simulation using 120 chemical species of which 66 are transported (chemical tracers).Radical species with very short chemical lifetimes are not transported.The chemistry module used the RODAS-3 (4-stage, order 3(2), stiffly accurate) solver with self-adjusting internal time step (Hairer and Wanner, 1996) as part of the kinetics pre-processor (KPP; Eller et al., 2009;Sandu and Sander, 2006).KPP was implemented with its supplied linear algebra (BLAS Level-1) routines in place.The 2 • × 2.5 • simulation used a time step of 1800 s for all operations.For the 0.5 • ×0.625 • simulation, chemistry and system-operation time steps were both 450 s.Dynamics, physics, and radiation time steps were 900 s.For both simulations, the atmosphere used 72 vertical hybridsigma (pressure) levels.Simulations were run for 31 days initialized on 1 July 2006.All chemical tracers were initialized from output of a GEOS-Chem CTM (v9-02) simulation.
The 2 • × 2.5 • coupled simulations were used to test scalability of the coupled system and for comparison to the GEOS-Chem CTM.We conducted simulations with 48, 96, 144, 192, and 240 total MPI processes operating on 12 × 4, 12 × 8, 12 × 12, 16 × 12, and 16 × 15 (lat × long) contiguous grid point subdomains, respectively.This represents a set of five simulations j ∈ [1, 5].For comparison, the offline GEOS-Chem CTM (v9-02) was run on eight sharedmemory processes at 2 • × 2.5 • resolution using eight-core 2.6 GHz Intel Xeon processors, reflecting a typical CTM setup, using otherwise identical settings and initial chemical conditions as the coupled GEOS-5-GEOS-Chem simulations.Since GEOS-5 is a pure MPI application, each MPI process corresponds to a single processor core.
Figure 2 gives execution wall times for the total simulation and for the chemistry (GEOS-Chem) and dynamics gridded components.To analyze the performance and scalability results, we define the normalized scaling efficiency S for sim-ulation j relative to simulation j-1 as where W x,j is the wall time for component x, and N j is the number of cores allocated to the simulation.S measures how efficiently the addition of computational resources speeds up execution.For example, a value of 0.9 indicates that 90 % of the resources added for computation contributed to increased performance.A value of zero means no speedup.A negative value means slowdown, as might result from increasing I/O.Results for 48 cores (j = 1) are given relative to the eightprocess GEOS-Chem CTM simulation (j = 0), which uses different shared-memory processes and a different transport code for chemical tracers only.The two simulations are not strictly comparable but results serve to benchmark the performance of the GEOS-5-GEOS-Chem system against the GEOS-Chem CTM.We find that the scaling efficiency for the chemistry module (GEOS-Chem) in the GEOS-5-GEOS-Chem system is 0.78 ± 0.10 for the range of cores tested.This represents excellent performance, with no decline as the number of cores increases, reflecting the independent nature of the chemistry calculation for individual columns.For that reason, we expect the excellent scalability of the chemistry module to extend to any number of cores.Scaling efficiency of the dynamics component decreases with increasing number of cores and becomes negative above 192.This reflects the small number of grid points allocated to individual cores increasing the relative cost of communicating between processes versus operating within local memory, as well as a greater internodal communication associated with additional chemical tracers.The results further suggest that the chemistry module would remain efficient for simulations beyond the range of values tested.
The 0.5 • × 0.625 • resolution simulation was used to examine the performance of the GEOS-5-GEOS-Chem system when operating on a finer grid resolution than permitted by the GEOS-Chem CTM using shared-memory OpenMP parallelization.The higher resolution also increases the problem size, permitting the efficient use of more computing power.For this simulation, the horizontal grid was decomposed into 24 × 25 latitude-longitude blocks over 600 cores.The 0.5 • ×0.625 • resolution simulation completed 0.35 simulation years per wall day.
About 20 % of the wall time spent on chemistry in the GEOS-5-GEOS-Chem system was spent copying and flipping the vertical dimension of chemical tracer arrays between the GEOS-5 ESM and GEOS-Chem.This would be overcome to a large extent by linking GEOS-Chem tracer arrays to the ESMF using pointers, which access memory locations of pre-existing variables directly.This cannot be done within the GEOS-5 ESM for two reasons: (1) GEOS-Chem stores concentrations in double-precision arrays, while the GEOS-5 system generally uses single precision; (2) GEOS-Chem indexes concentration arrays vertically from the surface of the Earth upward while the GEOS-5 system does the reverse.Such limitations are not intrinsic to GEOS-Chem and depend on the specific ESM to which GEOS-Chem is coupled; other ESMs may use different data precision and indexing.Further software engineering in GEOS-Chem could add flexibility in array definitions to accommodate different ESM configurations.
Figure 3 illustrates model results with 500 hPa O 3 mixing ratios at 12:00 UT on 15 July 2006 for GEOS-5-GEOS-Chem simulations at 2 • × 2.5 • and 0.5 • × 0.625 • resolutions, and for the GEOS-Chem CTM using GEOS-5 assimilated meteorological data at 2 • × 2.5 • resolution.All three simulations are initialized from the same GEOS-Chem CTM fields at 00:00 UT on 1 July 2006, but have different meteorology because of differences in resolution and also because the CTM uses assimilated meteorological data while the GEOS-5-GEOS-Chem system in this implementation does not.The figure demonstrates the fine structure of chemical transport that can be resolved with the 0.5 • × 0.625 • resolution.The general patterns are roughly consistent between simulations and are reasonable compared to satellite and sonde observations (Zhang et al., 2010).A scatterplot comparing output from the different simulations (Fig. 4) shows that they have comparable results.Figures 3 and 4 are intended to illustrate the GEOS-5-GEOS-Chem capability.A more thorough evaluation of GEOS-Chem's chemistry within the GEOS-5 system would require the use of the same meteorological data as the offline CTM, diagnosing the full ensemble of simulated chemical species, and investigating the effect of transport errors when using offline meteorological fields in the CTM.This will be documented in a separate publication.

Summary
We have presented a new grid-independent version of the GEOS-Chem chemical transport model (CTM) to serve as atmospheric chemistry module within Earth system models (ESMs) using the Earth System Modeling Framework (ESMF).The new GEOS-Chem version uses any grid resolution or geometry specified at runtime.The exact same standard GEOS-Chem code (freely available from http:// geos-chem.org)supports both ESM and stand-alone CTM applications.This ensures that the continual stream of innovation from the worldwide community contributing to the stand-alone CTM is easily incorporated into the ESM version.The GEOS-Chem ESM module thus always remains state of science.
We implemented GEOS-Chem as an atmospheric chemistry module within the NASA GEOS-5 ESM and performed a tropospheric oxidant-aerosol simulation (120 coupled chemical species, 66 transported tracers) in that fully coupled environment.Analysis of scalability and performance for 48 to 240 cores shows that the GEOS-Chem atmospheric chemistry module scales efficiently with no degradation as the number of cores increases, reflecting the independent nature of the chemical computation for individual grid columns.Although the inclusion of detailed atmospheric chemistry in an ESM is a major computational expense, chemistry operations become relatively more efficient as the number of cores increases due to their efficient scalability.

Figure 1 .
Figure 1.Coupling between the GEOS-Chem global chemical transport model (CTM) (dashed beige box) and an Earth system model (ESM) (blue box).The schematic shows how the coupling is managed through the ESMF, and utilizes only the GEOS-Chem components bound by the ESM box: transport modules in the GEOS-Chem CTM are bypassed and replaced by the ESM transport modules through the atmospheric dynamics simulation.

Figure 2 .
Figure 2. Performance and scalability of the GEOS-5-GEOS-Chem system for a 1-month test simulation including detailed oxidant-aerosol tropospheric chemistry at 2 • × 2.5 • horizontal resolution.Top panel: total and stacked wall times for the chemical operator (GEOS-Chem), dynamics, and other routines versus number of processor cores.Bottom panel: scaling efficiency (Eq.4) for chemistry, dynamics, and the full GEOS-5-GEOS-Chem system.Values shown for 48 cores are relative to the eight-process sharedmemory GEOS-Chem CTM.