HERMESv3, a stand-alone multiscale atmospheric emission modelling framework - Part 1: global and regional module.

. We present the High-Elective Resolution Modelling Emission System version 3 (HERMESv3), an open source, parallel and stand-alone multiscale atmospheric emission modelling framework that computes gaseous and aerosol emissions for use in atmospheric chemistry models. HERMESv3 is coded in Python and consists of a global_regional module and a bottom_up module that can be either combined or executed separately. In this contribution (Part 1) we describe the global_regional module, a highly customizable emission processing system that calculates emissions from different sources, 10 regions and pollutants on a user-specified global or regional grid. The user can flexibly define combinations of existing up-to-date global and regional emission inventories and apply country specific scaling factors and masks. Each emission inventory is individually processed using user-defined vertical, temporal and speciation profiles that allow obtaining emission outputs compatible with multiple chemical mechanisms (e.g. Carbon-Bond 05). The selection and combination of emission inventories and databases is done through detailed configuration files providing the user with a widely applicable framework for designing, 15 choosing and adjusting the emission modelling experiment without modifying the HERMESv3 source code. The generated emission fields have been successfully tested in different atmospheric chemistry models (i.e. CMAQ, WRF-Chem and NMMB-MONARCH) at multiple spatial and temporal resolutions. In a companion article (Part 2) we describe the bottom_up module, which estimates emissions at the source level (e.g. road link) combining state-of-the-art bottom-up methods with local activity and emission factors. 20


Introduction
Emission inputs of trace gases and aerosols play a key role in the performance of atmospheric chemistry models for air quality research and forecasting applications. Depending on the purpose of the application, an atmospheric chemistry model may be applied at global, regional or local (urban) scales. Similarly, the level of coverage and detail required for the emission input data will vary according to the type of study and modelling scale (e.g. Borge et al., 2014). 5 For global and regional modelling, emissions are typically estimated at country level (combining national statistics and technology-dependent emission factors), and then disaggregated using spatial proxies such as population density and land use.
Different global and regional emission inventories are continuously being developed and made publicly available by research groups and international programs such as the Global Emissions Initiative (GEIA) (Frost et al., 2013). These inventories 10 usually report total annuals per primary pollutant and source sector distributed over a rectangular grid at resolutions ranging from 1º by 1º to 0.1º by 0.1º. The practical use of these inventories suffers from several problems. On the one side, the reporting format is not directly compatible with the emission input requirements of atmospheric chemistry models as these typically ingest hourly and chemical species-based emissions over other grid projections and resolutions using specific file formats and conventions. On the other side, there are substantial discrepancies in the total emissions, sectorial emission shares, spatial 15 distribution, and pollutant sources considered between the available inventories and therefore in their respective behaviour when used in atmospheric chemistry models (e.g. Granier et al., 2011;Trombetti et al., 2018;Saikawa et al., 2017). A potential remedy for the latter is to combine different inventories and apply adjustment factors in order to improve the representativeness of the emission data and the air quality modelling results (e.g. Rémy et al., 2017). All in all, the incorporation of emission data into atmospheric chemistry models usually implies laborious programming in order to combine, adjust and adapt the original 20 inventories to the model requirements. Global and regional inventories are too imprecise for urban scale modelling applications (e.g. Timmermans et al., 2013).
Emission and activity factors lack specificity for the local conditions of interest (e.g. Guevara et al., 2014), and the spatial proxies used to allocate the emissions are of poor quality and may not apply to certain emission processes (e.g. Lopez- Aparicio 25 et al., 2017). These inventories are for example limited when it comes to predict and assess the impact of emission reduction measures upon local air quality such as the change of speed limits (e.g. Baldasano et al., 2010) or the penetration of new vehicle technologies (e.g. Soret et al., 2014). Consequently, working at the urban scale requires dedicated local emission inventories combining activity data collected at a fine spatial scale (e.g. point source, road links, household) with bottom-up detailed emission algorithms that represent the different factors influencing the emission processes (e.g. vehicle speed, outdoor 30 temperature). Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License.
Besides the aforementioned atmospheric chemistry models, the emission outputs of this module are also adapted for their application with the R-LINE urban dispersion model (Snyder et al., 2013).
We conceive HERMESv3 as a flexible multiscale modelling framework that allows integrating and combining different 20 emissions estimation approaches, so that the emission related outputs can be as detailed and specific as possible for the different domains (global, regional or local) involved in the corresponding application.
The development of HERMESv3 is based on the knowledge acquired from previous versions of HERMES for Spain (Baldasano et al, 2008;Guevara et al., 2013), Europe (Ferreira et al., 2013) and Mexico City (Guevara et al., 2017) that have 25 been developed at the Earth Sciences Department of the Barcelona Supercomputing Center (BSC) during the last decade. Other existing emission software such as HEMCO (Keller et al., 2014) and PREP-CHEM-SRC (Freitas et al., 2011) have also been taken as a reference for the development of HERMESv3.
In this paper (Part 1) we provide a description of the global_regional module (herein referred to as HERMESv3_GR). The 30 bottom_up module is described in the companion paper (Part2; Guevara et al., in preparation). The paper is organized as follows. Section 2 describes the processing system and its main functionalities together with some illustrative examples of the outputs that can be generated with this tool. Section 3 describes some of the current implementations of HERMESv3_GR for air quality modelling. Finally, Section 4 presents the main conclusions of this work. Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License.

2
Description of HERMESv3 Figure 1 shows a schematic representation of the structure of HERMESv3_GR along with the execution workflow.

Overview
HERMESv3_GR first defines the destination grid and selects the emission inventories (see Sect. 2.2), and the vertical, temporal and speciation profiles based on the specifications defined by the user in the general and emission inventory configuration files 5 (see Sect. 2.3 and 2.4,respectively). During the initialization process, HERMESv3_GR automatically creates a set of auxiliary files that are subsequently used during the emission calculation process. These auxiliary files, including the output grid description, the time zones and the country mask, are specific to each new working domain and are stored by default after their creation so that they can be reused in subsequent executions. The emissions are calculated in four steps that are applied to each pollutant sector and species of the selected original emission inventories. These four steps include: (i) the spatial regridding 10 from source grid to destination grid (see Sect. 2.5.1), (ii) the mass distribution over model vertical layers (see Sect. 2.5.2), (iii) the temporal disaggregation (see Sect. 2.5.3) and (iv) the speciation mapping depending on the selected gas phase and aerosol chemical mechanisms (see Sect. 2.5.4). The emission calculation can combine inventories that cover different geographic domains and/or emission sectors. To prevent spatial overlapping between inventories a masking functionality is included during the regridding phase. The user can define country-specific masks that restrict the applicability of the original inventory 15 to a given region, and country-specific scaling factors. Once the emissions have been processed, HERMESv3_GR writes the output file following the requirements and conventions of the atmospheric chemistry model selected by the user in the general configuration file (see Sect. 2.5.5).
For each grid cell x and vertical layer l on the destination domain, and requested output species e, HERMESv3_GR computes 20 the output hourly emissions following Eq. (1).
Where _ ( ̅ ) ̅ , , is the input emission flux (kg m -2 s -1 ) of the species ̅ and pollutant sector s reported by inventory i on the source grid cell ̅ . ( ̅ ) ̅ , , is the remapping weight value from source grid cell ̅ to the destination grid cell x associated to 25 species ̅ and pollutant sector s of inventory i.
(̅, ) ̅ , , is the vertical weight factor for layer l and source grid cell ̅ assigned to species ̅ and pollutant sector s of inventory i (0 to 1). ̅ , , is the temporal weight factor t assigned to species ̅ and pollutant sector s of inventory i. ̅ , , is the speciation factor assigned to species ̅ and pollutant sector s of inventory i. The final _ ( , ) is hourly emission for output species e in destination grid cell x, layer l and is the sum of: (i) all ̅ source grid cells ̅ that contribute to destination grid cell x, (ii) all S employed pollutant sources s and (iii) all I used emission 30 inventories i. The units of the output emissions will vary according to the atmospheric chemistry model selected by the user.
Where ( ̅ ) is the interpolation weight value that describes how the source grid cell ̅ contributes to the destination grid cell x (0 to 1). ( ̅ ) ̅ , , is the masking factor assigned to species ̅ and pollutant sector s of inventory i on the source grid cell ̅ 5 (1 or 0). ( ̅ ) ̅ , , is the scaling factor assigned to species ̅ and pollutant sector s of inventory i on the source grid cell ̅ .
( ) ̅ , , is the monthly factor for month m assigned to species ̅ and pollutant sector s of inventory i (0 to 12). ( ) ̅ , , is the daily factor for day d assigned to species ̅ and pollutant sector s of inventory i (0 to 28,29,30 or 31 depending on the total number of days for month m). (ℎ) ̅ , , is the hourly factor for hour h assigned to species ̅ and pollutant sector s of inventory i (0 to 24). 10

2.2
Emission data library and preprocessing Table 1 lists all the global and regional inventories currently included in the HERMESv3_GR emission data library. On  Wiedinmyer et al. (2014) and Carn et al. (2017), respectively. Two European regional anthropogenic emission inventories are also included, namely the TNO-MACC_III (Kuenen et al., 2014) and the EMEP . The emission data library compiles gaseous (NOx, CO, NMVOC, SOx, NH3) and particulate (PM10, PM2.5, BC, OC) air pollutant emissions. Depending on the inventory, NMVOC emissions are reported as a single category (e.g. ECLIPSEv5.a), by individual species (e.g. GFASv1.2) or following the 25 species groups as proposed within the Global Emission Inventory 25 Activity (GEIA) (Olivier et al., 1996) (e.g. EDGARv4.3.2_VOC). Most of the inventories are reported at the monthly level and include time series with multiple base years (past, present and future).
For each inventory, a specific pre-processing function has been developed to rewrite the original datasets on a common format. volcanic degassing emissions) are stored in CSV files that include information on the name of each source (e.g. name of the volcanoes), geographic coordinates, altitude of injection of the emissions (in meters) and total amount of annual emissions (in 5 kg s -1 ). For this type of inventory, no pre-processing function is needed and it is expected that the user directly provides the data in the required format.
HERMESv3_GR only includes anthropogenic, biomass burning and volcano emission inventories. Natural emissions such as biogenic NMVOCs, mineral dust aerosols, Ocean DMS or lightning and soil NO, which have functional dependencies on 10 meteorological variables, are assumed to be calculated online during the execution of the corresponding atmospheric chemistry model (e.g. NMMB-MONARCH dust module; Pérez et al., 2011) or using specific emission models (e.g. MEGANv2.1; Guenther et al., 2012).

General configuration file
The general configuration options (e.g. start and end date, output file name, working domain description) can be passed to 15 HERMESv3_GR via a configuration file, arguments or a combination of both. The arguments passed by command line takes priority from the ones that appear in the configuration file.
The general configuration file is divided in four different sections (see example in Appendix 1):  General: this section defines the main paths of the processing system (i.e. input, output, data), the name of the output 20 emission file and time step configuration parameters, including start and end dates, temporal resolution (i.e. monthly, daily, hourly) and number and frequency of time steps (e.g. 24 time steps every 3 hours).
 Domain selection: this section defines the working grid where emissions will be calculated (e.g. spatial extension, horizontal and vertical description). Currently, HERMESv3_GR can calculate emissions on grids with the following map projections: regular lat-lon for global domains and rotated lat-lon and lambert conformal conic for regional domains. Other 25 coordinate systems and combinations (e.g. regular lat-lon for regional domains) could be added upon request. In this section of the configuration file, the user also selects the format of the output emission file. Currently, HERMESv3_GR is able to write NetCDF emission output files following the CMAQ, WRF-Chem or NMMB-MONARCH conventions, and can be easily extended to other projections and atmospheric chemistry model conventions.
 Emission inventory configuration: this section defines the path to the file describing the configuration of the emission 30 inventories (see Sect. 2.4).
 Profiles selection: this section defines the profile files that will be applied to perform the vertical distribution, temporal disaggregation and speciation treatment of the original emission inventories (see Sect. 2.5.1 to 2.5.4).

Emission inventory configuration file
The emission inventory configuration file allows the user to select the base emission inventories, pollutant sectors and species to combine and overlay for their simulations, and to choose the corresponding temporal, vertical and speciation profiles and optional scaling and masking factors that will be applied to the original emissions for their adaptation to the CTM requirements.
Each line of the emission inventory configuration file belongs to a specific emission inventory, pollutant sector and pollutant 5 species group, for which the user can define:  Country-specific scaling factors that multiply the original emissions.
 Country-specific masks that restrict the applicability of the original inventory to a given region.
 A vertical profile to distribute the original emissions across the vertical layers of the working domain. 10  A monthly, daily and hourly profile to temporally disaggregate the original emissions.
 A speciation profile to map the original pollutants species to a specific gas phase and aerosol chemical mechanism.

Figure 2 shows five examples of emission inventory configuration files and the resulting emission outputs calculated by
HERMESv3_GR. The first column ("ei") indicates the name of the emission inventory, followed by the name of the pollutant 15 sector ("sector"), the reference year of the emission inventory ("ref_year"), the requested pollutant species to be computed ("pollutants") and a field that indicates if this sector is activated or not ("active", 0 or 1). HERMESv3_GR combines all this information in order to select the corresponding file from the emission data library. In the first example ( Fig. 2a), we selected the 2010 HTAPv2.2 organic carbon (OC) transport emissions, while in the second one ( Fig. 2b) this inventory is combined with OC biomass burning emissions from GFASv1.2. The resulting output shows an increase of emissions in those areas 20 typically affected by forest fires (e.g. Central Africa).
The following two columns of the configuration file are optional parameters that can be used to define country-specific scaling factors that multiply the original emissions ("factor_mask") and country-specific masks that restrict the applicability of the original emissions to the defined region ("regrid_mask"). Country-specific scaling factors are defined combining the ISO 25 3166-1 alpha-3 country code of the targeted country (https://unstats.un.org/unsd/tradekb/knowledgebase/country-code) with a numerical factor. Scaling factors for more than one country need to be separated by a comma. Our third example (Fig. 2c) shows the original 2010 HTAPv2.2 OC transport emissions scaled by a factor of 5 in China and 0.5 in India (CHN 5, IND 0.5). On the other hand, country-specific masks are defined using the ISO 3166-1 alpha-3 country code preceded by either a "+" sign, which restricts the applicability of the inventory only to the targeted country, or a "-" sign, which restricts the 30 applicability of the inventory to all the countries except the targeted one. The masks defined by the user can include more than one country. In the fourth example (Fig. 2d) Column "frequency" defines the temporal resolution of the inventory (i.e. annual, monthly, daily). Column "path" defines the 5 root path of the emission files of each inventory. For all inventories, the root path consists of the common "<data_path>" defined in the general configuration file followed by the name of the institution providing the inventory, the name of the inventory and the temporal frequency. As shown in the first example, the root path of the HTAPv2.2 emission files is "<data_path>/jrc/htapv22/monthly_mean".

10
The alphanumeric codes specified in columns "p_vertical", "p_month" "p_day" "p_hour" and "p_speciation" refer to the vertical, monthly, daily, hourly and speciation profile IDs assigned to process the original emissions. All the codes are crossreferenced with text files where the vertical, temporal and speciation numerical factors are defined. As shown in the first example, the "p_hour" field allows the user to define specific diurnal profiles for weekdays, Saturdays and Sundays, which may be of relevance for certain pollutant sectors such as road transport (e.g. Mues et al., 2014). For the GFASv1.2 biomass 15 burning emissions (second example), the "p_vertical" field is not filled with a vertical profile ID but with two parameters that define: (i) the maximum altitude of the fire plume injection height ("method") and (ii) how the emissions are distributed across the layers below this maximum height ("approach") (see Sect. 2.5.2). Finally, the "comment" column is an optional field in which the user can add an observation.

2.5
Emission core module 20 The following sections describe the main functionalities of HERMESv3_GR, namely the spatial, vertical, temporal and speciation processing of the original emissions and the writing of the output file.

Spatial regridding
This function regrids the selected inventories from their original source grid to the user-defined destination grid. The regridding process consists of two steps. The first step uses the Earth System Modeling Framework (ESMF) regrid weight generation 25 application (Hill et al., 2004) to calculate an interpolation weight matrix that describes how points in the source grid contribute to points in the destination grid. The interpolation method is first-order conservative, where the weight calculation is based on the ratio of the source cell area overlapped with the corresponding destination cell area. The second step is the multiplication of the emissions on the source grid by the interpolation weight matrix and, if previously defined by the user in the emission inventory configuration file, the corresponding scaling and/or masking factors to produce emissions on the destination grid. 30 Country-specific scaling and masking factors are generated with a gridded country mask created during the initialization process.
In the case of point source inventories (e.g. volcano degassing emissions) that are not reported on a regular grid but on specific lat-lon locations, the remapping is performed using a nearest destination to source approach. (When multiple source points are mapped into the same grid cell, the destination is the sum of the source emission values.) For point source emissions, neither scaling nor masking options are available, as the user can directly modify and/or erase individual point sources in the 5 corresponding inventory input file.
The regridding process allows the user to interpolate the original emissions to global or regional grids with flexible spatial resolutions and several map projections, including regular lat-lon, rotated lat-lon, lambert conformal conic and mercator. Other map projections (e.g. polar stereographic) can potentially be added to the processing system in future releases. Figure 3 shows 10 an example of the 0.1x0.1 degree HTAPv2.2 black carbon (BC) transport emissions interpolated to: (a) a 1 by 1.4 deg global regular lat-lon domain, (b) a 0.1 by 0.1deg regional rotated lat-lon domain, (c) a 50 by 50 km regional mercator grid and (d) a 4 by 4 km regional lambert conformal conic grid.

Vertical distribution
Once the emissions are allocated in the horizontal grid, the next step is to distribute them across the vertical layers of the 15 destination domain. For this task, two input files are required: (i) a CSV file containing a description of the domain's vertical layers (i.e. approximate heights above the ground of the top of each vertical layer, in meters) and (ii) a CSV file containing a description of the vertical profile ID previously assigned by the user in the emission inventory configuration file (i.e. fraction of emissions assigned to each vertical layer, between 0 and 1). Using this information, HERMESv3_GR interpolates the original emissions to the modelling domain layers. 20 The user is able to define and assign any vertical profile to any emission inventory/pollutant sector/pollutant species. Some 30 suggested vertical profiles for the energy and manufacturing industry (Bieser et al., 2011) and the air traffic sectors (Olsen et al., 2013) are included in the HERMESv3_GR database. For the GFASv1.2 biomass burning inventory, the vertical emission distribution is not performed with a fixed vertical profile but using two parameters that define: (i) the maximum altitude of the fire plume injection height ("method") and (ii) how the emissions are distributed across the layers below this maximum height ("approach"). The fire plume injection height is directly provided by GFASv1.2 following two different methods. The first method ("sofiev") is based on a semi-empirical parameterisation detailed in Sofiev et al. (2013). The second method ("prm") consist on a plume rise model described by 5 Paugam et al. (2015). Regarding the approach, two options exist as well. The first one ("uniform"), consist on distributing uniformly all the emissions across the layers below the maximum injection height. The second one ("50_top") indicates that 50% of all emissions are allocated in the vertical layer that intersects with the maximum injection height, and the other 50% are distributed uniformly across the layers below the maximum injection height. The user has to select both the method and approach to use in the emission inventory configuration file. 10 Similarly, in the case of point source emission inventories (e.g. volcano degassing), the vertical distribution is not defined using a fixed vertical profile but with the injection height field included in the input inventory file, which can be adjusted individually for each point source. Emissions are distributed homogenously across all the layers below the defined injection height. 15

Temporal distribution
This process distributes temporally the emissions from their original resolution (e.g. annual) to the one defined by the user (monthly, daily or hourly). The emissions are multiplied by the user-defined monthly, weekly and hourly weight factors, which are specified on separated CSV files with the corresponding profile ID (i.e. "MXXX", "DXXX" and "HXXX" for monthly, weekly and hourly profiles, "XXX" being a three-digit numeric code that starts at "001"). Alternatively, users can also provide 20 the temporal profiles using gridded files, which contain specific weight factors for each grid cell.
As in the case of the vertical profiles, the user is left free to define and assign any temporal profile to each pollutant sector and  The application of gridded profiles can be of importance for those emission sectors whose temporal variation is not uniform across the space due to local influences such as temperature (e.g. residential combustion emissions) or farming practices (e.g. agricultural emissions). Figure (Fig. 6b and 6d). Results show large differences between the two results, especially in China and India, the main emitter countries for this sector. According to Fig. 6.e, in China the default profile allocates most of the emissions in March, whereas the updated temporal profile gives more weight to the months of June and July. Similarly, the default profile presents 15 a flat distribution over India, whereas the improved profile indicates a peak during the months of May and June (Fig. 6f). In both cases, the updated monthly distribution is more in line with the seasonality of the NH3 volume mixing ratio derived from the NASA's Atmospheric Infrared Sounder (AIRS) instrument (Warner et al., 2017).

Speciation mapping
This process maps the pollutants provided in the original emission inventories to the species needed by the atmospheric 20 chemistry model of interest and its corresponding gas phase and aerosol chemical mechanism. The mapping is performed using a speciation CSV file, in which the user defines the mapping expressions between the source and destination species. Each line of the speciation file corresponds to a specific profile, which is cross-referenced with the profile ID previously defined in the emission inventory configuration file (i.e "EXXX", "XXX" being a three-digit numeric code that starts at "001"). The columns of the file refer to the names of the destinations species, which need to match the atmospheric chemistry model 25 registry names of the emission variables. For gas-phase primary species (e.g. NOx, CO, NH3, SO2, NMVOCs) a conversion from mass to moles is performed before executing the speciation mapping.
The HERMESv3_GR database includes speciation profiles for the Carbon Bond 05 (CB05, CB05e51) (Whitten et al., 2010) and the Regional Acid Deposition Model 2nd generation (RADM2) (Stockwell et al., 1990) gas-phase mechanisms, as well as 30 the fifth and sixth-generation aerosol modules (AERO5, AERO6) (Roselle et al., 2008;Appel et al., 2017) and the Modal Aerosol Dynamics Model for Europe with the secondary organic aerosol model (MADE/SORGAM) aerosol mechanisms (Ackermann et al., 1998;Schell et al., 2001). For NMVOCs, most of the proposed speciation profiles are based on the  (2015), as well as on previously reported profiles (Simpson et al., 2012). In the case of PM2.5, mappings are mostly based on the SPECIATE (Simon et al., 2010) and SPECIEUROPE (Pernigotti et al., 2016) databases and the works by Visschedijk et al. (2007) and Reff et al. (2009). As in the case of the temporal and vertical weight factors, the user can create its own speciation profiles using other sources of information.

Writing module
The calculated emissions are written in NetCDF4 uncompressed files following the conventions of the selected atmospheric 20 chemistry model. During this process, the following actions take place: (i) conversion of units, and (ii) inclusion of mandatory global attributes.

Technical implementation
HERMESv3_GR is coded using Python 2.7.X and requires numpy (>= 1.9. The emission core module of HERMESv3_GR is parallelized using a domain decomposition strategy. This approach is considered to be the most effective since emissions are computed independently for each destination grid cell and no communication between cells is needed during the calculation process (see Eq (1)). Moreover, applying domain decomposition 30 also allows decreasing the memory consumption per computational node. Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License. Figure 7 shows a schematic representation of the domain decomposition strategy applied in HERMESv3_GR. During the spatial regridding, the destination working domain is divided into vertical sections, maintaining each column undividable. The number of divisions is equal to the number of processors to be used (P_0, P_1, …), which is defined by the user. The emission regridding process is performed independently in each processor and for each vertical section. The maximum number of cores to be used is equal to half of the number of columns of the destination domain. This limitation is defined by the ESMF software, 5 which needs, at least, two complete columns to perform the spatial regridding. The 2D regridded emissions are kept in memory until the writing operation. During this task, the vertical (v0, v1, …) and temporal (t0, t1, …) weight factors previously estimated in the vertical and temporal distribution functions are applied to each emission subdomain in order to transform the 2D arrays (longitude, latitude) into 4D arrays (time, vertical layer, longitude, latitude). This strategy allows reducing the time during which the memory consumption is higher. Finally, each worker process writes simultaneously its result to a common 10 NetCDF4 file, which ensures the gathering of the different subsets of the working domain into a single output.
A scalability test was performed using the supercomputer MareNostrum4, which is host by the BSC, in order to determine the capability of HERMESv3_GR to scale up the emission calculation process. MareNostrum 4 is a supercomputer based on Intel Xeon Platinum processors at 2.1 GHz from the Skylake generation. It is a Lenovo system composed of SD530 Compute Racks, 15 an Intel Omni-Path high performance network interconnect and running SuSE Linux Enterprise Server as operating system. It consists of 48 racks housing 3456 nodes, each one equipped with 48 cores and 96Gb of memory (2Gb per core) (www.bsc.es/marenostrum/marenostrum). HERMESv3_GR was executed using a number of cores from 1 to 510, doubling the number in each successive test until using all cores of a node (i.e. 1, 2, 4, 8, …, 48) and then adding 48 (a whole node) until 510 (i.e. 96, 144, …, 510). 20 All the tests were performed using a rotated lat-lon destination grid of 0.1x0.1 degrees with 701 rows, 1021 columns and 48 vertical layers covering North Africa, Europe and the Middle East (Fig. 3b). Hourly CB05 and AERO5 speciated emissions were estimated for 24 time steps using as input all the available emission pollutants and sectors of the TNO_MACC_III (Europe) and HTAPv2.2 (rest of countries) inventories. 25 As shown in the stacked area chart of Fig. 7, the increased number of cores used in the simulations speeds up the computations.
The total execution time decreases from 4,841.0s (1 core) to 1,204s (510 cores), the lowest value being observed when using 32 cores (863.4s). The most time demanding function changes according to the number of cores used. For 1 to 8 cores, most of the computational work is done during the spatial regridding (between 54% and 34%) and the temporal distribution (between 30 39% and 25%), whereas for the other cases (16 to 510 cores), the writing process increasingly becomes the main time consumer (up to 83% of the total time when using 510 cores). These results clearly indicate that the writing function does not scale properly. The reason behind this behaviour comes from the fact that the netCDF4 Python library writes the results in rowmajor order (C-style), while during the spatial regridding ESMF divides the domain in vertical sections (column-major order, Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License. FORTRAN-style). For each vertical division, netCDF4 Python has to call the writing function as many times as the number of rows that conform the domain. Subsequently, an increase of cores (i.e. an increase of vertical divisions) directly increases the execution time of the writing process. The low performance of the writing function will be addressed in future versions of HERMESv3_GR by integrating an I/O server that allows writing completed rows in row-major order. Despite this shortcoming, the current parallelization strategy allows HERMESv3_GR execution time to be minimized to less than 15 5 minutes per run (32 cores), which can be considered acceptable in an operational environment.

Implementations
HERMESv3_GR has been successfully tested in different atmospheric chemistry models. The system is currently implemented within the NMMB-MONARCH, which contributes to the multi-model ensemble forecasts of the International Cooperative for Aerosol Prediction (ICAP) (www.nrlmry.navy.mil/aerosol/icap.1135.php). HERMESv3_GR has also been coupled with the 10 CMAQ in the framework of the AIRE-CDMX air quality forecasting system for Mexico City (http://www.aire.cdmx.gob.mx/pronostico-aire/). In the first case, HERMESv3_GR is used to provide global primary aerosol emissions to the NMMB-MONARCH model, whereas in the AIRE-CDMX it is used to process the biomass burning emissions reported by GFASv1.2. Besides the two aforementioned implementations, HERMESv3_GR has been also used to perform simulations with the CALIOPE air quality forecasting system, which is based on CMAQ 15 (http://www.bsc.es/caliope/en/forecasts?language=en) and in several tests using the WRF-Chem model.

Conclusions
This paper presents HERMESv3_GR, a stand-alone multiscale emission processing system that estimates gas and aerosol emissions for use in atmospheric chemistry models. HERMESv3_GR is designed to provide a flexible and simplified 20 framework for the generation of emission input files for global and regional air quality modelling. During the execution, emissions from different inventories, sources and species are combined and regridded to the destination domain, and are vertically and temporally disaggregated, speciated and converted to the required format of the atmospheric chemistry model of interest. HERMESv3_GR is driven by configuration files that provide a flexible and transparent platform for the design and implementation of intercomparison and sensitivity modelling experiments. 25 HERMESv3_GR represents an effort of homogenizing the current available information on emission inventories and of using them in a transparent and flexible way to produce emission outputs that can be used directly by multiple atmospheric chemistry models. There are several features that makes HERMESv3_GR an unique emission processing system, including: Geosci. Model Dev. Discuss., https://doi.org /10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License.
-User-defined grid and choice between different map projections: Emissions can be computed on any global or regional domain with a regular lat-lon, rotated lat-lon, mercator or lambert conformal conic projection.
-Choice between different emission inventories: the emission data library of HERMESv3_GR includes current stateof-the-art global and regional inventories that cover different sources (anthropogenic, biomass burning, volcanoes), pollutants (ozone precursor gases, acidifying gases and primary particulates) and base years (past, present and future). 5 Moreover, country-specific scaling and masking factors defined by the user can be applied to the base inventories in order to combine and adjust them.
-Choice between different vertical, temporal and speciation profiles: HERMESv3_GR includes a dataset of profiles reported by the literature, but it also allows the user to add its own weighting factors for any pollutant sector and species. Additionally, the processing system is able to combine base inventories with gridded temporal profiles, which 10 can be of importance for those pollutant sectors whose temporal variation is not uniform across space (e.g. residential combustion emissions and temperature).
-Choice between different atmospheric chemistry model: The generated emission files can be used as input for the CMAQ, WRF-CHEM and NMMB-MONARCH chemical transport models.
-Choice between different chemical mechanisms: base pollutants can be mapped to several gas-phase and aerosol 15 chemical mechanism, including CB05, CB05e51, RADM2, AERO5, AERO6 and MADE/SORGAM. All these mechanisms are widely used in the air quality modelling community.
-Parallel implementation: The emission core module of HERMESv3_GR is parallelized using a domain decomposition strategy, which allows decreasing the execution time and memory consumption of the model. This feature can be of importance when using the processing system in operational air quality forecasting systems, for which the simulations 20 need to be completed within the required time constraints.
-Several emission outputs obtained with HERMESv3_GR are provided in this paper to illustrate its potential. The software is implemented within NMMB-MONARCH and CMAQ in the framework of the ICAP multi-model ensemble and the AIRE-CDMX air quality forecasting system for Mexico City, respectively. Future works will consider the expansion of the emission 25 data library to include regional inventories of regions such as Asia or America, emission datasets that are currently being developed in the framework of the Copernicus Atmosphere Monitoring Service (CAMS), as well as datasets that report emissions of greenhouse gases, so that HERMESv3_GR can also serve as input for climate modelling. Other efforts will focus on the implementation of a functionality to handle the remapping of emissions to unstructured destination grids (e.g. octahedral grid), which are starting to be widely in global models due to their computational efficiency and effective resolution, as well 30 as on the improvement of the scalability of the writing function. Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License.

Code availability
The HERMESv3_GR code package, pre-processing functions to homogenize the emission inventories (listed in Table 1), sample configuration and ancillary input files (vertical, temporal and speciation profiles) and a test case data are available at the following gitlab repository: https://earth.bsc.es/gitlab/es/hermesv3_gr. A wiki of the processing system with further instructions is also included in the gitlab repository, as well as the links and references for downloading and citing the original 5 gridded emission inventories that HERMESv3_GR can process. The required libraries need to be installed by the user in the computer infrastructure where the processing system is planned to be run. Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License.

Appendices
Appendix A: HERMESv3_GR general configuration file (hermes.conf)

Parameters and examples Description and comments [GENERAL] log_level = 3
Defines the logging level, which is associated to the amount of information that will appear in the log file. Parameters that define a regional lambert conformal conic grid:  lat_1 = Standard parallel 1 (in degrees).  lat_2 = Standard parallel 2 (in degrees).  lon_0 = Longitude of the central meridian (in degrees).  lat_0 = Latitude of the origin of the projection (in degrees).  nx = Number of grid columns.  ny = Number of grid rows.  inc_x = X-coordinate cell dimension (in meters).  inc_y = Y-coordinate cell dimension (in meters).  x_0 = X-coordinate origin of grid (in meters).  y_0 = Y-coordinate origin of grid (in meters). Parameters that define a regional lamber conformal conic grid:  lat_ts = Latitude of true scale (in degrees).  lon_0 = Longitude of projection center (in degrees).  nx = Number of grid columns.  ny = Number of grid rows.  inc_x = X-coordinate cell dimension (in meters).  inc_y = Y-coordinate cell dimension (in meters).  x_0 = X-coordinate origin of grid (in meters).
[ HERMESv3_GR and its implementation within the NMMB-MONARCH model. Carlos Pérez García-Pando helped conceiving HERMESv3_GR and supervised the work. Marc Guevara prepared the manuscript with contributions from all coauthors.

Acknowledgements
The research leading to these results has received funding from the Ministerio de Economía y Competitividad ( Geosci. Model Dev. Discuss., https://doi.org /10.5194/gmd-2018-324 Manuscript under review for journal Geosci. Model Dev. Discussion started: 7 January 2019 c Author(s) 2019. CC BY 4.0 License.