Reproducible and Relocatable Regional Ocean Modelling: Fundamentals and practices

,

blue carbon resources and understanding the potential marine impacts of climate change mitigation interventions, such as Marine :::::: marine offshore renewable energy (Dorrell et al., 2022) and land use change (Felgate et al., 2021).
While global ocean modelling products and research activities are increasing in resolution and sophistication, they are still a long way from the scale and process representation required to deliver accurate information on the coastal ocean. There are 20 a number of reasons why it is advantageous to configure bespoke regional models: though data catalogues like the Copernicus Marine Service (CMS 1 ) (and others) are rich resources for regional and global marine data, these can not always satisfy all user requirements. Motivations need not always be about spatial resolution. For example, missing processes can be an important driver for building a new configuration (perhaps addressing a lack of integrated physics and biogeochemistry or a lack of tidal processes in the "off the shelf" catalogue products). Alternatively bespoke model outputs might be required (such as high 25 frequency output for specific subregions or new metrics). So the key advantages of regional configurations over global include benefits associated with resolution enhancements, design flexibility and computational efficiency (to contain only the region and processes that matter and to not worry about degrading the solution in other regions). On the other hand, key disadvantages include the need for lateral boundary conditions (which can be hard to obtain and a potential source of error); the human resource required to configure and maintain multiple regional domains; and the lack of a common experience across a global 30 community of coastal ocean modellers.
The configuration of a regional ocean model has traditionally been a one-off event taking many months and requiring many, often subtle, expert decisions. Consequently, descriptions of the set up (e.g. in the literature) are relatively limited or hard to reproduce in its entirety. The need to configure multiple regional models in many different seas around the world has led us to develop a systematic workflow where NEMO (the Nucleus for European Modelling of the Ocean 2 , Madec and Team) regional 35 ocean configurations could be more efficiently built, deployed and reproduced. This reproducible workflow is not intended to be the sole authority on regional configuration setup, or provide a turn-key or black-box solution, instead it is designed to provide a set of guidelines for modellers to follow in order to capitalize on the usability and interoperability of the resulting simulations.
Indeed other modelling systems, such as MITgcm (Marshall et al., 1997), ROMS (Shchepetkin and McWilliams, 2005) and FVCOM (Chen et al., 2006), can also be readily configured for new regional applications. Each have their own strengths which 40 are largely set according to the model system's development history and the user's familiarity with the system. For example ROMS was specifically designed as a Regional Ocean Model, and MITgcm has been a model of choice for numerous idealised processes studies. On the other hand, FVCOM's unstructured mesh has made it a popular research code choice for multiscale coastal hydrodynamics. The concepts presented here are intended to be broadly applicable to any modelling system, though the worked examples are implemented within NEMO. 45 NEMO is an ocean modelling framework underpinned by a consortium of 5 large European research, climate and operational centres. It is well supported as an international community modelling code and consequently is employed as the ocean component for 9 of the 35 widely used CMIP6 models 3 (Eyring et al., 2016) and is used in the CMS catalogue of freely avail-1 CMS:https://marine.copernicus.eu 2 NEMO:http://www.nemo-ocean.eu 3 CMIP6 homepage: [last accessed 27 Jul 2022] able marine data products. As a regional model it benefits substantially from these investments of effort. However, its origins in large research and operational centres (where teams focus on specific configurations) has led, quite naturally, to barriers to 50 NEMO being more widely used in regional applications.
In our experience, each research question addressed with a regional model configuration requires a subtly different workflow.
Sometimes this would be the requirement of different forcing, sometimes the use of ensemble simulations, or sometimes different domain files. The intention here, therefore, is not to provide a manual on how to build a configuration but instead to share the concepts that need to be considered and practices that can be utilised when building a new configuration. This is in 55 part done with code examples. Note that the intention is not to create an automatic method for generating new configurations (e.g. Trotta et al., 2021); NEMO is a continually evolving code base with frequent releases and updates, so any turn-key solution would be quickly depreciated and less appropriate for cutting-edge scientific endeavours. Furthermore, since the process as a whole is complex and an unwelcome barrier to new starters, we have found it instructive to offer recipes that guide the user through the stages that need to be considered. Users are then encouraged to modify these recipes for their own bespoke purpose, 60 gaining insight through doing so, whilst simultaneously preserving the reproducibility documentation.
On this journey we have developed methodologies that reinforce the principle of reproducibility, that is fundamental to the scientific method. In particular these practices are aimed at making the use of large modelling frameworks more accessible.
These concepts, and benefits, are discussed in Section 2. In Section 3 we step through the important considerations that can guide the construction of a new regional configuration. Model and configuration specific details are abstracted worked examples with system development. However this alone can not deliver reproducibility, which arguably is minimally implemented within 80 our community.
It is easy to understand how the additional time burden and potential loss of "intellectual property" might disincentivise an individual in making their science too easy to reproduce. Indeed the strategy for how one chooses to make their work reproducible, or the level one must attain, is not prescribed, and perhaps nor should it be. However this lack of prescription, and non trivial amount of expert knowledge required to generate or reproduce simulations, can enhance the barrier to new 85 adoption of modelling frameworks.
Beyond being a mandate, reproducibility offers clear benefits to the community. Reproducibility leads to enhanced efficiency with less time "reinventing the wheel" or consumed by software problems and more time dedicated to science discovery or project deliverables. Reproducibility can lead to a democratisation of skills across a user base and an upskilling for individuals.
It can accelerate debugging and therefore accelerate development. Enhanced levels of reproducibility in existing configurations 90 make delivery of new working configurations a realistic prospect within smaller research projects, and furthermore makes the process more accessible for new regional modellers.
Furthermore with the progression towards increasingly automated and integrated systems (e.g. UK NERC digital strategy 7 ) there will be an increasing demand for machine-capable reproducibility.
In recent years, and in response to increasing demand for new regional configurations, the authors sought to develop such 95 practices. The concepts outlined here are intended to transcend code versions and even modelling frameworks. They are emergent rather than novel; born and distilled from experience and intending to borrow good working practices from the software development community. To be effective they must be memorable, even obvious.
There are three elements that have precipitated out of our work towards Reproducibility (and we are still on the journey).
Two activities: Reproducible Workflows and Standardised Assessment, and a third element: Value. For the endeavour to be 100 sustainable, the additional activities must produce a recognisable Value, that exceeds the effort. Schematically summarised in Fig. 1, these are each addressed in the following.

Reproducible configurations
The first activity within the enhanced Reproducibility principle is Reproducible Workflows (Fig. 1). Reproducible relocatable regional modelling workflows already exist. Indeed established and alternative workflows should be reviewed and considered 105 when choosing a template appropriate for a new project. Seminal examples include: -A Structured and Unstructured grid Relocatable ocean platform for Forecasting (SURF 8 ) (Trotta et al., 2016(Trotta et al., , 2021. Also using NEMO (and unstructured modelling) to rapidly build and deploy configurations for real time maritime disaster response. The focus being on operational deployment and reliability. This necessitates a high level of automation and reliance on mature code versions.  -The NEMO nowcast framework 9 . This is a well documented collection of Python modules that can be used to build a software system to run the NEMO ocean model in a daily nowcast/forecast mode. NEMO nowcast has different, though complimentary, ambitions to this manuscript but are likely to be of interest to the reader.
-The Salish Sea MEOPAR Project Documentation 10 . This includes extensive documentation for a regionally specific NEMO configuration of the Canadian Salish Sea, which is deployed in various research projects (e.g. Soontiens and 115 Allen, 2017).
Guided by existing examples and our own experiences, in this section the focus is on workflows that can enhance reproducibilty, whilst maintaining scientific flexibility.

Organised workflows
The key route to effective workflow reproducibility, and its benefits, is via systematic documentation.

120
Central to the structure advocated here is the use of: i Version control repositories for modifications to the standard NEMO source code. We arbitrarily choose git and GitHub.
It was found that the recipes, which make the whole process transparent from software installation through to assessment and analysis, were especially important in democratising the build process. Even though they took some time to document, the benefit was immediate since multiple scientists and students could independently work on projects without continual reliance on over burdened NEMO specialists.
Documenting the whole process in detail was important since the recipes form a template for subsequent configurations, for 130 which the required modifications are hard to anticipate and could vary in nature from HPC architecture changes to alternative boundary forcing data.
We found the GitHub platform convenient for workflow management, since the code modifications, scripts and recipes (the latter being in the form of associated wikis) could be co-located under one repository. We also found that the design of an "optimal" template repository was allusive since our various projects and experiments had subtly different requirements making 135 a universal template unwieldy. We found, therefore, the most efficient approach to getting the benefits of an accelerated start on new projects was to clone and modify an existing project.

Containerisation
Containerisation presents a complimentary route to reproducible workflows that addresses the challenge of code portability between machines. The container provides a reproducible environment in which code, an ocean model in this instance, can be developed and executed. Thus, removing the common constraints of library conflicts and bespoke installations on unique HPC systems ::: This ::::::: greatly :::::::: simplifies ::::: issues ::::::::: associated :::: with :::::::: conflicts ::::::: between :::::::: software :::::: library ::::::: versions ::: and ::::::::: operating :::::: system 150 ::::::: versions. Through the use of containers, we can begin to construct an end-to-end scientific study, even including the pre/post processing tools used in a peer review publication. The use of container images has been gaining traction in academia over the last decade with several instances of their use in the geophysical sciences e.g. Hacker et al. (2017), Melton et al. (2020) and Cheng et al. (2022). languages and libraries required to run the application. This image file is used by the container software on the host system to construct a runtime environment from which to run the application, providing an attractive and lightweight method of virtualising scientific code. It provides a runtime environment that is independent of the host system, highly configurable and removes the setup and compilation issues potentially faced by the user.
Two that have gained a lot of traction within the scientific community over recent years are Docker (with a focus on cloud computing and local desktop deployment) and Singularity (in the realm of HPC systems). Though this is not yet a routine part 165 of our workflow, it has been an essential part of several successful projects and its use continues to be explored.
Example: Docker Docker is an established containerisation software package that effectively streamlines the process to build, launch and manage containers. It originally required root priviledges and message passing was not trivial. However, getting Docker installed ::::::: installing ::::::: Docker and running a demonstration NEMO container on new machines was so simple we were able to deliver workshop training to run NEMO configurations on participants' consumer grade laptops.

170
Docker abstracted all the challenges of participants' subtly different miscrosoft :::::::: microsoft and mac software libraries into a single controlled build within a container (with a linux OS) that they could each install. This demonstration is made available through the Belize_workshop 23 configuration.
A more complex example was to build a Docker container with MPICH -a portable implementation of Message Passing Interface (MPI) standard -to compile and run the NEMO ocean engine with parallel processing to run on Google Cloud In summary there are a number of forcings datasets that are required, which need to be available if the configuration is to be reproducible. By way of example, for a regional operational type ocean model (e.g. Graham et al., 2018), forcings include: rivers; lateral boundary conditions (time-varying and tidal harmonics, as appropriate); surface boundary conditions. Finally, in 215 addition, the bathymetry and grid configuration is required. All these data require a level of preprocessing to prepare them for use. A pragmatic middle ground would be to include the preprocessing methods, from the point of externally available sources, in the configuration along with a small sample of processed model-ready files. To ensure that (i) the model can be run for a short period with demonstration files (ii) forcings for long simulations can be replicated in the same way.

220
The second activity within the enhanced Reproducibility principle is Standardised Assessment. The accuracy of output from any model configuration should be assessed by comparing it to equivalent observations. In the case of idealised configurations an assessment can be made against expected outcomes from theory or laboratory experiments. This is done to quantify how closely the model is able to simulate the reality it attempts to replicate. For forecasting models it is clear why this important: the accuracy of predictions can have significant impacts on the communities involved. For scientific applications it is equally 225 important when simulating realistic regions as the scientist must have confidence in analyses, inferences and conclusions about the physical processes of interest. Error compensation may mean that improving the model with new, realistic, process :::::::: processes degrades the comparison with observations. This is likely to be acceptable for scientific applications, but less so for operational cases. Note that this is a principle rather than a prescription, since the requirements will vary according to the modelling application. In this section we provide an outline of the key ideas behind the principle, highlighting the net benefits 230 and advocating its importance. We consider there to be two elements to standardised assessment (see Fig. 1): A standardised framework The framework prescribes templates for how different (class) objects should be structured, and requires all ingested data to be of a defined class. For example, all data are transformed into Xarray xarray ::::::::::::::::::::::: (Hoyer and Hamman, 2017) objects with standardised dimensions and variable names according to its data type. This means that an equivalent assessment may be applied to data from different models (e.g. NEMO, ROMS, FVCOM, etc) and comparisons made to 235 observations from any source (e.g. profile data from EN4 (Good et al., 2013) or World Ocean Database (Boyer et al., 2018a) sources).
Common diagnostics By defining classes to be built upon Xarray datasets, the powerful data handling capabilities are accessible to the framework. Furthermore, since all data loaded into the framework are of known type and properties, generic diagnostics can be written for each class which avoids the requirement for hardwired details relating to the data origin.

240
The data source specific details are abstracted to the framework.
With this separation, contributors can more cleanly add to the diagnostics or the framework according to their interests and skills. This philosophy has been implemented, for example, in the COAsT 26 framework. 2002, for background). This extreme form may not be appropriate for ocean modelling, where the simulations are governed first by physical laws rather than skill, but the practice of having standardised assessment that can be easily executed (e.g. in 250 a script form) could nonetheless accelerate refinement of tunable parameters that are otherwise unconstrained by the physical laws.
Progress in HPC performance produces simulations with increasingly larger volumes of data. These datasets increasingly require specialised tools to diagnose or manipulate. Standardised workflows for assessment can be built that abstracts :::::: abstract aspects of the high performance computing data manipulation into generalised software libraries, and thereby jointly frees up 255 science time whilst also increasing access to a broader range of scientific users. Furthermore, with community engagement in a standardised workflow, a broad range of specialist skills can each contribute their expertise to the mutual benefit of all the users.

Benefits to the scientific knowledge base
Ideally, Standardised Assessment is bound with the principle of Reproducibility and reproducible workflows, so that any 260 shared configuration would include the scripts to generate it and also the means to generate a verification report. This practice would appear similar to the Copernicus Marine Service who provide Quality Information Documents with the catalogue (e.g. The Atlantic-European North West Shelf-Ocean Physics Reanalysis Quality Information Document 27 ), though these would be entirely automatically generated.
Standardised assessment makes it easier for the community to assess simulations of the same domain, the full battery of 265 results could be expected to appear as part of a published configuration (not necessarily peer-reviewed). This would allow for the quick and easy identification of the resultant impact of changes to the code, or alterations to parameterizations or boundary conditions, for certain pre-defined cornerstone metrics.
This transparency would relieve problems associated with selectively presenting only the most favourable outcomes when publishing in the peer review literature (under page count constraints).

Examples
The concept of standard or shareable verification, or assessment, tools is not novel. Indeed the concept is born from demonstrated successes with increased productivity (in more rapidly developing cumbersome NEMO simulations) and increased user 27  System :::::: Models : in CMIP.
-COAsT (Coastal Ocean Assessment Toolkit) 29 : a diagnostic and assessment toolbox targeted at kilometric scale regional models leaning heavily on the Xarray and Dask Python libraries (Hoyer and Hamman, 2017). The package brings simulations and observations into a common framework to facilitate assessment (see Byrne et al. (2022)).

280
-Pangeo 30 : a community project promoting open, reproducible, and scalable science providing documentation, developing and maintaining software, and deploying computing infrastructure to make scientific research and programming easier.
-nctoolkit 31 : a comprehensive and computationally efficient Python package for analysing and postprocessing netCDF 285 data.
Taking example from the open source python user community, much of the scientific software written is underpinned by open source packages such as NumPy, for scientific computing (Harris et al., 2020); Matplotlib, for plotting (Hunter, 2007); Xarray, for manipulating multi-dimensional data, originally with a geoscientific focus (Hoyer and Hamman, 2017). Though these do not directly lead to standardised assessment tools it is important to highlight: a) their fundamental importance in underpinning 290 any development of open source python scientific software development; b) they also serve as successful templates for how to coordinate, develop and maintain standardised assessment tools.

Cautions
Finally two notes of caution. The rise and fall of software stacks is strongly influenced by the ease at which it can be adapted (and kept up to date) and adopted (for new users, is the learning investment required offset by the expected gains? variables: combining 5-day averaged fields can miss the bolus interaction if the fields are correlated at at higher frequency (e.g. heat or volume transport calculations). To this end, the code would prioritise readability over efficiency and users would be encouraged to help improve the codebase.

Sustainability: a tension between cost and value
The above recommendations to enhance reproducibility have a fundamental problem: they require additional work from individuals.
Clearly a user can benefit from workflows and tools that others have created without adding to the knowledge base. This would be of no additional cost to themselves. A user could also choose to curate their code and notes in version controlled 310 repositories. Except for some initial familiarisation with new tools and practices, this is not additional work. Furthermore they would likely benefit from accelerated debugging particularly if help is sought.
However for the process to work, content must be created by people with appropriate expert knowledge and disseminated for the benefit of those seeking to upskill in this area. In some national laboratories or under large consortia programmes, roles can exist to support this community endeavor. For example NumPy has direct sponsorship and NEMO itself is sustained by 315 significant support across an international consortium. However, with limited resource any activities not appropriately valued will suffer neglect. A crucial aspect, therefore, is to appropriately recognise and reward the contributors for the value realised in shareable model configurations and assessment approaches. For example through well thought-out career pathways that acknowledge non-traditional science outputs, where peer-review is not appropriate. Output metrics could include repository page views, downloads or other esteem indicators. A practical recommendation towards this would be to normalise inclusion of 320 non-traditional contributions on CVs, which could be facilitated by the adoption of narrative format CVs. However regardless of CV format, the change would fundamentally require a culture shift towards valuing community contributions alongside traditional peer reviewed publications.

Practical considerations and worked examples in building a regional configuration
This section seeks to distill important considerations when building a new configuration. These insights are synthesised from a 325 wide range of experience and expertise across the ocean modelling community and can help prioritise elements in a sequential build plan.
The ordering is progressive, starting with prioritisation of the leading order processes for simulation, obtaining and building the code, proceeding through domain construction and then on to constructing external forcings. Each stage has a discussion highlighting the major options or considerations to be factored in the design. Each stage also has links to worked examples.  being actively developed. Though the releases pointed to herein are static and valid, they may be improved upon.)

Build planning and process prioritisation
The first stage consists of planning the new configuration and how to build it sequentially. The concept is to systematically increase the complexity of the processes represented, with associated efficacy testing. This is done in order to verify that the simulated processes behave as expected, and also to assist with error trapping should unexpected behaviours manifest. The 340 first step, therefore, is to prioritise processes that need careful attention, or that dominate the system. For example, many simulations in the South China Sea may be dominated by tides, whereas simulations in the Mediterranean, or other inland seas, may be dominated by winds. Alternatively, for simulations in the northern Bay of Bengal, special consideration may be required to accurately represent the freshwater input and its seasonal variation. Similarly, in the case of models of the Arctic or Antarctic shelves, special care should be paid to properly represent sea ice and ice shelf processes. In practice, these priorities 345 can not be set without consideration of the intended use of the model and will therefore be application specific. However, having formulated a priority list, experiments can be designed whereby separate processes can be tested to sequentially build complexity in the target configuration. This is particularly useful in light of the fact that the choice of the numerical techniques adopted to solve the governing equations will inevitably affect the realism and accuracy of many of the physical processes explicitly resolved or parameterised by the ocean model (e.g. Haidvogel, 1999;Griffies et al., 2000). Not all processes are 350 separable, or easy to separate, and so a measure of pragmatism is required to get to the final configuration with a minimum of unnecessary complications.
As a guideline, a number of worked examples are set out in the associated repositories, which include adding tidal processes to configurations where the parent models did not have tides. For these simulations, : proper representation of tides was considered fundamental. For this we choose to use terrain following coordinates, to better represent shallow water processes, and 355 tidal forcing from FES2014 (Lyard et al., 2021). For the South Asian, SANH 35 and South East Asian SEAsia 36 examples, this represents a significant departure from the parent model (without tides and computed on geopotential levels). Though many other aspects of the parent and child configuration are similar. Another example, SEVERN-SWOT 37 , details the workflow for (one-way) nesting of a 500m resolution child configuration in a macrotidal coastal regime.

Obtain code, compile model executables and build tools 360
This step is to obtain the code and build the required NEMO supported tools. This process is generic for all configurations.
However, the majority of parameters that the user will edit, typically those which define details of and control the simulation After planning the new configuration and successfully building the machinery needed to run it, the following stage is to identify the most appropriate model domain, horizontal and vertical resolutions, and discretization schemes needed to adequately resolve the spatial scales and the processes the new configuration is targeting. The final outcome of this crucial stage is the definition of the 3D geometry of the model domain, including the horizontal and vertical coordinates and the 3D grid spacings, which typically vary for variables defined on the different faces, corners and centre points of the grid cells.

385
Starting from version 4.0, NEMO loads at runtime an externally generated domain configuration file containing all the relevant grid geometry information. This separation of the grid generation process from the dynamical core permits a flexible approach to grid construction. If the planned configuration is based on an existing NEMO configuration (or idealised geometries), then the work to build a new domain configuration file can be done by tools and guidance that are supplied with NEMO.
However, if the proposed work is for a new regional configuration, as is the main theme of these workflows, then the guidance 390 outlined below is indispensable for directing the process.
-Location of boundaries: For the worked example, these are chosen with consideration of the tidal harmonics. It was verified (using TPXO9 (Egbert and Erofeeva, 2002), FES2014 (Lyard et al., 2021) or other tidal product) that none of the 4 largest semi-diurnal or 4 largest diurnal species had amphidromes near the proposed boundary, since for fixed relative errors in tidal amplitudes, small absolute errors at the boundary could scale to large absolute errors in the interior.

395
A principle of regional ocean modelling is to nest with the parent model in the deep ocean. Ocean-shelf exchange processes are complex, fine-scale and exert a first-order control on shelf seas (Huthnance, 1995), so it is expected that they would be better represented by the child than parent model. This choice comes with two penalties. The deeper water necessitates a shorter barotropic time step and steep continental slopes can cause issues of horizontal pressure gradient error for terrain following coordinate models. So the alternative of nesting on-shelf is also preferred for some studies 400 (e.g. Holt and Proctor, 2008).
-Boundary bathymetry: In a regional model, the boundary interface with the parent model is a likely source of instability, especially if the grids are different. In light of this, the boundaries can be chosen to avoid grid-scale highly irregular bathymetry near the boundary (small islands or sea mounts and steep bathymetry) and to seek near orthogonal intersections between the boundary and the features, as done in the SEAsia configuration. An alternative is to precisely match 405 the bathymetries at the boundary, with the child being interpolated to the target resolution, and then having a halo region inside the boundary where the bathymetry is smoothly transitioned to the target child bathymetry. However this sophistication is not required for our examples.
-Bathymetry reference Datum: An important aspect of processing bathymetry data, for use in a numerical model, depends 420 on the vertical vertical reference datum in the source grid. Configurations with large tides and/or surges need to pay particular attention to the datums of the source bathymetry. Often the bathymetry is referenced to Lowest Astronomical Tide (LAT), e.g. EMODnet (Lear et al., 2020). While it is useful for chart data to be referenced to LAT for navigational purposes, the bathymetry needs to be referenced to Mean Sea Level (MSL) for modelling purposes. There are however fundamental limitations on the accuracy with this process since LAT is not accurately known in the absence of multiyear 425 tidal records. Therefore the process of referencing and de-referencing the bathymetry to LAT is problematic. Nevertheless it may be achieved to first order from a long multi-decade integration of a tide-only model of the region of interest by using the lowest obtained sea level as a proxy for LAT.
-Negative bathymetry and reference geopotential height: For configurations that involve inter-tidal zones, the bathymetry can be negative relative to MSL. To deal with negative bathymetry NEMO can use a reference geopotential level defined 430 at some height above MSL, so that all potentially wet points are below this reference level. Care is required when generating initial conditions and stretching of vertical coordinates to take into account the use of a non zero reference depth.
-Critical depth for wetting and drying: In NEMO there is the option to allow for a grid cell to dry out as the tide ebbs.
This is implemented in practice by limiting the fluid flux out of the cell when a user defined minimum depth is reached.

435
The specification of this minimum depth will be application dependent (typically a few cm's) and requires a compromise between maintaining numerical stability, for a given time-step, against enhanced realism of thinner critical depths. :::: (See, :: for :::::::: example, ::: the : -SWOT configuration).
-Grid discretisation: When designing the computational mesh, the lateral extent of the domain and the 3D resolution are likely to be determined by the experiment requirements and the available HPC resources. In the horizontal direction,

440
NEMO supports structured quasi-orthogonal curvilinear quadrilateral Arakawa C-grids. These types of grids can offer a good degree of compromise between flexibility and accuracy, allowing one to improve the representation of many coastal processes (e.g. the propagation of Kelvin waves and land-ocean interactions through the aligning of grid lines with the coast (Adcroft and Marshall, 1998;Greenberg et al., 2007;Griffiths, 2013)). In addition, they can be used to refine the resolution in a specific location of the domain, to improve for example shelf-open ocean exchanges (e.g. Bruciaferri 445 et al., 2020). However, since they rely upon analytical coordinate transformations, they typically have limited multiscale capability in comparison to more versatile (e.g. triangular mesh) unstructured grids (Danilov, 2013;Holt et al., 2017) ::::::::::::: (Danilov, 2013). In the vertical direction, NEMO takes advantage of quasi-Eulerian generalised vertical coordinates s(x, y, z, t), where the time dependence allows model levels to "breath" with the barotropic motion of the ocean. In domains with shallow seas and tidal dynamics, a species of terrain following coordinates are often adopted in order to 450 provide vertical resolution to resolve highly dynamic tidal processes on the shelf (Fig.2) as well as being able to resolve the open ocean forcing conditions and structured water mass properties (e.g. see Wise et al., 2022). In a linked example (e.g. SEAsia 44 ) we chose 75 levels of hybrid z-sigma vertical coordinates. These were configured so that below the 39th level (at around 400m) the coordinates would transition to z-partial step so as to favourably compare with the parent z-partial step.

455
-Process oriented experiments: It is often useful to conduct simple numerical experiments to assess whether the chosen 3D model geometry is numerically stable and accurate enough for the target application. For example, steeply sloping model levels can introduce errors in the computation of the horizontal pressure force (e.g. Mellor et al., 1998

Initial conditions
Initial conditions can be idealised or realistic. Effective use of appropriate initial conditions can expedite the spin-up of a model in slowly evolving regions of the domain (e.g. deep water salinity). However the initial conditions are constructed, it is likely that they are imperfect and that at least some spin-up time is required for dynamical adjustment on the child grid. For example 480 the default initial condition machinery in NEMO uses only temperature and salinity with the expectation that the velocity field can be spun up from rest.
An alternative is to pose a "soft-restart" state rather than initialising the model from rest. In the "soft-restart" method, salinity, temperature, current velocities and sea surface height from, for example, a coarser resolution model or a reanalysis product are interpolated into the model grid and are then used to create a pseudo-restart file. Using this method requires a shorter spin 485 up to allow the model to adjust to any instabilities. The worked example in the SEAsia repository details how to generate a pseudo-restart for a reanalysis product.
A principle is to match initial condition and lateral boundary condition data (for temperature and salinity). A mismatch here can generate density driven currents tangential to the open boundary, which may persist long into the simulation under geostrophy. The most common potential challenges can be grouped into issues arising from inconsistencies across products:

490
-Parent to child grid interpolation: The parent and child grids are likely to be different. Therefore the coastline and bathymetric features will likely have different representations such that the child might have land points where the parent has wet points, or vice versa. Flood filling prior to interpolation (lateral filling of all land points with nearest neighbour wet point values) and downfilling isolated canyons (using e.g. SOSIE 48 tools) can address issues of bathymetric representation following interpolation. Additional smoothing of tracer fields may also be required if, for example, new straits are 495 opened by the child grid. Furthermore, the representation of the ocean interior might be different between the two grids.
-Equation of state: With a non-linear equation of state, interpolating temperature and salinity onto a child grid will not ensure preservation of static stability. Alternatively the equation of state that generated the parent data might be subtly different from the equation of state in the child model. Though these effects are likely to be small and quickly dissipate, in practice they have been seen to trigger convection in marginally stable environments. So checking for static stability 505 of the initial condition is recommended if stability issues arise in the first few timesteps.
However the initial conditions are constructed, it is likely that they are imperfect and that at least some spin-up time is required for dynamical adjustment to the child grid to occur.

SOSIE tools ([accessed 27 May 2022)]
Even if the target initial conditions are prescribed as being from a "realistic" source, it can be an instructive and time-saving route to a final configuration to start with idealised initial conditions. NEMO has the facility to compile user-defined initial 510 conditions into the executable which can be invoked by namelist parameter choices at run time. In the supporting SEAsia repository, two examples for idealised initial conditions are used: a) domain constant temperature and salinity; b) horizontally homogeneous temperature and salinity, constructed from the World Ocean Circulation Experiment (WOCE) climatology to be broadly representative of the region. The latter is used to assess the horizontal pressure gradient errors in an unforced run, thereby testing the limitations of the vertical discretisation.

Open boundary conditions
The lateral boundaries are the points that define the horizontal extent of the model domain. Information must be specified at these points to constrain the interior solutions, effectively providing a forcing to the model. When the regional model differs from the model that was used to generate the boundary data, which typically is the case, differences between the interior solutions will emerge. An open boundary is a way to specify the external forcing while allowing phenomena produced within care must be taken to assess instabilities and model accuracy deficiencies :::::::::: inaccuracies attributable to the boundary conditions.
For example, a parent model that is eddy rich may result in data that appears noisy, leading to a mismatch in dynamics at the boundary.
NEMO offers a number of namelist options to specify different open boundary condition formulations as well as set the frequency of the supplied data. This data typically comes from an external parent model with a much lower frequency (typically 530 daily, 5-daily or even monthly for global products). There is an option to interpolate in time. The namelist is organised so that boundary conditions are separated into the 2d :: 2D : depth mean velocities and sea-surface height, the 3d ::: 3D depth dependent velocities (perturbations from the depth mean) and the 3d ::: 3D tracer fields. Tidal harmonics can also be specified as part of the 2d :: 2D : fields. Following the principle of building up complexity, it is worth configuring the open boundaries for the depth mean velocities first. This can include tidal harmonic forcing.
The choice of boundary condition, for the 2d ::: 2D velocities, is primarily a choice over which radiation condition to use (e.g.

540
Flather (1976), or an Orlanski (e.g. Marchesiello et al., 2001) scheme). For the 3d :: 3D : velocities and tracers, one can also choose to relax the child field to the external data over a buffer zone or apply a condition on the normal flux or normal gradient.
For example it is possible to apply a radiation condition to the 2d :: 2D velocities, flow relaxation scheme to the tracers and zero gradient to the 3d :: 3D velocities.
The key considerations are whether the open boundaries are affecting either stability or accuracy. Some specifics to consider 545 include: -Parent-to-child: The boundary data will likely be associated with bathymetry that is different from that used by the child model. This can result in differences between the parent and child in terms of transport across the boundary. It may therefore be beneficial to match the bathymetry along the boundary. Another consideration is whether to preprocess the velocities so that the transports in the child match those of the parent when regridded. Note that NEMO allows the user 550 to provide files that contain the full velocities (2D + 3D), which it will then separate at run time.
The boundary data will also likely be associated with a vertical grid that is different from the child vertical grid. If they have not been regridded then NEMO provides an option to vertically interpolate onto the native grid at run time. For 3D velocities this could lead to inconsistency with interpolated tracer fields.
Changes in grid and bathymetry may also result in sections of boundary that are separated from land point by thin strips 555 of wet grid points. This may result in spurious currents and a need to mask out certain grid points.
The boundary velocities may also need to be rotated. If the external velocities are specified as rectangular, for example, they might require rotation to be correctly oriented on a spherical grid.
There may also be temporal differences between the child and parent. Specifically, models can be set up to ignore leap years, which may result in the boundary data becoming out of synchronisation with the child model time.

560
Finally, even if the vertical grids are the same, mismatches can occur if different types of free surface are applied: Many regional applications use non-linear free surfaces whereas global models often use fixed z-levels. These effects are strongest in the surface layers and could be mitigated by constructing boundary conditions from volume fluxes, if appropriate.
-Tides: As previously noted tidal amphidromes should ideally be away from the boundary. Additionally, as previously 565 noted, a mismatch in parent-child bathymetry can result in a mismatch in transports, this also affects transports due to tides. Relatedly, it should be ensured that tides are not present in the external boundary data if tides are also specified with harmonics.
-Volume conservation: Open boundaries can allow gain and loss of water through the boundaries which may result in drift in mean sea level, as well as accumulating dynamical errors. NEMO provides an option to maintain constant volume via 570 a correction. For a model including tides, however, this could be considered inappropriate.
-Spurious currents: Spurious currents can be generated at open boundaries that appear trapped but may affect the interior momentum over time. Areas where the boundary intersects the continental margin are particular areas of concern because the sloping bathymetry can act as a wave guide for spurious variability. A further consideration is the effect on nonphysical aspects of the model, such a :: as biogeochemistry (see Section 4.1). High vertical velocities at the boundary may 575 not be apparent due to flow relaxation at the boundary. However, tracers that are not relaxed at the boundary will feel the effect of spurious vertical currents.
Following the above guidance to build sequentially, whereby complexity is incrementally introduced ,  and 3D velocities as well as tracers :::::::::: temperature ::: and ::::::: salinity see the SEVERN-SWOT repository. The documentation includes generating the boundary conditions and setting the namelist.

585
In this discussion we consider atmospheric forced ocean models and atmosphere-ocean coupled models.

One-way atmospheric coupling
In the one-way (forced ocean) setup, it can be helpful to consider that the meteorological processes can either affect the thermohaline properties (via heat and radiation fluxes, or precipitation and can be applied as boundary conditions in the tracer equations), or they can affect the ocean momentum (via pressure, wind speed). In uncoupled configurations, ::::: These atmospheric 590 boundary layer processes must be parameterised in order to compute fluxes with which the ocean can be forced. NEMO has a number of parameterisation options (or Bulk formulations): i) NCAR (Large and Yeager, 2004), designed for the NCAR forcing, but also appropriate for the DRAKKAR Forcing Set (DFS) (Brodeau et al., 2010); ii) COARE3.0 (see Fairall et al., 2003); iii) COARE3.6 (see Edson et al., 2013); iv) ECMWF, appropriate for ERA5 data (Beljaars, 1995); v) ANDREAS (see Andreas et al., 2015). Alternatively if all the atmospheric fluxes are known, these can be supplied directly as surface boundary 595 conditions.
In addition to choosing the appropriate source and type of surface boundary conditions, there are a few additional considerations to be borne in mind when preparing the atmospheric forcing data set for a target child model that has different spatial or temporal discretisation from the parent: 1) Calendar stretching (e.g. 3 hourly forcing on a 360-day calendar being mapped to a gregorian calendar); 2) Land-Sea masks can differ between the parent atmospheric grid and child ocean grid. This is especially 600 problematic when a coarse parent grid is naively interpolated onto a finer child grid. The mismatch in coastlines results in misrepresentation of near coast heat fluxes and sea breeze dynamics, but can be alleviated using flood-filling techniques whereby extrapolation and interpolation is applied separately to land points and sea points. grid has islands that simply do not exist in the parent grid. Finally, subtle differences in the atmospheric data expected by the bulk formulae is a common source of error (e.g. reference levels, specific versus relative humidity, net versus downward long 605 and shortwave radiation etc.). So, as is often the case when re-purposing a complex system by modification: "trust but verify".
The variables are exchanged between both models via a coupler such as OASIS3-MCT (Valcke et al., 2015), and interpolated 615 between the two grids, typically using first-order conservative interpolation for scalars and bilinear interpolation for vector fields.
The coupling frequency can be optimised by considering the region over which the model is being run and the features and dynamics that dominate that area. However it must be set to a value larger than the model timestep. Wang et al. (2015) use a 3 hourly coupling frequency for their climate atmosphere-ocean model located over the Baltic and North sea but Zhao and 620 Nasuno (2020), found that a coupling frequency of hourly or sub-hourly better reproduced the SST and consequently stronger convection during the passage of the MJO :::::: Madden :::::: Julian ::::::::: Oscillation. In the regional coupled suite RCS-IND1 (Castillo et al., 2022), an hourly coupling frequency was used to capture the temperature diurnal cycle, however options to move to using a 10 minute coupling frequency are mentioned, as this might prove beneficial for modelling rapidly changing conditions as in squalls and tropical cyclones.

625
There is a risk that coupled atmosphere-ocean models may become unstable and drift when run over long periods of time due to the feedbacks between both models. To constrain the drifts, nudging or weakly coupled assimilation may be required.
However not all models require corrections, the decadal-scale run carried out by Wang et al. (2015) and 100-year simulation in Primo et al. (2019) being examples of this. Alternatively, mixed two-way and one-way forcing approaches can be applied if either coupling or direct forcing is not appropriate for the entire domain.

630
A worked example is given in the -SWOT repository using ERA5 53 data. The documentation includes these global data being cut down and manipulated for use in forcing a regional configuration and then running the model with one-way meteorological forcing.
Having identified the data sources, and the corresponding model grid points for freshwater inputs (itself a heavily labourintensive exercise that defies straightforward automation), there are further choices to be made regarding the implementation: i) How is the freshwater distributed horizontally -if the coastal outflow is a river delta, the freshwater load should be distributed between the tributary channels. Similarly, if the volume flux is large and baroclinic dispersal processes are not resolved then 650 unrealistic freshwater lenses can accumulate at the coastline. This can also be remedied by redistributing the freshwater flux across neighbouring grid cells. ii) How is the freshwater distributed vertically -the freshwater can be vertically mixed to a specified depth, or enter as a surface plume. (In all our high ::: our :::: mid latitude and tropical applications we :::: with :::::::::::::: biogeochemistry :: we :::::::: typically : mixed the freshwater over the top 10m, ::: for ::::::::: numerical ::::::: stability.) Increasing vertical diffusion at river points can be used to compensate for unresolved estuarine mixing. iii) What is the salinity and temperature of the river water: the 655 default implementation is often to add freshwater (zero salinity) at the temperature of either the sea surface or the atmosphere, however, better results are likely if observed or river model values are used. These choices can be adjusted to achieve target temperature and salinity characteristics of the plume. To date there is no accommodation for ground water fluxes, though these can be considerable (Zektser and Loaiciga, 1993). Finally, validation to ensure accurate estuary-mouth forcing is challenging.
Satellite salinity has a resolution of approximately 35-50 km for the Soil Moisture and Ocean Salinity (SMO) and 100-150 660 km for the Aquarius satellite, whereas insitu measurements capture plume features (and freshwater intensifications) at scales O(10m) that are much finer that the resolved scale but lack spatial coverage. Where possible use should be made of near coastal buoy and survey data, but in the absence of this we must settle with far field validation against hydrography, accepting that this conflates the effects of circulation and surface forcing.
The worked example in the SEAsia repository details how rivers fluxes were taken from the JRA-55 dataset (Tsujino et al., aries near large river outflows, and ii) laterally spreading the coastal source points to represent deltas and also to avoid unrealistic numerical issues if outflow values are locally too large.

High Performance Computing: decomposition and optimisation
Modelling frameworks like NEMO are equipped to be run on High Performance Computing machines. This is facilitated by 670 the optional abstraction of input-output procedures (XIOS) from the dynamical model (NEMO). These can then be separately optimised for the target machine and, crucially, most I/O to disk can be overlapped with the continuing computational tasks.
Most HPC platforms will have multiple nodes each comprising of a number of CPU cores and some shared memory. However NEMO and XIOS have different computation and memory requirements. Making the best use of each HPC platform can be further complicated by the service's charging algorithm and possible non-linearities in performance scaling arising from subtleties 675 in hardware design. Nevertheless, when configuring a simulation, there are only two sets of fundamental considerations: 1) How many compute cores will be allocated to the dynamical core and how many cores to assign to each node? Each core will have a roughly equal grid cell fraction of the surface map of the whole domain (parallelised subdomains contain the full water column).
2) How many cores should be dedicated to the XIOS processes and how many cores to assign to each node? 680 Some general guidance can be offered to address these questions: -Computational resource splitting: The best ratio of XIOS to NEMO cores depends on the volume of data to be written.
Since this is configurable at run-time via the XML files, some care is needed to provide the XIOS processes with sufficient resources to cope with any expected variations. Though there are no easy answers as to how to optimise domain decomposition, a starting point would be to allocate cores between NEMO and XIOS in a ratio of n : √ n. Often, 685 fewer XIOS cores are required but each XIOS process is likely to require access to more memory than each ocean core.
Typically, this is achieved by running XIOS cores sparsely populated on dedicated XIOS nodes or by using options to control placement of processes to CPUs on nodes running a mix of XIOS and ocean cores. The best solution will be hardware-specific. For example, on a system with 4 NUMA regions per node, you may choose to place one XIOS process alone in the first NUMA region (thereby giving it access to the maximum amount of shared memory) and to fully- happens when np_jpni and np_jpnj are < 1. The model will take the number of ocean cores provided to the parallel run command (e.g. mpirun) and will minimise the horizontal subdomain size and maximize the number of eliminated 700 land-only subdomains. It is likely that your first guess at the ideal number of ocean domains is not optimal; pay attention to messages in ocean.output for advice on better choices. See the section 7.3 of the NEMOv4.2 reference manual for more details. Experiment with a range of decompositions and use considerations of cost and "time to solution" to decide your best choice. As a rule of thumb, it is likely that performance will not scale with domains smaller than around 7 × 7 points since communication will begin to dominate over computation with domains smaller than this. However, 705 this is only a guide and NEMO developments towards exascale computing are continuing to challenge these limits. In older versions of NEMO land suppression (to avoid computation over landpoints) had to be explicitly activated . This efficiency saving is automatically activated since NEMOv4.2 but is discussed in the worked examples (e.g. SEVERN-SWOT 54 unforced run example). :::: This :::::::: efficiency :::::: saving :: is ::::::::::: automatically :::::::: activated :::: since ::::::::::: NEMOv4.2.

710
Invariably, during the course of developing a new configuration, a number of problems will be encountered where the model blows up or otherwise does not behave as expected. The general principle to more rapidly building a new configuration is to make changes slowly. In that, we mean, starting from something that works and sequentially testing incremental changes, rather than making multiple changes at once.
The most common problems new users encounter fall into two main categories: 1) compilation and start-up difficulties, 715 and 2) run-time stability issues. Or, equivalently: ' : "getting past the first time-step' and 'getting beyond the first few inertial periods'". Often, difficulties in the first category arise from inconsistencies in the build environment. It is essential, for example, to compile XIOS and NEMO executables in identical machine environments and this must include netCDF4 libraries which have been compiled with full parallel support if all the functionality of XIOS is to be used. This same environment must then be used at run-time. It is always a good idea to test the environment and batch system with one or more of the NEMO reference 720 configurations before tackling your own, bespoke :::::: custom, configurations. This will separate system issues from any issues introduced by your own changes. One way to do this is to run all or parts of the SETTE testing suite. There is a full section in the NEMO user guide 55 explaining how to do this. At a minimum, it is recommended to run the GYRE_PISCES configuration which requires no external data files.
After this, any continuing start-up issues with bespoke :::::: custom : configurations should be isolated to bespoke :: the ::::: user's : code 725 changes, errors in the inputs ::::: input files (including namelists) or resource size issues (e.g. your model requires more memory than is available on the nodes you are assigning). Precise advice will be machine-specific and depends on how the architecture is configured, but both XIOS and NEMO will detect and report many issues. whose cause is unclear. If failures are sudden, unlocated in the code and catastrophic, then recompile NEMO with compiler debug flags :: (at : a :::: bare :::::::: minimum ::::::::: employing ::: the :::: -O0 :::: -g ::::: flags) and try to obtain a traceback, which identifies where in the code the crash occurs.
Once past the first time-step, the next failures are usually due to numerical stability issues or I/O issues at a future checkpoint.
Even the best prepared initial conditions are likely to trigger fast-moving responses (e.g. coastal trapped Kelvin waves) at startup. Often, any numerical stability issues associated with this initial adjustment can be avoided by running a short period with 735 a reduced timestep. A more general section on model stability constraints is included below. I/O issues at a future checkpoint will either be errors in the input files (e.g. the model reaches a point when it requires the next month's surface forcing, but those files are not available) or output issues associated with specific output. Examples of the latter include XIOS/netCDF4 detecting 'not representable' values (usually a good indication that NaNs are occurring in the simulation) or memory limits being reached due to a sudden surge in output requests. If the latter occurs in an established simulation with no new demands 740 on XIOS, then it may be indicative of external influences changing the usual rate of I/O to disk (faulty switches etc.).
Here we touch on some general issues that can arise.

Syntax errors in XIOS:
The XML files which control the XIOS output are very sensitive and susceptible to syntax errors.
Commonly, syntax errors in the XML files manifest in a lack of output, or a model failure when writing output. To avoid these, difficult to trace, issues, it is strongly recommended that you edit the XML files independently of other changes being 745 introduced to a working configuration and test each change in short test simulations.
Model stability constraints: As with most explicitly formulated computational fluid dynamics codes it is critical that the timestep is shorter than the time it takes for substance to cross a grid box, so that the flow of mass, tracer, or the propagation of waves are resolved. In configurations with deep water and a free surface, the shallow water wave speed will likely be the timestep limiting process. Accordingly, the timestep will linearly decrease with refinement to the horizontal resolution. This

750
can make a fine-scale coastal model with some deepwater grid-points computationally expensive.

760
Code errors: Despite best endeavours, the code base can have errors (especially, for example, on branches that are being actively developed). Also, user implementation errors (e.g. inappropriate coefficients, as discussed above) can be hard to distinguish from parameterisation failures, which can both typically have explanations that are attributable to physical characteristics of the domain (e.g. the simulation blows up in a strait / over a sea mount / at a coastal inlet / on a spring tide / following intense stratification events). On the other hand, code errors arising from incorrect indices or variables may appear to have a 765 more random behaviour (e.g. crash point moves with a changing number of cores, or otherwise "surprising" or non-Newtonian behaviour).
With any advanced technology, troubleshooting is an inevitable part of developing. Therefore document your working; revert to a last working setup (if it exists) when things do not go as expected. Finally, making sure your issues are reproducible will help if you seek assistance.

4 Additional modules
One of the strengths of the NEMO framework is the broad community user base that actively develops additional modules that augment and enhance the value of the physics-only ocean simulations. In the following, experiences are shared relating to biogeochemical, wave and ice modelling.

775
A significant challenge of regional biogeochemical modelling is in initialising all the necessary fields and supplying appropriate surface and lateral boundary conditions, along with river inputs. This is complicated by the fact that data for initialising and forcing biogeochemistry models can come from a variety of sources and usually require some pre-processing before it can be applied. In particular, special attention should be paid to the data units, which might need to be converted according to the model-specific requirements. Specifically, concentrations should not be confused with fluxes.

780
A number of data sources are publicly available, both observational and modelled. Widely used observational global products include: World Ocean Atlas 57 (Boyer et al., 2018b), containing monthly gridded inorganic nutrients and dissolved oxygen in the upper 800m and annual values deeper; and GLODAP 58 (Olsen et al., 2016), containing annual gridded dissolved inorganic carbon and total alkalinity, as well as inorganic nutrients and dissolved oxygen. Modelled products, such as CMS global models, using PISCES, contain simulated nutrients and phytoplankton fields. For regionally specific products data may be available 785 from regional consortia and programmes (e.g. North Sea Biogeochemical Climatology (NSBC 59 ) (Hinrichs et al., 2017) and the EMODnet portal 60 (Lear et al., 2020)). However, observational sources include limited number of measured parameters, while modelled data is likely to be generated by a model with a different (usually simplified) trophic structure compared to the target biogeochemical model. Assumptions will therefore have to be made when generating initial and boundary conditions to infer the required fields based on the available data; application of the Redfield ratio can be a valuable tool to preserve 790 the stoichiometry when deriving the relative elemental composition of living organisms. boundary and initial conditions would come from the same source. When using data from difference sources, care must also be taken to avoid, or manage, potential mismatch issues between, for example, the depth of the nutricline and the pycnocline. In these circumstances one would expect the simulations to require a spin-up period. The length of spin-up will depend on various aspects of the region's dynamics and quality of the starting conditions of the key state variables. For example, a timescale of 795 several months is typically sufficient for most pelagic variables, whilst certain benthic variables can take several years or even decades to reach equilibrium. As with physics-only regional models, attention needs to be applied to appropriately formulating the model forcings and inputs. Generalised considerations when initialising a biogeochemistry simulation are presented in the following subsections.

800
It is generally preferable to initialise the biogeochemistry in low biological productivity periods. This reduces model uncertainty as the phytoplankton and zooplankton are at their lowest values and nutrients are mainly present in their inorganic form. In the absence of appropriate data, distributions of phytoplankton functional groups can be estimated from measured chlorophyll.
Other pelagic variables for which detailed data is not available (e.g. concentrations of dissolved organic carbon (DIC), calcite and bacteria) are typically initialised to constant values, ideally derived from any available observational data in the region.

805
Distributions of benthic variables are often difficult to estimate as there is usually little data available. Climatological profiles can be used but may not be representative of the target region of interest. It can take years, even decades, for some components (such as organic matter and macrofauna) to adjust. Our experience suggests that underestimated organic matter content will increase quicker in shallower, productive areas than overestimated content takes to decrease in deeper regions.

810
The biogeochemical fields required for surface boundary forcing will depend on the region of interest and the target application.
By way of example, nitrogen deposition is an important source of nutrients to the system and is commonly prescribed. Model products (e.g EMEP for Europe, HTAP for Global (Simpson et al., 2012;Tan et al., 2018)) are a useful data source. In shallow water regions light attenuation due to coloured dissolved organic matter (Gelbstoff absorption) can affect the radiation budget (e.g. Kara et al., 2005). It can be prescribed using data from ocean color observations such as those provided by OC-CCI 815 (Sathyendranath et al., 2019). Atmospheric pCO2 data should be provided for any model that includes a carbonate system.
Observational data can be converted from the fCO2 product available through SOCAT (Bakker et al., 2016). In some areas iron deposition may be important and can be included through modelled products such as provided by the GESAMP model ensemble (Myriokefalitakis et al., 2018). Though it is important to note that since only a small fraction of iron dust that lands on the ocean surface becomes bioavailable, any surface boundary values should be adjusted accordingly.

River Input
Observational data are (usually) the most precise source of data, but are limited in availability. In general, it is better to use the same source for all rivers included in the model setup. Therefore modelled products such as GlobalNEWS (Mayorga et al., 2010) and HYPE (Arheimer et al., 2020) can be the best option, though they will require, or implicitly make, some assumptions regarding land use. In some cases excessive anoxic conditions can form around river mouths, typically with larger rivers in 825 regions already subject to anoxia. Therefore, if riverine oxygen is needed but is unavailable, impose a concentration around the saturation value. Any data on river inputs will usually contain a limited number of variables compared to requirements of the target model. Therefore, assumptions will have to be made, for example partitioning total nitrogen inputs into various inorganic and organic fractions. A worked example generating river forcing (discharge and nutrients) from the GlobalNEWS2 model is given in the supporting repositories SANH 61 and Partridge (2022b).

Output testing
Before committing to production simulations a prudent initial test can confirm the simulation conserves mass of a tracer and that the boundary conditions are correctly applied. To this end we suggest the following procedure: Create two passive tracers (TRC1, TRC2) with a uniform initial value (e.g. 1.0). Using Neumann boundary conditions for 845 TRC1 confirm the tracer is conserved. Using FRS boundary condition for TRC2, with a low constant boundary value (e.g. 1E − 10), confirm the low values are not confined to the boundary after 10 time-steps.
When changing machine or domain decomposition it is advisable to check for bit-identical solution in test simulations.
Finally, be vigilant for negative or unrealistic values in areas that typically could cause problems. Such regions include boundaries, river mouths, or coastlines that might 'trap' and accumulate tracers.
4.1.6 Biogeochemical models used in regional configurations Here we highlight just three biogeochemical models, with which we have experience and that span a range of complexities in biogeochemical process representation.

ERSEM specific considerations
The European Regional Seas Ecosystem Model (ERSEM 62 [last accessed 20 July ::: Jul 22])) (Butenschön et al., 2016) was 855 originally designed as a marine biogeochemistry and the lower trophic levels model for the temperate European shelf seas (Baretta et al., 1995;Wakelin et al., 2020). However it has found application in global settings  as well as in regional configurations in the other parts of the world ocean, using a variety of hydrodynamic models, e.g. in the Mediterranean (Kay et al., 2020), Arabian (Sankar et al., 2018), and East China Seas (Ge et al., 2020).
ERSEM describes the biogeochemical cycling of carbon, major macronutrients (N, P, Si), and, optionally, iron. One of the  (Aumont et al., 2015). It is the default biogeochemistry module for NEMO, 875 and is distributed with the code base. (It also ships with CROCO, Coastal and Regional Ocean Community model 66 , which is based on ROMS AGRIF). PISCES is designed for regional and global applications. It has 24 tracers including two phytoplankton compartments (diatoms and nanophytoplankton), two zooplankton size classes (microzooplankton and mesozooplankton) and a description of the carbonate chemistry. Optional parameterizations can be activated to control the complexity of iron chemistry or the description of particulate organic materials.
For many applications, MEDUSA and PISCES are broadly similar, but are developed by different communities and thereby 895 offer a level of model diversity.
Since MEDUSA is a reduced complexity biogeochemistry model all the variables for the lateral boundary conditions can be supplied by parent data. This means the boundary conditions can be simplified by specifying a relaxation condition on all (physics and biogeochemistry) variables. MEDUSA coupling with NEMO is effectively one-way so, for example, phytoplankton blooms will not affect the absorption of shortwave radiation and influence upper ocean temperatures.

900
Though MEDUSA was conceived as an efficient biogeochemistry model for global applications in NEMO, it has also been used in a regional configuration. A worked example for setting up NEMO-MEDUSA in a western Indian Ocean domain is given in the linked repository 70 .

Waves
Surface wave processes mediate the transfer of energy and momentum between the atmosphere and the ocean. They are 905 associated with surface intensified fluxes of momentum and energy that gives rise to a Stokes drift (Stokes, 1847), which makes them an essential component when modelling particle drift. Waves can also modify subgridscale mixing (e.g. by injecting turbulence into the surface layers, through waves breaking (Craig and Banner, 1994), or modify the character of the turbulence, 67 PISCES Community: https://www.pisces-community.org into Langmuir turbulence (Belcher et al., 2012)); they can affect the horizontal momentum in the Ekman layer, though the interaction with planetary rotation, (Hasselmann, 1970;Polton et al., 2005). In near coastal scenarios waves can give up 910 their energy and momentum accelerating a mean flow or inducing "wave setup" (a change in background sea level) and can exert forces on the bathymetry, known as "radiation stress" (Longuet-Higgins and Stewart, 1964). Studies demonstrate value in coupling waves in regional operational systems. For example, in drift simulation (Bruciaferri et al., 2021), but also in modifying surface temperatures by changing the depth to which the warm surface waters are mixed (Lewis et al., 2019b). Coupling also impacts on the momentum transfer at the air-sea interface via alteration in surface roughness and stress, which affects the wave 915 growths (Valiente et al., 2021).
When adding waves to a regional ocean setup, a number of considerations need to be borne in mind when designing the configuration for the target application.
Spectral wave models are relatively expensive to run. Though spatially only two-dimensional, the spectra have frequency and directional discretisation (and a different timestep), which typically doubles the cost of the ocean-only component (e.g. 920 Lewis et al. (2019a) and Hashemi et al. (2015)).
This cost can be mitigated by the level of coupling used; if the waves and the ocean do not significantly affect each other then the components can be run independently, or if adequate products exist, and the ocean component does not significantly affect the waves, then data could be downloaded from e.g. CMS catalogue. Cost can also be mitigated in coupled ocean-waves configurations by decreasing the rate at which variables are exchanged between modules.

925
Having determined whether or not the wave field needs to be simulated, the level of coupling can then be chosen. There are three "zones" where coupling can be implemented: 1) At the air-sea interface: a. Waves act as a buffer to momentum exchange between the atmosphere and the ocean, for example reducing the surface stress experienced by the ocean when waves are growing. Typically this coupling is modelled one-way with a modifica-930 tion of water-side surface stress in the ocean module by wave growth and dissipation.
b. The presence of waves facilitates the transfer of momentum into the ocean. This can be captured by a wave-heightdependent ocean surface roughness in the ocean module.
c. Breaking waves inject turbulence into the upper ocean. This can be captured through modification of surface subgridscale turbulent kinetic energy in the ocean module. 935 d. Feedback on the atmosphere. Coupling waves impacts on the sea surface roughness, and consequently affects wind speed (Gentile et al., 2021), which in turn can alter the sea surface temperature.
2) In the water column: a. Though surface intensified, the Stokes drift has a deeply penetrating effect throughout the Ekman layer via its interaction with the Coriolis force. This Stokes-Coriolis term can be added as an additional force in ocean module momentum 940 equations.
b. Langmuir turbulence. This can be captured by analogy to convection, through an additional turbulent kinetic energy production term that scales with the size of the cells (Axell, 2002). Though this schemes :::::: scheme : exists in NEMO, Breivik et al. (2015) notes the mechanism is structurely different to that involving the vertical shear in the Stokes drift velocity (McWilliams et al., 1997;Polton and Belcher, 2007;Grant and Belcher, 2009) and so will have different effects 945 on the ocean surface boundary layer.
3) At the bed: a. In water much shallower than the wavelength, the wave field is modified through an interaction with the bathymetry whereby the waves exert a (radiation) stress on the bed. This bottom stress can be added to ocean module's momentum equations and or wave evolution in the wave module's equations. This will often require revisiting the ocean-model's 950 bottom friction parameters if these have been tuned in the absence of explicit wave fields.
There are choices of wave models to consider. Three state-of-the-art models are widely used: WAM (WAveModel, Komen et al. (1996)), WW3 (WAVEWATCHIII, Tolman et al. (2002)) and SWAN (Simulating WAves Nearshore, Booij et al. (1999) As an example exploring first order interactions between the wave and ocean model in a 1.5km resolution wide area configuration of the North West European shelf the following modifications were made: 1a, 1b, 2b. Code modifications to couple WaveWatchIII to NEMO using the OASIS coupler are reported in Lewis et al. (2019a), with associated linked repositories 71 .

Ice processes 960
For regional modelling in high latitudes, additional processes to represent ice are required. These include: i) sea ice (frozen sea water), which prevalent in winter high latitudes and undergoes significant yearly cycling in horizontal extent and contribution to the planetary albedo.
ii) ice shelves (floating bodies of ice attached to grounded ice sheets), which regulate the flow of grounded ice towards the ocean, in turn modulating sea level rise (Scambos et al., 2004;Joughin et al., 2010) and the formation of Antarctic Bottom
iii) icebergs (calved ice sheets), which melt and represent an important component of ice sheet mass balance and the marine stratification.

Sea ice (SI3)
Sea ice is represented using the "Sea Ice modelling Integrated Initiative" (SI3) module. SI3 is a European collaboration to pool 970 resources and develop a unified NEMO sea ice model for regional and global applications. This module specifically targets ice 71 https://code.metoffice.gov.uk/trac/utils/browser/ukeputils/trunk/gmd-2018 dynamics, thermodynamics, brine inclusions and subgrid-scale thickness variations, and is designed for an effective resolution of 10km or coarser. SI3 is designed in close harmony with NEMO and ships with the NEMO codebase.
Three data categories are required: ice thickness, snow thickness and ice fraction (or ice "concentration"). These are, of course, also required for the initial conditions.

975
Sea ice at the boundaries depend whether the ice flows into the regional domain or towards the outside. If the flow is strictly outward, then the ice at the boundary takes the characteristics of the closest grid point inside of the domain. If the flow is inward, then boundary conditions are read from a BDY file.
Boundary data can be obtained from global simulations or reanalysis. Only three fields are required: ice fraction, and ice and snow thicknesses. Additional fields can be provided for a better accuracy: ice and snow temperature, surface temperature, ice 980 salinity, and melt ponds concentration and thickness. If not provided, these optional fields are set to constant values determined in the namelist. The treatment of temperature is a bit more subtle. For instance, ice and snow temperatures are derived from the surface temperature if the latter is the only field provided.
Boundary data can be single or multiple categories type. The ice code distributes data into the number of categories defined in the namelist (jpl), assuming that the ice thickness distribution follows a Gamma function as in Abraham et al. (2015). It can 985 also distribute any number (N) of categories into jpl-categories, the only restriction being that Nand jplcan only be equal if the ice thickness distribution is the same between the parent and child models (in namelist block namitd).
On integration, the boundary fields can be set to the initial field or allow to vary, if the data exists. Only the flow relaxation (FRS) boundary condition scheme is implemented for the tracer fields, but you can use "None" if there is no sea-ice at the boundary.

990
At resolutions of order 10km and coarser, the continuum sea-ice model (like SI3) assumes that a grid cell contains representative samples of different ice floes and features. At finer resolutions this assumption is weakened, and though numerically stable, the solutions at the grid scale could be less realistic.

Ice shelves and cavities
Ice shelves, their cavities, and the interactions with the ocean can be represented with a range of configuration design options.

995
Important processes to be considered at the design stage include: i) the role of basal melt from the underside of the shelf on the ocean: whether this freshwater source need ::::: needs to be resolved or whether can it be specified (coming from observations or another model); ii) circulation under the cavity: whether this needs to be resolved in order to improve watermass properties (Mathiot et al., 2017), tidal processes or melt rates; iii) evolution of cavity geometry: whether the ice sheet and cavity geometry need to evolve with time (e.g. Smith et al., 2021). These considerations influence the implementation complexity and potentially the model stability.
In particular we focus on the representation of the ice shelf cavities, which can be open (and explicitly represented) or closed (and therefore parameterized).
If the cavity is closed an ice melt flux can be specified. Values can be sourced from an observational study (such as Rignot et al. (2013), Depoorter et al. (2013), Adusumilli et al. (2020)) or from an ice shelf simulation. Ideally the basal melt flux is distributed from the base of the floating ice shelf to the grounding bathymetry. This depth range can be extracted from an ice shelf draft data set (e.g. BedMachine Morlighem et al., 2020
Iceberg climatology: An iceberg climatology is easy to use and applicable for any regional configuration. It simply consists of a file that describes the icebergs melt for every model point. Usually, such a file is built by extracting climatological, seasonal or interannual outputs from a global simulation that uses a lagrangian ::::::::: Lagrangian : iceberg model such GO6 (Storkey et al., 2018).

1030
Lagrangian iceberg model: Lagrangian icebergs are not compatible with regional configurations in general because they are not allowed to enter/exit across the boundaries. This being said, lagrangian ::::::::: Lagrangian : icebergs can be relevant for configurations that cover a large enough area that leads to complete melt of the icebergs before reaching the edge of the domain (e.g. a circum Antarctic simulation). In order to force the lagrangian ::::::::: Lagrangian : iceberg model, the only forcing needed is a map of calving rates in km 3 of ice per year (ice density being fixed in the model namelist). To build this map, you need integrated 1035 calving rates per ice shelf as provided by Rignot et al. (2013), for example, plus a pattern for calving. Multiple solutions are possible to inform the spatial map of calving: all at one point per ice shelf; randomly distributed along the ice shelf front; or using output from an ice sheet model. A tool to distribute icebergs randomly is available as part of the NEMO CDFTOOLS toolbox 72 . Finally, the user can easily define the number of categories, the size of each iceberg category and how to distribute the calved mass across the category in the model namelist. Sensitivity of results to these choices are described in Stern et al. (2016). In addition to this, it is worth keeping in mind the two major limitations of the lagrangian ::::::::: Lagrangian iceberg model in the current NEMO version: Icebergs are 'virtual', therefore they cannot serve as anchor points for land-fast ice (Li et al., 2020), and there is no fragmentation scheme so the very large tabular icebergs do not breakup and thus have too long a lifetime (Eng, 2020).

Ice processes demonstrator 1045
To explore a NEMO configuration with a mix of parameterized and explicit ice shelves, tides, sea ice and an iceberg climatology, a Weddell Sea worked example 73 is given in the form of a demonstrator. This is adapted from the WED025 configuration of Bull et al. (2021). The "expected results" section illustrates the melt rate and barotropic streamfunction under Filchner Ronne Ice shelf in a 1/4 • simulation. In this demonstrator the iceberg melt climatology is specified as an additional runoff source.There is no demonstrator for the lagrangian ::::::::: Lagrangian : iceberg module. As reference, please see the description of the 1050 global GO6 simulation setup (Storkey et al., 2018).

Discussion and Conclusions
Following decades of groundwork development and HPC evolution, complex system regional ocean modelling is in the ascendancy. The paper has two parts. In the first part the principles set out here are emergent from the current working practices and are set out in order to make the current level of endeavour sustainable. There are 3 main challenges: 1) computational 1055 oceanography is increasingly looking towards computer science to advance its capability on diverse computer architectures (Porter et al., 2018), yet the mental bandwidth of an individual human is approximately constant. 2) ocean configurations are increasingly complex and hard to reproduce.
3) The research community does not have a mechanism, or is not used to, valuing technical outputs that are contributions to the community.
We advocate a stronger implementation of Reproducible practices and make the case that through systematic workflows, and 1060 standardised assessments, skills can be democratised, debugging can be accelerated, and in general, more time can be directed to scientific questions. Containerisation could be a means to achieve some of these goals.
In this paper we have made a distinction between making a configuration reproducible (by e.g. a machine) and making the workflow for generating a new configuration reproducible by a machine. This is because, in our scientific applications we invariably are pushing frontiers in some aspect of the configuration building process. e.g. with new process representation, 1065 grid structure, machine architecture etc. So it is desirable to accelerate the process in getting to the point of departure from standard, but not the whole process of configuration building. On the other hand, the emerging concept of Digital environmental science -purpose driven (systems of) simulations targeted at societally relevant questions -is an additional motivation for documenting regional ocean configurations that are readily reproducible, ideally by a machine.
In this paper we have presented the case for standardised assessment, built on common tools and common diagnostics. This 1070 is motivated by the increase in data volume that routinely now requires remote, and potentially parallel, processing close to the data. In these situations the availability of proprietary software licences can not be assured and so we recommend python developing tools that are i) freely available and ii) can abstract computer science from the science by effective use of open source packages like Dask and Xarray. We also recommend common diagnostics. Motivations here are 1) full transparency of model performance. Ideally these outputs would be made available with a configurations, whereas a paper might typically 1075 focus on more positive aspects. 2) Accelerate configuration development by having a battery of targeted tests that can be run in script form; 3) redirect duplicated effort (though there is great value in writing/understanding your own code) into more powerful share tools.
In this paper we highlight the need to recognise and value non-traditional science outputs that are increasingly an essential part of the science we do. Esteem in the traditional academic structure is gained through peer review outputs and proposal 1080 successes, but we increasingly rely on Open Source community contributions to achieve these (e.g. model source code bug fixes, enhancements and user guides or software stacks). It is our consideration that these contributions should be "cv-able" and, perhaps more challenging, that established scientists in positions of recruitment would expect to see such contributions on CVs.
In the second part of this paper, we amass collective wisdom across a swathe of the community to offer insights on the 1085 various elements that go into building a new regional ocean configuration. These insights are targeted at new starters with the aim of passing on understanding rather than just stating what is usually done. We have distilled the advice as much as possible and abstracted details to stand-alone repositories, thereby making the advice herein more general and long lasting. In this part we go through the build process for a realistic geometry physics configuration, then in the additional modules we introduce advice for biogeochemistry, waves, and ice processes. We do not present guidance as an instruction manual but instead aim to 1090 share the fundamentals which would equip future modellers to make strong design choices and accelerated debugging.
Code and data availability. This work was underpinned by experiences building a number regional NEMO configurations. Specifically cited in this work include: -Workshop material featuring containerisation of an idealised NEMO regional model of Belize, to demonstrate the principles of running an ocean model and diagnosing its output (https://doi.org/10.5281/zenodo.6417227, with recipes and code), Mayorga Adame et al. (2022).