Towards Multiscale Modeling of Ocean Surface Turbulent Mixing Using Coupled MPAS-Ocean v6.3 and PALM v5.0

A multiscale modeling approach for studying the ocean surface turbulent mixing is explored by coupling an ocean general circulation model (GCM) MPAS-Ocean with the PArallel Large eddy simulation Model (PALM). The coupling approach is similar to the superparameterization approach that has been used mostly to represent the effects of deep convection in atmospheric GCMs. However, since the emphasis here ::: the ::::: focus :: of ::: this ::::::::: multiscale :::::::: modeling :::::::: approach is on the small-scale turbulent mixing processes and their interactions with the larger-scale processes , a high-fidelity, three-dimensional large eddy 5 simulation (LES) model is used, in contrary to a simplified process-resolving model with reduced physics or reduced dimension commonly used in the superparameterization literature : in ::: the :::::: ocean, :: so ::: that :: a :::: more ::::::: flexible ::::::: coupling ::::::: strategy :: is :::: used. To reduce the computational cost, a customized version of PALM is ported on the general-purpose graphics processing unit (GPU) with OpenACC, achieving 10-16 times overall speedup as compared to running on a single CPU. Even with the GPU-acceleration technique, superparameterization of : a :::::::::::::::::::::: superparameterization-like :::::::: approach :: to :::::::: represent : the ocean surface turbulent mixing 10 using :: in :::::: GCMs ::::: using ::::::::: embedded : high-fidelity and three-dimensional LES over the global ocean in GCMs is still computationally intensive and infeasible for long simulations. However, running PALM regionally on selected MPAS-Ocean grid cells is shown to be a promising approach moving forward. The flexible coupling between MPAS-Ocean and PALM outlined here allows further exploration of the interactions between :: the : ocean surface turbulent mixing and larger-scale processes, and development of better :: as :::: well :: as ::::: future ::::::::::: development ::: and ::::::::::: improvement ::: of ocean surface turbulent mixing parameterizations in 15

coarse-resolution ocean GCM has been explored by Campin et al. (2011). They show that the superparameterization approach can capture many of the important features in a non-hydrostatic high-resolution simulation of an idealized open-ocean deep convection with much less computational cost. 60 For the open-ocean deep convection problem, the somewhat horizontally axisymmetric features and the relatively large vertical scale of the convective plumes justify the use of a plume-resolving model on a two-dimensional slice with the same vertical grid as the coarse-resolution GCM in Campin et al. (2011), similar to the application of superparameterization to the cloud problem by Grabowski and Smolarkiewicz (1999) and Khairoutdinov and Randall (2001). However, three-dimensional process-resolving models may be required to faithfully represent other small-scale processes in the OSBL, especially those that Essentially, : a separation of scales is often assumed and the exploration of the parameter space is often restricted to a certain regime. Whereas in :: In reality, processes at different scales interact with each other and the parameter space is vast. Li et al. (2019) demonstrate a way to consistently evaluate different OSBL vertical mixing schemes over a variety of realistic surface forcing and background stratification conditions. It can potentially also be used to guide a systematic exploration of the parameter space using LES. However, such approach ignores the interactions between the OSBL turbulence and larger-scale 95 processes, such as submesoscale fronts and eddies, which have been shown to be important (e.g., Hamlington et al., 2014;Bachman and Taylor, 2016;Fan et al., 2018;Verma et al., 2019;Sullivan and McWilliams, 2019).
To study the interactions between the OSBL turbulence and larger-scale processes, simulations that resolve all the important processes across multiple scales are necessary. But they can be extremely computationally expensive. For example, LES that simultaneously resolves the OSBL turbulence and permits some submesoscale features requires a grid size of O(1 m) and 100 a domain size of O(10 km) in the horizontal (e.g., Hamlington et al., 2014;Verma et al., 2019;Sullivan and McWilliams, 2019). Alternatively, the impact of large-scale processes on the small-scale OSBL turbulence can be studied by applying an externally determined large-scale lateral buoyancy gradient in a much smaller LES domain (e.g., Bachman and Taylor, 2016;Fan et al., 2018), though at the expense of missing the feedback to the large-scales. In this sense, a superparameterized GCM may provide an intermediate approach to systematically examine the impact of large-scale processes (resolved in the 105 GCM) on the small-scale OSBL turbulence (in LES) and vice versa, with a loose but controllable coupling between the two.
Note that so far we have not explicitly defined the scale of the large-scale processes, as long as it is larger than the OSBL turbulence. Therefore, we may use a high-resolution regional configuration or a low-resolution global configuration in the coupled framework depending on the problem to be addressed, allowing some flexibility to choose the coupling strategy. So the goal here is to better understand the processes that affect OSBL turbulent mixing in addition to the surface forcing and 110 background stratification, which will promote further improvements of the conventional OSBL parameterizations.
Simulating processes across scales is a major goal of MAPS-Ocean with its ability to regionally refine the resolution using unstructured horizontal grid. This allows ::::::: seamless : regionally focused high-resolution simulations, e.g., in the coastal regions, in a large background environment without being nested inside a coarse-resolution simulation. Yet the smallest scale MPAS-possible applications moving forward, are discussed in Section 4. This paper ends with a brief summary and main conclusions in Section 5. 160 2 Methods 2.1 Multiscale Modeling :::::::: Coupling :::::::::::: MPAS-Ocean :::: with ::::::: PALM As an illustration of the essential elements of coupling across scales, here we use a standard Reynolds decomposition to separate the dynamics into a large-scale and a small-scale. The large-scale and small-scale dynamics are then solved by different sets of equations . This is an implementation of the heterogeneous multiscale method (E et al., 2003, 2007). A broader review of 165 multiscale modeling approaches in geophysical fluid applications can be found in Grooms and Julien (2018).
The equations governing the small-scale fluctuations are derived by subtracting Eqs -from Eqs. -,

225
Different levels of coupling between scales ::::::: tightness :: in ::: the ::::::: coupling : are achieved by approximating the large-::::::::::: combinations :: of ::::::: different ::::: F u h SS , :::: F θ SS , ::::: F u LS , and small-scale forcing terms differently as detailed in the following. The large-scale dynamics are usually solved by GCMs. At the scales that GCMs typically resolve, hydrostatic approximation (e.g., ∂ t w b) is assumed so that the w equation reduces to the hydrostatic balance, In addition ::: F θ LS . :::: For :::: F u h SS :::: and ::: F θ SS , the horizontal and vertical components of F u SS and F θ SS are often parameterized separately :: in :::::: GCMs due to the large aspect ratio of the grid cell (the horizontal extent of a cell over the vertical). Each component further consists of contributions from different processes. Traditional superparameterization approach estimates the vertical component of F u SS and F θ SS:::: They ::::: often :::::::: represent ::: the ::::: effects ::: of ::::::: different :::::::: processes, from a process-resolving model that solves for u and θ . It replaces the conventional vertical mixing parameterization, but essentially retains the same scale separation and conditional equilibration assumptions. The horizontal component of F u SS and F θ SS is assumed to result from large-scale processes that are decoupled to the small-scale processes solved by the embedded process-resolving model with a small horizontally homogeneous domain. Such large-scale processes are represented by other 240 parameterizations in GCMs. The point approximation approach proposed by Grooms and Majda (2013) allows the horizontal component of F u SS and F θ SS from the embedded small-scale dynamics in additional to the vertical. However, the relevance of the horizontal components depends on the spatial and temporal scales of the dynamics governing the u and θ fields. Scale separation of the dynamics among more than two scales may be necessary, as commonly assumed by separate parameterizations of different processesin GCMs, such as mesoscale eddies, which contributes :::::::: contribute mostly to the horizontal fluxes, and sub-245 mesoscale eddies and boundary layer turbulence, both of which contribute mostly to the vertical fluxes. ::::: Thus, :::: scale ::::::::: separation :: of ::: the :::::::: dynamics :::::: among ::::::: multiple ::::: scales :::: may :: be ::::::::: necessary. For the purpose here to represent the ocean surface :::::: vertical : turbulent mixing in GCMs, we only explicitly consider the vertical component of F u SS and F θ SS :::: can ::: be :::::::: estimated :::: from ::: the ::::::: vertical :::::::::: convergence ::: of ::::::::: momentum : and θ as ∇ h , the large-scale forcing terms in Eqs. and can be written as, where w = 0 is assumed in the embedded domain.
Alternatively, one can solve the small-scale dynamics by solving for the total momentum u and tracer θ in the embedded 260 domain, and enforce :::: tracer :::::: fluxes :: in :::::: PALM :: by, : h F u h VSS ::: θF θ VSS :::: locally at each coupling stage where is the average over the embedded domain . Writing both Eqs. -and Eqs. -on the local embedded domain and adding them together, noting that ∇ h φ = 0 where φ represents either momentum or tracers, we get, where ∇ = (∇ h , ∂ z ) and the hydrostatic balance in Eq. is used. Ignoring the large-scale lateral gradient terms F u LS and F θ LS , Eqs. -reduce to the equations in the traditional superparameterization approach. The constraints and are enforced by 270 either applying a nudging term in Eqs. and (e.g., Khairoutdinov et al., 2005) or mapping from u h and θ to u h and θ to keep them consistent at the beginning of each coupling step (e.g., Campin et al., 2011). In the superparameterization approach, the embedded domain does not necessarily fill the large-scale GCM grid cell over which the spatial average replaces the Reynolds average. However, to allow the large-scale forcing terms F u LS and F θ LS in Eqs. and , appropriate estimates of the lateral gradients have to be made. The relevance of these terms therefore depends on the ratio of the large-and small-scales of interest. See

275
Section 4 for more discussion on this issue.
As a first step towards a multiscale modeling framework outlined in Section ??, MPAS-Ocean is coupled with PALM using a simple approach similar to the traditional superparameterization, with a few important exceptions for enhanced flexibility.
Although this remapping method is conservative, monotonic and highly accurate, loss of information is unavoidable, especially when remapping from high-resolution to low-resolution. When remapping the PALM fields to the MPAS-Ocean grid, the averaged effect of the high-resolution PALM is applied to the low-resolution MPAS-Ocean, which is what we want. But 350 when applying the large-scale forcing from low-resolution MPAS-Ocean to the high-resolution PALM fields, a simple nudging ::::::: relaxing to the MPAS-Ocean fields will cause loss of information, e.g., near the base of the mixed layer where the gradients are strong. To reduce such loss of information due to remapping, both the large-scale and small-scale forcings are evaluated on the original vertical grid and then remapped to the targeted vertical grid. In this way, the relatively sharp vertical gradients in PALM fields that are not resolved in MAPS-Ocean are preserved.

355
Instead of initializing a PALM instance at each coupling step and running it to quasi-equilibrium, we initialize the LES :::::: PALM at the beginning of the MPAS-Ocean simulation (by calling the initialization subroutine) and let it run throughout the simulation (by repeatedly calling the time-stepping subroutine to step forward). See Fig. 1 for an illustration of MAPS-Ocean and PALM running in parallel. This is intrinsically different from using LES as a replacement of the conventional parameterizations(thus the term superparameterization), in which an equilibrium of turbulence to the changing external forcing is often assumed.

360
The equilibrium assumption may be valid if the time step of the coarse-resolution model is long enough, within which the turbulence can adjust to the changes in the external forcing. However, as the time step of the ocean GCMs become shorter and shorter (half an hour for a common global MPAS-Ocean simulation but much shorter for regional simulations), the equilibrium assumption may no longer be valid. The coupling approach here allows for disequilibrium of turbulence when the forcing conditions change rapidly.

365
As in KPP, PALM is running at the center of an MPAS-Ocean :::: grid cell. A mask variable is introduced to allow PALM to run only on selected MPAS-Ocean grid cells . ::::: while KPP is used for other grid cells following Van Roekel et al. (2018). However, this also means that the PALM cells are essentially coupled with the neighboring KPP cells. When the momentum of MPAS-Ocean and PALM are tightly coupled, e.g., through a nudging :::::: relaxing : term with a short time scale, the solution of neighboring KPP cells will influence the solution in PALM, which is undesirable. Using a relatively longer nudging :::::::: relaxation time scale of a few hours alleviates this problem. We ::: will return to this issue in Section ?? : 3 : using an idealized diurnal heating 380 and cooling case in the single column MPAS-Ocean setup.
Running PALM on only selected MPAS-Ocean grid cells may cause a load imbalance, as the grid columns of MPAS-Ocean with PALM running will be much slower than those without. This problem can be alleviated by running PALM on GPU (see the next section) and carefully designing the mesh decomposition to balance the CPU jobs and GPU jobs on each computational node.
Significant modification and reduction of the PALM code (version 5.0) were conducted before porting it on GPU. In particular, all the special treatment of complex topography and surface types were removed, along with the atmospheric variables, all of which are irrelevant to our OSBL application here. We also removed the option to use different numerical schemes and fluxes due to turbulent mixing, whose vertical divergence in Eqs. (11) and (11) are used in MAPS-Ocean. MPAS-Ocean provides PALM the surface forcing and the large-scale forcing, which are nudging :::::: relaxing : terms following Eqs. (11) and (12) here, but can be extended to include the lateral gradients in Eqs. (11) and (11)  hard-coded it to use the 5th and 6th order advection scheme of Wicker and Skamarock (2002) and 3rd order Runge-Kutta time-stepping scheme. We used an external fast Fourier Transform library (FFTW) for the pressure solver (an option in PALM) in the CPU version for benchmarking. These steps significantly reduced the amount of effort required to port PALM on GPU.
This customized version of PALM was ported on GPU using OpenACC directive-based parallel programming model (openacc.org) and the NVIDIA CUDA Fast Fourier Transform library (cuFFT, developer.nvidia.com/cufft). Porting PALM involved two iterative steps: (1) parallelizing the loops and (2) optimizing data locality. In the first step we wrapped all the loops in the 405 time integration subroutine with the OpenACC "parallel" directive, which allows for automatic parallelization of loops. Some loops, especially in the tridiagonal solver and the turbulence closure subroutines, were restructured to ensure independency.
Lower degree of parallelism was employed for loops that cannot be easily restructured to remove the dependency between iterations. This step generally slows down the code because of the large amount of data exchanges between the CPU and the GPU occurs. Therefore, optimizing data locality is required. This was done by enlarging the data regions of the parallelized 410 loops on GPU. Then we moved on to parallelize another segment of the code and iterated between the two steps.
In the process of porting the code, some loops are executed on the CPU and some on the GPU. So extreme caution is required to make sure the data are synchronized between the CPU and the GPU when necessary. Eventually, the data region on GPU is large enough to cover the entire time integration subroutine. This reduces the data exchange between the CPU and the GPU to mostly at the beginning and the end of the time integration, which will occur once per MPAS-Ocean time step. As most of the The speedup of porting PALM on GPU was benchmarked by running the standalone PALM with and without GPU on two 420 machines: (1) a Linux workstation with an Intel Xeon Silver 4112 CPU @ 2.60GHz and an NVIDIA Quadro RTX 4000 GPU; and (2) the High Performance Computing system at Oak Ridge National Laboratory (Summit) with 2 IBM POWER9 CPUs and 6 NVIDIA Tesla V100 GPUs on each node. For the benchmarking on Summit :::: both :::::::: machines, only 1 CPU and 1 GPU were used. Fig. 3 shows the speedup in total run time and the three most time-consuming subroutines. The speedup factor is defined as the ratio of the run time on 1 CPU divided by the run time on 1 CPU and 1 GPU. Overall a , ::: all :::: with : 1 :::: MPI ::::: task.
3 Results :::::::::: Evaluation 440 Here we validate ::::::: evaluate the coupled MPAS-Ocean and PALM using two idealized test cases. The first and simplest is in a single column configuration, which was used throughout the development. The goal of this configuration is a test of one-way coupling from small-scale to large-scale dynamics in Eqs. (4) and (4) by estimating the small-scale forcing terms in Eqs. (11) and (11) from PALM. This also allows a test of the remapping between the vertical grids of MPAS-Ocean and PALM.
The second test case is a simulation of mixed layer eddies :: in :::::::::::: MPAS-Ocean with PALM running on multiple MPAS-Ocean 445 grid cells. The goal of this test case is twofold. First, this case allows us to test the capability of running multiple PALM instances within MPAS-Ocean and distributing the computation ::::::: different :::::: PALM ::::::: instances : on multiple GPUs. Second, this case provides a way to test the two-way coupling between MPAS-Ocean and PALM under more complex and realistic conditions.
In particular, the development of mixed layer eddies in this case introduces spatial heterogeneity and large-scale forcing from advection and lateral mixing, though baroclinic instability in PALM is excluded as a result of missing the lateral gradients. 3.1 Single Column Test :::::: column :::: test MPAS-Ocean allows for a minimum of 16 columns in a "single column" mode, in which the 16 columns are forced by the same surface forcing and are essentially identical to each other due to the lack of lateral processes. Therefore, ideally running PALM in the single column MPAS-Ocean is essentially the same as running the standalone PALM as there should be no large-scale forcing terms. In practice, this is true only if PALM is running in all 16 columns. If PALM is only running on one of these columns while KPP is used in other columns, as illustrated in Fig. 2, different solutions between PALM and KPP under the same forcing conditions will result in some spurious large-scale forcing to PALM. This is especially the case for the momentum since the momentum is solved on cell edges in MAPS-Ocean, i.e., affected by the momentum fluxes from both the PALM cell and the KPP cells. This 16-cell "single column" MPAS-Ocean configuration allows us to assess the impact of this issue and explore possible remedies. The surface forcing is an idealized diurnal heating and cooling with constant wind stress of 0.1 N/m 2 . An idealized diurnal cycle of the solar radiation F s is used,

470
where F 0 = 500 W/m 2 is the maximum solar radiation at noon and t is the time of a day in seconds. Absorption of solar radiation in the water column is computed using the two-band approximation of Paulson and Simpson (1977) assuming Jerlov water type IB. A constant surface cooling of −159.15 W/m 2 is applied at the surface, balancing the solar heating when integrated over a day. Coriolis parameter is set to 1.028×10 −4 s −1 , equivalent to the value at a latitude of 45 • N. This test case yields an inertial oscillation in the velocities with a period of about 17 hours in addition to the diurnal heating and cooling.

475
The coupled MPAS-Ocean and PALM is tested by running PALM only at the sixth cell c 6 (see Fig. 2 where φ * ::: is the updated PALM field, φ :: φ f : the original, φ :: φ f : the horizontal average over the PALM domain, and φ :: φ c 485 the initial condition from MPAS-Ocean (see Fig. 1).
We first compare the solutions of this test case in standalone PALM and KPP in Fig. 4. It is clearly seen that in this test case KPP gives quite different solutions than the standalone PALM. In particular, KPP yields slightly stronger vertical mixing indicated by the deeper boundary layer by a few meters. As a result, the simulated temperature is cooler throughout most part  Fig. 5). In particular, although the momentum profiles appear to be slightly closer to the standalone PALM as the nudging ::::::: relaxing is stronger, the temperature starts to be affected by this strong coupling with neighboring cells. The surface temperature 505 (upper 10 meters) gets warmer due to the diurnal heating and the layer immediately below gets slightly cooler as compared to the standalone PALM. This is probably an artifact due to the mismatching tendencies of warmer surface temperature from PALM and weaker velocity shear from KPP, making the surface layer in the embedded PALM more stable. The momentum near the base of the boundary layer is also strongly influence by neighboring cells running KPP. Similar behavior is also seen in a pure shear driven entrainment case (not shown).
3.2 Mixed Layer Eddy Test :::: layer :::: eddy :::: test 535 The setup of the mixed layer eddy test case is guided by similar simulations in both GCMs (Fox-Kemper et al., 2008) and LES models (Hamlington et al., 2014). However, since our focus is not on the mixed layer eddy itself, we are not comparing the simulation results here with those in the literature. Instead we focus on the coupling between MPAS-Ocean and PALM in the presence of some mixed layer eddies.
Here we run MPAS-Ocean with a two-front, or warm filament, setup on a domain of 72 km × 62.4 km × 150 m with 14400 540 cells and 50 vertical levels, corresponding to horizontal (distance between cell centers) and vertical grid sizes of 600 m and 3 m, respectively. The initial temperature field T 0 is uniformly distributed in the x-direction with two fronts in the y-direction, where T b = 16 • C is the temperature at the base of the mixed layer, H = 50 m the mixed layer depth, N 2 ml = 1.96 × 10 −7 s −2 the stratification within the mixed layer and N 2 int = 1.96 × 10 −5 s −2 below, corresponding to ∂ z T = 10 −4 • C/m and ∂ z T = 10 −2 • C/m, respectively, using a linear equation of state with the thermal expansion coefficient α T = 2 × 10 −4 • C −1 and the gravity g = 9.81 m/s 2 . The front width is L f = 10 km, with the horizontal buoyancy gradient M 2 f = 3.92 × 10 −8 s −2 (∂ y T = 2 × 10 −5 • C/m). The initial front locations are y 1 = 15.6 km and y 2 = 46.8 km. The salinity is constant at 35 psu.
The Coriolis parameter is f = 10 −4 s −1 . These parameters corresponds to a Richardson number Ri = N 2 ml f 2 /M 4 f = 1.28 in 550 the frontal regions. The two fronts are initialized from rest (unbalanced with the horizontal temperature gradients) with small perturbations on the temperature fields to promote the development of mixed layer instability. The surface is forced by constant wind stress of 0.1 N/m 2 in the x-direction. Since the MPAS-Ocean simulation domain is doubly periodic in the horizontal, the two fronts move to the right of the wind direction due to Ekman transport. The result is a stable front in which warmer surface water move over on top of the cold water and an unstable front in which both the Ekman-driven convective instability and 555 baroclinic instability occur.
The MPAS-Ocean simulation is first spun up with KPP for 15 days with a time step of 5 min, allowing the mixed layer eddies to develop. Then starting from day 16, three branch runs for 30 hours are conducted, which are the focus of the analysis here.
The first run continues to use KPP in all the cells. The second run uses PALM at 8 selected cells and KPP elsewhere, with a PALM domain of 160 m×160 m×120 m and 128×128×120 grid cells, and the same coupling strategy as in the single column 560 test using nudging ::::::: relaxing time scales of τ θ LS = 0.5 h and τ u LS = 5 h. The third run is the same as the second (i.e., same initial condition, surface forcing and PALM cells), except there is no coupling between MPAS-Ocean and PALM (i.e., no exchange of large-scale and small-scale forcings). Comparing the former two runs shows the different responses in KPP and PALM to the same surface wind forcing and large-scale forcing due to mixed layer eddies, whereas comparing the latter two shows the impact of the large-scale forcing on the small-scale turbulence in PALM. 565 Fig. 7 shows a snapshot of the temperature and relative vorticity fields at the beginning of day 16. Note that the unstable front initially at y = 46.8 km has moved to around y = 25 km whereas the stable front initially at y = 15.6 km has moved to around y = 45 km (out and re-entering the domain) due to the Ekman transport towards the negative y-direction and the doubly periodic domain. The locations of the eight MPAS-Ocean cells running PALM is marked and labeled for quick reference. For brevity, here we only present the results of four representative cells: cells 1 and 3 locating at both sides of the unstable front, 570 where active mixed layer eddies are developing; cell 7 and 8 locating at both sides of the stable front, with the latter strongly affected by the Ekman advection of surface warm water. Fig. 8-10 show the time evolution of the temperature and the zonal and meridional velocities at four locations. It is clearly seen that at cells 1 and 3, both the temperature and momentum are strongly affected by the large-scale forcing due to the mixed layer eddies (panels a, b, d and e in all three figures), where otherwise under constant surface wind forcing and rotation would 575 develop strong inertial oscillation with a period of about 17 hours (panels c and f in Fig. 9 and 10). The similarity between KPP and PALM at these two locations suggests the dominance of large-scale forcing due to mixed layer eddies in the evolution of temperature and momentum, while small differences are still noticeable suggesting the potential importance of boundary layer turbulence in feeding back to the evolution of mixed layer eddies. At cell 7, the large-scale forcing is relatively small in  Traditionally an OSBL parameterization is often tuned against some forced LES without large-scale forcing, as in the uncoupled PALM case (right panels of Figs. 8-11), whereas a mixed layer eddy parameterization is tuned using GCMs with 595 some OSBL parameterization, as in the KPP case (left panels). The results here suggest that we might need to explicitly consider the influence of the large-scale processes ::::: mixed ::::: layer :::::: eddies on the OSBL turbulence and vice versa when tuning the parameters of a parameterization :::::::::::::: parameterizations. The coupled MPAS-Ocean and PALM provides a way to do this. It should be noted that here we are showing the differences, rather than quantifying the errors, between the runs with and without the coupling across scales. The latter would require a careful evaluation against an LES simulation of the interactions between 600 mixed layer eddies and the OSBL turbulence (e.g., Hamlington et al., 2014). ::: We ::: are :::: also :::::: missing ::: the ::::: effect ::: of ::: the ::::::::: large-scale ::::: lateral :::::::: gradients :: on ::: the :::::::: dynamics ::: of ::: the ::::::::: small-scale ::::::::: turbulence :: in ::: this :::: test :::: case, :::::: which ::: will ::: be :::::::: discussed :: in :::::: Section :::: 4.1.

Discussion
Traditionally, the primary goal of embedding a fine-resolution process-resolving model inside a coarse-resolution GCM is to improve the skill of the coarse-resolution GCMs by improving the representation of small-scale processes(thus the term 605 superparameterization). Alternatively, here we show that embedding a high-fidelity, three-dimensional LES in GCMs also provide a useful :::::::: promising : framework to systematically conduct process studies of turbulent mixing in the OSBL in the context of other larger-scale processes, such as submesoscale eddies and fronts. In the traditional case, the interests are on the large-scale GCMs so that the embedded fine-resolution model does not have to be accurate on its own merits, i.e., we only need to represent the most important effects of the small-scale processes on large-scales. This opens the door for various choices 610 of the embedded small-scale model with reduced dimension or reduced physics, such as stochastic models (e.g., Grooms and Majda, 2013) or machine learning models (e.g., Brenowitz and Bretherton, 2018;O'Gorman and Dwyer, 2018). In our case, however, the interests are also on the small-scale processes. Therefore, a process-resolving model with high-fidelity such as an LES is required ::::::: necessary. One advantage of applying the coupled MPAS-Ocean and PALM to focused process studies, as compared to the traditional 615 approach of forced LES, is that the small-scale LES is essentially coupled, though rather loosely, with the large-scale GCM.
The effect of large-scale processes, such as advection and lateral mixing, are naturally accounted for at different levels of completeness depending on the coupling strategy. We can therefore use different coupling strategies to study the interactions between the large-scale processes and the small-scale turbulent mixing. Additionally, this approach is also much computationally efficient as compared to an LES on a large domain to resolve all the important processes, which is often required to study the 620 coupling across scales (e.g., Hamlington et al., 2014;Sullivan and McWilliams, 2019).
4.1 Remaining Issues ::::: issues The coupling between MPAS-Ocean and PALM presented in Section 2.1 currently excludes the effect of horizontal ::::: lateral gradient of large-scale quantities. Horizontal gradient of large-scale quantities is :::::::::: Large-scale ::::: lateral :::::::: gradients ::: are needed in the small-scale dynamics to allow processes like baroclinic instability (e.g., Bachman and Taylor, 2016) and is ::: are essential for the energy transfer from the large-scale to the small-scale. In the superparameterization literature, as well as the conventional OSBL parameterization literature, it is often assumed that there is a scale separation between the large-scale processes simulated by the GCMs and the small-scale turbulent processes simulated by the embedded process-resolving model or represented by the parameterization schemes. As the horizontal resolution of the GCMs increases, therefore the scales of the resolved process and the subgrid-scale process ::::::: processes :::: and ::: the ::::::::: small-scale :::::::: turbulent :::::::: processes : get closer, this assumption may start to break. The 645 multiscale modeling approach presented here can by extended to incorporate such effect by allowing the large-scale forcing terms in Eqs. (11) and (11). Similar approach has been suggested in, e.g., Grooms and Julien (2018) for multiscale modeling, as well as applied in LES studies of the frontal regions (e.g., Bachman and Taylor, 2016;Fan et al., 2018). The tricky part is estimating the horizontal ::::: lateral gradient of large-scale quantities, which strongly depends on the horizontal resolution of the GCMs. Therefore, great caution has to be exercised when interpreting the resolved horizontal ::::: lateral : gradients in the coarse-650 resolution GCMs. In addition, as the size of a grid cell in the GCM becomes similar to the size of the embedded LES domain, the exchange of lateral fluxes may also need to be explicitly considered. These extensions to the current approach will be explored in a future study.
Even with the GPU-acceleration technique, running PALM in MPAS-Ocean is still computationally intensive. In the present setup in the single column test (running PALM on one cell while using KPP in others), the total run time on one CPU and one 655 GPU is a few hundreds times longer than using KPP in all cells. In the mixed layer eddy test case, running PALM on 8 cells increases the total run time :: by :::: over ::: 40 ::::: times on 16 CPUs and 8 GPUsby over 40 times. While the computational cost is well justified for applications on studying the small-scale turbulent mixing, for applications on improving the GCM performance, the superparameterization : a :::::::::::::::::::::: superparameterization-like : approach with high-fidelity and three-dimensional LES embedded in every grid cell of : in : a GCM is still far from practical. We discuss some possible paths forward to combat the cost of using 4.2 Moving Forward ::::::: forward We have tested the coupled MPAS-Ocean and PALM using some very idealized test cases. The goals of these test cases are to verify the functionality of the coupled system, which provides the foundation for future extensions and applications to more realistic and complicated setup. Here we outline a few possible use scenarios that will take advantage of the development in 665 this study.

Focused Process Studies
With the capability of MPAS-Ocean to use regionally refined mesh, it is possible to setup simulations for focused process 685 studies with high-resolution in the study region and low-resolution in the larger background domain :::::::::: surrounding :::::: regions. This strategy has already been adopted to conduct global MPAS-Ocean simulations with refined resolution in the US coast and regional simulations with a focus on estuaries (personal communication). The coupling with PALM can further extend the refined study region in an MPAS-Ocean simulation to include non-hydrostatic dynamics. In such applications one can choose to embed PALM inside the finest grid cells of MPAS-Ocean in the focused regions, where the domain size of PALM can be 690 similar to the grid cell size of MPAS-Ocean. In this way, it is reasonable to pass the large-scale gradients estimated in the MPAS-Ocean directly to PALM to allow the development of baroclinic instability and other processes in PALM as discussed above. It would also be interesting to study the impact of turbulent mixing transitioning from an estuary environment to an open ocean environment using this coupled system. This is critical for a seamless coastal to global simulation that MPAS-Ocean is targeted at.

Data Assimilation
Taking advantage of the flexibility of running PALM only on selected locations in MPAS-Ocean, one can :::: may also explore the 705 possibility of improving the simulation results of a GCM by having high-fidelity representations of the turbulent mixing at only a few locations, :::::: perhaps : borrowing ideas from the data assimilation literature. The high-fidelity representations of the turbulent mixing can be easily drawn from the coupled LES as shown here. Results of such exploration will ::: may : also be relevant to incorporate ::::::::::: incorporating direct measurements of OSBL turbulent mixing at various platforms such as research vessels and ocean stations ::: into :::::: GCMs.

Conclusions
In this paper we have outlined the steps and the progress towards a multiscale modeling approach of ::::::: studying the ocean surface turbulent mixing by coupling MPAS-Ocean with PALM, following some ideas of the superparameterization approach.
However, in contrast to the traditional superparameterization approach which seeks a way to replace the conventional parameterizations, the goal here is to build towards a flexible framework to better understand the interactions among different processes 715 in the OSBLand : , thereby providing a pathway to improve the conventional OSBL turbulent mixing parameterizations. For this reason, a high-fidelity and three-dimensional LES is used instead of a simplified model with reduced physics and/or reduced dimension.
However, it is certainly computationally more favorable than an LES on a large domain ::::::::::: MPAS-Ocean, ::::: from ::::: being ::::::: practical ::: for The ::::::: However, ::: the : flexibility of running LES ::::: PALM : only on selected grid cells of the GCM ::::::::::: MPAS-Ocean, combined with 725 the capability of MPAS-Ocean using unstructured grid, provides a promising pathway to move forward in simulating the ocean surface turbulent mixing in a multiscale modeling approach. Here we have demonstrated the functionality and potential of this multiscale modeling approach using very simple test cases. As discussed in the previous section, direct process ::::::: progress : can be made on various applications of this approach, from a regionally focused process study of the OSBL turbulence in the presence of large-scale phenomena, to a systematic exploration of better representations of the small-scale OSBL turbulent mixing in 730 the GCM, perhaps in combination with the data assimilation and machine learning techniques ::::: GCMs ::::: under :::::: broad ::: and ::::::: realistic :::::: forcing ::::::::: conditions.
Code availability. The source code of the coupled MPAS-Ocean and PALM is available at http://doi.org/10.5281/zenodo.4131134.
Author contributions. QL and LVR together designed and implemented the coupling between MPAS-Ocean and PALM, including customizing a version of PALM and porting it on GPU, and prepared the manuscript. QL conducted the simulations and analyzed the results.