Improved double Fourier series on a sphere and its application to a semi-implicit semi-Lagrangian shallow water model and an Eulerian advection model

One way to reduce the computational cost of a spectral model using spherical harmonics (SH) is to use double Fourier series (DFS) instead of SH. The transform method using SH usually requires O N operations, where N is the truncation wavenumber, and the computational cost significantly increases at high resolution. On the other hand, the method using DFS requires only O N log N operations. This paper proposes a new DFS method that improves the numerical 10 stability of the model compared with the conventional DFS methods by adopting the following two improvements: a new expansion method that employs the least-squares method (or the Galerkin method) to calculate the expansion coefficients in order to minimize the error caused by wavenumber truncation, and new basis functions that satisfy the continuity of both scalar and vector variables at the poles. Partial differential equations such as the Poisson equation and the Helmholtz equation are solved by using the Galerkin method. In the semi-implicit semi-Lagrangian shallow water model using the new 15 DFS method, the Williamson test cases and the Galewsky test case give stable results without the appearance of highwavenumber noise near the poles, even without using horizontal diffusion and without a zonal Fourier filter. In the Eulerian advection model using the new DFS method, the Williamson test cases 1, which simulates a cosine-bell advection, also gives stable results without horizontal diffusion but with a zonal Fourier filter. The shallow water model using the new DFS method is faster than that using SH, especially at high resolutions, and gives almost the same results, except that small 20 oscillations near the truncation wavenumber in the kinetic energy spectrum appear only in the shallow water model using SH. This small oscillations in the SH model can probably be eliminated by using the vector harmonic transform which is similar to the vector transform using the least-squares method (or the Galerkin method) in the model using the new DFS method.

equation and the Poisson equation are easily solved in spectral space. The application of the semi-implicit semi-Lagrangian scheme allows for timesteps longer than the Courant-Friedrichs-Lewy (CFL) condition, which makes the model computationally efficient. In the spectral model using SH, the Legendre transform used in the latitudinal direction significantly increases the computational cost at high resolutions since the Legendre transform usually requires O operations and O memory usage (unless using the fast Legendre transform or on-the-fly computation of the associated 5 Legendre functions shown below), where is the truncation wavenumber. One way to reduce the operation count and the memory usage at high resolutions with large is to use the fast Legendre transform (Suda, 2005;Tygert, 2008;Wedi et al., 2013;Wedi 2014), which requires only O log operations and also effectively reduces the memory usage. In the fast Legendre transform, the threshold parameter affecting the accuracy-cost balance is chosen so that a loss of accuracy is sufficiently small. Dueben et al. (2020) presented global simulations of the atmosphere at 1.45 km grid-spacing in the SH 10 model using the fast Legendre transform. Another approach to improve the Legendre transform is on-the-fly computation of the associated Legendre functions (Schaeffer, 2013;Ishioka, 2018), which still requires O operations but requires only O memory usage. This small memory usage also contributes to speeding up calculations by taking advantage of the cache memory.
Alternatively, we can use double Fourier series (DFS) as basis functions to reduce the operation count and the memory 15 usage in the global spectral model. In the DFS model, the fast Fourier transform (FFT; Cooley and Tukey, 1965;Swarztrauber, 1982) is used not only in the longitudinal (zonal) direction but also in the latitudinal (meridional) direction.
The FFT requires only O log operations and O or O memory usage, and it is faster than the fast Legendre transform.
In DFS models (and also in SH models), the scalar variable , is zonally expanded as 20 where is longitude, is colatitude, and is the zonal truncation wavenumber. Several methods have been proposed for meridional expansion with DFS. Merilees (1973b), Boer and Steinberg (1975), and Spotz et al. (1998) performed the Fourier transform meridionally along a great circle. Spotz et al. (1998) showed that by using the spherical harmonic filter, the explicit DFS shallow water model using the pseudo-spectral method can produce results comparable with the SH model in 25 terms of accuracy and stability. However, the spherical harmonic filter consists of the forward SH transform (from grid space to spectral space) followed by the inverse SH transform (from spectral space to grid space), which increases the computational cost. Orszag (1974) and Boyd (1978)  where is the meridional truncation wavenumber. The coefficients , for odd are calculated by the forward Fourier cosine transform of sin ⁄ . Orszag (1974) imposed the following conditions at the poles: 0 0 and 0 for | | 2, 3 which can be expressed in terms of the expansion coefficients , as 5 , 0 and , 0 for | | 2. 4 Satisfying the above conditions ensures that the scalar variable , and its gradient ∇ are continuous at the poles. In Orszag (1974), only , and , were modified to satisfy Eq. (4), but this is not the best way to satisfy the same conditions as Eq. (3) or Eq. (4), as will be shown in Sect. 4. Yee (1981) and Layton and Spotz (2003)  The meridional basis functions sin sin for even 0 are different from Eq. (5). The coefficients , for even 0 are calculated by forward Fourier sine transform of sin ⁄ . The basis functions in Eq. (6) automatically satisfy the same conditions at the poles as Eq.
(3) for even , and guarantee the continuity of the scalar 20 variable at the poles, which is an advantage compared with the basis functions in Eq. (5). On the other hand, Eq. (6) does not automatically satisfy the conditions in Eq. (3) for odd , and does not guarantee the continuity of ∇ at the poles. The shallow water model and the vorticity equation model using a semi-implicit Eulerian scheme ran stably without the spherical harmonic filter by using high-order horizontal diffusion with O operations to smooth out the high-wavenumber components (Cheong, 2000b;Cheong et al., 2002;Kwon et al., 2004). The semi-implicit Eulerian hydrostatic atmospheric model also ran stably with high-order horizontal diffusion (Cheong, 2006;Koo and Hong, 2013;Park et al., 2013). However, 5 the computational results of these models appear to be a little different from (slightly worse than) the models using SH. One reason for this seems to be the appearance of high-wavenumber oscillation resulting from the meridional wavenumber truncation with ≅ 2 3 ⁄ or 2 ⁄ for even 0 (See Sect. 4), and the use of strong high-order horizontal diffusion to smooth out the oscillation, where is the number of grid points in the latitudinal direction. Yoshimura and Matsumura (2005) and Yoshimura (2012) stably ran the two-time-level semi-implicit semi-Lagrangian 10 hydrostatic and nonhydrostatic atmospheric models using the DFS basis functions of Cheong in Eq. (6). These models used the same fourth-order horizontal diffusion as the SH models, and did not require the spherical harmonics filter or the strong high-order horizontal diffusion for stability. The numerical stability of the models is improved due to the following reasons: 1. The semi-Lagrangian scheme is used, which avoids the numerical instability due to the nonlinear advection term.

The meridional truncation with
1 and is used, which enables to reconstruct accurately the given grid-data 15 with the expansion coefficients  and avoid the error due to the meridional truncation.

3.
sin and sin instead of sin ⁄ and sin ⁄ are transformed from grid space to spectral space, where is the zonal wind and is the meridional wind.
The results of these models were very similar to those of the SH models. However, we found the following two problems in these models: 20 1. High wavenumber noise appears near the poles.
2. The meridional truncation wavenumber needs to be equal to for even 0 because (e.g., ≅ 2 3 ⁄ ) for even 0 causes the high-wavenumber oscillation (See Sect. 4) and the numerical instability.
To solve these problems, we propose a new DFS method that adopts the following two improvements: 1. A new expansion method to calculate DFS expansion coefficients of scalar and vector variables, which adopts the least-25 squares method (or the Galerkin method) to minimize the error due to the meridional wavenumber truncation.
2. New DFS basis functions that automatically satisfy the pole conditions in Eq. (3), which guarantee continuity of not only scalar variables but also vector variables at the poles.
We also use the Galerkin method to solve partial differential equations such as the Poisson equation, the Helmholtz equation, and the shallow water equations. 30 Section 2 describes the arrangement of equally spaced latitudinal grid points used in the new DFS method. Section 3 describes the details of the new DFS method using the new DFS expansion method and the new DFS basis functions.
Section 4 examines the error due to the wavenumber truncation in the DFS method of Orszag (1974), the old DFS method (Cheong, 2000a(Cheong, , 2000bYoshimura and Matsumura, 2005), and the new DFS method. Section 5 examines the accuracy of the old and new DFS methods and the SH method for the Laplacian operator and the Helmholtz equation. Section 6 compares the results of the shallow water test cases between the model using the new DFS method, that using the old DFS method of Yoshimura and Matsumura (2005), and that using the SH method. Section 7 presents conclusions and perspectives.
In the new DFS method, the wind vector components and (instead of sin ⁄ and sin ⁄ or sin and sin ) are transformed from grid space to spectral space and vice versa, as shown in Sects. 3.5 and 3.6 below. This makes it possible to use Grid [1] that has grid points at the poles. 20 3 Improved double Fourier series on the sphere In Sect. 3, we describe the new basis functions for scalar and vector variables, and the new method to calculate expansion coefficients which minimizes the error due to wavenumber truncation. We compare the new DFS method with the SH method to see the difference between them. We also describe how to calculate the Laplacian operator, the Poisson equation, the Helmholtz equation, and horizontal diffusion in the new DFS method. 25

New basis functions for a scalar variable
We propose the following new DFS basis functions that automatically satisfy the continuity conditions at the poles in Eq.
(3). The scalar variable , is expanded zonally as Thus, it is guaranteed that ∇ , , , , is continuous at the poles.

New method to calculate expansion coefficients for a scalar variable
One way to calculate the coefficients , from in Eq. (10)  Here we propose a new method to calculate expansion coefficients using the least-squares method to minimize the error due to the meridional wavenumber truncation. This method also avoids dividing by sin or sin before the 20 forward cosine or sine transforms. The coefficients , in Eq. (10)  where the expansion coefficients , are calculated by the forward discrete cosine transform for even and the forward discrete sine transform for odd from the values of at the grid points (See Appendix B). 5 Next, , and , are calculated using the least-squares method to minimize the following error (the squared L norm of the residual): where the residual , is , ≡ , cos , sin , . 25 10 From Eqs. (24) and (25) Equations (26) and (27) 30c with the exception of the following underlined values: 10 3 , , 4 , 2 , 1 .

For odd
32 To solve , we solve with forward substitution first, and then solve with backward substitution.
There are also other methods to solve Eq. (31). For example, the method using LU decomposition considering pentadiagonal matrices as 2 2 block tri-diagonal matrices makes SIMD operations more effective. The method using cyclic reduction for block tri-diagonal matrices (e.g., Gander and Golub, 1997) is suitable for vectorization and parallelization. The 5 calculation with these methods for each requires O operations. The simultaneous equations with tri-diagonal matrices can be solved in a similar way. Therefore, the calculation of , for all and with Eq. (30) requires only O operations.

Comparison of new DFS with SH
Here we compare the new DFS method with the SH method to see the difference between them. In the SH method, . This triangular truncation used in the SH method gives a uniform resolution over the sphere.
From Eqs. (8) and (10), the number of the expansion coefficients , in the DFS method is about when . This 5 rectangular truncation used in the DFS model gives almost the same resolution as the grid spacing of the regular longitudelatitude grids. Therefore, the zonal Fourier filter (see Appendix F) is used in the DFS model to give a more uniform resolution.
We compare the method used to calculate the expansion coefficients in the new DFS method with that in the SH method. In Eqs. (37) and (38), the latitudinal weight sin appears, unlike in Eqs. (24) and (29), which is another difference between the SH and the new DFS methods. In the DFS method, the constant latitudinal weight is used in Eq. (24), although the latitudinal area weight described below in Appendix G is usually used as the latitudinal weight at the grid points, for example, for the calculation of the global mean.
When calculating the coefficients , in Eq. (10), we can also consider the least-squares method, not using in Eq. (24)  is the same as

New basis functions for a wind vector
The velocity potential and the stream function can be converted into the wind vector components and using the equations The equations for , are the same as Eqs. (53b-d), except that , , , , and , are replaced with , , , , and , , respectively. The equations for , are the same as Eqs. (53a-d), except that , , , , and , are replaced with , , , , and , , respectively. The equations for , are the same as Eqs. (53b-d), except that , , , , and , 15 are replaced with , , , , and , , respectively.
From Eqs. (52) and (53), it can be seen that , , , , , , and , at the poles are finite for 1 and zero for 1. Moreover, the following relations are satisfied for 1: Thus, it is guaranteed that the wind vector , , , in Eqs. (48) and (49) is continuous at the poles. 5

New method to calculate expansion coefficients for a wind vector
We propose a new method that calculates the expansion coefficients , , , , , and , in Eqs. (48) and (49) using the least-squares method to minimize the error of , , and where , and , are calculated from and by the forward discrete cosine or sine transform (See Appendix B).
Next, , , , , , , and , are calculated to minimize the following error (the squared L norm of the residual vector): Equations (60) and (61) From Eqs. (62a), the following equations for , and , are derived as shown in Appendix H. Eq. (63a) is easily solved. From Eqs. (63d), and from the same equations as Eqs. (63d), except that , , , and are replaced with , , , and , respectively, we derive the following linear simultaneous equations for 3: where the matrices are nine-diagonal. From Eqs. (63b,c), we derive the equations similar to Eq. (64) for 1 and even 2 with penta-diagonal matrices . The simultaneous equations with nine-diagonal or penta-diagonal matrices 5 can be solved in a similar way to Eq. (31), and the expansion coefficients , and , in Eq. (64) can be solved efficiently.
This method to calculate the DFS expansion coefficients of and from and using the least-squares method (or the Galerkin method with the DFS vector basis functions) is similar to the vector harmonic transform method (Browning et al., 1989;Temperton, 1991;Swarztrauber, 1993), where the SH expansion coefficients of the divergence ∇ and the 15 vorticity ∇ are calculated from the grid-point values of and using the Galerkin spectral method with the orthogonal vector SH basis functions.

Laplacian operator and Poisson equation
We define the truncated variables , and , as From Eqs. (65) and (68), we obtain is orthogonal to each of the new DFS basis functions , cos and , sin .
We can also use the least-squares method instead of the Galerkin method so that the following error (the squared L 10 norm of the residual) is minimized: When calculating by applying the Laplacian operator to a given , , and , can also be calculated from  Yee (1981) and Cheong (2000a) From Eq. (75), we obtain the following two linear simultaneous equations with tri-diagonal or penta-diagonal matrices: where _ , and _ , are the vectors whose components are , ( is even) and , ( is odd), respectively, and _ , and _ , are the vectors whose components are , ( is even) and , ( is odd), respectively; _ , , _ , , _ , and _ , are tri-diagonal or penta-diagonal matrices. In Eq. (65), the global mean of must be zero because the global mean of the right-hand side of Eq. (65) is zero. Before calculating from a given in the Poisson equation, we should subtract the global mean from (Cheong 2000b). See Eq.
(G1) about the calculation of the global mean.

The Helmholtz equation 20
The Helmholtz equation is From Eq. (84), is calculated from by 5 . 85

Horizontal diffusion
The horizontal diffusion is calculated in the similar way as in Cheong et al. (2004). Here we describe how to calculate fourth-order diffusion. Higher-order diffusion can be calculated similarly.
The equation for fourth-order hyperdiffusion is 10 where is calculated from . Equation (86)  Here, √ and √ are complex matrices and and are real column vectors. For efficient computation, two real column vectors can be combined into one complex column vector ; for example, and , where the superscript c indicates the zonal cosine component, and the superscript s indicates the zonal sine 25 component.

The error due to meridional wavenumber truncation in DFS expansion methods
Here we examine the error due to the meridional wavenumber truncation when the same continuity conditions at the poles as Eq. (3) are satisfied. In the DFS method of Orszag (1974) using Eq. (2), only , and , are modified to satisfy Eq. (4) equivalent to Eq. (3). In the old DFS method using Eq. (6), which is proposed in Cheong (2000aCheong ( . 2000b) and used in 30 Yoshimura and Matsumura (2005), the DFS meridional basis functions automatically satisfy the pole conditions in Eq. (3) for even , but not for odd . In the new DFS method using Eq. (10), the DFS meridional basis functions automatically satisfy the condition in Eq. (3) for both even and odd . We examine the error due to the wavenumber truncation in these DFS methods while comparing it with the SH method. In the DFS method of Orszag using Eq.
(2), a very large error occurs, especially for odd | | 3 (Fig. 2c), when , and , are modified to satisfy the pole conditions in Eq. (4). Dividing by sin before the forward Fourier cosine transform for odd also contributes to the large error.
In the old DFS method using Eq. (6), large high wavenumber oscillations appear for even 0 in Fig. 2a. In the new DFS method described in Sect. 3, the usual small oscillations from the Gibbs phenomenon appear in Fig. 2.
The error is small because the expansion coefficients are calculated using the least-squares method (or the Galerkin method) to minimize the error. Because of this, the truncation with arbitrary does not cause large oscillations in the new DFS 25 method. The values for even 2 and odd 3 in the new DFS method are similar to those for 2 and 3 in the SH method, respectively. In the SH method, when is large, the values become close to zero at high latitudes.
When using the basis functions of Orszag in Eq.
(2), we can also obtain results equivalent to the new DFS method by calculating the expansion coefficients using the least-squares method with Lagrange multipliers to minimize the error while satisfying the pole conditions in Eq. (4). 30 Figure 3a shows the same figure as Fig. 2a Yoshimura and Matsumura (2005) and Yoshimura (2012) set for even 2 to improve stability. However, there is a problem with the latitudinal derivative in the old DFS method even when for even 2 . Fig. 3b is the same as Fig. 3a except that it also shows the values between grid points calculated from the expansion coefficients by using Eq. (6) or Eq. (10). The large oscillations appear in the old DFS method with Grid [0], and it makes the latitudinal derivative at the grid points unrealistically large. In the new DFS method with the least-squares method, the large oscillations do not 5 appear.

Tests of the DFS methods with the Laplacian operator and the Helmholtz equation
We examine the accuracy of the old and new DFS methods for the Laplacian operator in Eq. (65)  where 0.01 , and and are given by Eqs. (92) and (94).
To examine the accuracy for the Laplacian operator, is given by (92), and ∇ is calculated from with the old DFS method (Cheong 2000a), the new DFS method (See Sect. 3.7) and the SH method. The calculated values are compared with 25 the exact values of ∇ in Eq. (94). Here, the exact values of ∇ are truncated by the forward transform followed by the inverse transform in order to see the error that does not include the error due to the wavenumber truncation. Table 1 shows the normalized L error between the calculated values and the exact values, which is normalized by the L norm of the exact values. The differences in error between the methods are small, but the results of the SH method are a little better than the old and new DFS methods. Table 2 shows the global mean of calculated ∇ . The exact value of the global mean of ∇ is zero. In Table 2, the global means calculated with each method are very close to zero. The global means of ∇ in the DFS methods using Grid [1] and Grid [-1] are not as close to zero as those in the DFS methods using Grid [0] and the SH method. This seems to be because the accuracy of the meridional discrete cosine and sine transforms in the DFS methods using Grid  Table 3 shows the normalized L error between the calculated values and the exact values. The differences in error between the methods are small, and which is better depends on the resolution and the arrangement of the grid points. 10

Evaluation of the DFS methods using shallow water test cases
We ran the Williamson test cases 1, 2, 5 and 6 (Williamson et al., 1992), and the Galewsky test case (Galewsky et al., 2004) in the model using the new DFS method described in Sect. 3, the model using the old DFS method of Yoshimura and Matsumura (2005), and the model using the SH method. By comparing the results of these model, we evaluated the old and new DFS methods. 15

Shallow water equations on a sphere
The prognostic equations of the shallow water model on a sphere are 2 ∇ℎ, 96 ℎ ℎ ℎ ℎ ∇ • , 97 where is time, is the horizontal wind vector, ℎ is the height, ℎ is the surface height, is the acceleration due to gravity, 20 is the 3-dimensional angular velocity of the earth's rotation, and the subscript H indicates the horizontal component.
Equation (96) is transformed for the advective treatment of the Coriolis term (Temperton, 1997) where is the 3-dimensional position vector from the Earth's center. Equation (97) is transformed for the spatially averaged Eulerian treatment of mountains (Ritchie and Tanguay, 1996) Equations (98) and (99) are integrated in time using a two-time-level semi-implicit semi-Lagrangian scheme (See Appendix I).

Models
We ran the shallow water test cases in the semi-implicit semi-Lagrangian shallow water model or the Eulerian advection model (See Sect. 6.3) using the new DFS method (hereafter the new DFS model). We also ran the same test cases in the model using the old DFS method of Yoshimura and Matsumura (2005) with the basis functions of Cheong (2000aCheong ( , 2000b  The zonal Fourier transforms in all the models and the meridional Fourier cosine and sine transforms in the DFS models are calculated using the Netlib BIHAR library, which includes a double precision version of the Netlib FFTPACK library (Swarztrauber, 1982). The meridional Legendre transform in the SH model is calculated using the ISPACK library (Ishioka, 2018), which adopts on-the-fly computation of the associated Legendre functions. We use the ISPACK library's 20 optimization option for Intel AVX512, which is highly optimized by using assembly language together with Fortran.

Williamson test case 1
The Williamson test case 1 simulates a cosine-bell advection. In the semi-Lagrangian models, the advection is calculated in the semi-Lagrangian scheme and the horizontal derivatives calculated from the expansion coefficients are not used for the advection calculation. Therefore, we also use the Eulerian scheme here to simulate the advection in the DFS and SH models 25 to test the expansion methods. The advection equation is ℎ ℎ • ∇ℎ. 100 In the Eulerian models, the advection term • ∇ℎ is evaluated using the spectral transform method. The advection equation is integrated by the leap-frog scheme with the Robert-Asselin time filter (Robert, 1969;Asselin, 1972) with a coefficient of 0.05. The horizontal diffusion is not used, but the zonal Fourier filter is used in the old and new DFS methods. In Eq. (F1), 30 the value 20 is used in the DFS shallow water models. However, the larger the value is, the higher the longitudinal resolution around the pole is. Because of this, when the Eulerian scheme is used and is large, a timestep must be very short due to the CFL condition. Therefore should be as small as possible. We have tested 0, but this degrades the result of the Williamson test case 1. We have also tested 1 and this result is good. Therefore, we use 1 in the Eulerian models. Figure 4 shows the predicted height after a 12-day integration in the Williamson test case 1 when using the Eulerian 5 advection models at the resolution 64. The meridional truncation wavenumber and the zonal truncation wavenumber are set as 42 ≅ 2 3 ⁄ because the 2/3 rule (Orszag, 1971) is used in order to avoid aliasing in the nonlinear advection term. The timestep is 30 minutes. The results for DFS [0], DFS [1], DFS [−1] and SH are very similar. Instability occurs in the old DFS model without horizontal diffusion. This is probably because of the appearance of high-wavenumber oscillations due to the wavenumber truncation with ≅ 2 3 ⁄ for even 0 in the old DFS method, as shown in Sect. 10 4. Table 4 shows the normalized L errors of the predicted height after a 12-day integration when using the Eulerian advection models. The timesteps are 30, 15, 7.5, and 2.5  The old DFS model is unstable even when the same fourth order horizontal diffusion is used. Higher-order horizontal diffusion, which effectively smooths out the high wavenumber components, is necessary to stabilize the Eulerian old DFS 20 model (Cheong, 2000b;Cheong et al., 2002). Table 5 shows the same as Table 4 except for using the semi-Lagrangian scheme. In the semi-Lagrangian models, the forward transform followed by the inverse transform are executed at every timestep, but the expansion coefficients are not used for the advection calculation. The timesteps are the same as described in Sect. 6.2. The errors are very close between the models. At the resolution 64, the errors in the semi-Lagrangian models are larger than those in the Eulerian models, 25 but at the resolutions 160, 320 and 960, the errors in the semi-Lagrangian models are smaller than those in the Eulerian models.
The conservation of mass in the Williamson test case 1 was also examined, and the results are shown in Sect. S2 in the supplement.

Williamson test case 2 30
The Williamson test case 2 simulates a steady state non-linear zonal geostrophic flow. In this test case, the angle between the solid body rotation and the polar axis is given, and the zonal and meridional components of 2 become 2 2Ω cos cos cos sin sin , 2Ω sin sin . 101 Figure 5 shows the time series of forecast errors of the height for a 5-day integration in the Williamson test case 2 with 2 ⁄ 0.05 in the models at the resolution 64 and 63 (DFS) or 62 (SH), using no horizontal diffusion.
The normalized L , L , and L ∞ errors are almost the same between the new DFS models using Grids [0], [1] and [−1], the old DFS model, and the SH model. Table 6 shows the normalized L errors of the predicted height after a 5-day integration. 5 The errors are almost the same between the old DFS, new DFS and SH models at each resolution.
The conservation of mass, energy and vorticity in the Williamson test cases 2, 5 and 6 was also examined, and the results are shown in Sect. S2 in the supplement.

Williamson test case 5
The Williamson test case 5 simulates zonal flow over an isolated mountain. Figure 6 shows the predicted height after a 15-  Table 7 shows the normalized L errors of the predicted height after a 15-day integration. The errors are almost the same between the old DFS, new DFS and SH models at each resolution. The errors do not decrease when the resolution increases, 15 which is different from the results in the other test cases. This may be because the mountain topography is not a differentiable function, and the mountain is added impulsively on to a initially balanced flow (Galewsky et al. 2004). wavenumber noise appears in the old DFS model at the same resolution. One possible reason is that the latitudinal derivative at the grid points can be unrealistically large in the old DFS method even when for even 2 as described in Sect. 4 (Fig. 3b). The new DFS expansion method with the least-squares method does not have this problem. By using the new expansion method with the least-squares method, the high zonal wavenumber noise does not appear even in the model that uses the same DFS basis functions as in Eq. (11) except that the basis function for odd 3 is sin cos instead 25 of sin sin . In the old DFS model at high resolution with 960 and 959, the high wavenumber noise is not seen in Fig. 7. The higher the resolution, the smaller the high wavenumber noise becomes. Figure 8 shows the kinetic energy spectra of the horizontal winds (Lambert, 1984) after a 15-day integration in Williamson test case 5. The kinetic energy spectra in the DFS models are calculated from the SH expansion coefficients, which are obtained by firstly calculating the Gaussian grid-point values from the DFS coefficients using Eq. (10) for the new DFS method and Eq. (6) for the old DFS 30 method, and secondly calculating the SH expansion coefficients from the Gaussian grid-point values by using a forward Legendre transform. In the old DFS model with 64 and 63, the high wavenumber components are larger than in the other models, which is related to the high wavenumber noise near the South Pole in Fig. 7. In the old DFS model with 960, the high wavenumber components are a little larger than in the other models, but the differences are slight. Figure 9 shows the predicted height after a 15-day integration in Williamson test case 5, which is the same as Fig. 6 Table  10 8 is the same as Table 7 except for ≅ 2 3 ⁄ . The results of the new DFS models and the SH model with ≅ 2 3 ⁄ in Table 8 are very similar to those with ≅ 1 in Table 7 when is the same. Figure 10 shows the kinetic energy spectrum of the horizontal winds after a 15-day integration in Williamson test case 5, which is the same as Fig. 8  high wavenumber region in the SH model, but not in the new DFS models. In the SH model, the wind components and divided by sin are transformed from grid space to spectral space (Ritchie, 1988;Temperton, 1991), which seems to be the cause of the small oscillations in the high wavenumber region. Another way to transform and from grid space to spectral space in the SH model is to use the vector harmonic transform (see Sect. 3.6), which avoids dividing and by sin and improves the stability of the model (Swarztrauber, 2004). This approach is similar to the expansion method for and using 20 the least-squares method in the new DFS method described in Sect. 3.6, and probably eliminates the small oscillations in the SH model. Alternatively, using and instead of and as prognostic variables may eliminate the small oscillations. Figure 11 shows the predicted height after a 14-day integration in Williamson test case 6. The error is similar between the 25 old and new DFS models using Grid [0] and the SH model. The error in the new DFS model using Grid [1] is the smallest. This is probably because Grid [1] has grid points at the poles, where the minimum height exists, and on the equator, where the maximum height exists. The error in the new DFS model using Grid [−1] is the second smallest. This is probably because Grid [−1] has grid points on the equator, where the maximum height exists. Table 9 shows the normalized L errors of the predicted height after a 14-day integration. The error in the new DFS model using Grid [1] is the smallest, and that in the 30 new DFS model using Grid [−1] is the second smallest, at each resolution. The errors in the old and new DFS models using Grid [0] and in the SH model are very close.

Galewsky test case
The Galewsky test case simulates a barotropically unstable mid-latitude jet. Figure 12 shows the predicted vorticity after a 6-day integration in the Galewsky test case for the models at 1.3 km resolution with 15360 and the quadratic truncation 10239, without horizontal diffusion. The result in the new DFS model using Grid [0] is almost the same as in the SH model. The old DFS model is unstable for the same reason as that shown in Sect. 6.5 (Fig. 9). Figure 13 shows the kinetic 5 energy spectrum of horizontal winds after a 6-day integration in the Galewsky test case. The results are almost the same for the DFS models using Grid [0], [1] and [−1], and the SH model, but small oscillations appear near the truncation wavenumber in the SH model. This is probably for the same reason as in Williamson test case 5 in Fig. 10.
The results of the Galewsky-like test case using the north-south symmetric initial conditions are shown in Sect. S3 in the supplement. 10 transform, which requires only log operation, is used instead of the usual Legendre transform in the SH model, the difference of the elapsed time between the models will be reduced at high resolutions. We have not tested the fast Legendre transform yet because we do not have subroutines for the fast Legendre transform.

Conclusions and perspectives
We have developed the new DFS method to improve the numerical stability of the DFS model, which has the following 25 two improvements: 1. A new expansion method with the least-squares method is used to calculate the expansion coefficients so that the error due to the meridional wavenumber truncation is minimized. The method also avoids dividing by sin before taking the forward Fourier cosine or sine transform.
2. New DFS basis functions that guarantee that not only scalar variables, but also vector variables and the gradient of scalar variables, are continuous at the poles.
The equations obtained with the least-squares method are equivalent to those obtained with the Galerkin method. We also use the Galerkin method to solve partial differential equations such as the Poisson equation, the Helmholtz equation, and the shallow water equations. 5 To test the new DFS method, we conducted experiments for the Williamson test cases 2, 5 and 6, and the Galewsky test case in semi-implicit semi-Lagrangian shallow water models using the new DFS method with the three types of equally spaced latitudinal grids with or without the poles. We also ran the Williamson test case 1, which simulates a cosine-bell advection, in the Eulerian and semi-Lagrangian advection models. We compared the results between the new DFS models using the new DFS method, the old DFS model using the method of Yoshimura and Matsumura (2005) with the basis 10 functions of Cheong (2000aCheong ( , 2000b, and the SH model. The high zonal wavenumber noise of the meridional wind appears near the poles in the old DFS model, but not in the new DFS models in the Williamson test case 5. One possible reason is that the latitudinal derivative at the grid points can be unrealistically large in the old DFS method even when the truncation wavenumber for even 0 is equal to the number of latitudinal grid points , while the new DFS expansion method with the least-squares method does not have this 15 problem. In the old DFS model, for even 0 causes numerical instability. In the new DFS model, an arbitrary meridional wavenumber truncation can be used without the stability problem because the error due to meridional wavenumber truncation is small when using the new DFS expansion method with the least-squares method. This is one of the merits of the new DFS method because the quadratic truncation ≅ 2 3 ⁄ or the cubic truncation ≅ 2 ⁄ is usually used in the Eulerian model and is also becoming to be used in the semi-Lagrangian model instead of the linear truncation 20 ≅ 1 for stability and efficiency at high resolutions (Hotta and Ujiie, 2018;Dueben et al., 2020). We have also confirmed that in the new DFS model, stable integration is possible in all test cases shown here even without using the zonal Fourier filter unlike in the old DFS model. Thus, the numerical stability of the semi-implicit semi-Lagrangian model using the new DFS method is very good. In the Williamson test cases 1, the Eulerian advection model using the new DFS method also gives stable results without horizontal diffusion but with a zonal Fourier filter. The Eulerian advection model using the 25 old DFS method is unstable without horizontal diffusion or with the weak fourth-order horizontal diffusion. In the old DFS model, the use of the semi-Lagrangian scheme is important for numerical stability. On the other hand, the advection model using the new DFS method is stable, even when the Eulerian scheme is used instead of the semi-Lagrangian scheme. The Eulerian shallow water model using the new DFS method without horizontal diffusion is also likely to be stable, although we have not tested it yet. 30 The results of the new DFS model are almost the same as the SH model. But in the SH shallow water model without horizontal diffusion, small oscillations appear in the high wavenumber region of the kinetic energy spectrum in some cases, unlike in the new DFS model. This seems to be because the wind components and divided by sin are transformed from grid space to spectral space in the SH model. The small oscillations with the SH model can probably be eliminated by using the vector harmonic transform, which is similar to the expansion method for and using the least-squares method in the new DFS model. Alternatively, using divergence and vorticity instead of and as prognostic variables may eliminate the small oscillations. We have executed our shallow water models on Intel CPUs. The execution on GPUs is one important topic, but we have not tested our models on GPUs because the execution on GPUs is not an easy task. MPI parallelization is another important topic. However, in our shallow water models, we use only OpenMP parallelization, not MPI parallelization for the simplicity of the source code.
We developed hydrostatic and nonhydrostatic global atmospheric models using the old DFS method (Yoshimura and 10 Matsumura, 2005;Yoshimura, 2012)  license. These models utilize the Netlib BIHAR library and the ISPACK library. The Netlib BIHAR library is available at https://www.netlib.org/bihar/ and is also included in the supplement. The ISPACK library is available at https://www.

Appendix A: Trigonometric identities
We list here the trigonometric identities used in transforming the expressions in this paper. 30 The following identities are satisfied:  (Yukimoto et al., 5 2011). Using and larger than 1.0 (e.g., 1.2) increases the effect of the semi-implicit scheme improving computational stability, but 1.0 is used here because ℎ larger than ℎ is enough for stable calculations in the shallow water model. This method using a provisional future value to calculate is similar to the method in Gospodinov et al., (2001). Since the value with the subscript D depends on , is calculated iteratively from Eq. (I6) (e.g., Ritchie, 1995;Temperton et al., 2001 using Eqs. (75) and (78). 6. , is calculated from and using Eq. (53) for , and the similar equations for , , , , and , .

Author Contributions.
HY developed a new DFS method and a shallow water model using the method, conducted model experiments, analysed data, and wrote the paper. 10        The truncation wavenumber 63. Solid, dashed, and dotted lines represent normalized L , L , and L ∞ errors, respectively. 5 The colors blue, green, red, purple, and orange represent the models using SH, old DFS with Grid [0], new DFS with Grid