the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
GHOSH v1.0.0: a novel GaussHermite HighOrder Sampling Hybrid filter for computationally efficient data assimilation in geosciences
Abstract. Data assimilation is used in a number of geophysical applications to optimally integrate observations and model knowledge. Providing an estimation of both state and uncertainty, ensemble algorithms are one of the most successful data assimilation approaches. Since the estimation quality depends on the ensemble, the sampling method is a crucial step in ensemble data assimilation. Among other options to improve the capability of generating an effective ensemble, a sampling method featuring a higher polynomial order of approximation represents a novelty. Indeed, the order of the most widespread ensemble algorithms is usually equal to or lower than 2. We propose a novel hybrid ensemble algorithm, the GaussHermite HighOrder Sampling Hybrid (GHOSH) filter, version 1.0.0, which we apply in a twin experiment (based on Lorenz96) and in a realistic geophysical application. In the most error components, the GHOSH sampling method can achieve a higher order of approximation than in other ensemble based filters. To evaluate the benefits of the higher approximation order, a set of thousands twin experiments of Lorenz96 simulations has been carried out using the GHOSH filter and a second order ensemble Kalman filter (SEIK; singular evolutive interpolated Kalman filter). The comparison between the GHOSH and the SEIK filter has been done by varying a number of data assimilation settings: ensemble size, inflation, assimilated observations, and initial conditions. The twinexperiment results show that GHOSH outperforms SEIK in most of the assimilation settings up to a 69 % reduction of the root mean square error on assimilated and nonassimilated variables. A parallel implementation of the GHOSH filter has been coupled with a realistic geophysical application: a physicalbiogeochemical model of the Mediterranean Sea with assimilation of surface satellite chlorophyll. The simulation results are validated using both semiindependent (satellite chlorophyll) and independent (nutrient concentrations from an insitu climatology) observations. Results show the feasibility of GHOSH implementation in a realistic threedimensional application. The GHOSH assimilation algorithm improves the agreement between forecasts and observations without producing unrealistic effects on the nonassimilated variables. Furthermore, the sensitivity analysis on GHOSH setup indicates that the use of a higher order of convergence substantially improves the performance of the assimilation with respect to nitrate (i.e., one of the nonassimilated variables). In view of potential implementation of the GHOSH filter in operational applications, it should be noted that GHOSH and SEIK filters have not shown significant differences in terms of time to solution, since, as in all ensemblelike Kalman filters, the model integration is by far more computationally expensive than the assimilation scheme.
 Preprint
(2723 KB)  Metadata XML
 BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Comment on gmd2023170', Anonymous Referee #1, 11 Dec 2023
Review of "GHOSH v1.0.0: a novel GaussHermite HighOrder Sampling Hybrid filter for computationally efficient data assimilation in geosciences"
Summary: The paper present an interesting idea of using highorder sampling to improve the accuracy of higher statistical moments of the ensemble, therefore achieving better skill in ensemble prediction. I was expecting to learn some details about the new approach, but to my disappointment I am still quite confused after reading the paper. In my opinion, the paper tries to achieve too many things at once: introduce the method mathematically and show experiment results in two different models. No wonder that you don't have enough room to provide the details. In the methodology, many steps in the formula are stated as "can be proven" but with no guidance for readers to actually understand the logic (skipping too many steps in a proof). Ideally, when introducing a new method, I would expect a paper focused on the theory and illustrations using simple models; then followed by other paper(s) utilizing the method in bigger models describing how you solved any additional issues in implementation. The current paper did the introduction poorly, I suggest the authors to give it a thorough revision, at least address the following major issues.
Major Issues:
1. It's unclear what is the actual size of the state vector now you are employing a weighted ensemble approach. When you state that you use 2r members to achieve 3rdorder sampling, such as the Wan & van der Merwe 2000 paper, the members are paired with mean + r perturbations, does it mean you only need to store r+1 numbers?
The weights are computed according to A3 which is a nonlinear system itself. What is the computational complexity of this step? You claimed that the GHOSH filter is no more costly than the 2ndorder EnKFs with same ensemble size, but can you provide some complexity arguments for each step in your filter? For example, given ensemble size m, obs count p and model state size n, the ETKF has complexity m^2 p + m^3 + m^3 n, according to Tippett 2003 (doi: 10.1175/15200493(2003)131<1485:ESRF>2.0.CO;2). Although with fixed ensemble size, the filter is doing additional steps (e.g. the weight computation). I'm not convinced that they are comparable.
2. Fig. 1 provides some visual guide to understand the statement about "m sample points are required to represent a hthorder approximation of an error distribution in rdimensional space". I can see that in 3D you need 4 points to span a volume (the covariance), and I can see that you need 4 degrees of freedom to provide a unique 3x3 covariance matrix. But from this point onward, I don't understand anymore.
First, the error distribution depicted is symmetric, meaning there is only the mean and identity covariance, there is not even correlation between variables in different dimensions, how can I visualize the higherorder moments (skewness)?Therefore, I don't see how the 4 points are giving accurate 3rd moment in 2D, there is no skewness anyway in the picture.
Second, why does projection from highdimensional to lowdimensional space relate to a higherorder sampling in the lowdimensional space? Somehow these 2 members project to the same member in lower dimension, which means they are paired in a way that projection will cancel out the perturbations and only leave the mean? But why is that equivalent to a weighted member, what is the weight in this 3point case anyway? Linking back to the 4 points in 3D, are the positioning only allowed to be squares with no rotation? In the 3D case the tetrahedron is allowed to rotate arbitrarily. Here in 2D if the square is rotated they project to 4 points in 1D.
Finally, why does three weighted samples in 1D give a 5th order approximation? I don't see any moments in the sample distribution higher than the 2nd.
At this point, I would rather you don't show this figure at all, but replace Lines 89108 with a more stepbystep reasoning, or formal proof, of the statement such as "2r samples give a 3rdorder approximation of rdimensional space".
3. In your Lorenz96 experiments (Fig. 3), based on the ensemble spread compared to the error, the SEIK filter has basically diverged at around t=5. The climatological error is about 4 for the Lorenz96 system and you can see that after t=5 the SEIK filter solution is as bad as random draw from climatology. I would argue that you need to have a more stable setting for both SEIK and GHOSH here to establish a comparison, especially when we know that SEIK should work for the Lorenz96 system. For some cases when the SEIK is stable, can you still show that the new GHOSH filter will achieve higher accuracy or at least perform as well as the SEIK?
4. In the verification of filter performance for both models, I mainly see the use of RMSE and ensemble spread as error metrics. This means you are still evaluating the first two moments of error distributions. However, the GHOSH filter is motivated by its capability of capturing higher moments in error. Why didn't you show any proof that it is behaving as expected? For example, in the Lorenz96 case you can compute the skewness and kurtosis of the error distribution and compare the two filters. This will truly be a proof of concept, rather than implying the skill from the forecast error in ensemble mean (1st moment).
Minor Issues:The use of nested parentheses in citation is a bit confusing. Try use (author year; author year)? Or is there a prefered format for GMD?
Line 42: "Monte Carlo"?
Line 51: Is there a reference for this? Why is closer together samples more suitable for accuracy using secondorder approximation?
Line 54: It's a bit not clear what you mean by "higher order of approximation" here. I think you refer to the higher moments of the distribution that the samples will exactly reproduce after model integration. If I'm not completely missing the point, this can be stated more clearly to help the readers follow.
Line 57: I can understand why you need r+1 samples to give mean and covariance (2nd order approximation), but it's a bit hard to see why 2r for 3rd order approximation. You say it can be proven, could you give a reference, or include a proof in appendix?
Line 59: apart from the unscented Kalman filter, other methods that preserve the 3rd moment of posterior pdf, such as the one by Hodyss 2011 (DOI: 10.1175/2011MWR3558.1), is also worth mentioning here.
Line 95: Pham 1996 is the introduction paper for SEIK, are you sure this is the right citation for the secondorder exact sampling?
Line 105108: this paragraph is very vague, we achieve a fifth order approximation when the variance is "bigger" on x than y, can you be more precise exactly when do we achieve fifth order?
Line 112: horder, shouldn't it be hthorder?
Line 128: putting index as superscripts are very confusing, like for this equation, is v_m ^{2xi} raised to 2xi power or just indexed by 2xi? Either put a parenthesis around them if they are indices, or just put the index as subscripts. It's also confusing to see a tuple j1,...,j_xi ending up being a single index from 1 to s. Please provide more contex in (1) about other equations, I guess the other equations in the set are matching a different moment indexed by j1,...,j_xi? How many equations are there in total?
Line 137: is there a formal relationship you can write down for r and s given h? First, you will need r>s to establish the principle components, otherwise there is rank deficiency. But what to expect given h? Previously you stated that for h=2 you need r>=s+1, and for h=3 you need r>=2s, in the 1D case you showed for h=5, you have r>=4s. I'm wildly guessing here that "r>=(h1)*s" might be what you imply?
Line 179: why is the error covariance approximated not exactly given by the ensemble perturbation matrix?
Line 247: (26) is the same as (6), you can just remove (26) and say that C and Omega are computed using x^a in (4)(7). By the way, what is the computational cost for this step? You have a eigenvalue decomposition here so at least O(r^3)?
Line 296: what is l'?
Table 1 can be removed, your experiment name already encode the value of the parameters as listed.
Line 365: have you defined the "forgetting factor" anywhere yet? Is it the same one from Pham 1996?
Line 469: what range of values have you tested and how many different experiments have you run for the tuning process beyond the 18 you showed. This will provide an idea of the cost for tuning as well.
Line 479: Would it help to show the results for Lorenz96 immediately after the description of its experiment design? You can have section 4 focused on the Lorenz results and section 5 on the BGC model results.
Citation: https://doi.org/10.5194/gmd2023170RC1 
AC1: 'Short reply on RC1', Simone Spada, 22 Dec 2023
We thank the Reviewer for the comments and suggestions, which highlight the need to state more clearly the goal and novelty of our manuscript, and of improving its readability. We will surely keep this in consideration in the revised version of the manuscript.
At the present stage, we would like to give a brief answer to the most important issues raised by the Reviewer, and leave further responses to his/her comments to a more detailed and pointbypoint answer to be submitted when also the second Reviewer will have provided her/his comments. Hereafter, the Reviewer’s comments are in bold while our replies are formatted as normal text.
Review of "GHOSH v1.0.0: a novel GaussHermite HighOrder Sampling Hybrid filter for computationally efficient data assimilation in geosciences"
Summary: The paper present an interesting idea of using highorder sampling to improve the accuracy of higher statistical moments of the ensemble, therefore achieving better skill in ensemble prediction. I was expecting to learn some details about the new approach, but to my disappointment I am still quite confused after reading the paper. In my opinion, the paper tries to achieve too many things at once: introduce the method mathematically and show experiment results in two different models. No wonder that you don't have enough room to provide the details. In the methodology, many steps in the formula are stated as "can be proven" but with no guidance for readers to actually understand the logic (skipping too many steps in a proof). Ideally, when introducing a new method, I would expect a paper focused on the theory and illustrations using simple models; then followed by other paper(s) utilizing the method in bigger models describing how you solved any additional issues in implementation. The current paper did the introduction poorly, I suggest the authors to give it a thorough revision, at least address the following major issues.
Firstly, we would like to clarify the main objective of the work, since it appears this was not clear enough. We are introducing a new method (GHOSH filter) that is focused on the use of a higher order of approximation to evaluate the ensemble mean in data assimilation applications. We are not evaluating standard deviation, kurtosis and higher statistical moments of the ensemble in the present work.
On the other hand, it is true that the equality of the statistical moments up to order h implies an approximation of the mean up to order h, as demonstrated in Appendix A and discussed later in the reply to the Reviewer’s comment 2.
Secondly, as highlighted by the Reviewer, we recognize that we have a lot of material, and possibly too much for a single paper. However, we think that  in addition to the description of the novel method  two applications were necessary to highlight the novelties of the filter. In fact, the first application (Lorenz96) is a relatively simple case, commonly used when new DA methods are proposed, and it is used to compare the GHOSH with a standard EnKF DA method (SEIK). The second application (on a biogeochemical marine model of realistic complexity) is intended to demonstrate the feasibility of the new method in real geophysical applications.
Major Issues:
1. It's unclear what is the actual size of the state vector now you are employing a weighted ensemble approach. When you state that you use 2r members to achieve 3rdorder sampling, such as the Wan & van der Merwe 2000 paper, the members are paired with mean + r perturbations, does it mean you only need to store r+1 numbers?
The weights are computed according to A3 which is a nonlinear system itself. What is the computational complexity of this step? You claimed that the GHOSH filter is no more costly than the 2ndorder EnKFs with same ensemble size, but can you provide some complexity arguments for each step in your filter? For example, given ensemble size m, obs count p and model state size n, the ETKF has complexity m^2 p + m^3 + m^3 n, according to Tippett 2003 (doi: 10.1175/15200493(2003)131<1485:ESRF>2.0.CO;2). Although with fixed ensemble size, the filter is doing additional steps (e.g. the weight computation). I'm not convinced that they are comparable.
As noted by the reviewer, the GHOSH algorithm involves additional steps compared to other EnKF (e.g. ETKF). However, we want to point out that: (1) the nonlinear system that calculates the ensemble weights is solved only once (i.e., the weights do not change during the simulations that include GHOSH assimilation filter); (2) the asymptotic computational complexity of each of these steps is less than m^3 n, which remains the largely dominating term in the asymptotic GHOSH computational costs (as in the ETKF or SEIK filters). In the Lorenz application we showed that GHOSH and SEIK have equivalent computational cost. In the later detailed answer we will address more specifically all the other aspects raised in the Reviewer comment 1.
2. Fig. 1 provides some visual guide to understand the statement about "m sample points are required to represent a hthorder approximation of an error distribution in rdimensional space". I can see that in 3D you need 4 points to span a volume (the covariance), and I can see that you need 4 degrees of freedom to provide a unique 3x3 covariance matrix. But from this point onward, I don't understand anymore.
First, the error distribution depicted is symmetric, meaning there is only the mean and identity covariance, there is not even correlation between variables in different dimensions, how can I visualize the higherorder moments (skewness)?Therefore, I don't see how the 4 points are giving accurate 3rd moment in 2D, there is no skewness anyway in the picture.
As noted by the Reviewer, due to symmetry of the error distribution, the third moment (skewness) is zero. The same holds for the discrete probability distribution represented by 4 squareshaped points. Thus, as demonstrated in Appendix A, the 4 ensemble members approximate the mean of the error distribution up to order 3.
Second, why does projection from highdimensional to lowdimensional space relate to a higherorder sampling in the lowdimensional space? Somehow these 2 members project to the same member in lower dimension, which means they are paired in a way that projection will cancel out the perturbations and only leave the mean? But why is that equivalent to a weighted member, what is the weight in this 3point case anyway? Linking back to the 4 points in 3D, are the positioning only allowed to be squares with no rotation? In the 3D case the tetrahedron is allowed to rotate arbitrarily. Here in 2D if the square is rotated they project to 4 points in 1D.
We totally agree with the Reviewer, each tetrahedron rotation implies a different shadow (i.e., projection). Only a few possible rotations of the tetrahedron project a perfect square in the xy plane, and even fewer rotations project a square that further projects into 3 points (instead of 4) in the x axis. The strength of GHOSH is exactly this: being able to catch the right rotation to achieve the best possible projections. Or, equivalently, the GHOSH sampling chooses ensemble members matching high order moments of the error distribution along some chosen directions (i.e., the principal components of the error), obtaining a better approximation of the mean along those directions.
Finally, why does three weighted samples in 1D give a 5th order approximation? I don't see any moments in the sample distribution higher than the 2nd.
In the simple case of Fig. 1, the discrete distribution represented by the 3 points in one dimension has odd centered moments equal to zero for symmetry reasons (and the same holds for the standard normal error distribution), while 2nd and 4th moments are different from zero (they are equal to 1 and 3, respectively, in the standard normal distribution). In order to ensure the matching of the moments up to order 5, symmetry is needed for the 3rd and 5th moments while weights are necessary for the 2nd and 4th moment matching.
At this point, I would rather you don't show this figure at all, but replace Lines 89108 with a more stepbystep reasoning, or formal proof, of the statement such as "2r samples give a 3rdorder approximation of rdimensional space".
The scope and contents of Fig. 1 can be clearer to the reader considering that one of the novel aspects of the GHOSH is a higher order of approximation of the mean, while it is not intended to provide higher order moments. Figure 1 illustrates how the order of approximation of the mean is related to the subspace dimension thanks to the opportune GHOSH sampling strategy.
In particular, in the top panel of Fig. 1, the square shaped shadow (i.e., projection) of the 4 points chosen by the GHOSH sampling have, on the xy plane, the statistical moments up to order 3 equal to those of the 3D standard normal distribution (shaded areas in Fig. 4). The equality of the statistical moment up to order 3 implies an approximation of the mean up to order 3, as demonstrated in Appendix A (this is taken for granted in Figure 1).
In other words, Fig. 1 shows, in the simple case of 3D standard normal error distribution and 4 ensemble members, that the discrete probability distribution (represented by the ensemble members  yellow dots  opportunely chosen by the GHOSH sampling) has some statistical moments in common with the 3D standard normal distribution. Indeed, as illustrated by moving from top panel to lower panels,, the moments are the same up to order 2 in the xyz space (implying second order of approximation of the mean as in ETKF, SEIK and other second order filters; Pham, 1996) and, thanks to the ensemble orientation provided by the GHOSH sampling strategy, they are the same up to order 3 (implying third order of approximation of the mean) in the xy plane. Finally, the moments are the same up to order 5 (implying fifth order of approximation of the mean) along the x direction (if opportune weights are chosen, i.e. the GHOSH weights). The relation between the order of the matching statistical moments and the order of approximation of the mean is demonstrated in Appendix A.
3. In your Lorenz96 experiments (Fig. 3), based on the ensemble spread compared to the error, the SEIK filter has basically diverged at around t=5. The climatological error is about 4 for the Lorenz96 system and you can see that after t=5 the SEIK filter solution is as bad as random draw from climatology. I would argue that you need to have a more stable setting for both SEIK and GHOSH here to establish a comparison, especially when we know that SEIK should work for the Lorenz96 system. For some cases when the SEIK is stable, can you still show that the new GHOSH filter will achieve higher accuracy or at least perform as well as the SEIK?
We thank the Reviewer for this comment that can help us to clarify some details of the Lorenz96 settings. Figure 3 provides one example of the 56.000 Lorenz96 experiments that have been carried out and that are resumed in Fig. 4. Moreover, Fig. 4 shows that SEIK was tested also in stable and convergent conditions, i.e., green squares in the two top panels in Fig. 4. In addition, it is worth noting that the comparison between SEIK and GHOSH has also been tested considering the best tuning in terms of inflation for both filters (“best” line in each panel of Fig. 4) always resulting in GHOSH working as good as SEIK or better.
4. In the verification of filter performance for both models, I mainly see the use of RMSE and ensemble spread as error metrics. This means you are still evaluating the first two moments of error distributions. However, the GHOSH filter is motivated by its capability of capturing higher moments in error. Why didn't you show any proof that it is behaving as expected? For example, in the Lorenz96 case you can compute the skewness and kurtosis of the error distribution and compare the two filters. This will truly be a proof of concept, rather than implying the skill from the forecast error in ensemble mean (1st moment).
As discussed above, the main goal of the GHOSH filter is to achieve higher approximation order on the ensemble mean without focus on higher order moments. Thus, we think that RMSE is a sound metric to compare the GHOSH performances with respect to SEIK.
We would like to thank the Reviewer again, and we will go through more detailed answers to all his/her comments after receiving all the needed reviews in the GMD open discussion.
Best regards,
Simone Spada on behalf of all the coauthors
Citation: https://doi.org/10.5194/gmd2023170AC1

AC1: 'Short reply on RC1', Simone Spada, 22 Dec 2023
 RC2: 'Comment on gmd2023170', Anonymous Referee #2, 02 Feb 2024

RC3: 'Comment on gmd2023170', Anonymous Referee #3, 03 Feb 2024
Review of "GHOSH v1.0.0: a novel GaussHermite HighOrder Sampling Hybrid filter for computationally efficient data assimilation in geosciences" by S. Spada et al.
The manuscript introduces a higherorder ensemble sampling scheme and a corresponding ensemble Kalman filter analysis step. The sampling scheme utilizes ensemble weights to represent higherorder moments of the ensemble distribution without necessarily increasing the ensemble size. The analysis step is motivated by the errorsubspace Kalman filter SEIK and utilizes the ensemble weights. Additional sampling steps are introduced during the forecast phase and after the analysis step in order to maintain the higherorder sampling to the prescribed order. After the introduction of the sampling and assimilation algorithm, the method is first assessed in comparison to the SEIK filter in twin experiments using the chaotic Lorenz96 toy model, which is frequently used to assess data assimilation methods. Subsequently, the new method is used in a realistic model application in which satellite chlorophyll observations are assimilated into the OGSTMBFM biogeochemical model. The manuscript concludes that the method can result in better state estimates than the SEIK filter for the same ensemble size in case of the Lorenz96 model. Further it is concluded that using higherorder sampling is feasible in a realistic application and can result in better state estimates compared to using only secondorder sampling in the biogeochemistry data assimilation application.
Before I come to scientific assessment, I like to point out that this manuscript does not seem to fit well into the scope of GMD. Presented is the development of a sampling and data assimilation method. This is neither a new model development, description, assessment or development of new assessment methods, which are the scope of GMD. The data assimilation is implemented and used with a toy model and a realistic oceanbiogeochemical model, but the focus is on the algorithm and no particular modelrelated developments are described. To this end, I recommend to transfer the manuscript to a journal with a better fitting scope. Within the journals of the EGU, this is Nonlinear Processes in Geophysics (NPG).
From the scientific viewpoint, the proposed algorithm is new and the experiments show that it has the potential to improve the skill of ensemble data assimilation by accounting for higher order moments in the ensemble distribution and hence improving the mean and covariance estimates used in the analysis step of the ensemble filter. However, the manuscript needs to be improved at different places. The description of the algorithm has to be refined and it has to be better explained. Further the numerical experiments have to be organized in a way that the benefit of the new algorithm becomes clear in typical settings. The application to the 3D oceanbiogeochemical model seems to overload the manuscript while it does not even compare the results to a standard algorithm. Overall, this requires a major revision of the manuscript.
Major comments:
On the derivation and mathematical treatment: At least for GMD, with is not a mathematical journal, the manuscript explains far too less of the algorithmic aspects. At several instances the authors just state 'can be proven', but omit attempts to explain why some equation holds. For me this is clearly insufficient and it is likely that readers cannot follow the descriptions. I do not recommend to include proofs (given that GMD is not a mathematical journal), but the authors should sufficiently explain why some relationship or equation holds. This does not need to be long, but will help the readers tremendously, while currently the authors force the readers to 'just believe' what they have written. (This comment will likely also hold should the manuscript be transferred for NPG, since also this is not a strictly mathematical journal).
Experiments with the Lorenz96 model: The first set of numerical experiments is conducted using the Lorenz96 model. The experiments show that the newly proposed GHOSH filter can yield smaller RMSEs by up to 66%. Given that this error reduction of the GHOSH filter relative to the SEIK filter is very large (I don't think that I have seen such magnitude even in comparisons of fully nonlinear particle filters with EnKFs), I recommend to be particularly careful in ensuring that the experiments are representative and comparable to previous studies. In particular, one has to ensure that this advantage of the GHOSH filter is not an artifact of the particular model and data assimilation configuration. In this respect I see several weaknesses as the model configuration and implementation are rather untypical, the initialization of the ensemble is different from other studies even for the SEIK filters, and the experiments are far too short. I will comment in more detail on this further below.
Experiments with the realistic OGSTMBFM model: In these experiments, in which real satellite chlorophyll data are assimilated, only different configurations of the GHOSH filter are assessed. Thus, a comparison to an EnKF like SEIK is missing. The result of the experiments is that using order 5 yields comparable RMSD for chlorophyll and phosphate but lower RMSD for nitrate, compared to using a lower order. The magnitude of this effect depends on the ensemble inflation, and for too strong inflation also the GHOSH filter leads to a deterioration of nitrate compared to the control run. To this end, the higherorder sampling can yield a better result in the multivariate update for a particular choice of the forgetting factor. Unfortunately, the manuscript does not provide insights into why this does happen.
Further, maps of the chlorophyll and phosphate as well as Hovmoeller plots of chlorophyll and phosphate are discussed. However, these discussions do not provide any further insight into the GHOSH filter, but merely show that this filter can be successfully applied to this model. These results might be interesting for researchers interested in chlorophyll assimilation, but for a methodological study on this new filter method, the experiments provide very little insight. Only the first part including Table 2 seems to be relevant, but here a comparison to the SEIK filter is missing.
Given the complexity of the model and the expert knowledge a reader needs to appreciate the discussion on the effects on chlorophyll, nitrogen and phosphate, I recommend to remove this experiment from the manuscript. In order to assess multivariate assimilation (which cannot be done with the Lorenz96 model) I rather recommend to utilize some other idealized multivariate model. Perhaps, the 2scale Lorenz2005 model could be a possible choice. For the methodological study it would be very important to actually assess the reasons for why multivariate updates are better (if they are). Just showing that they are better in one particular model configuration without understanding the reasons has little value because one cannot generalize the result. Below I will not further comment on details in the sections on the OGSTMBFM model.
Detailed comments:Title: It is not clear why the authors give their algorithm a version number, and this is very untypical. While I know that GMD requires version numbers when a certain model version is presented, this manuscript is not about a particular implementation of a model but about a numerical algorithm. This issue might again point to the fact that the manuscript is not well suited for GMD as I commented on before. Apart from this, I recommend to add to the title the word 'ensmeble' to clarify that an ensemble filter is introduced.
Abstract: The abstract includes unnecessary details. E.g. information how many experiments have been conducted with the Lorenz96 model ('two thousands', line 9) and which particular parameters have been varied (line 12) are not suitable for the abstract. The abstract is also irritating in that first 'GHOSH' is mentioned as a sampling method while in line 14 it is then named a 'filter'. Perhaps, one can better formulate this as introducing a filer and the corresponding sampling method.
The statement 'use of a higher order of convergence substantially improves the performance of the assimilation with respect to nitrate' (lines 20/21) is not valid in this form. Valid is that the higherorder filter improved the nitrate for the particular experiment, but this cannot be generalized in any way. The final sentence of the abstract on the computation time is superficial. The situation that most time is spend in computing model forecasts does not need to hold in general. Instead, the execution time has to be compared for the actual filter analysis including the resampling steps of the GHOSH filter. Please revise the abstract accordingly.Introduction section:
 Unfortunately, this section contains various small issues and it is difficult to comment on all of them. I focus here on the most relevant:
The section includes invalid referencing. E.g. the list of references in lines 2829 appears to be an arbitrary selection of studies that applied DA. Since there are many publications about applications of data assimilation it can be a better choice to cite review papers that summarize the state of research instead of picking 'some' article for which it is then unclear why the authors consider these as particularly relevant.
 With regard to citing review papers, I strongly recommend to cite those explicitly as reviews, e.g. using 'see, e.g.'. For example, Carrassi et al. (2018) is cited for 'modified implementations have been proposed for application to nonlinear ones' (line 35). This review paper did however not introduce the 'modified implementations' but reviews what has been published in other original studies. This likewise holds for the citation of Bannister (2017b) which is cited for 'EnVar' (line 38), but is a review of variational methods. (BTW: The second paragraph of the introduction does not seem to be relevant for this study on an ensemble filter method and it could be omitted without loss of relevant information). Further, for books it is common practice to cite them providing a chapter where the particular information can be found (Readers should not be required to read a whole book). Another case of invalid reference is e.g. citing Bocquet et al. (2010) for 'particle filter methods'. The paper discusses methods for nonGaussian data assimilation, but it is not an original paper for particle filters. In any case, there are more recent review papers, like the cited review by van Leeuwen et al. (2019) which better cover the state of research.
 It is also important that the authors provide references for claims they include in the text. E.g. in lines 30/31 it is written 'the use of DA has been steadily increasing in the last decade and it is now ubiquitous in earth sciences, also thanks to the unprecedented and always increasing availability of both data and computational capability.' I'm not sure from where the authors got the insight that the increasing computational capability and data availability lead to an increased use of DA. However, I would suppose that there should be some review paper examining this. Actually, most data types that are currently assimilated in the ocean and biogeochemistry (e.g. sea surface hight, sea surface temperature, chlorophyll) have been available for more than a two decades and have already been assimilated more than 10 years ago. Also the computing power was sufficient more than 10 years ago even to perform ensemble data assimilation as is evident from data assimilation studies that were published from 1520 years ago. The major changes appear that the model complexity and resolution was steadily increased. These facts obviously contradict the authors statement.
 Related to this, the authors state that recently the use of ensemble algorithms has been proposed 'thanks to the scalability of parallel implementations' (line 37). Then, Evensen (1994) is cited for the EnKF. I cannot really see that a paper that was published 29 years ago is 'recent'. It is further not evident that these methods were proposed because of the scalability of parallel implementation since Evensen (1994) proposed the EnKF as a method to use a sampled covariance matrix as a dynamic estimate of the uncertainty in order to advance the extended Kalman filter that was used by Evensen before (see Evensen, 1992, 1993), but not as a method solving a scalability issue. Please revise the text, to avoid such potential misleading statements.
 There are also further questionable statements.
 E.g. in lines 49/50: 'second order sampling methods provide only a second order approximation of the model'. Actually, the ensemble in ensemble KFs is not meant to approximate the model, but it is used to represent model uncertainty, i.e. the probability distribution. One should be very clear about this.
 Then in lines 50/51: 'the second order approximation is more effective the closer the ensemble members are to each other'. Here, I suppose the authors imply that the nonlinear effect are smaller if the perturbation of the ensemble members is small and hence the secondorder approximation should hold better. While this seems to be natural, Rainwater and Hunt (2013) found that a smaller ensemble spread did not improve the results of an EnKF. Thus, the statement does not seem to be valid in this form.
 Related to this is the statement "a relatively large ensemble spread is often required in data assimilation applications," (line 52). I'm not aware of that this is true and the authors would need to provide a reference for this statement.
 Also the statement "second order methods use an ensemble of r + 1 members to span an rdimensional error subspace" (line 57) is not supported by the literature. Actually the perturbedobservations EnKF by Evensen (1994) is secondorder (because it uses the Kalman filter analysis step), but it is not aimed at spanning an rdimensional subspace. Using a particular sampling with r+1 members was introduced by Pham with the SEEK and SEIK filters, but not widely adopted. Also the ensemble initialization used with the (L)ETKF scheme is not explicitly 2nd order. This also holds for the notion of the 'error subspace', which was further discussed by Nerger et al. (2005) while other filter methods, like the ETKF or newer variants of the EnKF (Evensen 2003/2004), were developed without this notion.
 Please note that the reference Pham (1996) is incomplete, but it is likely also not a valid reference according to citation standards. It is a technical report and was never officially published in a peer reviewed journal. While I'm aware of this report, I also could not find it any more on the internet. To this end, I recommend to replace it by a valid reference to a peerreviewer publication. The first real publication of the SEIK filter seems to be Pham et al. (1998b), different from what is cited in the manuscript. Since this article is mainly in French and only has a shorted text in English, a better reference would be Pham (2001).
 The introduction misses to explain what the authors mean by 'high order' by 'order' at all. Given that GMD is not a mathematical journal this should be explained.
 Please be careful when discussing a sampling in contrast to discussing filtering. E.g. lines 65/66 state "we propose a novel weighted ensemble method based on a new highorder sampling, that provides ensemble mean estimates of order higher than 2". Actually, while the existing ensemble Kalman filter method use the secondorder analysis scheme (treating only mean and covariances), they are not required that the ensemble was sampled with second order (in fact unless one uses secondorder exact sampling, the ensemble sampling does not consider the sampling order.)
 Also related to lines 65/66, I a bit irritating about the implied meaning of 'order'. I understand it as accuracy in sampling the PDF. However, then the mean is the first order moment of the PDF. How can this moment be accurate by 'order higher than 2'. I'm likely missing something here, but I think that it is likely that many readers will also wonder. To this end, I see the clear need to explain the terms 'high order' and 'higher order' so that the readers knwo how to understand the discussions throughout the manuscript.
 In this respect, please also check whether the distinction in 'high order' and 'higher order' is required and if so, please explain the difference.
 I will stop with details on the Introduction at this point. Overall, the authors should carefully revise the introduction. In this they should ensure that the statements are true and that they are sufficiently supported by the literature. This implies to provide references to claims as described before. It further implies to be careful when citing review papers so that it clear that no original works are cited (This also holds in other sections, e.g. the reference to Carrassi et al. (2018) in line 90 and line 264 are also not valid  Carrassi is neither an original reference for deterministic squareroot sampling methods not for the forgetting factor).Figure 1 and lines 93104: The figure is difficult to understand and I'm not sure if it is really sufficient to visualize the 3rd order sampling in 2 dimensions or 5th order in 1 dimension.
 Please include a clearer explanation for the shaded circular regions (in the top panel) in the caption (the caption states 'isosurfaces', but of what?).
 In the 2D and 1D views I cannot see that a 3rd or 5th oder approximation is shown. It is also not obvious that the 2D view is a projection of the tetrahedron. In the way this is plotted, the lower triangle looks like it's parallel to the yx plane so that the projection would be a triangle with one sample in its center.
 With regard to lines 102/103, I cannot follow why 3 weighted members should be able to represent an approximation of order 5. With weights one can obviously shift the mean, variance and skewness. But it is unclear how kurtosis could be expressed with 3 members (perhaps even with 4 members). Perhaps this irritating is due to the fact that it is not clear how the moments would be computed in this case.
 It would be useful for the readers if the figure is replotted with a better perspective. Likewise the text should better explain the how the orders are expressed in the sampling. Instead of stating 'it can be proven' in line 98 it would be useful to explain why this is the case and perhaps explain how a skewed (e.g. nonGaussian) distribution would be represented (I don't think that this is a projection of the tetrahedron representing the distribution, but a sampling of the projection of the actual distribution). Since the example uses a Gaussian distribution whose higher moments are generally zero (for odd orders) or have a constant value (for even), it seems to be difficult to actually sketch the representation of higher moments. Showing how higher moments could be represented for a nonGaussian case could be useful. It might also be useful to show the example on a Gaussian case that has different variances per direction.
 Linked to this is the sentence in lines 103/104 stating 'what we have obtained is a 3dimensional second order weighted ensemble which is a thirdorder approximation in the xy plane and a fifth order approximation along the x axis.' For me, this statement is not evident from the figure. It might be that the sketched samplings in the xyplane and along the xaxis are of this higher order  perhaps because they represent zero higher orders, but this is not visible from the dots that are shown in the figure.
 For the nonmathematical readers of GMD it might also be useful to avoid the term 'standard' for the normal distribution but instead mention the variance explicitly.Lines 110/111 mention that a higherorder sampling of a Gaussian distribution can be obtained by the GaussHermit quadrature rule. At this point reader might likely wonder why a higherorder sampling of a Gaussian is required, when one can fully represent it by its mean and covariance and the secondorder exact sampling provides this sampling. On the other hand, it might be obvious that a higherorder sampling helps for nonGaussian distributions, but in this case the GaussHermite quadrature doesn't seem to hold. It would be good if this aspect, which closely linked to the motivation of the method, is better explained. Perhaps, it becomes clearer when the authors explain 'high/er order' as commented on before.
Lines 112114 describe that the higherorder sampling can be performed for a limited number of directions, while for the other directions secondorder sampling can be used. Here I'm wondering if this holds after the application of the model. If we have, e.g., a polynomial function that includes terms that 'mix' (e.g. by multiplication) the directions used for higherorder sampling with those with secondorder sampling, one obtains mixing effects. With this the clear separation in higherorder and secondorder directions should no longer hold. Does the resampling during the forecast corrects these effects? Please explain these effects in a clear way.
Equation (1): I'm somewhat lost here. There are many indices which influence each line of the equation system which I cannot really disentangle. I think it would be useful to show some more lines, maybe the first 3, in addition to the general one. Is the subscript 'j_1,...,j_\xi' just one number or a list of indices. How many equations are in the system? (This relates to line 137 where it is stated that a large value of h required a smaller s or larger r, which is qualitative but does not provide an indication of actual counts for r/s/h)
Line 133: Here the 'Gaussian case' is mentioned. Usually we consider a Gaussian to be a distribution that is fully described by the first two orders. Why should a higherorder sampling be relevant for this?
Line 130: It is stated: "Such probability distribution must be uncorrelated, normalized and have 0 mean". Please explain why this 'must be'.
Line 138: I cannot follow the explanation how \Omega_h is defined. It is described as a "matrix with coordinates"? What does this mean. Is it possible to provide a proper definition, e.g. in form of an equation?
Equation (4): This equation is also difficult. It combines \Omega_h with an orthogonal random matrix of size (r+1)x(s+1) and some other orthogonal (r+1)x(rs) matrix. The construction is unclear to me. Since the second matrix has rs columns, does one need to ensure that it is orthogonal to the other matrix. Further, Pham (1996) does not seem to provide a scheme to generate such a matrix, but only a matrix of size (r+1)x(r). The readers are here left alone by speculating what the correct matrices might be and how they might be generated. In doubt it should be possible to provide a method to generate the matrices in the Appendix.
Equation (6): This equation shows that the filter relies on the covariance information contained in the matrix L. This looks particular, since usually we consider higher order to reach beyond the covariance matrix, whic his the second moment of the PDF. For me it is unclear how higher orders can be taken into account if here only the first two orders are used. Please provide an explanation for this particular formulation of the algorithm.
Line 178 and following: Line 185 states that the forecast resampling is performed at each time. It it is unclear whether this sampling is only done once after a whole forecast period or whether it is done after each time step of the time stepping scheme (The definition of 'time t_i' is not clear). Figure 2 might indicate that it is each time step, but it is not clear. Further, we know that resampling likely violates balance constraints that are contained in the model states of physical models. Does the resampling performed here also have such effects?
Line 224: The statement "other T can be explored without affecting the algorithm (e.g., see Nerger et al. (2012))." is incorrect. Actually the cited study shows that the projections on the SEIK filter are inconsistent and that for consistency a different projection is required. This does obviously 'affect the algorithm'.
Equation (26): The GHOSH filter seems to use the same backprojection from the errorsubspace to the state space as the SEIK filter. Nerger et al. (2012) have shown that this is inconsistent. Given that the authors are aware of the results of Nerger et al. (2012), I'm wondering why they decided to develop a new filter scheme on the basis of a filter that was shown to be mathematically inconsistent. A consistent form could be built easily following the ESTKF introduced by Nerger et al. (2012), which is likewise an errorsubspace formulation.
Equation (31): Using a model error covariance matrix Q in this form can be inconsistent for nonlinear models unless it is applied at each time step. This relates to me earlier comment requesting a clarification whether the resampling is performed at each model time step or only at those time steps at which also an analysis step is computed. If it is not applied at each time, please provide additional explanation why this linearized form of applying model errors is used.
Lines 264/265, Eq. (32): It is stated "The last term in equation (32) is one of the novel element of the present work.". This statement is not fully true. Actually the original papers about the SEIK and SEEK filters (Pham et al., 1998a,b) include the model error covariance matrix Q. There, also a projection with L is used. A difference is that Pham et al. use a projection on Q while here a projection on the inverse of Q is used. (Given that Q is a large matrix in realistic applications, the projection by Pham et al. is likely computationally more efficient.)
Line 262: I cannot follow the argumentation that the addition of Q is a 'hybridization' (the manuscripts states this already in the introduction and further argues for it around line 603 in the discussion section). The hybridization as, e.g., explained in the reviews by Bannister (2017a, 2008) is a combination of a timedependent covariance matrix with a covariance matrix representing climatology. The combination of both is used to represent the state error covariance matrix P. This is different from adding a model error covariance matrix which represents dynamic model uncertainties. Further, the addition of the model error covariance matrix is neither new nor a requirement of the GHOSH filter. Likewise it could be applied in the SEIK filter as the publications by Pham et al. (1998a,b) showed, even though this approach seems to be hardly used nowadays. To this end I recommend to avoid the potentially misleading term 'hybrid'.
Lines 284/285: On the forecast resampling of the GHOSH filter it is stated "it takes into account the model error effects in equations (31) and (32), which are otherwise neglected". The claim that without the addition of the model error covariance matrix Q in the sampling the effect of model errors would be neglected is not fully true. In modern applications of ensemble Kalman filters, the model error is typically represented by stochastic perturbations during the model integration instead of adding a model error covariance matrix at the end of the forecast phase. (The perturbations are expected to allow for a better representation of model nonlinearity on the model errors compared to the linearized effect of adding a model error covariance matrix)
Lines 314/315: "A reasonable choice is a polynomial weight function"  what makes the particular choice 'reasonable' and why was this form chosen? Actually, the proposed function in Eq. (41) seems to be uncommon (Most common is the 5th order polynomial of Gaspari and Cohn (2006), which is also a covariance function). Is the definition in Eq. (41) complete (it seems that f>0 for d>d_l, which should not happen)?
Section 4.1 Lorenz96 experiments
As mentioned before there are several weaknesses which should be corrected. In the current form the configuration seems to be particularly problematic for the SEIK filter, e.g. due to too short experiments and too less inflation, so that the strong improvement by the GHOSH filter relative to the SEIK filter is caused by the particular configuration.
 Perhaps it would be useful to include 'Lorenz96 model' in the section title. Given that this is a toy model, the experiments are necessarily twin experiments.
 The model is configured with state dimension 62. While this is a valid size, this value is very unusual and I'm not aware of other publications using it. Can the authors give a reason for this particular choice which makes it difficult, if not impossible, to compare the results to previous studies? The properties of different ensemble Kalman filters when applied to the Lorenz96 model have been assessed before; thus they can be considered as known. The SEIK filter was however not often used, but it was applied with the Lorenz96 model e.g. in Nerger et al. (2012). It would support the results of the manuscript if the authors could present consistent result to such previous studies.
 The duration of the experiment is only 20 time units. This results in experiments with observations being available in intervals between 0.1 and 0.3 time unit result in only between 200 and 66 analysis steps. As it is known that the filters sometimes converge slowly, the experiments likely mainly assess how fast the filters converge compared to how well they perform after convergence. In addition, the cases with longer observation intervals are expected to be quite unstable and one needs to average over a sufficient number of analysis steps to get interpretable results. To this end, length of the experiments is much too short. E.g. Sakov and Oke (2006) run over 11000 time units while Nerger et al. (2012) run over 2500 time units. Using such longer experiments would yield comparability to previous study with the Lorenz96 model. Actually, I experimented with the code provided in Zenodo and the case EnsSize=31, delta_obs=0.1, forget=0.7 and t_span=[20.0, 100.0] yields the same RMSE level for both filters in the second half of the experiment. This also happens for delta_obs=0.15, forget=0.7, but the spinup for the SEIK filter is slower. Thus, with longer experiments at least the asymptotic difference in the filter results becomes very small also for the shorted foreast periods in contrast to what is shown in Fig. 4. This seems to change for delta_obs>=0.2 where the SEIK filter yields larger RMSE and tends to diverge for EnsSize=31 and lower. For EnsSize=63, the RMSE seems to be nearly identical (this might be indicated by the pink color in Fig. 4, but the colorscale only allows a qualitative estimate because it scales with 1/3, 1/2, 1 so that the values at the ticks for <1 are unclear)
 The initial ensemble is sampled using a diagonal error covariance matrix. This is very unusual since typically secondorder exact sampling is applied to a covariance matrix that is estimated from the model dynamics. I like to point out that usually when we apply a toy model like Lorenz96, we attempt to make the experiments as realistic as possible, but a diagonal matrix would not be used in real applications. If a diagonal matrix is used to generate an ensemble of r+1 states only r elements of the state vector will be perturbed (usually this would be the first r elements of the state vector, but in the provided Python code the order is changed due to the use of a sorting routine (resorting the prescribed value 5 in all elements) and the perturbation is distributed. This distribution was not changed when repeating the experiments, e.g. elements 16 and 62 were perturbed in case of ensemble size 3), which might also influence the results). Using the common approach of a matrix sampled from a model trajectory would yield a matrix that is not diagonal. Thus, the eigenvectors will not be the unit vectors and more elements of the state vector will be perturbed. This should lead to more realistic data assimilation experiments which are more representative. Hence, I strongly recommend to use this common approach.
 The model is run using the Python library solver 'solve_ivp'. As also described by the authors as a possible reason for the long run time and instability with the SEIK filter, this solver does not use a fixed time step size. Given that the Lorenz96 model is deterministically chaotic, its behavior will vary when the time step size is varied. To this end, it should be better to use e.g. a classical RungeKutta 4th order implementation with fixed time step size (which 0.05 is a typical value). This is also the typical implementation as e.g. used by Nerger et al. (2012).
 For ensemble size 15 the filters likely diverge (From Fig. 4 the RMSE appear to be at around 4, and Fig. 1 of Nerger et al. (2012) shows divergence for ensemble size <18, which is related to the value of the forcing parameter in the model). There is no point in comparing the performance of diverging filters. Both filters fail, but one a little less than the other. This likewise holds for the statement on ensemble size 7 in line 495. In fact a filter divergence might also happen for ensemble size 31 for the observation intervals 0.25 and 0.3 for both filters. The RMS error for the SEIK filter seems to be above 3, which might be high enough to indicate filter divergence. For the GHOSH filter, the errors seem to be between 2.5 and 3, which might also be divergence. (Unfortunately, the colorbar makes it difficult to see such details. Please consider to use a colorbar that not only varies brightness, but also the hue). In any case, it is obviously easy to check for filter divergence  if the estimated error, i.e. the ensemble spread, is much smaller that the true error (RMSE), the filter has diverged. I recommend to perform such check on all results. Further, if the RMSE is not less than the long term standard deviation of the model dynamics (which seems to be about 3.6) the data assimilation obviously failed.
 Particular cases seem to be the observation intervals 0.2 and 0.25. Here it might be the that SEIK filter diverges, while the GHOSH filter still converges. This is a particular difference, which could be interesting to analyze. (However, it might be influenced by the effect in the next statement)
 In the cases with observations intervals 0.2 and larger, the GHOSH filter shows smaller RMSE which are still at a high level. Since the analysis errors are show, this points to the question whether the sampling in the GHOSH filter leads to a higher ensemble variance than the SEIK filter. This would basically allow for a larger increment in the analysis step. I recommend to check whether this holds, since this effect could be unrelated to a representation of higher orders, but mainly an effect of more randomness.
 For the observation intervals 0.1, 0.15 and 0.2, Fig. 4 leaves the impression that the forgetting factor of 0.7 is still too large for the SEIK filter. The optimal inflation is obviously visible when the RMSE is minimum and when it increases for even smaller values of the forgetting factor and this point should be visible in the figure.
 There are also some important pieces of information missing: Firstly, I suppose that the experiment is run without localization, but this is never stated in the manuscript. In addition, the choice of the actual solver in 'solve_ivp' is not mentioned. Also, the observation error variance should be explicitly stated
 The numerical experiments use random numbers. I recommend to reinitialize the seed for the random number at the beginning of an experiment. Only in this case, one can obtain reproducible results. One can run with different seeds to exclude that a particular interpretation results from these.Section 5.1, title: As for section 4.1 I recommend to mention the Lorenz96 model in the section title
Line 490: Please define 'best'. I suppose that the result with the smallest RMSE is meant.
Line 494: The results for ensemble size 7 were not shown "since in this case SEIK and GHOSH behave very similarly, showing very poor performances". This is the known behavior that without localization the filters diverge if the ensemble is too small. However, this likewise holds for ensemble size 15 which should also be removed. There is no point in discussing RMSEs of diverged filters.
Fig. 3: The figure shows that the first two variables are very well estimated by both filters (and partly better by the SEIK and the GHOSH filter) in between the times ~1 to 3.5 for the first variables and even longer for variable 2. In contrast the RMSE is still large at around a value of 3. This indicates that most of the other variables of the model have a much larger error. This likely supports my earlier statement on the ensemble initialization: For an ensemble of size r+1 only r elements of the state are perturbed in the ensemble and can hence not be corrected by the data assimilation. Only if the model dynamics act for long enough time, ensemble spread will build up, but at this time, the ensemble spread might already have become too low to achieve a convergence of the filter. At this point, experiments with the Python code show that a larger inflation (e.g. forget=0.6) helps to obtain convergence. The resampling performed by the GHOSH filter might distribute the spread faster due to its random effects. If this happens this advantage is the GHOSH filter would mainly be induced by the particular initial sampling and might not a an effect of higher orders. Please check if such an effec tis present or can be excluded.
Lines 498501, Fig. 4: It is described that the advantage of the GHOSH filter is particularly large if the observations are very frequent. This result looks surprising for me. We know that the nonlinearity of the DA problem increases if the forecast phase is longer, but here the higher order sampling, which should be relevant for nonGaussian ensemble distributions, is particularly good if the forecast phase is short. Also the experiments with the Lorenz96 model show a smaller improvement of the GHOSH filter relative to the SEIK filter for longer forecasts despite the fact that these increase the nonlinearity of the DA problem. Further it looks surprising that the 'best' result of the GHOSH filter in case of the largest ensemble size of 63 is only slightly better than that of the SEIK filter, while for ensemble size 31 the difference is larger. Here, the statement on lines 500/501 that the largest improvement is obtained when "the filter can take into account a high dimensional error subspace (i.e., the ensemble size is large)" does not seem to hold. If so, why is the improvement of the GHOSH less for the largest ensemble? The larger ensemble reduces the sampling errors and this should be the case for all orders. Here it looks like that the reduced sampling errors in the secondorder sampling have a stronger effect that the higher order sampling. This effect should be carefully discussed. For the figure it would be useful to clearly mark cases in which the filters diverge.
line 512: The RMSE of the GHOSH filter is mentioned to be slightly more reduced that those of the SEIK filter. Are these differences statistically significant?
Computing cost: The computing cast is discussed shortly for the OGSTMBFM model, but no actual numbers are provided. However, it should be possible to provide timings for the Lorenz96 model case where one can compare the time of the SEIK filter analysis step with the timing of the GHOSH analysis step and sampling. This cost should be considered separately from the cost of the model. (The model is likely faster than the model in this case since the forecasts are only between 2 and 6 times steps of a RungeKutta scheme)
Lines 613618: This paragraph contains several overstatements. The error reduction that was achieved mainly showed that the filter method works successfully, but it does not show an advantage compared to e.g. the SEIK filter (which was not applied here). The statement that nonassimilated variables are not degraded cannot be generalized. In addition, it is incorrect as Table 2 shows: For daily assimilation there is degradation of a forgetting factor of 0.8 is used. For weekly assimilation, there is degradation for h=2 and h=3 if a forgetting factor of 0.5 is used. (This comment is no longer relevant if the 3D experiments are removed as recommended)
Line 623: "it has been shown that the GHOSH increased accuracy reduces instabilities and numerical divergence". This statement is not valid in this form. For the Lorenz96 model in the particular implementation used here, the statement holds. However, this finding might be specific for the time stepping methods used here and it cannot be generalized in any form.
Lines 656659: Here implications the 'higher polynomial order' or the GHOSH sampling is discussed. Linked to the methods section 2, the relevance of the polynomial order was never explained, but only mentioned. As such it is unlikely that many reader can follow the discussion. Given that neither GMD nor NPG are mathematical journals, I recommend to explain the relevant aspects already in Sec. 2.
Lines 663667: " this strategy might lead to inaccurate estimations because the model error covariance matrix Q_i is not taken into account in the interpolation". This statement does again not take into account that there are other possibilities to apply model errors in the ensemble integration. When stochastic perturbations are applied, model errors is taken into account in the 'interpolation' that is done at the analysis time. This then also holds for nonlinear observation operations mentioned in line 666.
Lines 671672: Here a relation of the constant ensemble weights in the GHOSH filter to the weight in particle filters a drawn. I cannot see any relation because the weights have a different meaning. In the GHOSH filter, the weights are used to represent higher order moments, while in the particle filter the weights represent likelihoods. I recommend to remove this statement. (Please note that the reference to Bocquet (2010) is invalid as was discussed for the Introduction)
Lines 674675, 698699: Here it is argued about the computing cost, which was also discussed in the results section of the OGSTMBFM model, where no actual numbers were provided. Actually, it should be possible to provide timings for the Lorenz96 model case where one can compare the time of the SEIK filter analysis step with the timing of the sampling plus analysis step of the GHOSH filter. This cost should be considered separately from the cost of the model. As mentioned before, simply arguing that the time for the model is much higher than that for the filter and sampling is superficial and should be avoided.
Lines 688689: "the GHOSH filter showed improved capacity to take into account nonlinearities by a lesser need of inflation with respect to SEIK". Here, I don't see how inflation should be related to nonlinearity. This was never assessed in the manuscript and I don't think this is a common relationship. As such, the statement should be removed.
Discussion section 6: It would be useful if the authors include a discussing relating the GHOSH filters to other existing filters that are aimed at nonlinear data assimilation and pointing out the differences. Partly this is done by mentioning particle filters in line 672. However, there are filters that are closer to the GHOSH filter. E.g. the Gaussian mixture filter (Hoteit et al., 2008, Stordal et al., 2011) seems to be related, but obviously different. Perhaps, also the nonlinear ensemble transform filter (Toedter and Ahrens, 2015) shares some similarities given that this is also a transform filter.
Code availability:
I was able to download the codes. Unfortunately, I was not able to find the GHOSH filter in the Fortran implementation. I found an option for the higherorder sampling but neither the place where it is applied and neither the analysis step of GHOSH. Maybe it would also be useful to provide a cleaner code (e.g. in the Python code there are many outcommented lines) and also some inline documentation (there are essentially no comments in the codes which makes reading them very difficult). In any case, this would just be 'nice', but a paper acceptance would not depend on this.Small things:
 line 162: 'identical' > 'identity'
 line 248: \Omega_i is not computed in Eq. (6), but only used there. Please rephrase
 line 335: 'by' > 'from'
 line 492: 'lines' > 'rows' (in general in all description of Fig. 4)
 line 611: 'up to 3 times better RMSE': 'better' is not well defined. Its better to quantity as e.g. 'up to X% lower RMSE'
References
Pham, D. T., Verron, J. and Gourdeau, L. 1998b. Singular evolutive Kalman filters for data assimilation in oceanography. C. R. Acad. Sci., Ser. II 326(4), 255–260
Pham, D. T. 2001. Stochastic methods for sequential data assimilation in strongly nonlinear systems. Mon. Wea. Rev. 129, 1194–1207.
Rainwater, Hunt, Ensemble data assimilation with an adjusted forecast spread, Tellus A, 65:19929, 2013
Hoteit, I., Pham, D.T., Triantafyllou, G. and Korres, G. 2008. A new approximate solution of the optimal nonlinear filter for data assimilation in meteorology and oceanography. Mon. Wea. Rev. 136:317334.
Stordal, A. S., Karlsen, H. A., Nævdal, G., Skaug, H. J. and Valles, B. 2011. Bridging the ensemble Kalman filter and particle filters: The adaptive Gaussian mixture filter. Comput. Geosci. 15:293305.
J. Toedter, and B. Ahrens, 2015: A secondorder exact ensemble square root filter for nonlinear data assimilation. Mon. Wea. Rev., 143:13471367Citation: https://doi.org/10.5194/gmd2023170RC3
Model code and software
PythonDA Simone Spada https://doi.org/10.5281/zenodo.8219559
OGSTMBFMGHOSH Simone Spada https://doi.org/10.5281/zenodo.10019279
Viewed
HTML  XML  Total  BibTeX  EndNote  

184  55  17  256  7  9 
 HTML: 184
 PDF: 55
 XML: 17
 Total: 256
 BibTeX: 7
 EndNote: 9
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1