=== General comment
This study evaluates the availability of the ensemble Kalman filter Python package, pyPDAF, for high-dimensional systems by comparing its computational performance with the Fortran-based PDAF. The authors implemented both packages using a simple quasi-geostrophic atmosphere-ocean coupled model. However, while the study aims to demonstrate pyPDAF’s applicability to high-dimensional systems, the model's dimensionality (10^4–10^6) is far smaller than that of operational high-dimensional systems, which typically have 10^8–10^9 grid points (see Comment #2).
Additionally, the ensemble size used in this study is only 16, which is insufficient for a system with dimensions of 10^4–10^6. For instance, the 40-variable Lorenz-96 model with ETKF and LETKF requires an ensemble size of at least 40 and 10, respectively, for reliable results. The small ensemble size chosen by the authors undermines the validity of their conclusions regarding pyPDAF’s performance. It appears that the authors selected this ensemble size to avoid the computational expense of eigenvalue decomposition for the ensemble-size square matrix in ETKF. However, operational systems commonly use ensemble sizes of 10^2, making it crucial to assess the sensitivity to ensemble size in this study (see Comment #18).
The study also suffers from additional shortcomings, including:
- Inaccurate or misleading terminologies (see Comment #1).
- Unrealistic experimental settings, particularly in the choice of observation errors and timescale definitions (see Comments #14–16).
- Inconsistent presentation and substandard English writing quality, which hinder readability and clarity (see Minor Comments).
Given these significant issues, this manuscript does not meet the standards required for publication in an international journal. To address these concerns, the authors need to:
1. Align the model dimensionality and ensemble size with the stated objectives.
2. Improve the experimental design, particularly in terms of observation errors, timescales, and methodological rigor.
3. Correct terminological inaccuracies and enhance the clarity of descriptions.
4. Substantially improve the quality of English writing and presentation.
Without these major revisions, the study's claims and findings cannot be considered robust or scientifically valid. The manuscript is not suitable for publication in its current form.
=== Major comment
#1) L10 and elsewhere: Please remove the term "model" from phrases like "model forecast" and "model analysis." While such phrasing may appear in other papers, it is likely to be an incorrect use of terminology.
#2) L15 and elsewhere: Since operational centers use data assimilation systems with a dimensionality as high as 10^8–10^9, the idealized data assimilation system employed in this study, with a dimensionality of 10^4–10^6, cannot be considered high-dimensional.
#3) L19–20: In the data assimilation (DA), the description that “observations constraint forecast and results in analyses” would not be accurate. DA combines simulations and observations using dynamical systems theory and statistical methods. This process provides optimal estimates (i.e., analyses), enables parameter estimation, and allows for the evaluation of observation networks. These explanations would be more appropriate.
#4) L34: The term “user-supplied functions” in Section 1 is unclear, although the authors may provide a detailed explanation later.
#5) First and second paragraphs in Section 2: The Kalman filter can be derived as a minimum variance estimator without assuming Gaussian distributions. However, in most cases, the ensemble Kalman filter (EnKF) assumes Gaussian distributions in the forecast fields, leading to the analysis field following the Gaussian distribution. For non-Gaussian assimilation, such as rainfall, transformations like the logarithm function might be applied. The authors' discussion on ensemble data assimilation appears insufficient. Since detailed discussions on data assimilation methods are not the focus of this study, it is suggested to remove these relevant descriptions.
#6–8) L92–94:
- Please cite two typical EnKFs: the Ensemble Adjustment Kalman Filter (EAKF; Anderson et al. 2001) and the Ensemble Square Root Filter (EnSRF; Whitaker and Hamill 2002).
- Please remove the mention of the deterministic ensemble Kalman filter (DEnKF), as it neglects quadratic terms such as KRK^T in its derivation, and therefore is no longer considered an EnKF.
- Please specify which EnKF and data assimilation (DA) methods are included in PDAF and pyPDAF.
#9) L123–124: Even in twin experiments and Observing System Simulation Experiments (OSSEs), true values are never used to generate forecast ensembles.
#10) L148–149: Since the amount of work required to implement additional Fortran and Python code depends on the users' skill level, this sentence may not be appropriate.
#11) L301–303: Please clarify why the transformation errors decrease as the number of grid points increases.
#12) L306–307: The minimum required ensemble size depends on various factors, such as sampling errors and system characteristics. Please provide a detailed explanation of how the unstable subspace dimension influences the ensemble size.
#13) L318–321: The descriptions of the amplitude of the observation error variance are inconsistent.
#14) Subsection 4.2: Please clarify why different variables with different units can be directly compared. Non-dimensional temperature and stream function are typically normalized in a quasi-geostrophic equation.
#15–17) Figure 5:
- Generally, observation error variances are much smaller than forecast error variances from free runs. The observation error variances used in this study are quite large, as they are dominated by the last 100 years with quite large forecast error variances in the free run. Since there are no model errors in the perfect twin experiments in this study, observation errors should be set much smaller than the current values. For reference, forecast errors approach around five over time in the Lorenz-63 and -96 models, while observation errors are prescribed to be one (i.e., 20% smaller).
- Please present temperature and stream function values using their respective units to enable comparison with practical systems.
- A model timestep of 0.1, corresponding to 16 minutes, is likely to be too short. Model drift continues even after 200 years, and the error doubling time appears to be around 100 years, which is much longer than the error doubling time of 2 days in the atmospheric system (Lorenz, 1996). Please clarify how the authors define the timescale.
#18) The minimum ensemble size is 10 in the LETKF-based Lorenz-96 system with 40 variables. However, the authors set the ensemble size to 16 members in a system with dimensions on the order of 10^4–10^6. Please demonstrate that an ensemble size of 16 members is sufficient for this system.
#19) In ETKF and LETKF, the most computationally expensive part is applying eigen-decomposition to a square matrix of size equal to the ensemble size by ensemble size. This study intentionally reduces the ensemble size to 16, but operational systems typically require an ensemble size on the order of 10^2 to achieve sufficient accuracy. Therefore, it is necessary to assess the sensitivity to ensemble size using a high-dimensional system in order to investigate the computational performance of PDAF and pyPDAF.
#20) L333–336: If the authors intend to maintain a dynamical balance in the initial conditions, it would be better to extract them from the free run rather than applying second-order exact sampling. Multiplying by a factor smaller than one may degrade the dynamical balance.
#21) L326: Please clarify why a daily assimilation interval is chosen in this study, given that a 6-hour interval is typically used in the atmospheric data assimilation community.
#22) L345 and elsewhere: The term "significant" should only be used if statistical tests are applied.
#23) Please clarify whether the authors calculate RMSEs and ensemble spread for forecasts or analyses throughout the manuscript (e.g., forecast RMSEs).
#24) L366–367: Please clarify how the authors control error growth and forecast errors.
#25) L368–369: The explanation of analysis accuracy at grid points without observations is not reasonable, as the authors have not consider the impact of ensemble correlation.
#26–27) Figure 6:
- Please show not only the RMSEs but also the ensemble spread for comparison at the same time.
- Please apply a paired t-test to compare the RMSEs between weak and strong coupled data assimilation.
#28) The term "error" typically refers to an instantaneous error, whereas "RMSE" represents the statistical expectation of errors. Please use these terms appropriately to clarify the distinction between them.
#29) L389: Please clarify what is meant by "transient atmosphere processes."
#30) L390–394: Please demonstrate that the ocean exhibits a 60-year timescale in the stream function field, which results in minimum RMSEs at a 60-year smoothing window.
#31-32) Subsection 5.2:
- Figures 8 and 9 show computational times at the analysis timestep on a logarithmic scale, making it difficult to directly compare the differences between PDAF and pyPDAF. Please clarify these differences by providing the ratio.
- Please include a description of the total computation time for one assimilation cycle, including both the forecast and analysis steps.
#33) L431: The term "very" is subjective. Please describe this in a more objective manner.
#34) L497: This sentence is inconsistent because the computational times for PyPDAF and PDAF are different. Please revise it to reflect the results obtained in this study.
=== Minor comment
#35) L5: "exists" -> "are"
#36) L5 and elsewhere: "need" → "demands"
#37) L15-16: Incorrect spelling of LEKTF (Local Ensemble Transform Kalman Filter).
#38) L19 and elsewhere: "weather and climate" → "atmosphere and ocean"
#39) L29: Please spell out "DAPPER".
#40) L38: Please specify which models are coupled.
#41) L39 and elsewhere: There are no URLs, although the authors mention the date of last access.
#42) L74: "initialization" -> "initial"
#43) L78: Incorrect grammar.
#44) L97: "counter" -> "mitigate"
#45) L119: "ensemble-based 3DVar" -> "ensemble variational data assimilation"
#46) L137 and elsewhere: "observation vectors and error covariance matrix" -> "observations and observation error covariance matrix"
#47) L139: Please specify the "direct" observation operator.
#48) L142: "would be" → "is"
#49) L179: Please add the last access date.
#50) L266: "coupled" → "implemented", "implemented" → "written", "that is coupled directly" → "and is implemented"
#51) When connecting two sentences, please insert a comma before "and" to enhance readability.
#52) The use of "respectively" seems incorrect. A comma should be added before "respectively". For example, insert "each" before "Fortran" and remove "respectively" in L268.
#53) Eqation (2): Please add an explanation for "n".
#54) L315: Please specify the meaning of "ensure … model state".
#55) L329: Please specify a forgetting factor. Is this the same as the relaxation parameter in the relaxation-to-prior perturbation and spread methods?
#56) L338: Please remove "generally".
#57) L357–358: Please check the meaning of the sentence.
#58) L464: "respectively" should be added after "l2g state". |
I quickly reviewed this paper and found that critical issues exist as follows: