A Python interface to the Fortran-based Parallel Data  Assimilation Framework: pyPDAF v1.0.2

Chen, Yumeng; Nerger, Lars; Lawless, Amos S.

doi:https://doi.org/10.5194/gmd-18-8235-2025

Articles | Volume 18, issue 21

https://doi.org/10.5194/gmd-18-8235-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/gmd-18-8235-2025

© Author(s) 2025. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 18, issue 21

Development and technical paper

|

05 Nov 2025

Development and technical paper |

| 05 Nov 2025

A Python interface to the Fortran-based Parallel Data Assimilation Framework: pyPDAF v1.0.2

Yumeng Chen, Lars Nerger, and Amos S. Lawless

Download

Final revised paper (published on 05 Nov 2025)
Preprint (discussion started on 11 Jun 2024)

Interactive discussion

Status: closed

RC1: 'Comment on egusphere-2024-1078', Anonymous Referee #1, 12 Jun 2024

I quickly reviewed this paper and found that critical issues exist as follows:

- Incorrect terminology even in abstract (e.g., model analysis)
- No explanation of math symbols even in Eq. (1) (e.g., p_i)
- Larger forecast/analysis RMSEs than the prescribed observation errors in Fig. 4 (i.e., filter divergence occurs)

Therefore, there is a high possibility that this paper does not satisfy the standards of international journals.

Citation: https://doi.org/10.5194/egusphere-2024-1078-RC1
RC2: 'Comment on egusphere-2024-1078', Anonymous Referee #2, 03 Jul 2024

Summary:

The authors introduced a new Python interface for the PDAF software. The idea of implementing a Python wrapper for a sophisticated Fortran library is much welcomed, since coding in Python is much easier and it will allow more rapid development especially for new models written in Python. However, I found the authors missed the opportunity to convince the readers in this paper that pyPDAF will provide them with tools to quickly implement DA in their Python models. Instead, a lot of focus is put on the details such as second order exact sampling (in SEIK) and comparing weakly and strongly coupled DA. As a user, I only see some hints of what I need to do to use pyPDAF to build a DA system, but I don't know exactly how after reading the paper. With the benchmarking results the authors seem to prefer the Fortran-based PDAF anyway. The test case dimension is very small comparing to typical application, so there is a question whether the Python-Fortran type conversion overhead is significant if pyPDAF is applied to large-dimensional real models. Overall, I feel that the authors didn't promote the new software with convincing enough results.
Major issues:

1. While Figs 1 and 2 provide the overview of software architectures, the readers will not understand exactly what's needed from their side to build a typical DA system fully based on Python. Can you provide an example with a typical cycling DA experiment setup (initial ensemble generation, ensemble forecasts, compute observation priors, collect/distribute state, assimilate algorithm, etc), and highlight the functions where users can either code their own version, or use something readily available from PDAF core? The OMI is also quite confusing, what functions does it provide? I feel that you have omitted these details because they are available either in PDAF documentation or in previous papers, but listing the details here can convince readers more effectively that it is "easy" to build a system using pyPDAF.
2. There are excessive details on the second-order exact sampling which I found not relevant for the topic of this paper. You can use SEIK to demonstrate that the software works, but there are many other options in PDAF that also available. You should definitely save the page limit for something more important. The same thing applies to the comparison between the weakly and strongly coupled DA. The main theme of the paper should be 1. providing the details of how pyPDAF is designed, 2. what's its advantage over other Python DA prototypes, 3. how to use it to build a Python DA system, then 4. some results from a test case. Currently number 2 and 3 are quite missing.
3. Efficiency benchmark in Figure 8 is performed for a really small model (129x129x3) and ensemble size (16), the number of observations also really small (17x17?) What the figure is conveying is that coding in Python introduced additional overhead and the resulting software is much slower, therefore using the Fortran code is better? I would argue otherwise: for the test case chosen here, if it only take ~0.02 seconds per cycle, I would much prefer coding in Python since it will save me days of work compared to coding in Fortran. It doesn't matter if its 0.02 seconds or 0.005 seconds, they are trivial compare to the "human overhead".
You miss the opportunity here to demonstrate that pyPDAF is a good alternative to PDAF: it should be close to the PDAF efficiency despite the Python-Fortran conversion overhead, but much favorably reducing human overhead in coding. We all know how the ETKF scales with problem sizes, it is roughly O(Ne^3 + Ne^2 Nobs) per state variable. Arguably if the problem size is much larger, and computation spent on the DA problems increase, the overhead of doing the Python-Fortran object conversion can become trivial in propotion to the whole cost. Isn't that a better story to tell?
Minor issues:

Line 51, "The Python tool only has a Python interface to a few PDAF routines", do you refer to EAT? and what's the difference between this "Python interface" and pyPDAF?
Line 54, "...can facilitate code development thanks to ...", maybe you mean "easier code developement", because it is still possible to write Fortran code only a bit more time consuming.
Line 56, "...written onto a disk", do you mean the model restart files?
Line 65, "can improve dynamically balanced analysis", need a bit rephrasing, how about "can can improve dynamical balance in the analysis"
Line 77, "This allows for an embarrassingly parallel... does not increase with the ensemble size", this statement is a bit too general. The embarrassingly parallel variant is the LETKF where each state variable can be analyzed separately. Some other EnKF variants is more tricky to parallelize. Also, in ETKF the computational cost does increase with ensemble size O(Ne^3), I guess you mean at the end of the statement "does not increase with state dimension".
Line 81, this sentence is repeat of the first sentence.
Line 86, "In practice computational resources limit the feasible ensemble size", are you referring to the fact that model forecasts cost a lot so that one cannot run huge ensemble forecasts?
Line 105, 108, you've defined PDAF earlier, so why repeating the full name over and over?
Line 110, smoother, 3DVar and other non-linear filters, these are not introduced earlier, either mention these in the introduction, or adding some references here.
Lines 113-120, why is the detail on second-order exact sampling provided here. PDAF not only provides SEIK but also a lot of other filter types, the SEIK-specific details should be moved to later maybe in the experiment design.
Line 131, the OMI is introduced here in one sentence, can you provide more details. How does the user utilize the functions provided in OMI? Do they import them through the pyPDAF interface or do they have to write their own Cython code to call their Fortran versions?
Line 134, "replaced by model restart files", this is not trivial to implement, how exactly can this be done? Since this paper only compared two online DA systems using PDAF and pyPDAF, I'm not sure implementing the offline DA is even relevant here.
Figure 2: There are several levels of "user supplied" functions, in both Python and the C interface. This is confusing. As a user using pyPDAF, do I need to code in Python and compile the package, or do I have to also write Cython code? Or, is it just two options?
Line 138, "Due to" -> "Thanks to"
Line 141, "...also allows for an efficient code development and modifications..." this sentence needs some work. I get the first half that pyPDAF allows you to build an online DA system for a Python model. But why does the second half relate to the first half?
Line 143, "...before performing an optimised implementation for high-dimensional Fortran-based models", there are Python models that are high-dimensional with well optimised numerics, I don't see the point here why using pyPDAF for a high-dimensional system is not possible now for a prototypical system.
Line 149, "pyPDAF fully supports the parallel features", can you provide more details how the MPI featuers are utilized in the Python interface, in Fig. 2 is every function run with MPI, is the Python code run with mpi4py?
Figure 4: the error time series seems to be not reaching steady state, it keeps increasing and there is a sign of exponential growth towards the end. Is the filter actually stable in time?
Line 265, "16 members...ETKF without spatial localization", given the results in Fig. 4 maybe you want to add some localization to stablize the filter.
Line 278, this is not true for the final 100 years, errors are larger than spread.
Figure 8: you didn't provide information on all the subroutines, what does "init dim obs" mean?

Citation: https://doi.org/10.5194/egusphere-2024-1078-RC2
AC1: 'Comment on egusphere-2024-1078', Yumeng Chen, 30 Sep 2024

Dear reviewers,
Thank you for the comments for the manuscript. Please find the point-by-point response letter is attached.
Kind regards,
Yumeng Chen, Lars Nerger, Amos Lawless

Citation: https://doi.org/10.5194/egusphere-2024-1078-AC1

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

AR by Yumeng Chen on behalf of the Authors (03 Oct 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (22 Nov 2024) by Shu-Chih Yang

RR by Anonymous Referee #1 (04 Dec 2024)

Suggestions for revision or reasons for rejection

=== General comment
This study evaluates the availability of the ensemble Kalman filter Python package, pyPDAF, for high-dimensional systems by comparing its computational performance with the Fortran-based PDAF. The authors implemented both packages using a simple quasi-geostrophic atmosphere-ocean coupled model. However, while the study aims to demonstrate pyPDAF’s applicability to high-dimensional systems, the model's dimensionality (10^4–10^6) is far smaller than that of operational high-dimensional systems, which typically have 10^8–10^9 grid points (see Comment #2).
Additionally, the ensemble size used in this study is only 16, which is insufficient for a system with dimensions of 10^4–10^6. For instance, the 40-variable Lorenz-96 model with ETKF and LETKF requires an ensemble size of at least 40 and 10, respectively, for reliable results. The small ensemble size chosen by the authors undermines the validity of their conclusions regarding pyPDAF’s performance. It appears that the authors selected this ensemble size to avoid the computational expense of eigenvalue decomposition for the ensemble-size square matrix in ETKF. However, operational systems commonly use ensemble sizes of 10^2, making it crucial to assess the sensitivity to ensemble size in this study (see Comment #18).
The study also suffers from additional shortcomings, including:
- Inaccurate or misleading terminologies (see Comment #1).
- Unrealistic experimental settings, particularly in the choice of observation errors and timescale definitions (see Comments #14–16).
- Inconsistent presentation and substandard English writing quality, which hinder readability and clarity (see Minor Comments).

Given these significant issues, this manuscript does not meet the standards required for publication in an international journal. To address these concerns, the authors need to:

1. Align the model dimensionality and ensemble size with the stated objectives.
2. Improve the experimental design, particularly in terms of observation errors, timescales, and methodological rigor.
3. Correct terminological inaccuracies and enhance the clarity of descriptions.
4. Substantially improve the quality of English writing and presentation.

Without these major revisions, the study's claims and findings cannot be considered robust or scientifically valid. The manuscript is not suitable for publication in its current form.

=== Major comment
#1) L10 and elsewhere: Please remove the term "model" from phrases like "model forecast" and "model analysis." While such phrasing may appear in other papers, it is likely to be an incorrect use of terminology.

#2) L15 and elsewhere: Since operational centers use data assimilation systems with a dimensionality as high as 10^8–10^9, the idealized data assimilation system employed in this study, with a dimensionality of 10^4–10^6, cannot be considered high-dimensional.

#3) L19–20: In the data assimilation (DA), the description that “observations constraint forecast and results in analyses” would not be accurate. DA combines simulations and observations using dynamical systems theory and statistical methods. This process provides optimal estimates (i.e., analyses), enables parameter estimation, and allows for the evaluation of observation networks. These explanations would be more appropriate.

#4) L34: The term “user-supplied functions” in Section 1 is unclear, although the authors may provide a detailed explanation later.

#5) First and second paragraphs in Section 2: The Kalman filter can be derived as a minimum variance estimator without assuming Gaussian distributions. However, in most cases, the ensemble Kalman filter (EnKF) assumes Gaussian distributions in the forecast fields, leading to the analysis field following the Gaussian distribution. For non-Gaussian assimilation, such as rainfall, transformations like the logarithm function might be applied. The authors' discussion on ensemble data assimilation appears insufficient. Since detailed discussions on data assimilation methods are not the focus of this study, it is suggested to remove these relevant descriptions.

#6–8) L92–94:
- Please cite two typical EnKFs: the Ensemble Adjustment Kalman Filter (EAKF; Anderson et al. 2001) and the Ensemble Square Root Filter (EnSRF; Whitaker and Hamill 2002).
- Please remove the mention of the deterministic ensemble Kalman filter (DEnKF), as it neglects quadratic terms such as KRK^T in its derivation, and therefore is no longer considered an EnKF.
- Please specify which EnKF and data assimilation (DA) methods are included in PDAF and pyPDAF.

#9) L123–124: Even in twin experiments and Observing System Simulation Experiments (OSSEs), true values are never used to generate forecast ensembles.

#10) L148–149: Since the amount of work required to implement additional Fortran and Python code depends on the users' skill level, this sentence may not be appropriate.

#11) L301–303: Please clarify why the transformation errors decrease as the number of grid points increases.

#12) L306–307: The minimum required ensemble size depends on various factors, such as sampling errors and system characteristics. Please provide a detailed explanation of how the unstable subspace dimension influences the ensemble size.

#13) L318–321: The descriptions of the amplitude of the observation error variance are inconsistent.

#14) Subsection 4.2: Please clarify why different variables with different units can be directly compared. Non-dimensional temperature and stream function are typically normalized in a quasi-geostrophic equation.

#15–17) Figure 5:
- Generally, observation error variances are much smaller than forecast error variances from free runs. The observation error variances used in this study are quite large, as they are dominated by the last 100 years with quite large forecast error variances in the free run. Since there are no model errors in the perfect twin experiments in this study, observation errors should be set much smaller than the current values. For reference, forecast errors approach around five over time in the Lorenz-63 and -96 models, while observation errors are prescribed to be one (i.e., 20% smaller).
- Please present temperature and stream function values using their respective units to enable comparison with practical systems.
- A model timestep of 0.1, corresponding to 16 minutes, is likely to be too short. Model drift continues even after 200 years, and the error doubling time appears to be around 100 years, which is much longer than the error doubling time of 2 days in the atmospheric system (Lorenz, 1996). Please clarify how the authors define the timescale.

#18) The minimum ensemble size is 10 in the LETKF-based Lorenz-96 system with 40 variables. However, the authors set the ensemble size to 16 members in a system with dimensions on the order of 10^4–10^6. Please demonstrate that an ensemble size of 16 members is sufficient for this system.

#19) In ETKF and LETKF, the most computationally expensive part is applying eigen-decomposition to a square matrix of size equal to the ensemble size by ensemble size. This study intentionally reduces the ensemble size to 16, but operational systems typically require an ensemble size on the order of 10^2 to achieve sufficient accuracy. Therefore, it is necessary to assess the sensitivity to ensemble size using a high-dimensional system in order to investigate the computational performance of PDAF and pyPDAF.
#20) L333–336: If the authors intend to maintain a dynamical balance in the initial conditions, it would be better to extract them from the free run rather than applying second-order exact sampling. Multiplying by a factor smaller than one may degrade the dynamical balance.

#21) L326: Please clarify why a daily assimilation interval is chosen in this study, given that a 6-hour interval is typically used in the atmospheric data assimilation community.

#22) L345 and elsewhere: The term "significant" should only be used if statistical tests are applied.

#23) Please clarify whether the authors calculate RMSEs and ensemble spread for forecasts or analyses throughout the manuscript (e.g., forecast RMSEs).

#24) L366–367: Please clarify how the authors control error growth and forecast errors.

#25) L368–369: The explanation of analysis accuracy at grid points without observations is not reasonable, as the authors have not consider the impact of ensemble correlation.

#26–27) Figure 6:
- Please show not only the RMSEs but also the ensemble spread for comparison at the same time.
- Please apply a paired t-test to compare the RMSEs between weak and strong coupled data assimilation.

#28) The term "error" typically refers to an instantaneous error, whereas "RMSE" represents the statistical expectation of errors. Please use these terms appropriately to clarify the distinction between them.

#29) L389: Please clarify what is meant by "transient atmosphere processes."

#30) L390–394: Please demonstrate that the ocean exhibits a 60-year timescale in the stream function field, which results in minimum RMSEs at a 60-year smoothing window.

#31-32) Subsection 5.2:
- Figures 8 and 9 show computational times at the analysis timestep on a logarithmic scale, making it difficult to directly compare the differences between PDAF and pyPDAF. Please clarify these differences by providing the ratio.
- Please include a description of the total computation time for one assimilation cycle, including both the forecast and analysis steps.

#33) L431: The term "very" is subjective. Please describe this in a more objective manner.

#34) L497: This sentence is inconsistent because the computational times for PyPDAF and PDAF are different. Please revise it to reflect the results obtained in this study.

=== Minor comment
#35) L5: "exists" -> "are"
#36) L5 and elsewhere: "need" → "demands"
#37) L15-16: Incorrect spelling of LEKTF (Local Ensemble Transform Kalman Filter).
#38) L19 and elsewhere: "weather and climate" → "atmosphere and ocean"
#39) L29: Please spell out "DAPPER".
#40) L38: Please specify which models are coupled.
#41) L39 and elsewhere: There are no URLs, although the authors mention the date of last access.
#42) L74: "initialization" -> "initial"
#43) L78: Incorrect grammar.
#44) L97: "counter" -> "mitigate"
#45) L119: "ensemble-based 3DVar" -> "ensemble variational data assimilation"
#46) L137 and elsewhere: "observation vectors and error covariance matrix" -> "observations and observation error covariance matrix"
#47) L139: Please specify the "direct" observation operator.
#48) L142: "would be" → "is"
#49) L179: Please add the last access date.
#50) L266: "coupled" → "implemented", "implemented" → "written", "that is coupled directly" → "and is implemented"
#51) When connecting two sentences, please insert a comma before "and" to enhance readability.
#52) The use of "respectively" seems incorrect. A comma should be added before "respectively". For example, insert "each" before "Fortran" and remove "respectively" in L268.
#53) Eqation (2): Please add an explanation for "n".
#54) L315: Please specify the meaning of "ensure … model state".
#55) L329: Please specify a forgetting factor. Is this the same as the relaxation parameter in the relaxation-to-prior perturbation and spread methods?
#56) L338: Please remove "generally".
#57) L357–358: Please check the meaning of the sentence.
#58) L464: "respectively" should be added after "l2g state".

Hide

RR by Anonymous Referee #2 (04 Dec 2024)

RR by Anonymous Referee #3 (13 Feb 2025)

Suggestions for revision or reasons for rejection

This manuscript introduces a newly developed Python interface to the Fortran-based Parallel Data Assimilation Framework (PDAF) software, pyPDAF. This tool aims to ease the development of new data assimilation systems by utilizing the Python programming language while only sacrificing the computational speed to an acceptable extent. It demonstrates an example of coupled data assimilation (CDA) using the Modular Arbitrary-Order Ocean-Atmosphere Model (MAOOAM) and ensemble transform Kalman filter (ETKF)/local ensemble transform Kalman filter (LETKF) algorithms, showing that the CDA systems developed based on PDAF and pyPDAF both work correctly and produce identical results, with their computational speeds measured for comparison. The idea of developing a Python interface to an existing data assimilation software package targeting efficient parallel computing is very useful to the community and worth publication. However, I find that this manuscript may need to be improved for its readability and strategy to present this topic. Therefore, I recommend that the manuscript can only be considered for publication after a major revision.

Major comments:

Overall, the strategy to present the topic of this manuscript may be reconsidered. The primary purpose of this article should be to introduce this newly developed pyPDAF, describe its concept, strengths, and weaknesses, and thus promote it to the community. A use-case example certainly needs to be included in this article, but it does not need to be too complicated or advanced in its experimental design or the presentation of the results. Several parts of the contents regarding the data assimilation methods and experiments may be shortened or removed. On the other hand, some other information about the development of data assimilation systems using pyPDAF should be better detailed or presented, such as the comparison of implementation difficulties between using PDAF and pyPDAF. I find that the manuscript has been improved in response to the comments of Referee #2 in the previous round of the review, with which I share similar opinions, but room for improvement still exists.

1. I feel that the authors intended to provide much information and demonstrate some scientific findings regarding CDA in this manuscript; however, these may not be all necessary, considering that the focus of this manuscript should be introducing a new software tool for data assimilation development. Regarding the data assimilation experiment, I think its most important aspect should be to serve as an example of the data assimilation development using PDAF and pyPDAF, and the scientific insight may not be of the first concern. Therefore, I would suggest keeping the experimental design and the analysis and presentation of the results as simple as possible, so readers can easily understand the experiment results and focus on understanding the pyPDAF. Since the authors referred a lot to Tondeur et al. (2020), which used the same MAOOAM model, one option may be (only if the authors think this is appropriate) to repeat or mimic a few experiments in Tondeur et al. (2020) but using PDAF and pyPDAF so that the authors could save some words describing the experiments and interpreting the results.

In addition, a review of the ensemble-based data assimilation methods (Section 2) may not need to be too comprehensive as long as sufficient information relevant to this study is provided; for example, the particle filter method is not used in this study, so it may not be reviewed in too much detail. Besides, the ensemble generation method (i.e., "second-order exact sampling") is not very relevant to this study, either, as long as the approach is reasonable and a spin-up period is excluded from the analysis of the results (as in Lines 333-341).

2. The authors added Section 3.3 in the previous review process describing the things a developer needs to take care of from the aspect of several pyPDAF library interfaces, which was good. However, some of these contents appear too technical and too much like technical documentation of the software, but not easily understood by readers without having used the software. On the other hand, some critical information is still missing or not clearly presented: (1) to run the CDA experiments in the current study, what exactly are the programming tasks one needs to do by using PDAF and pyPDAF; how many "user-supplied functions" (simply listing them) are needed to be written by the users to fulfill the capability of running the current CDA experiments? (2) Were all the user-supplied functions written in Fortran and in Python, respectively, in the experiments using PDAF and pyPDAF? For example, for the spectral transformation calculation in Eqs. (1) and (2), were they coded separately in Fortran and Python in the two experiments? This information is important for readers to understand the relative implementation difficulties of using PDAF and pyPDAF.

3. An important result I expect to see is the identicality of the data assimilation experiment results using PDAF and pyPDAF. The author did describe it but only in a brief sentence: "The online DA systems using PDAF and pyPDAF produce quantitatively the same results in all experiments up to machine precision." (Lines 353-354) I feel that this important aspect may deserve a bit more detailed discussion. In particular, given that a lot of user-supplied functions are written in different programming languages, it seems unlikely to me that their results can be "the same up to machine precision." It would be helpful if the authors could provide precise numbers of the analysis RMSEs of the two experiments using PDAF and pyPDAF.

4. The comparison of the computational performance of PDAF and pyPDAF is undoubtedly an important part of this study. The authors attempted to state that the computational speed of pyPDAF is only slightly slower than PDAF, especially when they are used with high-dimensional systems. However, from the results presented, it seems to me that their difference is actually not very small, particularly noting that Figs. 8 and 9 are plotted on a logarithmic scale, which may visually underestimate the differences. In addition, the study shows that in the case of LETKF (filters with domain localization), the difference in computational speed can be even more significant if the additional "PDAFlocal" module is not developed. Although this issue can be satisfactorily mitigated by the additional development presented, it also implies that the degree of the computational speed loss of using pyPDAF compared to PDAF could be very different case by case (different filters, observation operators, … etc.). I feel that these results do not significantly detract from the value of pyPDAF, as enabling rapid development of data assimilation systems remains crucial. However, I suggest the authors moderate their claims about the advantage of pyPDAS in the computational aspect and clearly describe the limitations.

Minor comments:

1. Lines 14 (in Abstract) and 34: These are the first appearances of the terminology "user-supplied function" in PDAF. In my understanding (after reading more contents in the manuscript and the PDAF documentation), it stands for the additional code users need to write to complete a data assimilation system based on PDAF, but this is not very straightforwardly understood in the beginning. I suggest that this term be better explained in its first appearance.

2. Section 4.2, experiment design: What is the length of the cycled data assimilation experiments? Is it ~300 years? This seems to be implied in Fig. 5 but is not explicitly provided.

3. Lines 329-330: What does the "forgetting factor" mean? Does it represent some parameters in a specific covariance inflation scheme?

4. Lines 384-385, "the time-averaged RMSE of fields that are smoothed in time by a moving average as a function of the averaging time-window": I find that the meaning of Fig. 7 is difficult to understand. Does it mean first applying a moving average (with variable time-window lengths in the x-axis) to the spatial RMSEs across the 300-year experiment period (related: Minor comment #2) and then computing the temporal average of the moving average results? If this is correct, the scientific meaning behind this figure remains difficult to me: Why do the authors want to do the "double temporal average" (average of moving average)? Is this meaningful? Following my Major comment #1, to keep the experiment results as simple as possible, this figure may be removed if it is not critical to the theme of this manuscript.

5. Lines 402-403: Why is the data assimilation calculation performed on a single processor instead of 16 processors used for running ensemble model forecasts? Is there any practical restriction of PDAS to parallelize the data assimilation calculation with an arbitrary number of processors?

6. Figure 8, Lines 423-424: The "MPI" communication time is long and accounts for a large portion of the total computation time (in both PDAF and pyPDAF). Given that the number of processors (16?) is not many, why does the MPI communication time take so long? Could the authors briefly explain where this MPI communication time is mostly spent?

7. Lines 444-445: What exactly is the localization length scale or cut-off radius used in this study? What do the authors mean by the "1 spatial unit"?

8. Figure 9: Why are there two missing bars in the "no. domains" part? Are they excessively small so it does not appear in this figure? This needs to be corrected or explained.

Hide

ED: Reject (23 Dec 2024) by Shu-Chih Yang

ED: Reconsider after major revisions (13 Mar 2025) by Shu-Chih Yang

AR by Yumeng Chen on behalf of the Authors (22 May 2025) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (06 Jun 2025) by Shu-Chih Yang

RR by Anonymous Referee #3 (03 Jul 2025)

Suggestions for revision or reasons for rejection

The authors have satisfactorily addressed all my previous comments. I only have a few minor suggestions and comments listed below. I recommend that the manuscript be accepted after resolving these minor points.

Minor comments:

1. Lines 121-123: Suggest rephrasing the sentence as “Other typical filtering algorithms, such as ensemble adjustment Kalman filter (EAKF, Anderson, 2001) and ensemble square root filters (EnSRF, Whitaker and Hamill, 2002), are not implemented in current releases but are planned to be included in future releases.”

2. Line 232: “with the OMI functionality, only three user-supplied functions need to be implemented.” It is not clear from the following text which three user-supplied functions are needed.

3. Lines 386-395 and 401-404: “In WCDA, each model component performs DA independently ... and define the local state vector as either the atmosphere or ocean variables.” and “Compared to the WCDA, ... and does not require special treatment.” These sentences do not fit well in the context of Section 4.1 “Skill of data assimilation.” They may be more appropriately placed in Section 3.2 “Experiment design” or somewhere else.

4. Line 437: “the ratio of total computational time” should be “the ratio of computational time for ‘pre-post’.”

5. Line 441: “... but the ratio is only 2.04 and 3.58 for ‘distribute state’ and ‘collect state’ ...” The ratio should be 3.58 and 8.60 according to Table 2. However, this correction makes the reduction of pyPDAF overhead much less impressive, so the sentence may need to be further revised.

6. Table 3: Please clarify why the wall clock times of all components do not sum to the “total” time shown in the last row of the table.

7. Line 562: I am not sure where the number “70%” comes from.

Hide

ED: Publish subject to minor revisions (review by editor) (14 Aug 2025) by Shu-Chih Yang

AR by Yumeng Chen on behalf of the Authors (19 Aug 2025) Author's response Author's tracked changes Manuscript

ED: Publish as is (02 Oct 2025) by Shu-Chih Yang

AR by Yumeng Chen on behalf of the Authors (06 Oct 2025) Manuscript