the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on graphics processing units (GPUs)
Paul F. Baumeister
Zhongyin Cai
Jan Clemens
Sabine Griessbach
Gebhard Günther
Yi Heng
Mingzhao Liu
Kaveh Haghighi Mood
Olaf Stein
Nicole Thomas
Bärbel Vogel
Download
- Final revised paper (published on 05 Apr 2022)
- Supplement to the final revised paper
- Preprint (discussion started on 01 Dec 2021)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on gmd-2021-382', Anonymous Referee #1, 05 Feb 2022
The manuscript contains the matter of two articles. The first Is a thorough description of the new version of the MPTRAC trajectory code and the second one is the description of the parallelization of this code using GPU which is an excellent example of the application of modern programming methods that is more general that MPTRAC. I accept the choice of the author to group these two works but it would have made sense to make two separate articles to reach a wider audience, at least for the second one.
The manuscript contains very useful material, in particular in the second part where it demonstrates how a complex simulation code can be moved to a GPU system using the high level library OpenACC with relatively small effort (compared to the full rewriting required by direct use of CUDA low level libraries). This is an important and inspiring contribution.
I only have a few comments to be accounted in the revised version
In regard of the sophistication of the rest of the code, the treatment of the vicinity of the pole appears very crude and inaccurate. There has been possibly few concern in the applications of MPTRAC so far but this is a point that should be corrected in the next version.
The convective parameterization is based on the assumption of CAPE relaxation and an important parameter is the CAPE threshold that should deserve some discussion. The manuscript says that a global value is used but CAPE accumulates much more over the continents than over the ocean, leading to much more intense storms in continental regions. Therefore a single threshold value will probably produce excessive mixing over the continents and too small mixing over the oceans. More generally, it is recognized by all experts in convective parameterization that CAPE alone is a bad predictor of convective onset. As the ERA5 archive the upward and downward convective fluxes resulting from its state of the art parameterization of convection, why not using these data instead of a very crude representation of convection. Again, this might be considered in the next version.
The manuscript fails to quote this work “optimization of atmospheric transport models on HPC platforms, de la Cruz et al., Computers & Geosciences, 2016, doi: 10.1016/j.cageo.2016.08.019” which addresses very similar issues.
Figure 10 made from screen copies is not readable, either on print or on the screen.
Other minor comments
- 185: pressure is not the best choice of vertical transport for Lagrangian transport in the stratosphere as well where many models use instead the potential temperature and heating rates instead of pressure tendencies.
- 205: I guess the authors meant linear in log pressure.
- 770: The results from OpenMP parallelization may vary a lot according the scheduling strategy. This should be mentioned.
- 896: I do not see any fluctuations but a regular increase in fig. 11.
Citation: https://doi.org/10.5194/gmd-2021-382-RC1 -
RC2: 'Comment on gmd-2021-382', Anonymous Referee #2, 08 Feb 2022
The manuscript by Hoffmann et al. presents and impressive piece of work. I can only congratulate the authors on the development of MPTRAC and its parallelization on GPUs, which is the main topic of the study. The manuscript is well written and structured and the methods and results sections are easy to follow.
I thus only have a few minor comments, suggestions, and corrections that the authors should consider before publication:
The introduction is quite MPTRAC-centric. Since the focus is on code parallelization, it would be good to include references on parallelization approaches in other Lagrangian dispersion models, e.g. Brioude et al. (2013) for FLEXPART-WRF, Jones et al. (2007) for NAME, Pisso et al. (2019) for other versions of FLEXPART (with MPI or OpenMP parallelization and asynchronous I/O in case of FLEXPART-COSMO). There is actually also a GPU version of (parts of) FLEXPART developed many years ago (https://db.cger.nies.go.jp/metex/flexcpp.html), but unfortunately it was never published in peer-reviewed literature to my knowledge.
The introduction should also explain more clearly, what the main areas of application of MPTRAC are. It seems to be designed primarily to study large scale atmospheric transport in the free troposphere and stratosphere but not for transport and mixing in the atmospheric boundary layer (ABL). This is important to mention, because Lagrangian models are increasingly being used for inverse emission estimation, for which e.g. a proper representation of turbulent mixing in the ABL is critical.
The manuscript convinced me that MPTRAC is a technically carefully designed, flexible and computationally efficient model. However, I was less convinced that it is also doing a good job in terms of accurately representing atmospheric transport. A key criterion for Lagrangian particle dispersion models, for example, is the well-mixed condition of Thomson (1987): A tracer well-mixed in the atmosphere should not un-mix due to the simulated transport. This is challenging to achieve but is critical for simulating mixing in the ABL or inversely estimating emissions, for example. Simple mixing schemes (e.g. without density correction term) as implemented in the model lead to un-mixing. It would be good to know the magnitude of un-mixing generated by the model in long simulations (un-mixing likely saturates at some point). This could be studied in a simulation similar to those presented in Section 3.3, but where particles with uniform mass are initialized proportional to air density. Particle densities should ideally remain proportional to air density throughout the simulation.
The synthetic tracer simulations presented in Section 3.3. are suitable to study differences between the CPU and GPU versions, but they are not sufficient to demonstrate that transport is generally well represented in the model. A much more challenging diagnostic for stratospheric transport, for example, would be age of air, which is known to be underestimated by many transport models.
I thus strongly encourage the authors to focus on such critical aspects in future studies to provide a thorough scientific benchmark for future applications of the model. This is more a comment than a suggestion for modifying the current publication.Small points:
Page 7, line 184: What exactly do you mean by "pushed back"? The standard approach in Lagrangian models is that particles are reflected. "Pushing back" likely leads to accumulation of air parcels at the surface or upper boundary of the model.
Page 9, line 250: Shouldn't it be | phi | > phi_max? Same issue on the next line on page 10.
Page 15: Convection is parameterized in an overly simplified way, since e.g. deep convection does not at all lead to uniform vertical mixing. It would be good to mention (and to consider) more advanced approaches such as Forster et al. (2007, https://doi.org/10.1175/JAM2470.1).
Page 18: Also dry deposition is described in a highly simplified way. Dry deposition does not only depend on particle or gas properties but also on the state of the atmosphere (in addition to surface properties). Also here it should be mentioned that more advanced approaches for Lagrangian models exist, e.g. Webster and Thomson (2012, https://doi.org/10.1504/IJEP.2011.047322).
Page 30: Which number of compute cores of the GPU is the most relevant number for MPTRAC? Is it the number of FP32 or FP64 cores? Later it becomes clear that it is the latter. Is double precision really needed? Did you test MPTRAC with single precision?
Figure 7: The differences between GPU and CPU simulations presented in panels b), d) and f) are likely due to statistical noise. This could be shown by performing multiple CPU simulations with different random seeds and evaluate the differences in the same way as the differences between CPU and GPU.
Section 3.7: I didn't quite understand this scaling test. Why does the runtime shown in Fig. 11 not decrease with the number of MPI tasks? What is the difference between a weak and a strong scaling test?
Small corrections and typos:
Page 11, Line 272: Change to "The following choices are made .."
Page 23, Line 500: shouldn't it be "interpreting" rather than "interpolating"?
Page 30, line 678: "MPTRAC was build" -> "MPTRAC was built"
Page 40, line 830: "33% if the overall runtime" -> "33% of the overall runtime"
Page 40, line 857: It should be Figs. 10a and b rather than 9a and b.
Citation: https://doi.org/10.5194/gmd-2021-382-RC2 -
AC1: 'Comment on gmd-2021-382', Lars Hoffmann, 08 Mar 2022
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2021-382/gmd-2021-382-AC1-supplement.pdf