the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Climate Model Downscaling in Central Asia: A Dynamical and a Neural Network Approach
Abstract. To estimate future climate change impacts, usually high-resolution climate projections are necessary. Statistical and dynamical downscaling or a hybrid of both methods are mostly used to produce input datasets for impact modelers. In this study, we use the regional climate model (RCM) COSMO-CLM (CCLM) version 6.0 to identify the added value of dynamically downscaling a general circulation model (GCM) from the sixth phase of the Coupled Model Inter-comparison Project (CMIP6) and its climate change projections' signal over Central Asia (CA). We use the MPI-ESM1-2-HR (at 1° spatial resolution) to drive the CCLM (at 0.22° horizontal resolution) for the historical period of 1985–2014 and the projection period of 2019–2100 under three different shared socioeconomic pathways (SSPs): SSP1-2.6, SSP3-7.0 and SSP5-8.5 scenarios. Using the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) gridded observation dataset, we evaluate the CCLM performance over the historical period using a simulation driven by ERAInterim reanalysis. CCLM's added value, compared to its driving GCM, is significant over CA mountainous areas, which are at higher risk of extreme precipitation events. Furthermore, we downscale the CCLM for future climate projections. We present high-resolution maps of heavy precipitation changes based on CCLM and compare them with CMIP6 GCMs ensemble. Our analysis shows a significant increase in heavy precipitation intensity and frequency over CA areas that are already at risk of extreme climatic events in the present day. Finally, applying our single model high-resolution dynamical downscaling, we train a convolutional neural network (CNN) to map the low-resolution GCM simulations to the dynamically downscaled CCLM ones. We show that applied CNN could emulate the GCM-CCLM model chain over large CA areas. However, this specific emulator has shortcomings when applied to a new GCM-CCLM model chain. Our downscaling data and the pre-trained CNN model could be used by scientific communities interested in downscaling CMIP6 models and searching for a trade-off between the dynamical and statistical methods.
- Preprint
(9701 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on gmd-2023-227', Anonymous Referee #1, 20 Dec 2023
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
General comments:Overall, the authors present some interesting results that merit to be published. They explored a traditional method of downscaling and contrasted it with a more novel machine-learning approach, which is interesting and a hot topic in the field of climate modeling.
The experiments and results only need minor modifications: the method needs more transparency and explanation of the authors’ choices. Their experiments are not reproducible from their text but could be with some additional information.
The text needs major rewriting and restructuring as the manuscript sent for review really lacks quality. The manuscript still contains several typos, grammar errors, inconsistencies, missing references, and things in the wrong place. This manuscript did not read as “review ready,” and it could have strongly benefitted from additional proofreading before sending it to reviewers. Furthermore, a sentence in the introduction was clearly copied from another website without credit. This is unacceptable and should not have happened, as it comes across as sloppy and unethical.
Therefore, I advise the authors to thoroughly rewrite and clean their manuscript before sending it back for review.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Specific comments:
Answers to reviewing questions from Copernicus:
- Does the paper address relevant scientific modeling questions within the scope of GMD? Does the paper present a model, advances in modeling science, or a modeling protocol that is suitable for addressing relevant scientific questions within the scope of EGU?
- Does the paper present novel concepts, ideas, tools, or data?
- Does the paper represent a sufficiently substantial advance in modeling science? A1+2+3 together: The paper presents the added value of dynamical downscaling precipitation from a GCM over Central Asia and compares it to emulating the RCM with a machine learning framework. Although statistical downscaling with machine learning has been done with precipitation (and other climate variables) before, it seems novel over Central Asia, and the authors make some interesting comparisons. I think the geoscience community can benefit from learning more about the benefits of these machine-learning models and how they compare to traditional techniques.
- Are the methods and assumptions valid and clearly outlined?
- Are the results sufficient to support the interpretations and conclusions?
- Is the description sufficiently complete and precise to allow their reproduction by fellow scientists (traceability of results)? In the case of model description papers, it should, in theory, be possible for an independent scientist to construct a model that, while not necessarily numerically identical, will produce scientifically equivalent results. Model development papers should be similarly reproducible. For MIP and benchmarking papers, it should be possible for the protocol to be precisely reproduced for an independent model. Descriptions of numerical advances should be precisely reproducible. A4+5+6 together: At the current state of the manuscript no. The methods are not clearly outlined, the results need clarification and as it is, the experiments could not be reproduced by fellow scientists. But if the manuscripts are rewritten with the proposed corrections and feedback, I believe that it could be.
- Do the authors properly credit related work and clearly indicate their new/original contribution? A: No, at the current state of the manuscript, there is a copying problem from a website in the introduction that needs crediting or paraphrasing.
- Does the title clearly reflect the contents of the paper? Yes.
- Does the abstract provide a concise and complete summary? Yes.
- Is the overall presentation well-structured and clear? A: No. The majority of sections need restructuring and rewriting to make it clear.
- Is the language fluent and precise? A: No. There are typos and grammar errors that need to be corrected.
- Are mathematical formulae, symbols, abbreviations, and units correctly defined and used? A: No, equation 1 needs verification. There is also a lack of consistency for acronyms across the manuscript.
- Should any parts of the paper (text, formulae, figures, tables) be clarified, reduced, combined, or eliminated? A: Yes, see the comments per section below.
- Are the number and quality of references appropriate? Yes.
- Is the amount and quality of supplementary material appropriate? For model description papers, authors are strongly encouraged to submit supplementary material containing the model code and a user manual. For development, technical, and benchmarking papers, the submission of code to perform calculations described in the text is strongly encouraged. A: Yes.
Section by section:
Abstract: Overall, you wrote a good abstract. It generally reads well and gives a good overview, but it needs some clarifications and minor changes.
Introduction:
- Acronyms: Your text has many acronyms; this weighs down the text and makes it hard to read. Some are used only once or twice, so I suggest you remove those and keep only essential acronyms. There is also the problem of using acronyms before they are defined (usually, only later in the text); that should be corrected.
- Structure: The introduction feels scattered and all over the place. Information is repeated, and some paragraphs talk about different subjects. Some paragraphs also contain sentences that feel out of place and should be in another place (highlighted in the annotated pdf). The introduction needs to be rewritten and restructured so that similar information is not repeated and appears in the same place in the text.
- Quality: the sentence on lines 55-57 was clearly copied from the Cordex website (see annotated pdf). The two sentences (from your text and the Cordex website) have a Jaccard similarity index of 0.69, but it was not picked up by Copernicus' similarity report. This is unacceptable and very sloppy. You should modify this immediately so that you paraphrase it in a way that is not just a copy.
Data and methods:
- Overall, this section needs restructuring and rewriting. There are inconsistencies, and quite a few pieces of information need to be included or clarified. You might understand your setup very well, but it's hard to follow exactly what you've done for a new reader, creating a transparency problem.
- The CNN section particularly needs considerable rewriting and more information about your choice of the framework (i.e., what's the perfect and imperfect framework and why did you choose the imperfect over perfect one), the selection of training/validation/testing data (and why), and the different models (HCL, No-CL, SCL) you created (see annotated pdf for more details). Things from the appendix need to be in the CNN section, and your different CNN setups (NoCL, etc.) must be clearly defined there. I think it's also missing a figure for the architecture, and readers would benefit from seeing it in your manuscript instead of having to look it up in another paper.
- You should also consider what your target audience will be for this paper. Because you aim for a geoscience journal, its readers might not necessarily be familiar with many machine learning terms and might need more background information. I think your paper might benefit from more explanations like: why did you choose to use a CNN instead of another architecture? What are its advantages? Why did you choose to train it in this way?
Results:
- Overall, your results are interesting, and you conducted some good experiments for your RCM and CNN emulator. For example, I think it’s great that you also tested your CNN emulator on a new GCM. However, the way you present your results needs more transparency. Some figures would benefit from additional information to be clearer (see annotated pdf). Also, your interpretations rely greatly on seeing “by eye” how different the patterns are and how well the emulator reproduces the RCM. Some of them also come across as vague and unconvincing. Adding a quantitative value to this (for example, spatial means and std of AV or MAE) could add weight to your arguments.
- Quality: This section also needs rewriting. Some things need to be put in the right place, lack transparency, a reference is missing (? from Latex), and there are inconsistencies in notations.
- Question to the authors: From what I understood, you ran your CNN over a dataset that combines the historical period and different SSPs. In the hypothesis that you randomly distributed your training and testing data, have you evaluated whether the CNN tests differently over the three SSPs? One thing to do, for example, that could have been done to test generalization, for example, could have been to train the CNN on historical + two SSPs and then test it on an unseen SSP.
Discussion/conclusion:
- Theoretically, a reader should able to read the discussion by having just read the abstract and mostly understand what you're talking about. In your discussion, however, some terms are not explained. Double-check that it contains all the information a reader needs to understand the discussion "almost in a vacuum".
- This section could benefit from a more thorough (and quantitative) discussion on the benefits of using a CNN instead of an RCM. For example, how much faster was your CNN compared to the RCM (also with pre-processing), and what are the downsides of ML (e.g., black-boxiness of the algorithm, overfitting, etc.)?
- I also can't entirely agree with your conclusion that you could use the pre-trained emulator to downscale paleoclimate. You've proven that it works for pretty recent climate data (1850s to now). But that does not mean it will work for climate data from 10'000 years ago. If that data differs from what you have trained the model on, then the CNN will suffer from the same generalization problem as when applied to a new GCM. For example, models that were trained on paleo climate like Jouvet et al. (https://www.cambridge.org/core/journals/journal-of-glaciology/article/iceflow-model-emulator-based-on-physicsinformed-deep-learning/8C4D103C0F34DA690D9B524DF1461C5C) struggle to generalize to recent climate. You make a strong claim, and it would need to be tested before. Otherwise, you should just write that it needs to be tested.
- Your text could also benefit from a conclusion paragraph summarising your work and an opening like in your abstract because it ends very abruptly as it is now. Appendix: I don’t think the CNN run code needs to be there, seeing how it repeats what you've already said in the method.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Technical comments:
See the annotated pdf (supplement) for specific corrections.- AC2: 'Reply on RC1', Bijan Fallah, 21 May 2024
-
CEC1: 'Comment on gmd-2023-227', Juan Antonio Añel, 20 Dec 2023
Dear authors,
Unfortunately, after checking your manuscript, it has come to our attention that it does not comply with our "Code and Data Policy".
https://www.geoscientific-model-development.net/policies/code_and_data_policy.html
You have archived your code on GitHub. However, GitHub is not a suitable repository for scientific publication. GitHub itself instructs authors to use other alternatives for long-term archival and publishing, such as Zenodo. Therefore, please, publish your code in one of the appropriate repositories, and reply to this comment with the relevant information (link and DOI) as soon as possible, as it should be available before the Discussions stage. Also, please, include the relevant primary input/output data.In this way, if you do not fix this problem, we will have to reject your manuscript for publication in our journal. I should note that, actually, your manuscript should not have been accepted in Discussions, given this lack of compliance with our policy. Therefore, the current situation with your manuscript is irregular.
Also, you must include in a potentially reviewed version of your manuscript the modified 'Code and Data Availability' section, the DOI of the code (and another DOI for the dataset if necessary). Moreover, the GitHub does not include a license listed. If you do not include a license, the code continues to be your property and nobody can use it. Therefore, when uploading the code to the new long-term repository, you could want to choose a free software/open-source (FLOSS) license. We recommend the GPLv3. You only need to include the file 'https://www.gnu.org/licenses/gpl-3.0.txt' as LICENSE.txt with your code. Also, you can choose other options: GPLv2, Apache License, MIT License, etc.
Juan A. Añel
Geosci. Model Dev. Executive EditorCitation: https://doi.org/10.5194/gmd-2023-227-CEC1 -
AC1: 'Reply on CEC1', Bijan Fallah, 22 Dec 2023
Dear Juan A. Añel,
I wanted to provide you with the updated information regarding the code for "Physics-Constrained Deep Learning for Climate Downscaling," which is now available on Zenodo at the following DOI: https://zenodo.org/uploads/8150694.
As per your suggestion, we have created a corresponding repository on Zenodo (DOI: https://zenodo.org/records/10417111) with comprehensive details utilized in the paper. This includes:
- Input, output, trained model, and a snapshot of the code employed in the deep-learning downscaling process.
- COSMO-CLM model setups for all Regional Climate Model (RCM) simulations conducted.
- A list of CMIP6 model information used for comparative analysis.
- A Jupyter notebook for executing a test case of the "Physics-Constrained Deep Learning for Climate Downscaling" as described in the manuscript.
We are actively addressing the comments provided by reviewer 1 and plan to incorporate them into a potentially reviewed version of the manuscript. It's important to note that the RCM output data, totaling around 100 TBs, is currently not available in its entirety due to capacity constraints. However, we are committed to standardizing the output in accordance with CORDEX Central Asia standards and making it publicly accessible in the near future. In the meantime, we are open to sharing specific output upon request.
Thank you for your comment.
Yours sincerely,
Bijan
Citation: https://doi.org/10.5194/gmd-2023-227-AC1
-
AC1: 'Reply on CEC1', Bijan Fallah, 22 Dec 2023
-
RC2: 'Comment on gmd-2023-227', Anonymous Referee #2, 11 Apr 2024
The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2023-227/gmd-2023-227-RC2-supplement.pdf
- AC3: 'Reply on RC2', Bijan Fallah, 21 May 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
532 | 181 | 44 | 757 | 39 | 37 |
- HTML: 532
- PDF: 181
- XML: 44
- Total: 757
- BibTeX: 39
- EndNote: 37
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1