Surrogate model-based precipitation tuning for CAM5

Wu, Xianwei; Hu, Liang; Wang, Lanning; Lu, Haitian; Zheng, Juepeng

doi:https://doi.org/10.5194/gmd-2023-164

Preprints

https://doi.org/10.5194/gmd-2023-164

Preprints

Submitted as: development and technical paper

16 Aug 2023

Submitted as: development and technical paper |

| 16 Aug 2023

Status: this preprint was under review for the journal GMD but the revision was not accepted.

Surrogate model-based precipitation tuning for CAM5

Xianwei Wu, Liang Hu, Lanning Wang, Haitian Lu, and Juepeng Zheng

Abstract. The uncertainty of physical parameters is a major reason for a poor precipitation simulation performance in Earth system models (ESMs), especially over the tropical and Pacific regions. Although tuning related parameters can help reduce such uncertainty factors, repetitive runs of ESMs incur large computational costs. While surrogate models can reduce the computational costs in many tuning scenarios, building an effective surrogate model for the community atmospheric model (CAM) is a complex integration of many processes, which is an unresolved challenge due to its strong nonlinear behaviors. In this study, we present a surrogate model-based parameter tuning framework for the CAM and apply it to improve the CAM5 precipitation performance. We propose a multilevel surrogate model-based optimization method. First, a global-level surrogate model is constructed with a gradient boosting regression tree (GBRT), which has been proven, through cross-validation experiments, to have a more significant effect than other methods. The candidate point approach (CAND) is applied to balance exploration and exploitation to obtain better values for establishing a local-level surrogate model. A local-level surrogate model is then constructed based on a much smaller number of chosen points. We design a trust region approach to adjust the sampling region during the tuning process. This proposed method has a faster convergence speed and higher accuracy during the tuning process. We attempt a region-based optimization method to improve the CAM simulation results over some areas with large errors. The results show that the surrogate model-based optimization method can significantly improve the simulation performance of the CAM model. The average improvement of the selected regions is 19 %. To integrate the optimization results of these regions, we design a nonuniform parameter parameterization scheme and integrate the parameters using a parameter smoothing scheme, and the experimental results improve in four regions. These experimental results demonstrate that the proposed method improves the precipitation simulation of the CAM model.

Received: 26 Jul 2023 – Discussion started: 16 Aug 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Xianwei Wu, Liang Hu, Lanning Wang, Haitian Lu, and Juepeng Zheng

Status: closed

RC1:
'Comment on gmd-2023-164', Anonymous Referee #1, 26 Sep 2023
This study proposes the online surrogate updating strategy to tune parameters in climate models. By constructing a higher accuracy surrogate model in the sub-domain parameter space, this method effectively reduces computational costs. This idea is particularly novel as it overcomes the low efficiency associated with the offline surrogate method, which cannot ensure optimization for the real model. Despite the significance of this work, the manuscript requires major revision by addressing the following issues.

Major issues:
The manuscript structure, particularly the method section, needs to be reorganized to improve the compactness. There are several areas that require clarification. For instance, Algorithm 1 calculates the RMSE, but its definition is found in section 3.2.2. It would be more appropriate to move the definition to section 2. Additionally, in Line 4 of Algorithm 2, it is unclear whether the new parameters are obtained using CAND. Furthermore, it is not explained why the local-level surrogate utilizes Gaussian Process. In addition, it could describes the difference between the algorithm used in this work and the ASMO. Typically, optimization algorithms require hundreds of steps to achieve convergence, but in this work, only around 20 steps of local optimization are performed. It is hard to say the algorithms get convergence. It appears that the ASMO method can achieve local optimization more quickly. The conclusion is not convinced. The description of CAND is difficult to follow, particularly the calculation v^s and v^d, which is lack of calculation details. The cross validation describe can move from result section to the method section.

The manuscript lacks a thorough mechanism analysis of how parameters affect precipitation on a global and regional scale. While section 4 presents optimization results, it lacks organization and falls short in providing a detailed understanding of the underlying mechanisms. To enhance the manuscript, it is recommended to delve deeper into the analysis. By investigating the cause-effect relationships between parameters and precipitation patterns, physics insights can be gained to improve the parameterization scheme.

In equations 10-11, it could be possible for the numerator to be very large, and the denominator can be very small. This implies that the value of sigma could exceed 0.75, but the fitness is bad. If the fitness is good, the value of sigma could be close to 1 rather than just being greater than 0.75.

Improving the clarity of motivation for the nonuniform parameter parameterization scheme.

Line 55, while previous methods involved running the climate model, it is important to note that this work also requires running the climate model in each iteration. However, the manuscript does not provide a direct comparison of the efficiency of this method with other approaches. To enhance the evaluation of the proposed method, it would be beneficial to include an assessment of the computational cost compared to existing methods. This evaluation can provide valuable insights into the efficiency and computational advantages of the proposed approach, strengthening the manuscript's contribution in terms of computational performance.

Minor issues:
The title uses CAM5, but the contexts use CAM. They could be consistent.

Line 11: “selected points..” to “selected points.”

Line 29: traditional tuning methods in climate modeling have certain limitations. However, they remain highly useful. The majority of climate models employ traditional tuning approaches due to their reliance on well-established physics knowledge. In fact, automatic tuning methods require a solid understanding of physics to enhance their efficiency.

Line 35, The statement that "WRF physics process is simple" is not accurate. In fact, it is known to be complex and intricate.

Line 37, The statement that "MVFSA may become infeasible for CAM tuning" may require further consideration. Fast simulated annealing, which is utilized in MVFSA, actually requires only one population to search for the next optimal parameters. The MVFSA requires thousands of steps to get a stable solution. But CAM requires a lot of computational cost for each optimization iteration. The authors should thoroughly discuss the challenges associated with MVFSA to provide a comprehensive understanding of its feasibility for CAM tuning.

Line 51, When the optimization process reaches convergence, further iterations do not lead to any improvement. Similarly, once the optimization algorithm has obtained a local solution, additional iterations do not result in further enhancements. The effectiveness of the algorithm is also a determining factor in this regard.

Line 58, It is confusing that ‘the mathematical expression is complex and time-consuming’. Could you explain it?

Line 59. Revise the sentence “Wang et al. … ; a SCM-SMA hydrologic model”

Line 85, the authors could carefully analyze the challenge of ASMO used in atmospheric model. The method has been successfully used in WRF, CLM. what’s the real challenge for atmospheric model?

Line 91, the above sentences discuss the tuning algorithms. The sentence “The precipitation process …” talk about the metrics. It would be beneficial to separate these statements into individual paragraphs.

Line 110, it is hard to say the nonlinearity and complexity of CAM5 are much higher than WRF.

Section 2.1, describe more details of CAM5, such as horizontal resolution, vertical level, how long does CAM5 run, the sst and sea ice are used prescribed seasonal climatology.

Line 138: define the six main regions, giving a table including the range of latitude and longitude.

Line 143, why not use GPCP to estimate precipitation but use ERA5.

Line 147, “Makes” to “makes”

Line 163, is the “sampling method” is the latin hypercube sampling? How many samples do you conduct?

Line 280, use the correct ref for GP.

Line 308, It is confusing that the surrogate model is built as the quadratic function. Does it use GP?

For fig2, what is the y-axis? Is it the relative error? How calculate it?

Line 343, ‘lower’ to ‘lowest’.

Section 4.1.3 should be merged into section 4.1.2.

In figure 4, it should include the obs pattern, or the difference between opt/default and observation.

Line 366, it is confusing for this sentence “Therefore, we need to further …”
Citation: https://doi.org/10.5194/gmd-2023-164-RC1
- AC1: 'Reply on RC1', Xianwei Wu, 15 Nov 2023
  
  The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2023-164/gmd-2023-164-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC1
- AC3: 'Reply on RC1', Xianwei Wu, 12 Dec 2023
  
  Dear Referee,
  Thank you very much for your valuable time and constructive comments! We have carefully replied to all the comments and organized our responses into a document as an attachment. Please find the updated responses in the attached document.
  Best regards,
  Xianwei Wu on behalf of all authors
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC3
RC2:
'Comment on gmd-2023-164', Anonymous Referee #2, 10 Oct 2023
The paper “Surrogate model-based precipitation tuning for CAM5” by Xianwei Wu et al. proposed a new technique to tune a model for precipitation patterns through a surrogate model and iterative focusing on regions of parameter space and iterative updates to the surrogate model that should lead to converging towards a parameter-space or parameter-vectors that allows the ESM to have better precipitation patterns. I agree with the authors that model tuning has to be approached more systematically than in the past or through simple knowledge of the physics and experience with the model, however, the presentation of the technique and resulting consequences requires a more careful presentation before it can be considered for publication.

The presentation of the algorithm and techniques could be more concise. It would further benefit from clear mathematical notation and equations, a clear nomenclature, and a clearer order. For instance, RMSE is used before properly introduced. The calculation of V^s and V^d are hard to follow and not right away clear.

The paper lacks describing links between the physics and the choice of parameters. Why do certain parameter combinations perform better (for instance, what do they affect, how does that affect the general performance etc.)

The paper does not discuss that in general tuning for a single metric is not required as a climate model has many different metrics that need to be fulfilled. Thus, it is required to discuss how the precipitation tuning might degrade other fields. For instance, what is the effect on the global mean temperature from this tuning etc.

The presentation of introducing the non-uniform parameter values is not entirely clear. Why should that be? What is the physical explanation for using different parameter values in different places? Shouldn’t the physics be independent of the location particularly in regions which are relatively similar (South Pacific, Nino?)? Particularly you tune for different ocean regions; it is not clear why different ocean areas should have different parameter tunings. (I could understand a land vs ocean parameter change, however, different oceans or land masses requires more careful introduction and physical justification)

The presentation is unclear as to why is a GP only used for the regional-level surrogate models and not for the global?

Major comments:
LL.37-70: it would be worth discussing also the tuning approaches by Hourdin and Williamson in more detail (http://link.springer.com/10.1007/s00382-013-1896-4 ; http://link.springer.com/10.1007/s00382-014-2378-z ; https://gmd.copernicus.org/articles/10/1789/2017/; https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020MS002423; https://onlinelibrary.wiley.com/doi/10.1029/2020MS002225; https://onlinelibrary.wiley.com/doi/10.1029/2020MS002217 )

Ll.135-138: How do you inform the parameter range of the parameters to tune for? How do you choose exactly those parameters? Generally, there are more parameters in the parameterizations; why exactly those 6?

LL. 140-141: Why do you choose particularly those regions? Why are they important?

Section 3: I would suggest sticking to terminology! Whenever you talk about an actual model simulation I would suggest using ESM/CAM5 or something like that. Otherwise you start mixing up terms such as global-model, global-level, complex model which does not make it easy to follow which of all the models you refer to or whether it is a new one.

L 213: Why do you use reanalysis data? Why not GPCP for instance?

Equation 1: Comparing CAM simulations with reanalysis requires regridding data. What technique do you use for that?

L. 249: What is the level of the fitness function? Equations for V^s and V^d would be very helpful!

L. 267: “..., allowing us to estimate uncertainty from the weight parameter”: How do you estimate uncertainty of the weight parameter?

Ll. 274-275: “ Because the surrogate model is constructed by a small number of real complex model samples, it cannot accurately simulate the actual situation of the complex model.”: I don't understand this step. Above you mention that the globale-level surrogate model is updated until it converges. So, it can't have converged if it is not accurate… please clarify

Figure 1: Could you put the whole algorithm into the flowchart which indicates which technique is used at which step which would make the whole description much clearer.

Figure 3: Why is the relative error of the proposed method increasing with iterations? Shouldn’t the surrogate model get more accurate with iteration numbers?

L 256: “…we only run the real model once, and”: But don’t you add more than one sample in each iteration? So, how do you need ot run the model only once?

LL. 377-384: “I think it is important to understand how each parameter influences the model simulations. Why did you pick only one here? This section requires more careful exploration of the parameter itself and the physical mechanisms. What are the physical reasons for the positive and negative correlations described in ll. 377-379

Table3/4: Could you also discuss the RMSE of those regions from the optimized parameter set from the global-surrogate model? This would clarify what the gain is from the local-level surrogate models.

Minor comments:
L. 1: “The uncertainty of physical parameters is a major reason for a poor precipitation simulation performance in Earth system models (ESMs), especially over the tropical and Pacific regions.”: Is it not only uncertainty of physical parameters but also the microphysics parameterizations itself.

Ll. 14-15: “The results show that the surrogate model-based optimization method can significantly improve the simulation performance of the CAM model.”: I would rephrase it stating that the surrogate model-based optimization method allows for better identifying optimal parameter values.

L. 25: “...could lead to huge deviations in the simulations”: Deviations from what?

LL. 168-169: “The strategy leverages the information and knowledge obtained from the surrogate model to optimize the run time of the real complex model to fulfill the requirement of accuracy.”: How do you optimize for the run time of the real complex model (I guess the ESM)?

L 171: “.. to update the global-level surrogate model until global-model convergence.”: Which global-model do you refer to here, which global-model has to converge?

L. 172: “... high-waulity CAM” Do you mwan with high-waulity simulations closer to the target value?

L. 175-176: “In the parameter tuning process, each surrogate model can fully explore the parameter space to obtain better solutions, generating a large number of samples.”: I don't understand this sentence as earlier (ll 171-172) it is stated that local-level surrogates don't use the whole parameter space?

LL. 202-211: Why do you talk about 1-D LHS. Usually LHS code can handle several dimensions. LHS code usually makes sure to maximize the minimal distance between all vectors in order to sample the whole space as uniformly as possible.

Ll. 222-225: The two sentences appear to have very similar information and should be rewritten

ll. : 347: Do you compare here the RMSEs of the final optimized parameter set? If so, do they converge to the same parameter set or different ones?

Figure 4: It would be good to see what you are tuning for. Can you also add the target value and not only the default and the biases to the target?

L 370: What do you mean with influence mode?

Technical issues:
Labels on contour plots are generally very small.

L. 63: I might have missed it but “ANNs” acronym was not introduced.

L. 78: I might have missed it but “SCA-SMA” acronym was not introduced.

L.132: “The compset used in this study is F_2000_CAM5, and the resolution is ne30_g16” : To the normal reader these abbreviations don’t mean anything. A little bit more explanation would be nice. What is F_2000_CAM5 for instance or what does ne_30_g16 mean in the physical world?

Can you use maybe mathematical notation of the original parameters in the paraemeterization instead of the CAM5 parameter naming? For instance line. 148 zmconv_tau is simply tau in the original Zhang McFarlane paper.

L 275: “actual situation”: you probably mean behaviour or something like that.

Figure 2: y-axis label missing

L. 333: What is now X,Y in the S{X,Y} notation?

Figure 3: no x-axis label

L. 398: “precipitation change trend”: What trend do you mean here? Time trend?
Citation: https://doi.org/10.5194/gmd-2023-164-RC2
- AC2: 'Reply on RC2', Xianwei Wu, 15 Nov 2023
  
  The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2023-164/gmd-2023-164-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC2
- AC4: 'Reply on RC2', Xianwei Wu, 12 Dec 2023
  
  Dear Referee,
  Thank you very much for your valuable time and constructive comments! We have carefully replied to all the comments and organized our responses into a document as an attachment. Please find the updated responses in the attached document.
  Best regards,
  Xianwei Wu on behalf of all authors
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC4

Status: closed

RC1:
'Comment on gmd-2023-164', Anonymous Referee #1, 26 Sep 2023
This study proposes the online surrogate updating strategy to tune parameters in climate models. By constructing a higher accuracy surrogate model in the sub-domain parameter space, this method effectively reduces computational costs. This idea is particularly novel as it overcomes the low efficiency associated with the offline surrogate method, which cannot ensure optimization for the real model. Despite the significance of this work, the manuscript requires major revision by addressing the following issues.

Major issues:
The manuscript structure, particularly the method section, needs to be reorganized to improve the compactness. There are several areas that require clarification. For instance, Algorithm 1 calculates the RMSE, but its definition is found in section 3.2.2. It would be more appropriate to move the definition to section 2. Additionally, in Line 4 of Algorithm 2, it is unclear whether the new parameters are obtained using CAND. Furthermore, it is not explained why the local-level surrogate utilizes Gaussian Process. In addition, it could describes the difference between the algorithm used in this work and the ASMO. Typically, optimization algorithms require hundreds of steps to achieve convergence, but in this work, only around 20 steps of local optimization are performed. It is hard to say the algorithms get convergence. It appears that the ASMO method can achieve local optimization more quickly. The conclusion is not convinced. The description of CAND is difficult to follow, particularly the calculation v^s and v^d, which is lack of calculation details. The cross validation describe can move from result section to the method section.

The manuscript lacks a thorough mechanism analysis of how parameters affect precipitation on a global and regional scale. While section 4 presents optimization results, it lacks organization and falls short in providing a detailed understanding of the underlying mechanisms. To enhance the manuscript, it is recommended to delve deeper into the analysis. By investigating the cause-effect relationships between parameters and precipitation patterns, physics insights can be gained to improve the parameterization scheme.

In equations 10-11, it could be possible for the numerator to be very large, and the denominator can be very small. This implies that the value of sigma could exceed 0.75, but the fitness is bad. If the fitness is good, the value of sigma could be close to 1 rather than just being greater than 0.75.

Improving the clarity of motivation for the nonuniform parameter parameterization scheme.

Line 55, while previous methods involved running the climate model, it is important to note that this work also requires running the climate model in each iteration. However, the manuscript does not provide a direct comparison of the efficiency of this method with other approaches. To enhance the evaluation of the proposed method, it would be beneficial to include an assessment of the computational cost compared to existing methods. This evaluation can provide valuable insights into the efficiency and computational advantages of the proposed approach, strengthening the manuscript's contribution in terms of computational performance.

Minor issues:
The title uses CAM5, but the contexts use CAM. They could be consistent.

Line 11: “selected points..” to “selected points.”

Line 29: traditional tuning methods in climate modeling have certain limitations. However, they remain highly useful. The majority of climate models employ traditional tuning approaches due to their reliance on well-established physics knowledge. In fact, automatic tuning methods require a solid understanding of physics to enhance their efficiency.

Line 35, The statement that "WRF physics process is simple" is not accurate. In fact, it is known to be complex and intricate.

Line 37, The statement that "MVFSA may become infeasible for CAM tuning" may require further consideration. Fast simulated annealing, which is utilized in MVFSA, actually requires only one population to search for the next optimal parameters. The MVFSA requires thousands of steps to get a stable solution. But CAM requires a lot of computational cost for each optimization iteration. The authors should thoroughly discuss the challenges associated with MVFSA to provide a comprehensive understanding of its feasibility for CAM tuning.

Line 51, When the optimization process reaches convergence, further iterations do not lead to any improvement. Similarly, once the optimization algorithm has obtained a local solution, additional iterations do not result in further enhancements. The effectiveness of the algorithm is also a determining factor in this regard.

Line 58, It is confusing that ‘the mathematical expression is complex and time-consuming’. Could you explain it?

Line 59. Revise the sentence “Wang et al. … ; a SCM-SMA hydrologic model”

Line 85, the authors could carefully analyze the challenge of ASMO used in atmospheric model. The method has been successfully used in WRF, CLM. what’s the real challenge for atmospheric model?

Line 91, the above sentences discuss the tuning algorithms. The sentence “The precipitation process …” talk about the metrics. It would be beneficial to separate these statements into individual paragraphs.

Line 110, it is hard to say the nonlinearity and complexity of CAM5 are much higher than WRF.

Section 2.1, describe more details of CAM5, such as horizontal resolution, vertical level, how long does CAM5 run, the sst and sea ice are used prescribed seasonal climatology.

Line 138: define the six main regions, giving a table including the range of latitude and longitude.

Line 143, why not use GPCP to estimate precipitation but use ERA5.

Line 147, “Makes” to “makes”

Line 163, is the “sampling method” is the latin hypercube sampling? How many samples do you conduct?

Line 280, use the correct ref for GP.

Line 308, It is confusing that the surrogate model is built as the quadratic function. Does it use GP?

For fig2, what is the y-axis? Is it the relative error? How calculate it?

Line 343, ‘lower’ to ‘lowest’.

Section 4.1.3 should be merged into section 4.1.2.

In figure 4, it should include the obs pattern, or the difference between opt/default and observation.

Line 366, it is confusing for this sentence “Therefore, we need to further …”
Citation: https://doi.org/10.5194/gmd-2023-164-RC1
- AC1: 'Reply on RC1', Xianwei Wu, 15 Nov 2023
  
  The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2023-164/gmd-2023-164-AC1-supplement.pdf
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC1
- AC3: 'Reply on RC1', Xianwei Wu, 12 Dec 2023
  
  Dear Referee,
  Thank you very much for your valuable time and constructive comments! We have carefully replied to all the comments and organized our responses into a document as an attachment. Please find the updated responses in the attached document.
  Best regards,
  Xianwei Wu on behalf of all authors
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC3
RC2:
'Comment on gmd-2023-164', Anonymous Referee #2, 10 Oct 2023
The paper “Surrogate model-based precipitation tuning for CAM5” by Xianwei Wu et al. proposed a new technique to tune a model for precipitation patterns through a surrogate model and iterative focusing on regions of parameter space and iterative updates to the surrogate model that should lead to converging towards a parameter-space or parameter-vectors that allows the ESM to have better precipitation patterns. I agree with the authors that model tuning has to be approached more systematically than in the past or through simple knowledge of the physics and experience with the model, however, the presentation of the technique and resulting consequences requires a more careful presentation before it can be considered for publication.

The presentation of the algorithm and techniques could be more concise. It would further benefit from clear mathematical notation and equations, a clear nomenclature, and a clearer order. For instance, RMSE is used before properly introduced. The calculation of V^s and V^d are hard to follow and not right away clear.

The paper lacks describing links between the physics and the choice of parameters. Why do certain parameter combinations perform better (for instance, what do they affect, how does that affect the general performance etc.)

The paper does not discuss that in general tuning for a single metric is not required as a climate model has many different metrics that need to be fulfilled. Thus, it is required to discuss how the precipitation tuning might degrade other fields. For instance, what is the effect on the global mean temperature from this tuning etc.

The presentation of introducing the non-uniform parameter values is not entirely clear. Why should that be? What is the physical explanation for using different parameter values in different places? Shouldn’t the physics be independent of the location particularly in regions which are relatively similar (South Pacific, Nino?)? Particularly you tune for different ocean regions; it is not clear why different ocean areas should have different parameter tunings. (I could understand a land vs ocean parameter change, however, different oceans or land masses requires more careful introduction and physical justification)

The presentation is unclear as to why is a GP only used for the regional-level surrogate models and not for the global?

Major comments:
LL.37-70: it would be worth discussing also the tuning approaches by Hourdin and Williamson in more detail (http://link.springer.com/10.1007/s00382-013-1896-4 ; http://link.springer.com/10.1007/s00382-014-2378-z ; https://gmd.copernicus.org/articles/10/1789/2017/; https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2020MS002423; https://onlinelibrary.wiley.com/doi/10.1029/2020MS002225; https://onlinelibrary.wiley.com/doi/10.1029/2020MS002217 )

Ll.135-138: How do you inform the parameter range of the parameters to tune for? How do you choose exactly those parameters? Generally, there are more parameters in the parameterizations; why exactly those 6?

LL. 140-141: Why do you choose particularly those regions? Why are they important?

Section 3: I would suggest sticking to terminology! Whenever you talk about an actual model simulation I would suggest using ESM/CAM5 or something like that. Otherwise you start mixing up terms such as global-model, global-level, complex model which does not make it easy to follow which of all the models you refer to or whether it is a new one.

L 213: Why do you use reanalysis data? Why not GPCP for instance?

Equation 1: Comparing CAM simulations with reanalysis requires regridding data. What technique do you use for that?

L. 249: What is the level of the fitness function? Equations for V^s and V^d would be very helpful!

L. 267: “..., allowing us to estimate uncertainty from the weight parameter”: How do you estimate uncertainty of the weight parameter?

Ll. 274-275: “ Because the surrogate model is constructed by a small number of real complex model samples, it cannot accurately simulate the actual situation of the complex model.”: I don't understand this step. Above you mention that the globale-level surrogate model is updated until it converges. So, it can't have converged if it is not accurate… please clarify

Figure 1: Could you put the whole algorithm into the flowchart which indicates which technique is used at which step which would make the whole description much clearer.

Figure 3: Why is the relative error of the proposed method increasing with iterations? Shouldn’t the surrogate model get more accurate with iteration numbers?

L 256: “…we only run the real model once, and”: But don’t you add more than one sample in each iteration? So, how do you need ot run the model only once?

LL. 377-384: “I think it is important to understand how each parameter influences the model simulations. Why did you pick only one here? This section requires more careful exploration of the parameter itself and the physical mechanisms. What are the physical reasons for the positive and negative correlations described in ll. 377-379

Table3/4: Could you also discuss the RMSE of those regions from the optimized parameter set from the global-surrogate model? This would clarify what the gain is from the local-level surrogate models.

Minor comments:
L. 1: “The uncertainty of physical parameters is a major reason for a poor precipitation simulation performance in Earth system models (ESMs), especially over the tropical and Pacific regions.”: Is it not only uncertainty of physical parameters but also the microphysics parameterizations itself.

Ll. 14-15: “The results show that the surrogate model-based optimization method can significantly improve the simulation performance of the CAM model.”: I would rephrase it stating that the surrogate model-based optimization method allows for better identifying optimal parameter values.

L. 25: “...could lead to huge deviations in the simulations”: Deviations from what?

LL. 168-169: “The strategy leverages the information and knowledge obtained from the surrogate model to optimize the run time of the real complex model to fulfill the requirement of accuracy.”: How do you optimize for the run time of the real complex model (I guess the ESM)?

L 171: “.. to update the global-level surrogate model until global-model convergence.”: Which global-model do you refer to here, which global-model has to converge?

L. 172: “... high-waulity CAM” Do you mwan with high-waulity simulations closer to the target value?

L. 175-176: “In the parameter tuning process, each surrogate model can fully explore the parameter space to obtain better solutions, generating a large number of samples.”: I don't understand this sentence as earlier (ll 171-172) it is stated that local-level surrogates don't use the whole parameter space?

LL. 202-211: Why do you talk about 1-D LHS. Usually LHS code can handle several dimensions. LHS code usually makes sure to maximize the minimal distance between all vectors in order to sample the whole space as uniformly as possible.

Ll. 222-225: The two sentences appear to have very similar information and should be rewritten

ll. : 347: Do you compare here the RMSEs of the final optimized parameter set? If so, do they converge to the same parameter set or different ones?

Figure 4: It would be good to see what you are tuning for. Can you also add the target value and not only the default and the biases to the target?

L 370: What do you mean with influence mode?

Technical issues:
Labels on contour plots are generally very small.

L. 63: I might have missed it but “ANNs” acronym was not introduced.

L. 78: I might have missed it but “SCA-SMA” acronym was not introduced.

L.132: “The compset used in this study is F_2000_CAM5, and the resolution is ne30_g16” : To the normal reader these abbreviations don’t mean anything. A little bit more explanation would be nice. What is F_2000_CAM5 for instance or what does ne_30_g16 mean in the physical world?

Can you use maybe mathematical notation of the original parameters in the paraemeterization instead of the CAM5 parameter naming? For instance line. 148 zmconv_tau is simply tau in the original Zhang McFarlane paper.

L 275: “actual situation”: you probably mean behaviour or something like that.

Figure 2: y-axis label missing

L. 333: What is now X,Y in the S{X,Y} notation?

Figure 3: no x-axis label

L. 398: “precipitation change trend”: What trend do you mean here? Time trend?
Citation: https://doi.org/10.5194/gmd-2023-164-RC2
- AC2: 'Reply on RC2', Xianwei Wu, 15 Nov 2023
  
  The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2023-164/gmd-2023-164-AC2-supplement.pdf
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC2
- AC4: 'Reply on RC2', Xianwei Wu, 12 Dec 2023
  
  Dear Referee,
  Thank you very much for your valuable time and constructive comments! We have carefully replied to all the comments and organized our responses into a document as an attachment. Please find the updated responses in the attached document.
  Best regards,
  Xianwei Wu on behalf of all authors
  
  Citation: https://doi.org/10.5194/gmd-2023-164-AC4

Xianwei Wu, Liang Hu, Lanning Wang, Haitian Lu, and Juepeng Zheng

Viewed

Total article views: 893 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
606	237	50	893	73	79

HTML: 606
PDF: 237
XML: 50
Total: 893
BibTeX: 73
EndNote: 79

Views and downloads (calculated since 16 Aug 2023)

Month	HTML	PDF	XML	Total
Aug 2023	139	35	3	177
Sep 2023	48	12	4	64
Oct 2023	41	9	3	53
Nov 2023	10	3	0	13
Dec 2023	43	16	8	67
Jan 2024	11	9	2	22
Feb 2024	18	9	2	29
Mar 2024	26	14	4	44
Apr 2024	33	2	6	41
May 2024	13	6	4	23
Jun 2024	33	6	1	40
Jul 2024	17	6	1	24
Aug 2024	20	4	2	26
Sep 2024	14	3	1	18
Oct 2024	14	1	0	15
Nov 2024	23	4	2	29
Dec 2024	8	5	0	13
Jan 2025	12	11	0	23
Feb 2025	14	6	0	20
Mar 2025	19	10	2	31
Apr 2025	8	9	1	18
May 2025	12	14	2	28
Jun 2025	20	25	2	47
Jul 2025	8	12	0	20
Aug 2025	2	6	0	8

Cumulative views and downloads (calculated since 16 Aug 2023)

Month	HTML	PDF	XML	Total
Aug 2023	139	35	3	177
Sep 2023	48	12	4	64
Oct 2023	41	9	3	53
Nov 2023	10	3	0	13
Dec 2023	43	16	8	67
Jan 2024	11	9	2	22
Feb 2024	18	9	2	29
Mar 2024	26	14	4	44
Apr 2024	33	2	6	41
May 2024	13	6	4	23
Jun 2024	33	6	1	40
Jul 2024	17	6	1	24
Aug 2024	20	4	2	26
Sep 2024	14	3	1	18
Oct 2024	14	1	0	15
Nov 2024	23	4	2	29
Dec 2024	8	5	0	13
Jan 2025	12	11	0	23
Feb 2025	14	6	0	20
Mar 2025	19	10	2	31
Apr 2025	8	9	1	18
May 2025	12	14	2	28
Jun 2025	20	25	2	47
Jul 2025	8	12	0	20
Aug 2025	2	6	0	8

Viewed (geographical distribution)

Total article views: 870 (including HTML, PDF, and XML) Thereof 870 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 08 Aug 2025

Short summary

In order to build an effective surrogate model for the community atmospheric model (CAM). We present a surrogate model-based parameter tuning framework for the CAM and apply it to improve the CAM5 precipitation performance and propose a multilevel surrogate model-based optimization method. We design a nonuniform parameter parameterization scheme and integrate the parameters using a parameter smoothing scheme, and the experimental results improve in four regions.


Total:	0
HTML:	0
PDF:	0
XML:	0