Submitted as: model description paper 29 Mar 2021
Submitted as: model description paper  29 Mar 2021
fenics_ice 1.0: A framework for quantifying initialisation uncertainty fortimedependent icesheet models
 ^{1}School of GeoSciences, Univ. of Edinburgh, City of Edinburgh, United Kingdom
 ^{2}School of Mathematics, Univ. of Edinburgh, City of Edinburgh, United Kingdom
 ^{1}School of GeoSciences, Univ. of Edinburgh, City of Edinburgh, United Kingdom
 ^{2}School of Mathematics, Univ. of Edinburgh, City of Edinburgh, United Kingdom
Abstract. Mass loss due to dynamic changes in ice sheets is a significant contributor to sea level rise, and this contribution is expected to increase in the future. Numerical codes simulating the evolution of ice sheets can potentially quantify this future contribution. However, the uncertainty inherent in these models propagates into projections of sea level rise, and hence is crucial to understand. Key variables of ice sheet models, such as basal drag or ice stiffness, are typically initialized using inversion methodologies to ensure that models match present observations. Such inversions often involve tens or hundreds of thousands of parameters, with unknown uncertainties and dependencies. The computationally intensive nature of inversions along with their high number of parameters mean traditional methods such as Monte Carlo are expensive for uncertainty quantification. Here we develop a framework to estimate the posterior uncertainty of inversions, and project them onto sea level change projections over the decadal timescale. The framework treats parametric uncertainty as multivariate Gaussian, and exploits the equivalence between the Hessian of the model and the inverse covariance of the parameter set. The former is computed efficiently via algorithmic differentiation, and the posterior covariance is propagated in time using a timedependent model adjoint to produce projection error bars. This work represents an important step in quantifying the internal uncertainty of projections of icesheet models.
Conrad P. Koziol et al.
Status: final response (author comments only)

RC1: 'Comment on gmd202190', Anonymous Referee #1, 22 Apr 2021
The uncertainty in model data is propagated to a Quantity of Interest (QoI) in ice sheet simulations using a Bayesian approach. This kind of analysis has been published previously but the authors extend it here to time dependent QoIs. They show how to reduce the uncertainty in the prior by utilizing Bayes' formula for the posterior. The method is tested with the SSA ice sheet equations solving an ISMIPHOM problem. Parameters are varied in numerical computations. Examples of possible future work are given in the final section. The code for the experiments is available for free.
The paper is suitable for GMD and is improved if the comments below are addressed somehow in a revised version.
General comments
Sect. 2.4: $\Gamma_{post}$ depends on how $r$ is chosen. If there is a gap in the eigenvalue distribution then $r$ can be chosen such that the gap is between $\lambda_r$ and $\lambda_{r+1}$. But in general there is no gap. How should $r$ be chosen in a general case? This question is related to the choice of $\gamma$, see Fig 4a. If we believe in prior data then $\gamma$ should be large but if we don't then give data lower weight (or should $\gamma$ be viewed only as a regularization parameter?).
Sect. 4, (33): Why is this particular QoI chosen? A bit longer motivation would be welcome.
Sect. 4: How is the prior $c_0$ chosen? Maybe this is mentioned somewhere but it could be repeated here. A discussion of how to select $c_0$ and its impact on the posterior result would be interesting.
Sect. 5.3: One would expect that the linear approximation of QoI should work for sufficiently small perturbations. Maybe the perturbations are too large when the regularization is small ($\gamma_1$) and for smaller perturbations the approximation will work.
Sect. 6: The issues above with choice of $r, \gamma, c_0$ and QoI could also be discussed in the last section.
Specific comments
line 93: Euclidean inner product of $a$ with $\Gamma^{1}_{obs}b$? Next line $\a\_{\Gamma^{1}_{obs}}$?
105: Is there a weight missing in front of the prior term? Maybe $\gamma$?
110: Tell that you maximize over $\bar{c}$.
145: Define $\delta$. $\matcal{L}$ is defined for a function in (9). In (10) $\matcal{L}$ is applied to a vector $c$, at least $c$ is a vector after the last equality in (10). Should $J_{reg}^c$ depend on a weight too?
149: Should the definition of $L$ have only one $\mathcal{L}$ in the integral? What are the bars over $\phi$?
171: Mention that the eigenvalues are ordered such that $\lambda_i\ge\lambda_{i+1}$.
179: leading eigenmodes > leading eigenmodes with large eigenvalues
242: Specify parameters $B$ and $n$
251, 253: Specify $\delta$ which is different from $\delta$ in (9). Tell what o.w is.
321, 322: Is weight $\gamma$ missing here? Is $c$ a function in the integral after the first equality and a vector in the second term after the second equality?
515: The uncertainty in the QoI is not only lower than the uncertainty in the parameters due to the filtering but also due to the choice of QoI. With a different QoI it may be larger even with filtering.
Â
Technical corrections
line 138: $J_c$ > $J^c$
176: $H$ > $\bar{H}_{mis}$ ?
223: (Section ?)
302, 304: $C^2$ or $C$ here?
385: say something about ...... intervals?
413: maybe move ....?
665, 677: missing journals
Â

RC2: 'Comment on gmd202190', Anonymous Referee #2, 03 Jun 2021
This paper presents a mathematical method and a software tool for quantifying uncertainty in glacier flow model projections. The methods consists of (1) using automatic differentiation to compute the action of the full Hessian of the negative log posterior, (2) computing the smallest few eigenvalue / eigenvector pairs of the Hessian in order to find out which directions in parameter space are most important to sample, and (3) computing the uncertainty in a given output quantity of interest by linearizing the model physics around the most probable state. The authors then test this method on a synthetic test problem, the ISMIPHOM test case. The authors also address two key shortcomings of how data assimilation in glaciology is usually practiced: accounting for discrete (as opposed to dense) spatial observations and correlation between measurements. Overall, the method is described fairly well albeit with a few points that could use clarifying. I have a few concerns about how generalizable the method is, but in any case the paper and the software are a valuable contribution that I recommend for publication with minor revisions.
The authors make excellent use of lowrank approximations to the Hessian. This trick has appeared in the literature before (although it's not as widely used as it should be). How does their approach compare to that of, say, Petra et al. 2014 or the hIPPYlib code?
I have two real concerns with the approach, although these don't change my overall opinion of this very good paper. First, the method relies on the assumption of quasilinearity in several places. While the authors check that this assumption wasn't violated for their particular test case, it's difficult to assess whether this would generalize to other problems. By contrast, the stochastic Newton approach in Petra et al. 2014 uses an assumption of quasilinearity only locally, to bootstrap a more sophisticated Monte Carlo sampling algorithm. Second, the authors make a big deal about using the full Hessian instead of the GaussNewton approximation while at the same time relying on quasilinearity, and yet the GaussNewton approximation is works best when the dynamics are almost linear. At the end of the paper, the authors state that they did not use a timedependent control method because it's computationally expensive. Would a timedependent method have been feasible if the authors had instead used the GaussNewton approximation? Establishing whether the full Hessian is really necessary is a very important point. Other researchers might want to emulate the techniques described in this paper and yet they might be using other modeling frameworks for which the GaussNewton approximation is feasible to implement while the full Hessian is not.
General comments45, "gradientbased optimisation": The point of this sentence is to state that computing the MAP estimate doesn't give any information about parametric uncertainty. The fact that you used a gradientbased algorithm concerns more the "how" than the "what"; it isn't really important and you could just cut this phrase entirely. You could have used a derivativefree optimization algorithm  it's a dreadfully awful idea, but you could do it!
5760: How much better is using the full Hessian than using the GaussNewton approximation? This isn't immediately apparent from the text or from the sources you cite here.
97: By taking the parametertoobservation map f to be a map from R^n to R^m, you're assuming a "discretize, then optimize" mindset. It might clarify the points you make about mesh dependence later on if you instead define it as a map from some function space Q to R^m  the "optimize, then discretize" mindset. See Gunzburger 2002.
The parametertoobservation map that you've written down encapsulates both the physics and the measurement process. This is just a suggestion, but it might help the exposition to instead define f as the composition of two maps. First, there's a function g that takes the parameters to the observable fields, like the ice velocity. This function g is basically just "solve the shallow shelf equations". Next, there's a function h that takes the observable fields to the actual observations. When doing the "wrong thing" that you point out later, h is the identity map and the norm is an L^2 norm. When doing the correct that that you have actually implemented, h evaluates the observable fields at a bunch of discrete points and packs these observations into a vector, and the modeldata misfit is a discrete sum of squared errors.
135139: Using a control method does not necessarily imply that you're writing the modeldata misfit as a squared L^2 norm, it's just a sinful thing that many glaciologists (including me) have done because it's easier.
It's also worth noting that when you say "mesh dependence", certain readers are immediately going to think of something other than what you describe here. In the PDEconstrained optimization literature, mesh dependence refers to what happens when you use a bad optimization algorithm based on using the vector of coefficients of the derivative obtained from the adjoint method as a descent direction. This can give really obvious mesh imprinting artifacts in the results, especially with higherorder finite element bases. (The minimal right thing to do is to multiply by the inverse of the mass matrix. You mention using the BFGS method later, and taking the H_0 matrix to be the inverse mass matrix works there as well.) This problem is more a question of how you solve a particular optimization problem. The mesh dependence that you're talking about is much more serious  by neglecting the discreteness of the observations, going to a finer mesh implicitly assumes that you magically have more measurements than you did before. At a higher level, what you've done is tackle the fact that everyone has been solving the wrong problem, irrespective of how they were solving it. Since some readers will immediately associate with the first case, it might be good to either (1) clarify the distinction with a reference to, say, Schwedes et al. 2017, or, (2) if you don't feel like talking about that, use a different phrase besides "mesh dependence".
140146: Including the delta term is essentially adding the prior information that you think the mean of the parameters is zero. I don't think this is a good prior in all cases. Are there other ways to get a prior with bounded covariance that don't make this assumption? For example, you could use the MoreauYosida regularization of bounds constraints, which instead assumes that the parameters don't wander too far outside a preset interval but which provides no constraints within that interval.
260264: It's easy to get the impression from this paragraph that you're doing timedependent inversions, which is only dispelled in the conclusion on line 552. You should probably state this earlier in the text.
270273: How do you know how many Picard iterations is adequate? Why not use another globalization strategy, like damping / line search or trust regions?
274: Why did you use BFGS when you can calculate a Hessianvector product? Why not NewtonKrylov?
341: The Lcurve is fine, but you might want to mention the discrepancy principle or other more statisticallymotivated approaches. See Habermann et al. 2013.
484490: This is really great, I haven't seen anyone address this issue before.
537538: I don't think the technical hurdles are minor at all because of exactly what you say in the next sentence.
I would remove this statement from the text.550551: It might be worth citing some of the work that Karen Willcox and her group have done on multifidelity modeling and UQ.
Technical correctionsSeveral of the authors' "note to self" comments remain in the manuscript.
484: "isotroptically" > "isotropically"
Referenceshttps://doi.org/10.1137/130934805
https://hippylib.github.io/
https://doi.org/10.1137/1.9780898718720
https://doi.org/10.1007/9783319594835
https://doi.org/10.5194/tc716792013
Conrad P. Koziol et al.
Model code and software
fenics_ice Joe A Todd, Conrad P Koziol, James R Maddison, Daniel N Goldberg https://doi.org/10.5281/zenodo.4633106
Conrad P. Koziol et al.
Viewed
HTML  XML  Total  BibTeX  EndNote  

425  119  8  552  3  4 
 HTML: 425
 PDF: 119
 XML: 8
 Total: 552
 BibTeX: 3
 EndNote: 4
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1