Latent Linear Adjustment Autoencoders v1.0: A novel method for estimating and emulating dynamic precipitation at high resolution

A key challenge in climate science is to quantify the forced response in impact-relevant variables such as precipitation against the background of internal variability, both in models and observations. Dynamical adjustment techniques aim to remove unforced variability from a target variable by identifying patterns associated with circulation, thus effectively acting as a filter for dynamically-induced variability. The forced contributions are interpreted as the variation that is unexplained by circulation. However, dynamical adjustment of precipitation at local scales remains challenging because of large natural variability 5 and the complex, nonlinear relationship between precipitation and circulation particularly in heterogeneous terrain. Building on variational autoencoders, we introduce a novel statistical model—the Latent Linear Adjustment Autoencoder (LLAAE)—that enables estimation of the contribution of a coarse-scale atmospheric circulation proxy to daily precipitation at high-resolution and in a spatially coherent manner. To predict circulation-induced precipitation, the Latent Linear Adjustment Autoencoder combines a linear component, which models the relationship between circulation and the latent space of an autoencoder, with 10 the autoencoder’s nonlinear decoder. The combination is achieved by imposing an additional penalty in the cost function that encourages linearity between the circulation field and the autoencoder’s latent space, hence leveraging robustness advantages of linear models as well as the flexibility of deep neural networks. We show that our model predicts realistic daily winter precipitation fields at high resolution based on a 50-member ensemble of the Canadian Regional Climate Model at 12-km resolution over Europe, capturing for instance key orographic features and geographical gradients. Using the Latent Linear Adjustment 15 Autoencoder to remove the dynamic component of precipitation variability, forced thermodynamic components are expected to remain in the residual, which enables the uncovering of forced precipitation patterns of change from just a few ensemble members. We extend this to quantify the forced pattern of change conditional on specific circulation regimes. Future applications could include, for instance, weather generators emulating climate model simulations of regional precipitation, detection and attribution at sub-continental scales, or statistical downscaling and transfer learning between models and observations to 20 exploit the typically much larger sample size in models compared to observations.


Introduction
Precipitation is a key climate variable that is highly relevant for impacts such as floods or meteorological drought. Precipitation simulations at high-resolution (e.g., Prein et al., 2017) are required for adaptation planning for local and regional precipitation change in a warming climate. However, precipitation shows large natural variability (Deser et al., 2012), and its relationship 25 with atmospheric circulation is complex and non-linear, in particular at local to regional scales and in heterogeneous terrain (e.g., Zorita et al., 1995). Moreover, projected changes in precipitation are unevenly distributed across the distribution of precipitation intensity (Allen and Ingram, 2002;Held and Soden, 2006;Pendergrass, 2018). Scaling rates depend on the return period, region, temperature and moisture availability (Prein et al., 2017), and changes in circulation during precipitation events (Shepherd, 2014;Fereday et al., 2018). Hence, it is a key challenge to identify, understand and interpret patterns of forced 30 precipitation change in model simulations and observations. Dynamical adjustment techniques have been developed to separate forced and internal variability via a co-interpretation of target variables such as temperature or precipitation using circulation information: A circulation proxy (such as a sea level pressure pattern) is used to estimate the circulation-induced (dynamic) contribution to temperature or precipitation variability.
For example, dynamical adjustment of precipitation has revealed that the spatial pattern and amplitude of observed residual 35 (predominant thermodynamic) precipitation trends at the scale of the entire Northern hemisphere mid-and high-latitude land area are in good agreement with the expected anthropogenically-forced trends from model simulations (Guo et al., 2019).
Similarly, in Europe, Fereday et al. (2018) showed that thermodynamic forced changes in future winter precipitation are in relatively good agreement among models, while large uncertainties remain in simulated forced circulation changes that may affect precipitation. While internal and forced components of precipitation variability and change can be decomposed in 40 large ensembles of model simulations (Deser et al., 2012;von Trentini et al., 2019;Leduc et al., 2019), high resolution large ensembles are prohibitively expensive. It would be beneficial to be able to estimate and identify forced precipitation patterns from only a few ensemble members at impact-relevant regional spatial scales.
Techniques for dynamical adjustment have relied largely on linear regression (Wallace et al., 1995(Wallace et al., , 2012Smoliak et al., 2015;Sippel et al., 2019) or circulation analogue techniques (Yiou et al., 2007;Deser et al., 2016). Because the fraction of Our proposed Latent Linear Adjustment Autoencoder extends the standard VAE model appropriately to enable the climate applications of interest.
Specifically, during training the Latent Linear Adjustment Autoencoder encodes daily precipitation fields into a low-dimensional latent space and subsequently decodes them for reconstruction. In addition, we formulate the objective function such that the latent space can be regressed linearly on the circulation proxy. For dynamical adjustment, we use the estimate of the latent 60 space based on circulation, which is then decoded for predicting daily precipitation fields at high spatial resolution. In other words, the final model is nonlinear, consisting of a linear part and a nonlinear part, where the latter is a deep neural network.
It enables prediction of the portion of the precipitation field that can be explained by circulation (i.e., the dynamic component of precipitation). Moreover, several further climate science applications of the Latent Linear Adjustment Autoencoder are conceivable, such as for example weather generators emulating regional climate model simulations, detection and attribution at sub-continental scales, or statistical downscaling, and are discussed further below.
In summary, the objectives of this paper are the following: 1. Introduce a novel statistical model-the Latent Linear Adjustment Autoencoder-as a versatile technique for applications in climate science, particularly for making better use of high-resolution climate simulations by estimating circulation-induced (dynamic) precipitation at high resolution from coarse-scale circulation information. 70 2. Illustrate the Latent Linear Adjustment Autoencoder by applying it to dynamical adjustment of daily high-resolution precipitation from simulations over Central Europe. More specifically, the LLAAE will be used to separate forced precipitation trends from internal variability.
2 Dynamical adjustment using statistical learning Following Smoliak et al. (2015) and Sippel et al. (2019), we frame dynamical adjustment as a statistical learning problem. Let 75 ∈ R ℎ× be the climate variable of interest on a spatial field of size ℎ × and let ∈ R be input features. In the following, we consider daily precipitation fields for and empirical orthogonal function (EOF) time series of sea-level pressure (SLP) for as a proxy for circulation. The EOF time series are detrended (as described below) and scaled to unit-variance; in the EOF computation, we do not weight by area. A variety of climate variables instead of precipitation could be taken as ; results for daily temperature can be found in Appendix C.

80
Let X be the × matrix where each column contains one input feature with data points. Each data point is in our case a simulation from a Regional Climate Model (RCM), and each column corresponds to one of EOF components of SLP, from the RCM but at coarsened resolution. Let Y be the × ℎ × tensor that represents the precipitation intensity for each data point in a spatial field. Below, we present our proposed statistical model, which estimates the circulation-induced component where is a generic non-linear function. LetR denote the residualsR = Y −Ŷ X that remain. SinceŶ X is the precipitation explained primarily by variations in circulation,R is the precipitation primarily unexplained by circulation. If SLP is unaffected by external forcing, then this residual contains the signal induced by the thermodynamic component of the external forcing, since the variability due to circulation has been removed. If instead external forcing does affect SLP, then the dependence 90 between and that arises due to the common influence of the external forcing would bias the estimation of . To avoid potential forced trends in SLP projecting onto the thermodynamic component of the external forcing in , we detrend the daily SLP EOF time series as follows. We ensure that they are orthogonal to the smoothed, first EOF of the January ensemble mean by regressing the time series against the ensemble mean, and using the corresponding residuals as input features . The reasoning behind this step is that a forced trend in SLP will be approximately captured in the first EOF of the SLP ensemble mean. Here, 95 we use the January ensemble mean as a proxy for December-February SLP. For simplicity, we refer to the detrended and scaled SLP EOF time series simply as the "SLP time series" in the following.

Latent Linear Adjustment Autoencoders: Proposed deep autoencoder model for dynamical adjustment
We build on variational autoencoders (VAEs; Kingma and Welling, 2014;Rezende et al., 2014), which can be understood as a (typically nonlinear) dimensionality reduction method. An autoencoder consists of an "encoder" that maps to the low-100 dimensional latent space ∈ R , = ( ), and a "decoder" which in turn maps to the reconstruction of ,ˆ = ( ) = ( ( )). This scheme is illustrated in Fig. 1, which depicts the reconstruction of precipitation fields. The VAE objective encourages the distribution of the latent space variables to be close to a chosen prior distribution, typically a standard multivariate Gaussian distribution, and also ensures thatˆ ≈ . The encoder and the decoder are parameterized as (deep) neural networks. 105 We extend the standard VAE model to make it suitable for dynamical adjustment by adding a linear component ℎ to the architecture. The linear component ℎ takes as input features and predicts the latent space variables of the VAE; thus we call the overall model the "Latent Linear Adjustment Autoencoder". Using an appropriate training objective (see Eq. (3)), we enforce that when linearly predicting the latent space variables with ,ˆ = ℎ( ), the resulting decoded prediction = (ˆ ) = (ℎ( )) to be close to . The motivation behind this loss function is that the combined model, which consists of 110 the combination of ℎ and , should explain as much variance in as possible, while using only the input . In other words, it should capture the circulation-induced signal in . The advantage of combining the linear model ℎ with the nonlinear decoder of the VAE is that the overall model is very expressive while the estimation of ℎ remains relatively simple.
In more detail, we consider the following objective to train the encoder and decoder with associated parameters = ( , ) of our proposed Latent Linear Adjustment Autoencoder: L VAE is the standard VAE objective for real-valued input data, consisting of a reconstruction loss and the Kullback-Leibler divergence between the distribution of the encoded inputs and the prior distribution of the latent space, here chosen to be a standard multivariate Gaussian distribution (for details see Kingma and Welling, 2014;Rezende et al., 2014). L L is the extension to the objective that we propose, and is a tuning parameter that steers the relative importance of the two loss functions in the overall objective (Eq. (2)). The autoencoder and the linear model are trained iteratively in an alternating fashion. In the first step, the objective L is optimized while ℎ is treated as fixed. In a second step, the linear component ℎ of the Latent Linear Adjustment Autoencoder is trained with squared error loss, treating the encoder and decoder parameters as fixed, The parameters of the encoder and decoder ( , ) and those of the linear model ℎ, ℎ , are coupled, since the linear model aims to predict the latent space variables , which are subject to change during the training of the encoder and the decoder. At the same time, the autoencoder should be trained such that a linear regression from on achieves a small error in Eq. (3), which is accomplished with this procedure for training the components. In practice, we train the model using the Adam optimizer 130 (Kingma and Ba, 2015). All details related to training the model, such as architecture and hyperparameter choices, as well as code to reproduce our experimental results, can be found in Appendix A.
After training the components , and ℎ, we no longer need the encoder to perform dynamical adjustment on unseen test data. This is illustrated in Fig. 2. We predict the latent space variables with the linear model ℎ, using the SLP time series as input . The resulting predictions are fed to the decoder, which outputs predictions of the spatial field based only on . In 135 other words, we obtain the spatial field of precipitation which can be explained by circulation.
The spatial field is modelled jointly in our approach-the optimization is performed over the whole spatial field at once-in contrast to Sippel et al. (2019), where a separate model needed to be trained for each grid cell. The joint modeling of the daily high-resolution precipitation field as a function of coarse-scale circulation may enable several additional climate science applications, briefly discussed in Sect. 4. This approach yields 50 approximately independent realizations of the climate system . Each of the global simulations has been downscaled using the Canadian Regional Climate model version 5 (Martynov et al., 2013)

165
SLP is regridded to a spatial resolution of 1x1 • before computing the EOFs as described in Sect. 2, so that the model predicts high-resolution precipitation from only a coarse-resolution proxy of atmospheric circulation. We aggregate the data (3-h) to daily averages. SLP data is also taken from the original 280x280, 0.12 • resolution Euro-Cordex domain (WCRP, 2015) and regridded to a regular 1x1 • grid that broadly covers the region of -15 to 35 • E and 35-64 • N (see Fig. 9 (top)).

170
We use RCM simulation data from 1955 to 2100 to allow for 5 years of spinup. We train our model using daily data from December-February from 9 ensemble members ("kba", "kbc", "kbe", "kbg", "kbi", "kbk", "kbm", "kbq", "kbs"). The results in the main text are based on training data that comprises the years 1955-2070. In Appendix B, we present results based on the shorter training time period from 1955-2020, using the same ensemble members. This corresponds to a reduction in the amount of training data points of approximately 43% and serves as a sensitivity test to the amount of training data. Furthermore, we 175 evaluate our trained models on the remaining 41 ensemble members that were left out of training. We refer to these as "holdout ensemble members".

Evaluation of predictions
To illustrate the spatial coherence of our approach, we show 5 example target (daily) data points of , their reconstructionsˆ , and the predictionsˆ from the holdout ensemble member "kbb" (Fig. 3). These examples are chosen for different percentiles 180 of the distribution of the 2 values (i.e., proportion of explained variance). As such they show the range between data points where the LLAAE performs relatively poorly to cases where the LLAAE performs well (in terms of 2 ). To highlight the precipitation features, these examples are displayed after the data has been square-root-transformed. We compute the gridcellwise mean squared error (MSE) of predictionsˆ and the proportion of explained variance ( 2 ) based on the daily data from the holdout ensemble member "kbb". The results for the other holdout ensemble members are very similar. We further evaluate 185 the predictions within a dynamical adjustment framework described in the next paragraph.

Dynamical adjustment
We evaluate the extent to which the forced response of precipitation can be uncovered with a small number of ensemble members using dynamical adjustment (e.g., Deser et al., 2016). Specifically, we quantify how well the long-term "forced response" (i.e., the average across all 50 ensemble members) can be approximated by the residuals of our predictions (the 190 difference between precipitation simulated by the RCM and the circulation-induced component of precipitation predicted by the Latent Linear Adjustment Autoencoder). In other words, dynamical adjustment acts as a filter for short-term "circulationinduced" precipitation variability. Recall that we expect the residuals to primarily contain the thermodynamic component of change (Deser et al., 2016). However, it is important to stress that the residual is not a perfect proxy for thermodynamical change, because it may contain effects from feedbacks, remaining internal variability and circulation components not directly 195 captured by SLP. Moreover, since the SLP data is detrended prior to the application of the method, long-term dynamical changes may even be part of the residuals. In addition to analysing the forced response in seasonal precipitation totals, we evaluate the estimation of the forced precipitation response for two composites of atmospheric circulation based on EOF analysis of SLP.

Results and Discussion
4.1 Reconstructed and predicted spatial fields 200 We begin by showing a selection of reconstructed precipitation fieldsˆ from the holdout ensemble member "kbb" (center, Fig. 3), which illustrates the skill of the encoder and decoder, and predictionsˆ (right, Fig. 3), which illustrate the skill of the linear latent model ℎ, against the original RCM-simultated precipitation (left, Fig. 3). The reconstruction quality, i.e.
the similarity between the left and the center column, is quite high, though not all fine details are reproduced (which is to be expected). The predictionsˆ (right column) are computed using the linear model ℎ and the decoder with SLP time series 205 as inputs. For the worst example (in terms of 2 ; first row), the original spatial field shows that this corresponds to a day with very low precipitation while the LLAAE predicts larger precipitation in some regions (e.g. south of France), resulting in a very low 2 value. For the other rows (25%-percentile-best example), the predictions resemble the original image fairly well. For dynamical adjustment we use the residualsˆ , which are computed as the difference between the original fields (left column) and the predictions (right column).

210
The proposed model yields spatially coherent predictions and explains a large proportion of the variance of . Skill of precipitation predicted with the Latent Linear Adjustment Autoencoder is quantified in Fig. 4, which shows the gridcell-wise MSE over all December-February days for the holdout ensemble member "kbb". The spatial pattern of MSEs ( Fig. 4) largely reflects the pattern of precipitation as simulated by the RCM. Prediction errors are high over heterogeneous terrain, likely linked to orographic precipitation, in particular on the western sides of mountain ranges such as the Alps in Central Europe, 215 the Appenine in Italy, the Dinaric Alps in South-East Europe, and smaller ranges located in France and Germany. Prediction errors are also high at the west coast of the UK, whereas mean squared prediction errors appear relatively low over low altitude regions (e.g., northern France, Benelux, and north Germany). The spatial pattern of 2 (the fraction of explained variance, Fig. 5) shows a more nuanced pattern dominated by a land-sea contrast. Over land, the circulation proxy generally explains a high proportion of variance (up to ≈ 90%), especially on the western slopes of mountain ranges, which receive a large fraction 220 of their precipitation from large-scale circulation-induced events. In contrast, the fraction of variance explained by circulationinduced precipitation is smaller on the eastern sides of mountain ranges, and particularly low over oceanic regions (which we do not interpret in this study).

Extraction of forced precipitation trends at high spatial resolution
In this subsection, we evaluate our predictions of the circulation-induced component in the framework of dynamical adjustment 225 (Deser et al., 2016). That is, we test the extent to which the forced response of regional high resolution precipitation obtained from averaging across the full 50-member ensemble can be approximated by the residuals of our predictions from a single or from relatively few ensemble members. The thermodynamic component of the variation in precipitation, which is driven by temperature change and unrelated to dynamically-induced variability, should remain in the residuals (e.g., Deser et al., 2016).
Dynamical adjustment, hence, acts to reduce short-timescale circulation-induced variability, thus increasing signal-to-noise The effect of dynamical adjustment can be seen in Fig. 6. It shows time series of domain-average (land only) December-February seasonal precipitation totals simulated by the high resolution RCM, the predictions of the circulation-induced component for three holdout ensemble members, and the forced response. All three RCM ensemble members show an increasing trend in seasonal precipitation totals across the 21st century, over which large inter-annual variability is superimposed. In con-235 trast, the predicted circulation-induced components capture the inter-annual precipitation variability well, but they do not show discernible trends. Consequently, the residuals have relatively smoothly increasing trends, which match the magnitude of the forced precipitation trend well (Fig. 6, right panels). This demonstrates successful dynamical adjustment of continental-scale, seasonal precipitation totals using the Latent Linear Adjustment Autoencoder.
It is more challenging, however, to identify and evaluate the forced precipitation response at the local scale of individual 240 grid points. To this end, Fig. 7 shows the spatial pattern of forced 50-year (2020-2069) precipitation trends, the pattern of dynamically adjusted precipitation trends, and "raw" precipitation trends in RCM simulations for three holdout ensemble members. The forced response of winter precipitation change is dominated by a north-south contrast. The northern part of the domain is projected to experience a precipitation increase in the 21st century, while decreases in precipitation are projected for the southernmost part of the domain (mainly over the Mediterranean Sea). Increases in winter precipitation across most parts 245 of the domain are largely due to thermodynamic and lapse rate changes (e.g., Brogli et al., 2019). Locally, forced precipitation trends are larger over heterogeneous terrain, which may be due to forced dynamic components that are independent of SLP (Shi and Durran, 2014). The latter paper shows idealized simulations of the forced response of orographic precipitation, which is dynamic, but it is driven by changes in vertical velocity on the upslope side that enhances orographic precipitation, which could be separate from the changes in upslope wind speed and thus SLP. Vertical velocity could increase because of the increasing 250 moisture with warming.
However, large variability in individual ensemble members is superimposed on the signal of forced change (Fig. 7, right), consistent with the large role of internal variability even on multi-decadal time scales . For example, ensemble member "kct" (second row) produces a relatively strong drying trend over northern Italy, which is entirely due to internal variability. The dynamically adjusted version of "kct" shows only very weak drying in northeast Italy, whereas it shows  considerably. Hence, dynamical adjustment is particularly useful when only few members are available; e.g. for small ensembles up to five members. If only one member is available, the reconstruction RMSE of the forced 50-year precipitation trend is 265 reduced by more than half via dynamical adjustment. Conversely, to achieve the same RMSE of a single dynamically adjusted ensemble member, an ensemble average of about four to six members would be required (Fig. 8, top panel). On the other hand, for ensembles with more than about 14 members, dynamical adjustment does not improve the ability to reconstruct the forced response. Moreover, dynamical adjustment reduces not only the reconstruction RMSE, but also reduces the spread of the distribution across ensemble members, as indicated by the boxes and whiskers in Fig. 8 (top panel). The overall reduction 270 of the reconstruction RMSE also holds particularly for specific circulation regimes (Fig. 8, middle and bottom panel), and is discussed in the next subsection.

Elucidating forced precipitation trends for specific circulation composites at high spatial resolution
While dynamical adjustment of long-term trends of temperature and precipitation has become a standard tool for the detection of forced thermodynamic trends Deser et al., 2016;Guo et al., 2019;Lehner et al., 2018), a bigger 275 challenge is to assess forced trends in specific circulation regimes. One example would be summer heat waves related to specific circulation conditions (Jézéquel et al., 2018).
Thus, we assess to what extent the forced precipitation response can be uncovered under specific circulation conditions from a small number of ensemble members. We create composites of the dominant mode of atmospheric winter circulation over Europe as diagnosed by EOF analysis over the historical period  in the RCM simulations. The first EOF of 280 the coarse-resolution SLP field is shown in Fig. 9 (top). The dominant mode has a meridional gradient, with low pressure anomalies over northern Europe and high pressure anomalies over the Mediterranean. Although the domain includes only a small fraction of the North Atlantic, the dipole character of the EOF spatial pattern resembles the North Atlantic Oscillation.
We now generate composites of 'EOF1+' and 'EOF1-' regimes by isolating days that exceed the 75th percentile ('EOF1+') and those that fall below the 25th percentile ('EOF1-') in terms of the first principal component (Fig. 9, bottom). Note that 285 the principal component time series associated with EOF1 does not show any discernible trend until the late 21st century, so we do not expect large forced changes in the SLP variability patterns over Europe. On winter days with strong positive EOF1 ('EOF1+', roughly analogous to NAO+), i.e. a pronounced north-south pressure gradient, increased westerly winds bring mild and moist air from the Atlantic into Central Europe (Fig. 9, bottom left). Conversely, the opposite regime suppresses westerlies, hence inducing drier conditions on average (Fig. 9, bottom right) which are also accompanied by colder temperatures. Forced precipitation trends for 2020-2069 under 'EOF1-' conditions differ from 'EOF1+' conditions due to a change in the synoptic situation: the forced spatial pattern has generally weaker precipitation changes (due to overall drier conditions during 300 'EOF1-'), and precipitation increases are confined towards southeastern Europe (Fig. 11, left panel). Meanwhile, over large regions north and west of the Alps, precipitation changes only weakly under these circulation conditions. Spatial patterns of dynamically-adjusted ensemble members (Fig. 11, middle panel) have a closer correspondence to the forced pattern than to the spatial patterns of 'raw' 50-year trends (Fig. 11, right panel). The reconstruction RMSE of the forced response is again substantially reduced (RMSE=0.052 for dynamically adjusted grid cells, RMSE=0.086 for raw trends).

305
The application of dynamical adjustment to composites of specific circulation regimes raises the question as to whether the Latent Linear Adjustment Autoencoder may be applicable to understanding the dynamical component in extreme precipitation events. While the LLAAE may be able to fill an important gap in reconstructing the dynamical component of daily precipitation fields, possibly including days with extreme precipitation (at least, the component proportional to surface pressure), it exhibits a tendency to smooth predicted precipitation fields (Fig. 3), which would presumably result in somewhat underpredicted extreme 310 events. However, a detailed evaluation of the LLAAE in the context of extreme events will be the focus of future work.
Overall, we conclude that dynamical adjustment enables approximating the forced response from high resolution simulations with only a few ensemble members. This is possible for both long-term trends in seasonal precipitation totals as well as for trends under more specific circulation regimes. The improvement for the 'EOF1+' and 'EOF1-' circulation regimes can be evaluated from Fig. 8 (middle and bottom panel), where reconstruction RMSEs of forced 50-year precipitation trends, based 315 on one ensemble member, are reduced by up to about a factor of two by using dynamical adjustment based on the Latent Linear Adjustment Autoencoder. To achieve a similar forced response reconstruction RMSE, an ensemble average of about four to six members would be required (Fig. 8) both for seasonal trends (top panel) and the trends under specific circulation composites.

Dynamical adjustment uncertainties, computational costs and future applications
One of the main uncertainties in dynamical adjustment is the question of whether and how to detrend the climate data (cir-320 culation fields and/or precipitation) prior to dynamical adjustment. This is somewhat subjective, and often discussed as an across models (Shepherd, 2014;Fereday et al., 2018). Therefore, it is critical to ensure that the statistical model does not fit a 325 thermodynamical, forced signal and hence only models the dynamical variability. For the results presented in this work, we orthogonalized SLP EOF time series with respect to the ensemble-mean SLP change over time (i.e., a very simplistic but generic "detrending"). Our analysis shows that the residuals after dynamical adjustment match the ensemble mean very well (Fig. 6).
Hence, if a trend signal is included in the prediction of the precipitation field (e.g., due to hypothetical remaining trend artefacts in the pressure field), this effect is likely to be small because the residuals match the ensemble mean (forced) trend very well.

330
In Appendix B, we test an alternative simple detrending approach, where SLP is not detrended, but where we detrend precipitation using a simple method. However, we stress that our study is intended as a proof-of-concept study using Latent Linear Adjustment Autoencoders within a large ensemble context. Appropriate detrending choices for real-world applications (e.g., on observations), or an interpretation of forced changes into thermodynamical vs. circulation-induced components, remain for future work.

335
Another important question is how much training data is necessary to achieve the presented results. One may argue that it is computationally cheaper to estimate the forced response using a-say-9-member-ensemble mean, instead of training the LLAAE based on simulations from nine ensemble members (as done in this work). Indeed, as machine learning algorithms are known to require rather large amounts of training data, "proving" the case of LLAAE dynamical adjustment in a large ensemble may not be as straightforward. While we have shown that dynamical adjustment based on LLAAEs reduces the 340 number of ensemble members required to identify a proxy of the forced response for local-scale 50-year winter precipitation trends substantially (Fig. 8), this approach evidently requires several ensemble members for training. While the results shown here are based on training the Latent Linear Adjustment Autoencoder with data from nine ensemble members  period), we have tested that the accuracy remains virtually identical if trained with about 43% less training data (nine ensemble members, but training only on 1955-2020, Appendix B), thus highlighting robustness of the method against reasonable changes to the amount of training data.
Beyond the computational aspects, however, we anticipate the ultimate applications of LLAAE-based dynamical adjustment not on a large ensemble (where the forced response is typically approximated with the ensemble average, (Deser et al., 2020)), but instead on simulations with models where only one or a few ensemble members may be available such as projections with regional climate models (Jacob et al., 2014). Hence, the results presented in this work are intended as a proof-of-concept 350 study within a large ensemble. As the next step, we envision the application to different climate models (e.g. training on a large ensemble or multiple large ensembles, and application of the dynamical adjustment to models for which only a few simulations exist), and with ultimate application of the trained LLAAE on reanalysis SLP data. This would allow us to leverage the available data from climate model simulations while applying the method in a context where a direct calculation of the (member) ensemble mean is not possible. The transfer between climate models or towards reanalyses could explore adding 355 other constraints or regularization to the linear model in the latent space, such as instrumental variables or anchor regression (Rothenhäusler et al., 2018) for distributional robustness (Meinshausen, 2018), which may benefit the robust applicability of LLAAEs (across different climate models or observations) without the need for re-training.

Alternative statistical and machine learning approaches
In principle, there are alternative approaches for statistical learning in the context of dynamical adjustment and also alternative 360 options to employ deep neural networks. For instance, one could extend the method of Smoliak et al. (2015) or Sippel et al.
(2019) by using a neural network instead of linear regression. In that case, however, one would have a separate fit for each grid point (i.e. not the 2D precipitation field as output). This would be computationally demanding and it is also questionable whether the resulting predicted spatial field would be as coherent. In contrast, the spatial field is modelled jointly in our approach-the optimization is performed over the whole spatial field at once. The joint modeling of the daily high-resolution 365 precipitation field as a function of coarse-scale circulation also enables additional climate applications, one of which is outlined in Sect. ??.
Furthermore, one may wonder why the autoencoder is needed in the architecture of the LLAAE if the encoder is discarded when predicting dynamic precipitation from SLP. Using the autoencoder for estimation allows us to link SLP EOFs as input with the 2D precipitation fields as output. Removing the intermediate stage of the autoencoder would constitute a challenging 370 estimation problem as the autoencoder helps to estimate the decoder. We are not aware of alternative ML algorithms for this input/output combination and the LLAAE is novel in this regard.

Conclusion and future work
In this work we have first introduced the Latent Linear Adjustment Autoencoder, which combines a linear model with the nonlinear decoder of a variational autoencoder. By combining a linear model, which takes a circulation proxy as input, with the 375 expressive nonlinear (deep neural network) decoder, it can be easily trained, and allows for jointly modeling the dynamically-induced high-resolution spatial field of the climate variable of interest. The main methodological novelty is that we add a linear model to the variational autoencoder and include an additional penalty term in the loss function that encourages linearity between the circulation proxy and the latent space. This leverages the advantages of a linear relationship between circulation variables and latent space variables, hence enhancing robustness, while also benefiting from the advantages of deep neural 380 networks (i.e., flexibility in modeling non-linearities, such as those that occur in high-resolution orographic precipitation).
Future work targeting climate applications could explore the robust transfer of LLAAE's between different climate models, reanalyses data, or observations, by using ideas from transfer learning or distributional robustness (Meinshausen, 2018), for example through adding other constraints or regularization to the linear model in the latent space.
Second, as the main application, we have tested the applicability of the Latent Linear Adjustment Autoencoder to dynamical 385 adjustment of high-resolution precipitation based on daily data at regional scales. Based on a circulation proxy, the Latent Linear Adjustment Autoencoder predicts dynamic (circulation-induced) precipitation at high resolution. An estimate of the forced precipitation response can then be separated from internal variability, leaving higher signal-to-noise compared to raw multidecadal trends. With only one or two ensemble members, root mean squared errors are roughly halved compared to raw trends when estimating the forced response (see Fig. 8), leading to dynamically-adjusted spatial trend patterns that closely 390 resemble those of the approximated forced response (i.e., the ensemble average over 50 members), despite large internal variability. Moreover, we have used dynamical adjustment with the Latent Linear Adjustment Autoencoder to extend the framework to uncover estimates of the forced response conditioned on specific circulation regimes. We illustrated this aspect for composites of days with prevailing westerly conditions and hence wet conditions over Western Europe (similar to NAO+ regimes), and conversely, for days with suppressed westerlies (similar to NAO-) and hence generally drier conditions in Western

395
Europe. In both cases the Latent Linear Adjustment Autoencoder was able to provide a better estimate of the forced response (i.e., with reduced error) compared to raw trends.
Further use cases of the Latent Linear Adjustment Autoencoder may include further applications of dynamical adjustment, including transfer learning across different high resolution simulations such as EURO-CORDEX models (Jacob et al., 2014).
Eventual application to observations for regional-scale detection and attribution of precipitation changes is anticipated. More can be generated based on new samples of the SLP time series, obtained for instance via bootstrapping or from a coarseresolution GCM, which then would allow to emulate daily precipitation dynamics at high spatial resolution. While modelling 410 the spatial dependencies directly is challenging, this technique may leverage the trained models to represent the relationship between the SLP time series and the dynamic precipitation component. Thus, this approach has the advantage of avoiding the more complex and costly operations that depend on the high-dimensional spatial field for emulation.
Overall, the Latent Linear Adjustment Autoencoder may prove a versatile tool for climate and atmospheric science, specifically for modeling relationships between large-scale predictors and local and nonlinear precipitation at high resolution.

415
Appendix A: Experimental details In this section, we detail the architecture used for the encoder and decoder of the proposed model. Additionally, we report the most important hyperparameters. All further details can be found in the accompanying code; see the "Code and data availability" section below for details. For the encoder and the decoder we use three convolutional layers and one residual layer (2) is set to 1.

Appendix B: Additional experimental results for dynamical adjustment of daily precipitation
As discussed in the main text, one of the main uncertainties in dynamical adjustment is how to ensure that the statistical model 425 does not fit a thermodynamic, forced signal and hence only models the dynamic internal variability. Fitting a forced signal can potentially be mitigated by (i) an appropriate choice of the training period and (ii) suitable pre-processing of the data.
Furthermore, another important question is how much training data is necessary to achieve the presented results, even though we do not see the ultimate use case of LLAAEs to be used for dynamical adjustment in large ensembles (also see the discussion in Sec. 4.4). To address these points and to further corroborate our results, we perform two additional analyses to understand 430 the sensitivity of the LLAAE to (i) the training period choice, (ii) the amount of training data as well as (iii) the sensitivity to different detrending approaches.

B0.1 Training on the time period 1955-2020
We train on the shorter period from 1955-2020 (as opposed to 1955-2070), using the same nine ensemble members as described in the main text. This corresponds approximately to a 43% reduction in training data, but more importantly, this restricts the 435 training to a period with relatively modest precipitation change. We then reproduce the analysis with this model trained on (i) this shorter time period; and (ii) less data. We find that the and 2 performance measures indicate very robust results with respect to these changes in the input data (see Fig. A1 and A2). Furthermore, the dynamical adjustment analysis based on the shorter training period reveals almost identical results as compared to the longer 1955-2070 training period. That is, the residual variability is much closer to the ensemble mean forced response (see Fig. A3). This sensitivity analysis thus provides 440 support that our method is robust to (i) a shorter time period and (ii) less training data points.

B0.2 Detrending precipitation
The question of whether and how to detrend prior to dynamical adjustment is open, somewhat subjective, and often discussed as an inherent subjective choice and uncertainty in dynamical adjustment papers (see, e.g. Deser et al. (2016); Lehner et al. (2017) and Lehner et al. (2018) for a discussion about trend removal). Here, in addition to the results presented above, we test 445 an alternative simple detrending approach: SLP is not detrended, but we detrend precipitation using a simple LOESS smoother, fitted on the ensemble (seasonal) means at every location and subtracted from every day individually. (Furthermore, we here use the shorter 1955-2020 period for training the model.) For the dynamical adjustment analysis, we then compute the residuals based on the non-detrended precipitation data and our predictions (from the model trained on the detrended precipitation data; see Fig. A4). This analysis suggests that this approach to detrend precipitation is too simplistic since the residuals of the 450 dynamical adjustment analysis underestimate forced changes (the ensemble mean) to some extent (Fig. A4). There are several possible reasons for this: 1. Precipitation change cannot be modelled by a single additive mean change across the whole distribution. For instance, precipitation change is known to increase the variance of the precipitation distribution (Pendergrass et al., 2017). Hence, by subtracting the estimated, seasonally averaged precipitation trend we may have not fully removed the trend for wet 455 days. Developing a more refined approach to remove the forced precipitation changes from daily data is non-trivial and beyond the scope of this work.
2. There may be some dynamically-induced changes in precipitation, but it would be hard to evaluate this without any additional simulations where dynamical effects and thermodynamical effects could be separated.
Overall, we conclude that our simple SLP detrending (without detrending precipitation) is a useful approach for introducing 460 LLAAEs as a versatile tool for dynamical adjustment, as demonstrated by the fact that the residuals of individual ensemble members after dynamical adjustment match the ensemble mean trend of precipitation very well (e.g., Fig. 6). However, we acknowledge that considerations around whether and how to detrend the data prior to dynamical adjustment are crucial, especially for real-world applications (also see the discussion in Sect. 4.4).
Appendix C: Experimental results for dynamical adjustment of daily temperature 465 The following results are based on temperature anomalies. In Fig. A5, we show examples from the holdout ensemble "kbb" of (i) original temperature fields (left column), (ii) reconstructionsˆ (center column), and (iii) predictionsˆ (right column).
The reconstruction quality, i.e. the similarity between the left and the center column, is quite high, even though not all fine details are reproduced. For dynamical adjustment, we use the residuals, which are computed as the difference between the original fields (left column) and the predictionsˆ (right column). The predictions are computed using the linear model ℎ and 470 the decoder with SLP time series as input. As can be seen, the proposed model yields spatially-coherent predictions and explains a large proportion of the variance of . Figure A6 shows the gridcell-wise MSE and Fig. A7 shows the 2 statistics for the holdout ensemble "kbb".             Figure A7. Temperature anomalies. Proportion of variance explained ( 2 ) for each grid cell for the temperature predictions.