EuLerian Identification of ascending Air Streams (ELIAS 2.0) in Numerical Weather Prediction and Climate Models. Part I: Development of deep learning model

Physical processes on the synoptic scale are important modulators of the large-scale extratropical circulation. In particular, rapidly ascending air streams in extratropical cyclones, so-called warm conveyor belts (WCBs), modulate the uppertropospheric Rossby wave pattern and are sources and magnifiers of forecast uncertainty. Thus, from a process-oriented perspective, numerical weather prediction (NWP) and climate models should adequately represent WCBs. The identification of WCBs usually involves Lagrangian air parcel trajectories that ascend from the lower to the upper troposphere within two days. 5 This requires numerical data with high spatial and temporal resolution which is often not available from standard output and requires expensive computations. This study introduces a novel framework that aims to predict the footprints of the WCB inflow, ascent, and outflow stages over the Northern Hemisphere from instantaneous gridded fields using convolutional neural networks (CNNs). With its comparably low computational costs and relying on standard model output alone the new diagnostic enables the systematic investigation of WCBs in large data sets such as ensemble reforecast or climate model projections 10 which are mostly not suited for trajectory calculations. Building on the insights from a logistic regression approach of a previous study, the CNNs are trained using a combination of meteorological parameters as predictors and trajectory-based WCB footprints as predictands. Validation of the networks against the trajectory-based data set confirms that the CNN models reliably replicate the climatological frequency of WCBs as well as their footprints at instantaneous time steps. The CNN models significantly outperform previously developed logistic regression models. Including time-lagged information on the occurrence 15 of WCB ascent as a predictor for the inflow and outflow stages further improves the models’ skill considerably. A companion study demonstrates versatile applications of the CNNs in different data sets including the verification of WCBs in ensemble forecasts. Overall, the diagnostic demonstrates how deep learning methods may be used to investigate the representation of weather systems and of their related processes in NWP and climate models in order to shed light on forecast uncertainty and systematic biases from a process-oriented perspective. 20

lag of several hours after the inflow, considers those WCB air parcels between 800 and 400 hPa. All WCB air parcels above 400 hPa define the WCB outflow stage which occurs with a time-lag after the ascent stage. In a final step, the parcel locations are gridded for each layer on a regular 1 • ×1 • latitude-longitude grid. Labeling grid points without/with WCB trajectory as 0/1 95 yields dichotomous dependent two-dimensional predictands for WCB inflow, ascent, and outflow, respectively.
Predictors are computed from nearly the same ERA-Interim data as used for the trajectory computation. The only difference is that the computation of predictors is based on data at the 1000, 925, 850, 700, 500, 300, and 200 hPa isobaric surfaces and not on all available model levels. This is due to the intention that the CNN models shall be applicable to climate projections or reforecast data, for example of the sub-seasonal to seasonal prediction project data base (Vitart et al., 2017), which are only 100 available on this limited number of vertical levels. The four most important predictors for WCB inflow, ascent, and outflow were identified in a stepwise forward selection approach by Quinting and Grams (2021) and are listed in Table 1. As an additional fifth predictor, we include the 30-day running mean climatological occurrence frequency of WCB inflow, ascent, and outflow centered on each calendar day which is based on 6-hourly data from the gridded Lagrangian WCB data set for the period 1 January 1980 to 31 December 2016. The purpose of using this fifth predictor is to account for the seasonal variation in WCB 105 occurrence frequency so that the same CNN models can be applied year-round. This avoids the need to develop one model per season . For each of the three WCB stages of inflow, ascent, and outflow a separate CNN model is developed for the Northern Hemisphere with the predictors listed in Table 1 serving as input maps. These CNN models are referred to as standard models.
3 UNet convolutional neural network 110 In this study, we use variants of the UNet CNN architecture (Ronneberger et al., 2015) which was originally designed to process biomedical images but has been successfully applied in meteorological applications (e.g., Lebedev et al., 2019;Ayzel et al., 2020;Weyn et al., 2020). The UNet is an encoder-decoder neural network architecture and consists mainly of two paths ( Fig.1), the contracting path (encoder) which down-scales the input map from its original resolution using convolutional layers and pooling, and the expanding path (decoder) which up-scales learned patterns back to the original resolution using up-115 sampling and convolutional layers. In the following, we provide information on the format of the input maps, the contracting path, and the expanding path.

Input map
In a first step, the data introduced in Section 2 are split into training, validation, and testing data sets. An essential requirement is that the training, validation, and testing data sets are statistically independent. A random sampling from the entire time period 120 to create the three subsets would likely lead to highly correlated data sets. For example, a sample from 00 UTC on one day could fall into the training set and a sample from 12 UTC on the same day into the testing set. The 12 h time interval between the two samples would be considerably shorter than the synoptic timescale on which WCBs evolve. To avoid statistical dependence, we split the data into the three subsets as shown in Table 2. The training data, which comprise the period 01 January 1980 to 31 December 1999, are used to train the CNN models. Validation data are a comparably small subset of 5 years that allow to compare models with different settings on unseen data and to identify the best performing model. The testing data, which comprise the period 01 January 2005 to 31 December 2016, are used to evaluate the best performing models on unseen data (Section 3). Though predictors and predictands are available at 00, 06, 12, 18 UTC, we train and validate the CNN models with 12-hourly data (00, 12 UTC) for computational reasons. The computationally less expensive testing of the models is performed on 6-hourly data (00, 06, 12, 18 UTC).

130
Each training sample consists of M ×N ×P input maps and an M ×N ×1 output map. The variable M is the number of rows (latitudes), N is the number of columns (longitudes), and P is the number of channels (number of predictor variables; P = 5).
The CNN models of this study contain at least four so-called max-pooling layers (see Section 3.2), each downsampling a map two times. Therefore, M and N have to be a multiple of 2 n+1 (Ayzel et al., 2020), where n is the number of max-pooling layers. With 1 • ×1 • horizontal grid spacing M would be 91 for the entire Northern Hemisphere and thus not a multiple of 135 2 n+1 . Accordingly, we decided to select data from 6 • S to 89 • N (M =96) in the latitudinal direction. The North Pole at 90 • N is excluded due to infinite gradients when computing some of the predictors in Table 1 via finite differences. To account for the circular nature of the data in the longitudinal direction at the international date line, input padding is performed (Shi et al., 2015;Schubert et al., 2019): we pad 44 grid points east and west of the date line which increases N from 360 to 448. As a result, the computing time needed for the model training increases. Still, it improves the results since without input padding 140 the modelled probabilities would exhibit discontinuities along the dateline.
Prior to input padding each of the five predictor variables is normalized for each training sample to where x i,j is the original value, x denotes the area-weighted mean and σ is the area-weighted standard deviation. The reasoning behind the normalization is to prevent predictors with large values to cause large weight updates in the CNN during training 145 (Section 3.4).

Contracting path
The default setting in this work is a contracting path with four blocks, each of which contains two convolutional layers (blue triangles in Fig.1). These layers transform the input maps into so-called feature maps using convolutional filters. The convolutional filters are three-dimensional tensors of learnable weights with a certain spatial kernel size (kernel size = 3 × 3 in this 150 study) and the third dimension equal to that of the input map. The filters convolve through the input maps grid point by grid point with stride = 1 in this study and perform a convolution defined as (e.g., Lagerquist et al., 2019) is the kth feature map in the ith layer, X i−1 denotes the jth feature map in the (i − 1)th layer, W layer, and f is the activation function. We use the rectified linear unit (ReLU; Nair and Hinton, 2010) as activation function in order to add non-linearities to the convolutional layer output. This non-linearity is required since otherwise the CNNs would only learn linear relationships. The third layer of each block is a 2 × 2 max-pooling layer (orange triangles in Fig.1) which slides over each feature map with stride = 1 and takes the maximum of four numbers in the filter region of 2 × 2 grid points.
Accordingly, the feature maps are downsampled by a factor of 2. For example, the original size of the input map is 96 × 448 160 and after the first block, which contains one max-pooling layer, it is reduced to 48 × 224. The process of convolution and max pooling is repeated for each block. With each block, the number of filters doubles so that the models are able to detect the meaningful features of the input maps effectively. Each max-pooling layer is followed by a dropout regularization layer which aims to prevent overfitting (Srivastava et al., 2014). During dropout regularization, input units are randomly set to 0 with a pre-defined dropout fraction at each step during training time. Though the effectiveness of dropout regularization in CNNs is 165 still debated (e.g., Hinton et al., 2012;Ioffe and Szegedy, 2015), we decided to test the sensitivity of the results to the dropout fraction by varying it in the range from 0.0 to 0.3 at intervals of 0.05 (see Section 3.5). Dropout regularization is not used during validation and testing. Further, we apply batch normalization (Ioffe and Szegedy, 2015) 1 after each dropout layer which effectively reduces overfitting in CNNs and reduces the number of training steps.

Expanding path 170
In line with the contracting path the expanding path consists of four blocks, each of which contains three layers. The first layer is a transposed convolutional layer and serves the purpose to up-sample the feature maps from low to higher resolution. The kernel sizes are set to 3 × 3, and the stride is 2. The upsampling is followed by the second layer which first concatenates the feature maps from the contracting path to the expanding path (so-called skip connections), second applies a dropout function, and third includes a convolutional layer with a kernel size of 3 × 3 and stride = 1. By including skip connections (black dashed 175 arrows in Fig.1), high resolution information from the contracting path can be used in order to reconstruct high resolution feature maps in the expanding path. The third layer is a further convolutional layer with the same kernel size and stride as in the previous layer. After each expanding block the number of filters halves in contrast to the contracting blocks. The spatial dimensions double with each expanding block so that the size of the feature map is 96 × 448 after four expansions which is the same size as the original input map.

180
The final output is generated with a convolutional layer (kernel size = 1 × 1; stride = 1; black triangle in Fig.1) which reduces the number of feature maps from 16 to 1. In contrast to all previous convolutional layers, the activation function is a sigmoid function yielding values between 0 and 1 so that the output can be interpreted as a conditional probability.
1 During model training (Section 3.4), the training data set is divided into so-called batches. In brief, the batch size defines the number of training samples considered before updating the filter weights and batch normalization describes the process of normalizing the inputs maps in one batch prior to proceeding with the training.

Model training
Random initialization of the convolutional filters ensures that they all have different weights W ingly, the different filters detect different features on the input map. The weights and biases of all convolutional filters are updated during iterative training via the Adam optimization algorithm (Kingma and Ba, 2015). The purpose is to minimize a loss function for classification which is in this study the binary crossentropy loss. It is defined as and is commonly used for binary classification tasks. N is the number of scalar values in the model output,ŷ i denotes the 190 probability that the ith example is a WCB, and y i is the corresponding target value (WCB yes or no).
The weights and biases are optimized using at most 20 training iterations (called epochs in the context of machine learning) with batch sizes ranging from 8 to 64. The initial learning rate of the Adam optimizer is set to 1 × 10 −3 and is reduced by a factor of 0.1 when the binary crossentropy does not improve over the course of 5 consecutive iterations. Further, the training stops early if the binary crossentropy does not improve in 10 consecutive iterations.

Model setting optimization
In this section, we evaluate the performance of different models setups for the validation period (1 January 2000 to 31 December 2004) in order to find the optimal setting of parameters. A particular focus is on the hyperparameters dropout fraction and batch size. Further, we evaluate the effect of omitting the WCB climatology as predictor (4block_16filters in Fig. 2), adding an additional fifth block to the UNet CNN (5block_WCBCLIM_16filters in Fig. 2), and increasing the number of initial filters 200 from 16 to 32 (4block_WCBCLIM_32filters in Fig. 2). We try all 102 possible combinations of the parameters listed in Table 3.
In order to find the optimal configuration, we assess the model performance on the entire set of validation data in terms of the Matthews Correlation Coefficient (Matthews, 1975, MCC). The MCC is a balanced skill metric for binary verification tasks, even if the two classes are imbalanced as is the case with WCBs which occur at some grid points only in 1% of the cases. The To begin with, we only focus on the standard models and their median MCC values (black, gray, blue and red lines Fig. 2) as 215 a function of the decision thresholds. For all three WCB stages, WCB inflow (Fig. 2a), ascent (Fig. 2b), and outflow (Fig. 2c), the median MCC values for the different number of filters and blocks exhibit sensitivities of less than 5%. Even when not using the running mean WCB climatology as a fifth predictor, the median MCC does not decrease markedly (black line in This result inspired us to test an additional CNN model configuration for the WCB stages of inflow and outflow: As outlined 225 in the introduction, the inflow of a WCB precedes the ascent stage and the outflow lags the ascent stage. Accordingly, we decided to account for this relationship by replacing the fifth predictor of the standard models for inflow and outflow (30-d running mean WCB climatology) with the conditional WCB ascent probability predicted by the optimal WCB ascent model at a certain time-lag. Here we decided for a time-lag of 24 hours because the model is to be applied to forecasts of the subseasonal to seasonal prediction project data base (Vitart et al., 2017), which are available 24 hourly. Thus, the fifth predictor is occurrence frequency, respectively (Madonna et al., 2014). Further, we compare the reliability and skill of the CNN models and the logistic regression models of Quinting and Grams (2021).

Reliability
The average agreement between the observed WCB frequencies and the modelled WCB probabilities for DJF and JJA is shown as reliability diagrams in Fig. 3. For this purpose, the predicted probabilities are divided into 19 regular bins from 0.05 to 0.95 and plotted against the observed frequencies in these bins. The reliability curve of a perfect model would follow the solid 250 diagonal line in Fig. 3. A model overestimates (underestimates) the observed WCB frequency when the model's curve lies below (above) the solid diagonal line.
For DJF and JJA, the CNN models tend to slightly overestimate the observed WCB inflow frequency at modelled probabilities greater than 0.3 and 0.1, respectively (Fig. 3a). The overestimation is more pronounced in JJA. Still, the reliability curve lies inside the ±10% range of a perfect model. The CNN models clearly outperform the logistic regression models for 255 modelled probabilities greater than 0.3 in JJA and greater than 0.5 in DJF. As for the logistic regression models in Quinting and Grams (2021), the CNN models perform best for WCB ascent (Fig. 3b). For DJF and JJA, the reliability curve nearly matches the solid diagonal line of the perfect model. For WCB outflow, the reliability curves lie within the ±10% range of the perfect model during DJF and JJA but slightly above the perfect model (Fig. 3c). This indicates that the CNN model underestimates the observed frequencies. Still, the CNN model is more reliable than the logistic regression models which overestimate the 260 observed frequencies considerably for modelled probabilities greater 0.5 and 0.7 during JJA and DJF, respectively.

Model bias
The evaluation of the bias and skill (see Section 4.3) of the CNN models requires categorical/deterministic predictions (WCB yes or no). Therefore, the probabilistic CNN model predictions need to be categorized by applying a decision threshold above which a modelled probability is considered as WCB inflow, WCB ascent, or WCB outflow. Following (Quinting and Grams,265 2021), the decision threshold is chosen to be gridpoint-dependent and to minimize the climatological bias of the models at each grid point. Here, bias is defined as the difference between the trajectory-based climatological WCB frequency and the CNN- 2. We then loop over a decision threshold 0 < p W CB < 1 at intervals of 0.01 above which the conditional probability predicted by the CNN is set to 1. 4. For each day of the year and each grid point, we determine the optimal p W CB which is the one that produces the lowest bias between the trajectory-based and CNN-based WCB climatology.
The purpose of calculating a decision threshold p W CB for each day of the year is to account for seasonal variations of the 280 modelled probabilities. For WCB inflow, ascent, and outflow the decision threshold is highest in winter and lowest in summer (not shown).
In the following, we analyse to what degree the climatological occurrence frequency of WCB inflow, ascent, and outflow based on gridded trajectories is represented by the CNN-based approach (Fig. 4) in the testing period. By design of the decision threshold above which a predicted probability is considered as WCB, the observed frequency and that of the regression model 285 coincide well. During DJF, the model bias for WCB inflow, ascent, and outflow is less than 2% at all grid points. Also in JJA, the biases are of similar magnitude except for the region related to the Asian monsoon over Northern India (Figs. 4b,d,f). Here, the frequency bias reaches up to 3%.

Model skill
The skill of the CNN models is quantified in terms of the MCC during the testing period. For WCB inflow and during DJF,  (Fig. 5d), but smaller than for WCB 300 inflow and outflow. Still, the mean MCC for WCB ascent is the second highest of the three WCB stages (cf. Fig. 2).
The MCC for WCB outflow is lower than for WCB inflow and WCB ascent. Highest MCC values of 0.6 to 0.7 are found over eastern North America, the Labrador Sea, and over the western to central North Pacific (Fig. 5e). Thus, as for WCB inflow and ascent, the regions of highest model skill are collocated with regions exhibiting the highest climatological WCB outflow frequency. The relative improvement compared to the logistic regression models is particularly pronounced in the regions with 305 the climatologically lowest WCB outflow occurrence frequency. The absolute value of the MCC exceeds that of the regression models by more than 0.25 which corresponds in some locations to a relative increase of more than 125% (Fig. 5f). During boreal summer (JJA) the frequency of WCB inflow, ascent, and outflow is considerably lower than during DJF. WCB inflow occurs most frequently over central North America, the western North Atlantic, and over East Asia to the central North Pacific (Fig. 6a). A further maximum north of India is related to the Asian monsoon. On average the MCC for WCB inflow 310 tends to be lower during JJA than during DJF. Still, in regions with a climatological frequency of more than 2%, the MCC reaches values of 0.4 to 0.6 which corresponds to a relative improvement of 50% over the western North Pacific and more than 125% over central North America compared to the logistic regression models.
The MCC values for WCB inflow and WCB ascent are of similar magnitude. Although the climatological frequency is only about 2% in the storm track regions of the North Atlantic and the North Pacific, the MCC still exceeds 0.5 in these areas 315 (Fig. 6c). The absolute value of MCC improves by more than 0.15 compared to the regression models in most regions which corresponds to a relative increase of 25-50% (Fig. 6d).
Also in JJA, the MCC is lower for WCB outflow than during DJF. In regions where the WCB outflow occurrence frequency exceeds 2%, the MCC mostly exceeds values of 0.4 (Fig. 6e). However, in regions of climatologically low WCB outflow occurrence frequency the MCC is on the order of 0.2 to 0.3 which is most likely related to the comparably small training 320 sample size in those areas. Especially in high latitudes over the North Pacific and over the entire North Atlantic, the MCC improves by more than 100% compared to the logistic regression models at most grid points.

Identifying the most relevant dynamical footprint of WCBs by interpreting feature importance in the CNN models
Prior to the logistic regression model development, Quinting and Grams (2021) identified the best predictors for the three stages of WCB inflow, ascent, and outflow via stepwise forward selection. Here, we take the opposite approach and use the  (Table 1) for each WCB stage is sampled at a random date from the testing period. The remaining four predictors are sampled at the exact same date. This process is repeated for predictors 2 to 5 so that the skill of 5 different predictions in terms of the MCC can be compared. The larger the decrease in MCC, the higher the importance of the corresponding predictor. Though the normalization of the input data should reduce the effect of seasonal variations, we still take the random dates from a window of 30 days around the actual date.

335
According to the permutation feature importance, the most important predictor variables for WCB inflow during DJF is the conditional probability of WCB ascent 24 hours later (referred to as MIDTROP in Figs. 7a,b). The average skill decrease in terms of the MCC is twice as high as the decrease when perturbing the 1000-hPa moisture flux convergence and 850-hPa meridional moisture flux (Fig. 7a). It is only at the edges of the climatologically most active WCB inflow regions where moisture flux convergence and the meridional moisture flux are identified as most important predictors. Also during JJA, the 340 conditional probability of WCB ascent with a time-lag of 24 hours is the most important predictor for WCB inflow (Fig. 8a,b).
It is followed by the 1000-hPa moisture flux convergence and the 850-hPa meridional moisture flux. The 700-hPa thickness advection as well as the 500-hPa moist PV are of minor importance in both seasons. That the conditional probability of WCB ascent with a time lag of 24 hours is the most important predictor for WCB inflow is in line with the original trajectory-based definition where a temporal relation between the two stages is given by definition. The comparably high importance of variables 345 related to moisture flux is in line with the findings for the logistic regression models but also with the general concept of WCB inflow which is typically characterized by strong moisture flux convergence and bands of high water vapor transport (Wernli and Davies, 1997;Dacre et al., 2019).
For WCB ascent, a permutation of the 850-hPa relative vorticity leads to the strongest decrease in model skill (Figs. 7c,8c). In particular over the western North Pacific and the western North Atlantic the MCC decreases to values near 0 when 350 perturbing the relative vorticity field (Figs. 7d, 8d). These findings are the same for DJF and JJA. During WCB ascent, relative vorticity is redistributed via stretching so that cyclonic vorticity increases in the lower troposphere (Binder et al., 2016). Thus, the overall importance of relative vorticity for WCB ascent is in line with physical considerations. The decrease of the MCC for WCB ascent due to permutations of the 300-hPa thickness advection, 850-hPa meridional moisture flux, and 700-hPa relative humidity exhibits similar values. The seemingly least important predictor is the climatological WCB occurrence frequency 355 with a median decrease in MCC close to zero. However, one should keep in mind that the random dates are taken only from a window of 30 days around the actual date. By doing so, the importance of the climatological WCB occurrence frequency for predicting the seasonal cycle of WCB activity is likely underestimated.
The almost equally most important predictor variables for WCB outflow during DJF are the conditional probability of WCB ascent 24 hours before and the 300-hPa relative vorticity (Figs. 7e,f). Interestingly, over the North Pacific the WCB 360 ascent predictor is most important at nearly all grid points while over the North Atlantic the 300-hPa relative vorticity is the most important predictor at about half of all grid points. During the summer months, the 300-hPa relative vorticity becomes less important (Fig. 8e). At nearly all grid points the conditional probability of WCB ascent is the most important predictor (Fig. 8f). It is only in regions with a climatologically lowest WCB frequency where the 300-hPa relative vorticity and the 300-hPa irrotational wind speed are still the most important predictors. The importance of the conditional probability of WCB 365 ascent with a time lag of -24 hours coincides with the trajectory-based WCB identification where this relation is given by definition. The importance of the 300-hPa relative vorticity is most likely related to the fact that WCB outflow is most often found in upper-tropospheric anticyclonic ridges (e.g., Pomroy and Thorpe, 2000;Grams et al., 2011).

Conclusions
In this study, we introduce a UNet CNN that aims to identify WCB footprints from Eulerian fields which are available from 370 NWP and climate models. For each of the WCB stages of WCB inflow, ascent, and outflow a separate CNN model is developed. The CNN-based framework is trained for the Northern Hemisphere on 20 years of gridded trajectory-based WCB data derived from ERA-Interim using the same physical predictors as in Quinting and Grams (2021). The climatological occurrence frequency of WCB inflow, ascent, and outflow serve as an additional predictor for the respective WCB stage. With these predictors, the UNet standard models consisting of 4 layers with an initial set of 32 filters yield the best results for WCB ascent.

375
Sensitivities to the hyperparameters dropout fraction and batch size are found to be small. Given that the CNN model performs best for the WCB ascent stage, we make use of the temporal succession of the three WCB stages to predict WCB inflow and outflow. For WCB inflow and outflow, the fifth predictor in the standard models is replaced by the conditional probability of WCB ascent predicted with CNN model at a time-lag of 24 h and -24 h, respectively. With this approach, the improvement of the CNN models for inflow and outflow is considerably larger than any variations of the hyperparameters so that we consider 380 these models as optimal. The importance of the time-lagged conditional probability of WCB ascent as a predictor for WCB inflow and outflow is confirmed by the model-agnostic permutation feature importance. Further important predictors related to moisture flux for WCB inflow or relative vorticity for WCB ascent and outflow are in line with previous trajectory-based studies and highlight the capability of the CNNs to identify WCBs based on dynamical features that are in agreement with the general concept of WCBs.

385
The CNN models for WCB inflow, ascent, and outflow are evaluated for an unseen testing period covering 1 January 2005 to 31 December 2016. For all three WCB stages, the models' reliability is within the 10% interval around the reliability of a perfect model. The models reach a similar reliability during boreal summer and winter. Most notably, the models outperform the logistic regression models of Quinting and Grams, 2021 which tend to overestimate the frequency of WCBs at any of the three stages. The modelled probabilities are converted to dichotomous predictions by determining a decision threshold such 390 that the climatological bias of the models is minimized. For all three stages, the models reach the highest skill in terms of the Matthews correlation coefficient in the midlatitude storm tracks regions, i.e., in regions where the climatological occurrence frequency of WCBs is highest. Compared to the logistic regression models, the relative skill improvement reaches up to 100%.
Our study demonstrates that deep learning allows transferring a sophisticated diagnostic, which relies on high resolution data and considerable computing time, into a reliable and almost unbiased tool, which works on coarser data with significantly 395 less computing time. This opens promising pathways how to use machine learning for process-oriented studies on big data sets such as ensemble NWP reforecasts or climate model projections that were so far inaccessible due to diagnostic constraints.
For example, the CNN-based WCB models can be used to investigate the representation of the climatological frequency of WCBs in these data sets but also of the link of WCB activity and midlatitude synoptic systems such as cyclones or blocking.   represent the reliability of the CNN models and the red curves represent the reliability of the logistic regression models . Modelled probabilities (x-axis) and observed frequencies (y-axis) are binned into 19 bins based on the modelled probabilities. The perfect modelled probability and a ±10% interval about the perfect model is shown by the solid and dashed gray diagonals, respectively.