These authors contributed equally to this work.
We propose a new deep-learning architecture HIDRA2 for sea level and storm tide modeling, which is extremely fast to train and apply and outperforms both our previous network design HIDRA1 and two state-of-the-art numerical ocean models (a NEMO engine with sea level data assimilation and a SCHISM ocean modeling system), over all sea level bins and all forecast lead times. The architecture of HIDRA2 employs novel atmospheric, tidal and sea surface height (SSH) feature encoders as well as a novel feature fusion and SSH regression block. HIDRA2 was trained on surface wind and pressure fields from a single member of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric ensemble and on Koper tide gauge observations. An extensive ablation study was performed to estimate the individual importance of input encoders and data streams. Compared to HIDRA1, the overall mean absolute forecast error is reduced by
Global mean sea level rise, related to anthropogenic climate change
The problem of sea level forecasting on the northern Adriatic shelf (see Fig.
Topography and bathymetry of the Adriatic region. Abbreviations used on the map are as follows: TS – Trieste, KP – Koper, GoT – Gulf of Trieste, VE – Venice, N Adr shelf – northern Adriatic shelf, S Adr Pit – southern Adriatic pit, OT – Otranto Strait. The direction of the scirocco is marked with the red arrow. The image was created by the authors based on EMODnet bathymetry data, available at
The two distribution tails, however, represent two dynamically separate problems. High sea levels always occur due to intense pressure lows and corresponding strong winds during cyclonic activity in the basin, while extremely low sea levels typically occur through a combination of prolonged periods of high atmospheric pressure and spring tides.
Equilibrium ocean response to slow changes in air pressure is captured by the inverse barometer effect, while the wind setup of the sea level occurs through the vertical momentum flux across the air–sea interface. The dominant winds in the Adriatic basin are the southeasterly scirocco, blowing along the major axis of the basin (see Fig.
In this paper we will adhere to the terminology proposed in
The key difficulty of sea level forecasting in the Adriatic basin arises from the high sensitivity of the total sea level to the phase lag between the gravitationally generated tides (independent of meteorological forcing) and meteorologically generated basin seiches (independent of gravitational forcing). This sensitivity can translate reliable atmospheric forecasts with very limited errors in the timing and trajectory of the cyclone into substantial errors in the sea level forecast.
Probabilistic ensemble forecasting of sea level envelopes with error variance estimation
Machine learning has thus been explored by several research groups for single-point sea level forecasting.
The early approaches
In this paper we propose HIDRA2, our latest attempt at sea level forecasting using deep learning. In contrast to the previous version, HIDRA2 presents a novel architecture with new atmospheric, tidal and sea surface height (SSH) feature encoders as well as a novel feature fusion and SSH regression block. An additional conceptual novelty is that HIDRA2 predicts the full SSH rather than the residual (i.e., the difference between SSH and astronomic tide), as is the case for HIDRA1. The new model extracts relevant information from different spatial locations in the atmosphere signal and predicts the SSH with a 3 d horizon at an unprecedented accuracy, outperforming HIDRA1 as well as two state-of-the-art ocean models.
The paper is organized as follows. Section
SSH observations during the period 2006–2018 were retrieved from Koper Mareographic Station (45
The tidal signal in the sea level is independent of atmospheric processes and can be computed by tidal analysis and prediction models. The tidal contribution to Koper SSH considered in this study is estimated from hourly instantaneous SSH values in 1-year segments using the UTIDE Tidal Analysis package for Python
HIDRA2 input domain and dataset. The leftmost panel depicts the ECMWF grid (white dots) and Koper tide gauge location (red circle). Three panels on the right depict snapshots of ECMWF atmospheric fields used during training.
Atmospheric input used for HIDRA2 training was retrieved from the European Centre for Medium-Range Weather Forecasts (ECMWF) Ensemble Prediction System
The evaluation input dataset for both HIDRAs and NEMO is disjoint from the training dataset (years 2006–2018) and consists of ECMWF atmospheric predictions and Koper sea levels between 1 June 2019 and 31 December 2020. This period was chosen due to challenging conditions and an unusually high incidence of floods. We use the ECMWF daily predictions, each containing 50 ensemble members with 3 d prediction lead time. The data are standardized, and the dimensionality of the atmospheric data is reduced in the same fashion as described in Sect.
Standard measures, i.e., the mean absolute error (MAE), the root mean square error (RMSE) and the model bias, are used to evaluate prediction performance. To reflect the practical suitability, we additionally calculate the prediction accuracy as a ratio between the predictions which are within 10 cm of the ground truth and all predictions. This 10 cm threshold reflects an acceptable deviation from the ground truth and was determined through discussion with the operational forecasting service at ARSO. The metrics are calculated globally by considering all prediction points as well as separately only on floods to reflect the prediction performance at these critical rare events.
To further probe the flood event prediction capabilities, we make use of the standard performance measures from the detection literature: precision Pr, recall Re and the F1 measure F1. Firstly, we need to define the flood event and then define the notion of the event being detected. Both of these have been defined in discussion with operational forecasters at ARSO. The anchor (i.e., temporal point) of a flood event is defined as the time of the local maximum in an SSH sequence above 300 cm. If the predicted flood event anchor is within a 3 h margin (before or after) from the nearest ground truth flood event anchor, it is considered a true positive TP; otherwise, it is a false positive FP. A flood event in the ground truth is considered a false negative FN if there is no matching flood event anchor in the predicted SSH. Like in the accuracy definition, the tolerance of 10 cm is applied, meaning that predictions below 300 cm are also considered to be TPs when they appear within the margin of 10 cm and that false positives with ground truth within 10 cm are ignored.
The precision and recall are then calculated as
The proposed HIDRA2 is the second generation of a deep neural model for sea surface height prediction, with HIDRA1
The HIDRA2 architecture. The Atmospheric encoder embeds the wind and pressure sequences with learnable temporal subsampling and pattern prototype matching to extract relevant features from different geographic locations and fuse them temporally into a single feature embedding.
The Tidal and SSH encoders encode the future tide evolution and the past SSH and tide observations, respectively.
All features are re-calibrated, fused with the past 72 h SSH, and regressed into the final SSH predictions by the fusion-regression block. Notation
The atmospheric data for the Adriatic basin at a given time step are represented by a
The Atmospheric encoder is composed of two stages. In the first stage (shown in Fig. Note that the number of output channels is equal to the number of different kernels used in the layer.
The first stage of the Atmospheric encoder. The input are 4 consecutive hours with two wind channels (this case) or pressure.
The second stage of the Atmospheric encoder (Fig. Note that the sizes of the kernels in the 1D convolutional layer are The fact that the number of output segments is equal to the 20 h timespan is coincidental.
The second stage of the Atmospheric encoder. Features from all time points and both wind and pressure are processed with a 1D convolution, followed by two blocks with residual connections. The last convolution reduces feature dimensionality. The variables
Both the tidal and SSH encoders use the same architecture, the only difference being the size of the encoders' input. Figure
The SSH encoder encodes a concatenation of the past SSH and tide by a 1D convolution, followed by two blocks with residual connections, max-pooling temporal reduction and convolution-based feature reduction. The variables
The Atmospheric, Tide and SSH encoders produce temporal features of different importance and size. To account for that, the features are re-calibrated by normalization with means and variances of the features calculated during training and then denormalized with learned weights and biases. The form of normalization follows the batch normalization layer
While the encoding and mixing operations extract the domain context, the explicit surface height information might not be well retained in the extracted feature vector. To re-inject this information, the obtained domain context feature vector is concatenated with the time series of past observed SSH before passing to the final regression block. The latter is composed of two fully connected layers with 584 units, SELU activations and residual connections, followed by a fully connected layer with 72 outputs for the
72 h prediction horizon (see Fig.
The fusion-regression block firstly re-calibrates the features (the C symbol), and then concatenated features are passed to a dense layer, which fuses features and reduces their dimensionality. Undistorted SSH is appended and processed with two residual blocks. The final dense layer outputs the predictions. The variable
HIDRA2 is trained end to end using mean squared error (MSE) loss between the predictions and the ground truth. We train the model using the AdamW optimizer
While there are many differences between HIDRA2 and HIDRA1, we summarize only the major conceptual ones for a clearer exposition of the contributions. HIDRA1 uses wind, pressure and 2 m temperature from ECMWF predictions, while our preliminary study showed that the new HIDRA2 architecture does not benefit from the temperature, and thus only wind and pressure are considered. HIDRA1 concatenates all atmospheric inputs at a time step and encodes them by Resnet
In this section we briefly describe two different numerical ocean modeling setups used for benchmarking HIDRA2. The two setups differ in several important respects. One is based on the NEMO ocean engine
The Copernicus Marine Environment Monitoring Service (CMEMS) product MEDSEA_ANALYSISFORECAST_PHY_006_013 (see
A barotropic setup of the SCHISM storm surge and wind–wave modeling environment
Both NEMO and SCHISM sea levels, denoted here jointly as
Each day, the value of
As noted in Sect.
A valid hypothesis can be made that predicting the residual (i.e., the difference between the full SSH and the tide) might be more beneficial than predicting the full SSH, since the network parameters might be better utilized by focusing only on the part of SSH not affected by the astronomic tide. In fact, HIDRA1
A possible explanation for this somewhat surprising behavior could perhaps be related to nonlinear interactions between tides and storm surges: both tides and storm surges modify local water depth, which impacts their own barotropic wave propagation speeds and topographic amplifications, which ultimately define the onset time and the amplitude of any coastal flood in Koper. Such interactions are negligible during calm conditions, but they do play a role during stormy periods
An ablation study was executed to evaluate the importance of individual encoders and input data types. To estimate encoder importance, we removed each of the encoders in a separate experiment (and withheld all of their input data; see Fig.
Performance of ablated HIDRA2 designs evaluated over all sea level bins (the
The following encoder ablations were performed.
Removal of the Atmospheric encoder (HIDRA2 Removal of the Tidal encoder (HIDRA2 Removal of the SSH encoder (HIDRA2
The results in Table
Two further ablations were then performed regarding the data types of the sea level input data (the SSH and the tide; see Fig. Removal of the tidal input to the SSH encoder (HIDRA2 Removal of the SSH input (HIDRA2
The results in Table
We observe a similar situation when removing the atmospheric and SSH/tide feature re-calibration in the fusion-regression block (HIDRA2
Mean absolute error (MAE) of ablated HIDRA2 designs evaluated over all sea level bins. Vertical red line indicates the flooding threshold in Piran. Performances of all the models were evaluated on a 1 June 2019–31 December 2020 dataset, which is completely independent of the training data.
Figure
HIDRA2 is compared with HIDRA1
The overall prediction performance and the performance restricted to storm events are shown in Table
Performance of HIDRA1, HIDRA2, NEMO and SCHISM over all sea level bins (the
For detailed analysis, we visualize the MAE values of the tested methods with respect to the sea level heights in Fig.
HIDRA, NEMO and SCHISM performances with regard to sea level bins (grey histogram in the bottom layer). The coastal flood threshold is marked with a vertical red line. Performance of all the models was evaluated on a 1 June 2019–31 December 2020 dataset, which is completely independent of the training data.
We next analyzed how the prediction lead time affects the prediction errors.
Figure
MAE score of the HIDRA, NEMO, and SCHISM models with regard to prediction lead time (between 1 and 72 h). Performance of all the models was evaluated on a 1 June 2019–31 December 2020 dataset, which is completely independent of the training data.
To investigate the spectral properties of the modeled and observed SSH time series, we computed spectral densities of the HIDRA2, HIDRA1, NEMO and SCHISM predictions. Unless otherwise stated, all time series analyzed in this section were obtained by concatenating (in time) the first 24 h of each daily HIDRA2, HIDRA1 and NEMO 3 d forecast. Spectral densities (shown in Fig.
Spectral density of SSH time series from the Koper tide gauge, HIDRA2, and HIDRA1 compared with NEMO
Figure
It appears that HIDRA2 is capable of generating seiche-like behavior in its predictions. Spectral density, however, discards the temporal component of the signal, and adequate spectral density in the (21.5 h)
Historic Adriatic storm tide events are used to qualitatively compare the HIDRA2 performance with the state of the art. The storm tides in question occurred during November and December 2019 and were of historic proportions by any criterion. The Slovenian coast was flooded over 10 times in a single month, and sea levels in Venice were among the highest ever observed. Furthermore, the events in November 2019 turned out to be difficult to model due to the formation of a transient and very localized low pressure over the Gulf of Venice, which went unresolved in most models
Comparison of the HIDRA2 ensemble
Same as Fig.
Figure
The peak on 13 November is slightly better predicted by maximum members of both HIDRAs than by NEMO or SCHISM, with HIDRA2 exhibiting a somewhat lower forecast spread than HIDRA1. Apart from this peak, all the models captured the sea level variability quite well, which is in itself an implicit testament to the high skills of ECMWF atmospheric products.
Comparison of the HIDRA2 ensemble spread on forecast days 1
Comparison of total Koper SSH observations and forecasts
The floods of December 2019 are another example of HIDRA2's superior performance over HIDRA1 and both ocean models in Koper. SSH observations and predictions in Koper during this period are depicted in Fig.
To inspect the behavior of the ensemble forecast spread, three time series were created from daily (72 h long) forecasts during the evaluation time window between 1 June 2019 and 31 December 2020. The first time series was constructed by concatenating each first day (i.e., 1–24 h of forecast) from each of the daily forecasts, thus containing predictions with lead times of 1–24 h on each respective day in the evaluation time window.
The second and third time series were constructed by concatenating 25–48 h (49–72 h) of forecast on each respective day in the evaluation time window. All three time series for the December 2020 floods are shown in Fig.
To investigate the performance in geophysically relevant energy bands, we band-pass-filtered the observed and predicted SSH signals in energy bands, centered around four important periods: semi-diurnal tide (12 h period), diurnal tide (24 h period), fundamental basin along-axis eigenmode (21.5 h period) and first excited along-axis eigenmode (10.9 h period).
Same as Fig.
Although incomplete, this SSH decomposition allows qualitative estimation of the excitation intensity of the basin eigenmodes during a particular storm and also helps to qualitatively assign forecasting errors to specific frequency bands. However, since the amplitudes of the filtered signals in Fig.
We applied a fifth-order Butterworth band-pass filter with a sampling rate of (1 h)
Identical analysis and related figures for the SCHISM model are available in the Supplement to this paper. They illustrate that SCHISM exhibits very solid performance in the seiche energy bands.
All the models exhibit an underestimation of the amplitude but are otherwise in phase with the observations in the
In any case, since both tidal bands and the ground state seiche are reliably predicted by all the models, the reason for the forecasting errors must lie in the higher-frequency bands with periods below 10.9 h. This seems consistent with the occurrence of highly transient and localized low pressure over Venice mentioned in
Similar remarks can be made regarding the December 2019 coastal flooding depicted in Fig.
Figure
In the
This study presents a deep-learning-based sea level model, HIDRA2, suitable for operational sea level ensemble modeling due to its speed and accuracy. This work is a conceptual continuation of our previous attempt at sea level forecasting
Performance is analyzed during several historic storms. Spectral decomposition of the total sea level signal into bands centered around tides and basin seiches is carried out to assign modeling errors to specific energy bands of the predicted sea levels. HIDRA2 consistently exhibits high skill in exciting the ground state Adriatic basin seiche at the appropriate time and with the appropriate phase and amplitude.
HIDRA2 is a good example of how the entanglement of deep learning and geophysics may lead to reliable and numerically cheap models that are able to mimic complex physical phenomena on the level of the best numerical physical models. Nevertheless, several extensions could be additionally explored. One possible extension is data ingestion from several tide gauges along the Adriatic coast and verification of whether the prediction accuracy at individual locations improves in such a multi-point prediction setup. Another extension is the inclusion of real-time in situ measurements such as synoptic observations and satellite scatterometer and wind measurements. It would be interesting to migrate HIDRA2 to other Mediterranean locations or other semi-enclosed basins like the Baltic Sea, the Red Sea or the Chesapeake Bay to investigate its generalization properties. These will be the objects of our future research.
Implementation of HIDRA2 and the code to train and evaluate the model are available in
the GitHub repository:
The supplement related to this article is available online at:
MR was the main designer of HIDRA2 and reimplemented HIDRA1 in PyTorch. MK led the machine learning part of the research and contributed to the design of HIDRA2. ML provided the geophysical background relevant for the HIDRA2 design and led the geophysical part of the research. AF and ML prepared the atmospheric and sea level training and evaluation datasets. MR, MK and ML analyzed the results and wrote the paper. All the authors contributed to the final version of the manuscript.
The contact author has declared that none of the authors has any competing interests.
Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors would like to thank Tim Toomey and Alejandro Orfila for providing SCHISM sea level reanalysis time series for the Koper location, which were used for benchmarking HIDRA2 in this work. We further thank the reviewers for taking the time to review the manuscript and for their constructive remarks which led to an improved paper. Matjaž Ličer was financially supported by the Slovenian Research Agency (research core funding no. P1-0237). Matej Kristan was financially supported by the Slovenian Research Agency (research project no. J2-2506).
This research has been supported by the Slovenian Research Agency (research core funding no. P1-0237 and research project no. J2-2506).
This paper was edited by Rohitash Chandra and reviewed by three anonymous referees.