Data assimilation cycle length and observation impact in mesoscale ocean forecasting
- CSIRO Oceans and Atmosphere, Castray Esplanade, Battery Point TAS 7004, Australia
Correspondence: Paul Sandery (firstname.lastname@example.org)
A brief examination of the relationship between data assimilation cycle length and observation impact in a practical global mesoscale ocean forecasting setting is provided. Behind-real-time reanalyses and forecasts from two different cycle length systems are compared and skill is quantified using all observations typically available for ocean forecasting. A 1-day Ensemble Optimal Interpolation (EnOI) cycle is compared to a 3-day cycle. The mean analysis increments for the 1-day system are significantly smaller, suggesting a less biased system. Comparison of mean absolute increments identifies observations have greater impact in the 1-day system. Whilst smaller mean increments and greater observation impact do not guarantee a better forecast system, analysis of 7-day parallel forecasts show that the 1-day cycle system delivers improvement in predictability, particularly for the subsurface. This improvement appears to mainly come from less biased initial conditions and suggests greater retention of memory from observations and improved balance in the model.
Cycle length in sequential data assimilating forecasting systems is an important setting that relates to dynamical scales resolved by the numerical model and the observation system. Many ocean forecasting systems, for example those described in Cummings and Smedstad (2014), Martin et al. (2007), Chassignet et al. (2009), Ferry et al. (2010) and Bertino et al. (2008), make different choices around cycle length. Shorter cycle length implies more frequent analyses and initialization of the dynamical model. This may not necessarily lead to a better forecast system. In multivariate systems, observed variables project onto unobserved variables and systems tend to perform best when model error covariances are adequately sampled and there is reasonable coverage of multiple observation types. Longer cycles favour better coverage; however, they can introduce larger analysis increments, temporal representation errors and overfitting of observational data. Bias is a fundamental problem in atmospheric and ocean forecasting affecting system performance. Bias arises within an assimilation cycle shared by issues related to the assimilation system, the model and observations. Identifying the cause of bias can be almost impossible (Houtekamer and Zhang, 2016). Mean analysis increments are sometimes used to detect model bias (Houtekamer and Mitchell, 2005; Oke et al., 2013b), and some bias correction schemes are based on this (Zhang et al., 2016; Takacs et al., 2016; Ha and Snyder, 2014). Some care must be taken when using mean analysis increments as a proxy for model bias as they are dependent on the structure of the background covariances and also contain observation bias (Dee, 2005). Furthermore, they can be approximately zero and relatively meaningless in regions of few or no observations or when large errors of opposite sign cancel out over time. Provided observation coverage is sufficient, observation bias is minimal and background error covariances are physically meaningful, well-sampled mean analysis increments can be a reasonable indication of model bias. Dee and Da Silva (1998) illustrated that mean analysis increments tend to underestimate forecast bias. This is because they depend on the rate and period of growth of perturbations, i.e. model error growth, so they are forecast lead-time- and cycle-length-dependent. This questions the use of mean analysis increments to estimate and compare the bias of forecasting systems with different cycle lengths. It appears that the cycle length, however, should be based on that which is best for predictability. Aspects of this are touched on by running twin experiments with a global ocean forecasting system using cycle lengths of 1 and 3 days. The system used in this study is the current Bureau of Meteorology Ocean Model Analysis and Prediction System (OceanMAPS) version 3. Previous versions of this system are documented in Brassington (2013) and Brassington et al. (2007). OceanMAPS is global eddy resolving, forced by numerical weather prediction (NWP), runs on a 3-day data assimilation cycle and carries out 7-day forecasts. It is able to constrain aspects of the mesoscale variability to the available real-time observations. It produces forecasts of synoptic features of the ocean circulation, such as the locations of eddies and fronts, daily changes in sea surface temperature and mixed layer depth, wind-driven surface flows, and coastal trapped waves. As typical for ocean forecast systems like OceanMAPS, the largest errors tend to occur in regions of most rapidly growing dynamical instabilities (O'Kane et al., 2011), such as western boundary currents and along the Antarctic Circumpolar Current (ACC). Some of these features and the characteristic spatio-temporal scales resolved by the model are captured in Fig. 1, which presents a snapshot of sea-level anomaly (SLA) for 9 September 2013. The behind-real-time forecasted SLA is shown with unassimilated forward independent super-observations for the same day from the 1-day cycle system. Information regarding the use of forward super-observations for forecast verification, as used in this study, can be found in Sakov and Sandery (2015) and Sandery and Sakov (2017).
The Geophysical Fluid Dynamics Laboratory (GFDL) Modular Ocean Model version 4.1 (MOM4p1) (Griffies et al., 2009) is used. This is a Boussinesq three-dimensional primitive-equation volume-conserving ocean model. The OceanMAPS grid has 0.1∘ horizontal resolution and is the same as the Ocean Forecasting Australia Model version 3 (OFAM3) (Oke et al., 2013a), which is based on bathymetric data from Smith and Sandwell (1997). The grid has 51 vertical levels, and the top cell approximates quantities at 2.5 m depth with the average resolution in the upper 200 m being approximately 10 m. The physical model settings include the use of a fourth-order Sweby advection method and a scale-dependent isotropic Smagorinksy biharmonic horizontal mixing scheme as described in Griffies and Halberg (2000). The General Ocean Turbulence Model (GOTM) κ–ϵ scheme is used for vertical mixing. Note that tides are not explicitly modelled; rather, a parameterization of tidal mixing is implemented using the scheme of Lee et al. (2006).
a One-day system. b Three-day system. c Total number of super-observations used to verify the forty-four 7-day forecasts shown.
Initial conditions for both systems are the same and taken from the multi-year OFAM3 spin-up for 1 January 2012. The 1 and 3-day cycle systems are spun-up with data assimilation over a 1-year period to 1 January 2013. Hindcasts are continued throughout 2013, and a series of 7-day forecasts, 3 days apart, with identical base dates as illustrated in Fig. 2, are carried out from 3 January 2013. The forecast experiments were done behind real time; therefore, observations in the 12–24 h prior to forecast base time were available to both systems, whereas in practice they would not be available in this period in a real-time system. The model is forced by 3-hourly prescribed surface fluxes of momentum, heat and salt from the Bureau of Meteorology operational global NWP system version 1, which is known as ACCESS-G APS1 (Australian Community Climate and Earth System Simulator). For data assimilation the EnKF-C software (Sakov, 2015) is used in Ensemble Optimal Interpolation (EnOI) (Evensen, 2003) mode. The analysis equation and background error covariances can be written as
where xa and xf are analysis and forecast state vectors respectively; y is an observation vector; H is a linear observation operator, i.e. H=∇ℋ(x), where ℋ is a linear affine observation operator; B is background error covariance; R is observation error covariance; A represents ensemble anomalies; m is ensemble size and T denotes matrix transposition. xf is taken to be an instantaneous model state, whereas is a 1-day mean and 3-day mean in the respective systems. The system uses no nudging or incremental analysis updating (Ourmières et al., 2006); rather, the model is directly initialized to the analysis. This approach allows the model to run the complete length of each cycle as a dynamical forecast without being influenced by forcing from nudging terms in the model equations. It also includes any initialization shock from imbalance in the analysis in order to assess the impact of this on forecasts. B is based on a 144-member ensemble of intra-seasonal (1-day minus bimonthly mean) anomalies generated from an 18-year run of OFAM3. A source of time filtering is implicit in the innovation vector from the fact that the super-observations tend to represent averages over the time window, particularly for observations with relatively larger coverage, such as sea surface temperature (SST). An asynchronous 3-day cycle FGAT (first-guess appropriate time) system was not compared with the 1- or 3-day cycle systems as FGAT did not provide any improvements over the synchronous 3-day cycle. Mean increments and forecast errors from FGAT were comparable to the 3-day synchronous cycle (not shown).
Both the 1- and 3-day systems assimilate the same original observations only once. The following observations are converted to super-observations weighted by inverse error variance. Altimetric SLA is taken from the Radar Altimeter Database System (RADS) (Schrama et al., 2000) using tide, mean dynamic topography and inverse barometer corrections. SLA observations are limited to water depths greater than 200 m. SST retrievals from the NAVOCEANO (May et al., 1998) and WindSat (Gaiser et al., 2004) databases are used. All available in situ temperature and salinity observations on the Global Telecommunications System (GTS) are used. These include Argo profiles (Roemmich et al., 2009), conductivity–temperature–depth (CTD) and eXpendable BathyThermograph (XBT) profiles. The EnOI systems are run in a cycle scheme that centres the observation window as shown in Fig. 2. The amount of super-observations generated by the system from the original observations for the 1-day system is larger than the 3-day system. The total number of super-observations used in 2013 is shown in Table 1. In the data assimilation, a 250 km localization radius is used for all observation types. The mean sea level from OFAM3 (Oke et al., 2013a) is used for the model's mean dynamic topography to assimilate along track SLA observations.
Global forecast innovation errors for the 1-year behind-real-time period for 2013 are provided in Table 1. These are based on forward unassimilated observations, which can be regarded as independent. The 1-day cycle benefits statistically from a shorter forecast lead time. These errors suggest an improvement in performance in constraining SLA, SST, and subsurface temperature and salinity. In order to determine whether this result is only dependent on forecast lead time, a series of 44 parallel 7-day forecasts using identical base dates from 3 January 2013 are analysed. These forecasts are compared to unassimilated observations. The global 7-day mean forecast errors are shown in Table 2, and Table 3 repeats this for the Tasman Sea region. It is interesting to note that whilst mean absolute deviation (MAD) global forecast errors are marginally smaller in the 1-day system, mean forecast bias is more significantly reduced for SLA, SST and subsurface temperature. Figure 3 shows the global MAD forecast error growth as a function of lead time. Note that in order to ensure genuine forecasts are made, the model is propagated to the end of the respective observation window in both systems, which is the position of the star in Fig. 2. Daily mean forecast fields are saved, and these are compared to the observations. For day zero, statistics are included that represent the errors in the initial conditions, and the observation window partially overlaps half of this day in both systems, so the statistics for day zero cannot be regarded as independent. The results suggest the 1-day system is better overall as a forecast system with improvements in lead time of about 1 day in surface variables and up to 7 days in subsurface variables. The errors for salinity are relatively high for both systems as no restoring to salinity is used; however, the relative improvement is apparent.
The mean analysis increments for SST and SLA are shown in Fig. 4. Three key features emerge regarding this estimate of model bias in the mean increments. There is an equatorial eastern Pacific Ocean cold bias, a Southern Ocean high-latitude warm bias, and mesoscale warm and cold biases in the western boundary current and ACC regions. Without speculating on the source of these systematic model errors, it is noted that the first two aforementioned bias features have been detected in the CSIRO Climate Analysis Forecast Ensemble (CAFE) System, which is a configuration of the GFDL coupled model version 2.1 (CM2.1) run under an ensemble Kalman filter data assimilation framework. Figure 4 shows that mean increments are about one-third smaller in the 1-day than the 3-day system, which can be expected for approximate linear error growth. The spatial patterns are very similar, with the main difference being amplitude. Another way to compare increments over a period of time is to calculate the mean absolute increment (MAI) (Fig. 5). This is done in the following way. In each 3-day period the 1-day increments are summed and then the absolute values calculated. The mean of the absolute values over the 1-year period are then calculated. The MAI for the two systems is only directly comparable if the forecast error growth is linear. The difference in mean increments between the two systems suggests this; however, error growth in the two systems is largest on the first day (as seen in Fig. 3) and becomes mainly linear after this. Regardless, the differences in spatial distribution of MAI for SLA and SST, shown in Fig. 6, indicate the 1-day system has a generally larger MAI. It can bee seen there is a greater impact from the observing system. The 1-day system projects more information from observed variables into unobserved variables through the background error covariances due to the relatively smaller observation coverage per analysis. For instance, in situ observations from the Tropical Atmosphere Ocean–Triangle Trans-Ocean Buoy Network (TAO-TRITON) moored array in the equatorial Pacific Ocean produce a larger MAI on SLA and SST in the 1-day system. It is also evident that the 1-day system has a larger MAI in the western boundary currents and ACC. Figure 6 shows, as expected, that SST projects more into SLA in the 1-day system. SST observations in the 1-day system appear to be having a greater impact in the regions of fastest growing dynamical instabilities. Interestingly, the relatively smaller MAI for SST in the 1-day system in the Inter-Tropical Convergence Zone (ITCZ), in the tropical warm pool in the western Pacific Ocean and at high latitudes in the Southern Ocean indicates the observing system is having less impact in these areas in the 1-day system.
Data assimilation typically injects energy into a forecast model as the observed fronts can be sharper than what can be supported by the model. In each cycle we usually see a jump in total kinetic energy with subsequent diminishing until the end of the forecast. This can be caused by factors such as insufficient horizontal and vertical resolution and imbalances in the analysis. Figure 7 shows total kinetic energy at 6-hourly temporal resolution. Here it can be seen that data assimilation in the 1-day system renders the state at a higher kinetic energy level, with smaller-amplitude temporal fluctuations between cycles. The latter reflects the smaller increments per cycle in the 1-day system; however, the larger kinetic energy state indicates that more energy is retained in the mesoscale eddies, which indicates that the gradients in SLA are maintained closer to observations. The larger MAI for SLA and SST in the 1-day system in the western boundary current and ACC regions reflects that observations are having a larger impact in these regions. The total kinetic energy dissipation for both systems in 2013 was calculated by summing the dissipation within each cycle and removing the trend. The 1-day system total kinetic energy dissipation is 8.4×1018 J, and that for the 3-day system is 9.6×1018 J. The relative total kinetic energy dissipation, estimated by subtracting the mean dissipation from the respective systems, shows that the 1-day system has approximately 17 % less relative kinetic energy dissipation than the 3-day system, suggesting it is more effective at preserving SLA gradients and may be more dynamically balanced.
Global errors from a set of 44 parallel 7-day forecasts over a 1-year period in 2013 showed the 1-day cycle system delivered improvements in predicting sea surface temperature, sea-level anomaly, subsurface temperature and salinity. The difference in mean absolute increments between the two cycle length systems indicated that the same observations had a greater impact on the 1-day system, with a larger degree of observed variables projecting onto unobserved variables. Greater observation impact does not necessarily lead to an improved forecast system as overfitting observations can produce dynamical imbalances, which can have deleterious effects on forecasts. The results, however, indicate that the 1-day cycle takes greater advantage of the observations and, compared to the 3-day cycle, is less biased in initial conditions and forecasts. This also suggests that the background error covariances are a reasonable estimate of model error. With the shorter cycle length, data assimilation introduces a larger amount of kinetic energy from the observations into the state, bringing the model closer to a realistic representation of the ocean's kinetic energy. The 1-day cycle introduced a larger amount of information from the observations into the model with more frequent smaller adjustments at finer scales. The overall improvement in predictability, particularly in the subsurface, suggests greater retention of memory from observations and improved balance in the model. It is noted that, whilst an overall improvement in global performance was detected, in some regions the 1-day scheme may not perform better than the 3-day system. The results are a practical example of the influence of cycle length in global mesoscale ocean forecasting with the current observation network. The 1-day cycle is closer to asynchronous data assimilation and appears to be an improvement over the first-guess appropriate time (FGAT) approach (Cummings, 2005; Lee, 2005; Atlas et al., 2011) as our FGAT experiments did not yield as significant an improvement.
The ocean model is available at https://github.com/mom-ocean/MOM4p1 (Griffies et al., 2009), and the data assimilation code can be found at https://github.com/sakov/enkf-c (Sakov, 2015). The OceanMAPS3 system and observation processing scripts are the intellectual property of the Bureau of Meteorology.
The author declares that he has no conflict of interest.
This work was carried out within the Bluelink Project with financial support
from the Australian Bureau of Meteorology, the Commonwealth Scientific and
Industrial Research Organisation, and the Royal Australian Navy. Numerical
simulations were undertaken using the Raijin supercomputer at the National
Edited by: Paul Halloran
Reviewed by: two anonymous referees
Atlas, R., Hoffman, R. N., Ardizzone, J., Leidner, S. M., Jusem, J. C., Smith, D. K., and Gombos, D.: A cross-calibrated, multiplatform ocean surface wind velocity product for meteorological and oceanographic applications, B. Am. Meteorol. Soc., 92, Supplement, https://doi.org/10.1175/2010BAMS2946.2, 2011. a
Bertino, L., Lisæter, K., and Scient, S.: The TOPAZ monitoring and prediction system for the Atlantic and Arctic Oceans, J. Oper. Oceanogr., 1, 15–18, 2008. a
Brassington, G. B.: Multicycle ensemble forecasting of sea surface temperature, Geophys. Res. Lett., 40, 6191–6195, 2013. a
Brassington, G. B., Pugh, T., Spillman, C., Schulz, E., Beggs, H., Schiller, A., and Oke, P. R.: BLUElink> Development of Operational Oceanography and Servicing in Australia, J. Res. Pract. Inf. Tech., 39, 151–164, 2007. a
Chassignet, E. P., Hurlburt, H. E., Metzger, E. J., Smedstad, O. M., Cummings, J. A., Halliwell, G. R., Bleck, R., Baraille, R., Wallcraft, A. J., Lozano, C., et al.: US GODAE: global ocean prediction with the HYbrid Coordinate Ocean Model (HYCOM), Tech. rep., DTIC Document, 2009. a
Cummings, J. A.: Operational multivariate ocean data assimilation, Q. J. Roy. Meteor. Soc., 131, 3583–3604, 2005. a
Cummings, J. A. and Smedstad, O. M.: Ocean Data Impacts in Global HYCOM, J. Atmos. Ocean. Tech., 31, 1771–1791, 2014. a
Dee, D. P.: Bias and data assimilation, Q. J. Roy. Meteor. Soc., 131, 3323–3343, 2005. a
Dee, D. P. and Da Silva, A. M.: Data assimilation in the presence of forecast bias, Q. J. Roy. Meteor. Soc., 124, 269–295, 1998. a
Evensen, G.: The ensemble Kalman filter: Theoretical formulation and practical implementation, Ocean Dynam., 53, 343–367, 2003. a
Ferry, N., Parent, L., Garric, G., Barnier, B., and Jourdain, N. C.: Mercator global Eddy permitting ocean reanalysis GLORYS1V1: description and results, Mercator-Ocean Quarterly Newsletter, 36, 15–27, 2010. a
Gaiser, P. W., St Germain, K. M., Twarog, E. M., Poe, G. A., Purdy, W., Richardson, D., Grossman, W., Jones, W. L., Spencer, D., Golba, G., Cleveland, J., Choy, L., Bevilacqua, R. M., and Chang, P. S.: The WindSat spaceborne polarimetric microwave radiometer: Sensor description and early orbit performance, IEEE T. Geosci. Remote, 42, 2347–2361, 2004. a
Griffies, S. M. and Halberg, R. W.: Biharmonic Friction with a Smagorinsky-Like Viscosity for Use in Large-Scale Eddy-Permitting Ocean Models, Mon. Weather Rev., 128, 2935–2946, 2000. a
Ha, S.-Y. and Snyder, C.: Influence of surface observations in mesoscale data assimilation using an ensemble Kalman filter, Mon. Weather Rev., 142, 1489–1508, 2014. a
Houtekamer, P. and Zhang, F.: Review of the ensemble Kalman filter for atmospheric data assimilation, Mon. Weather Rev., 144, 4489–4532, 2016. a
Houtekamer, P. L. and Mitchell, H. L.: Ensemble kalman filtering, Q. J. Roy. Meteor. Soc., 131, 3269–3289, 2005. a
Lee, H.-C., Rosati, A., and Spelman, M. J.: Barotropic tidal mixing effects in a coupled climate model: Oceanic conditions in the Northern Atlantic, Ocean Model., 11, 464–477, https://doi.org/10.1016/j.ocemod.2005.03.003, 2006. a
Lee, M.-S.: Preliminary tests of first guess at appropriate time (FGAT) with WRF 3DVAR and WRF model, Asia-Pac. J. Atmos. Sci., 41, 495–505, 2005. a
Martin, M., Hines, A., and Bell, M.: Data assimilation in the FOAM operational short-range ocean forecasting system: A description of the scheme and its impact, Q. J. Roy. Meteor. Soc., 133, 981–995, 2007. a
May, D. A., Parmeter, M. M., Olszewski, D. S., and McKenzie, B. D.: Operational processing of satellite sea surface temperature retrievals at the Naval Oceanographic Office, B. Am. Meteorol. Soc., 79, 397–407, 1998. a
O'Kane, T. J., Oke, P. R., and Sandery, P. A.: Predicting the East Australian Current, Ocean Model., 38, 251–266, 2011. a
Oke, P. R., Griffin, D. A., Schiller, A., Matear, R. J., Fiedler, R., Mansbridge, J., Lenton, A., Cahill, M., Chamberlain, M. A., and Ridgway, K.: Evaluation of a near-global eddy-resolving ocean model, Geosci. Model Dev., 6, 591–615, https://doi.org/10.5194/gmd-6-591-2013, 2013a. a, b
Oke, P. R., Sakov, P., Cahill, M. L., Dunn, J. R., Fiedler, R., Griffin, D. A., Mansbridge, J. V., Ridgway, K. R., and Schiller, A.: Towards a dynamically balanced eddy-resolving ocean reanalysis: BRAN3, Ocean Model., 67, 52–70, 2013b. a
Ourmières, Y., Brankart, J.-M., Berline, L., Brasseur, P., and Verron, J.: Incremental analysis update implementation into a sequential ocean data assimilation system, J. Atmos. Ocean. Tech., 23, 1729–1744, 2006. a
Roemmich, D., Johnson, G. C., Riser, S., Davis, R., Gilson, J., Owens, W. B., Garzoli, S. L., Schmid, C., and Ignaszewski, M.: The Argo Program: Observing the global ocean with profiling floats, Oceanography, 22, https://doi.org/10.5670/oceanog.2009.36, 2009. a
Sakov, P. and Sandery, P.: Comparison of EnKF and EnOI regional ocean reanalysis systems, Ocean Model., 89, 45-60, 2015. a
Sandery, P. A. and Sakov, P.: Ocean forecasting of mesoscale features can deteriorate by increasing model resolution towards the submesoscale, Nat. Commun., 8, 1566, https://doi.org/10.1038/s41467-017-01595-0, 2017. a
Schrama, E. J., Scharroo, R., and Naeije, M.: Radar Altimeter Database System (RADS): Towards a generic multi-satellite altimeter database system, Netherlands Remote Sensing Board (BCRS), Programme Bureau, Rijkswaterstaat Survey Department, 2000. a
Smith, W. H. F. and Sandwell, D. T.: Global Sea Floor Topography from Satellite Altimetry and Ship Depth Soundings, Science, 277, 1956–1962, 1997. a
Takacs, L. L., Suárez, M. J., and Todling, R.: Maintaining atmospheric mass and water balance in reanalyses, Q. J. Roy. Meteor. Soc., 142, 1565–1573, 2016. a
Zhang, B., Tallapragada, V., Weng, F., Sippel, J., and Ma, Z.: Estimation and correction of model bias in the NASA/GMAO GEOS5 data assimilation system: Sequential implementation, Adv. Atmos. Sci., 33, 659–672, 2016. a