CranSLIK v1.0: stochastic prediction of oil spill transport and fate using approximation methods

. This paper investigates the development of a model, called CranSLIK, to predict the transport and transformations of a point mass oil spill via a stochastic approach. Initially the various effects on destination are considered and key parameters are chosen which are expected to dominate the displacement. The variables considered are: wind velocity, surface water velocity, spill size, and spill age. For a point mass oil spill, it is found that the centre of mass can be determined by the wind and current data only, and the spill size and age can then be used to reconstruct the surface of the spill. These variables are sampled and simulations are performed using an open-source Lagrangian approach-based code, MEDSLIK II. Regression modelling is applied to create two sets of polynomials: one for the centre of mass, and one for the spill size. Simulations performed for a real oil spill case show that a minimum of approximately 80 % of the oil is captured by CranSLIK. Finally, Monte Carlo simulation is implemented to allow for consideration of the most likely destination for the oil spill, when the distributions for the oceanographic


Introduction
Whilst the frequency of spills occurring has dropped significantly in the last few decades (Etkin, 2001), it does not diminish the inevitability of an oil spill occurring.Oil spills can cause large-scale destruction of the environment, they have significant economical effects, and can result in human life loss.They are inevitably the cause of environmental, economic and human disaster.The Deepwater Horizon spill, for example, has been analysed extensively by Graham et al. (2011), members of the US National Commission on the BP Deepwater Horizon Oil Spill and Offshore Drilling.There is therefore much interest in being able to accurately predict the destination, transport and transformation of an oil spill to minimise the resultant cost, both financial and environmental.
There are many complex phenomena affecting an oil spill, creating an advection-diffusion-transformation process.These consist of a large number of effects: the advection due to currents, wind and waves, the diffusion due to the turbulence and the transformation processes, such as evaporation, natural dispersion, spreading etc., which need to be considered for accurate fate and transport prediction.A schematic illustration of these effects can be seen in Fig. 1 ( MEDESS-4MS, 2013;ITOPF, 2013).Also, as the spill ages, different effects become more important -a speculative mass balance can be seen in Fig. 2 (Mackay and McAuliffe, 1989).There are numerous equations created to model these effects, based on both analytical and empirical approaches.However, the complexity of the underlying physics is not yet fully understood.Reed et al. (1999) provide a very good summary of early models.Since then significant progress has been made in acquiring a deeper understanding of the involved complex phenomena -for example biodegradation is studied by Mc-Genity et al. (2012).
Difficulty also arises as the result of uncertainty since exact quantities are not necessarily known beforehand due to the stochastic nature of certain variables, for example the sea surface velocity.The computational cost involved in running multiple cases, or Monte Carlo simulation, to consider the possible conditions is often far too great to be a viable approach.This becomes a severe impediment in cases of real accidents where a quick, or even real-time, prediction becomes necessary.

MEDSLIK II
Many models have been developed and used to predict the transport and transformation of an oil spill.These are either commercial, such as Li et al. (2013), or open-source, such as De Dominicis et al. (2013a).Regardless of the software tools employed, these models are not without their limitations.Often the computational cost involved in running a full simulation is too high.Alternatively, in order to be able to have a prediction in near real time, the model has to be simplified extensively, in terms of its physics, and therefore the simulation results are not of high accuracy.
One such code is MEDSLIK II.This solves the advectiondiffusion processes using a Lagrangian particle formalism, meaning that the oil slick is broken into a number of constituent particles, while the transformation processes act on the entire oil slick surface.It has been shown to provide accurate results in a number of real scenarios (De Dominicis et al., 2013a;Coppini et al., 2011).Results are produced reasonably quickly which is desirable since many simulations are necessary to apply the regression model.
There are four main inputs required: oil spill data, wind field, sea surface temperature, and structure of sea currents.The frequency of the oceanographic data is an important factor since these can change dramatically in a relatively short period of time.MEDSLIK II applies a linear interpolation in time between two subsequent current and wind fields to calculate the current and wind at the model time step.
The test case included with the program is for an oil spill in Algeria.This consisted of 680 tonnes of crude oil being spilled and validation was carried out to check the accuracy of the prediction over a 36 h period.The accuracy was found to be in good agreement with the observed results (De Dominicis et al., 2013a).This model has also been validated for the Lebanon crisis where the predicted oil slick at sea and coastal deposits were in agreement with observations (Coppini et al., 2011).
Additional details regarding the development and validation of MEDSLIK II can be seen in De Dominicis et al. (2013a, b).

Aims
This paper investigates the use of stochastic methods to map the response from different input variables to create a robust This provides an estimation of the destination and spread of an oil spill subject to uncertain oceanographic conditions.Also the minimal computational time required for the developed model allows for Monte Carlo simulation using nondeterministic values for current and wind velocities.This can then be used to calculate a region such that there is a high probability that said region will contain the oil spill.This aids significantly in reducing the resultant financial and environmental cost of oil spills, predicting their likely development.
Wind and current velocities are both continuous variables, and as such, it is impossible to investigate all possible values.Therefore, it is necessary to sample these variables.This involves creating a discrete set of values which is representative of the continuous variable.The sampled values then create the set of necessary simulations called a design hypercube (Myers et al., 2009).
The key steps in developing our methodology can be outlined as follows: 1. identify the key parameters and their relative distributions necessary for short-term oil spill prediction; 2. sample the considered parameters to create a design hypercube; 3. generate simulation data using the design hypercube; 4. fit regression models to map the inputs to the response; 5. use the aforementioned regression model to create a prediction code; 6. test the developed code against a real scenario and analyse the results.
In order to generate simulation data, we have used the MED-SLIK II model.This choice was based on a number of reasons, but predominantly due to its robustness and because it has been validated on multiple real spills as discussed in De Dominicis et al. (2013a, b).

Uncertainties and stochastic modelling
Another complexity in modelling arises from the uncertainty involved in prediction of oceanographic conditions and spill parameters.Many parameters, which are known to have an important role in the destination of an oil spill, are stochastic in nature and therefore difficult to accurately predict.
Wind forcing, i.e. the wind velocity components at 10 m above the sea surface, is provided by meteorological models, while currents and temperature are provided by oceanographic models.The atmospheric forcing is provided by the European Centre for Medium-Range Weather Forecasts (ECMWF), with 0.25 • space, and 6-hour temporal resolution.The current velocities used in this work come from the Mediterranean Forecasting System (MFS) described in Pinardi et al. (2003) and Pinardi and Coppini (2010).The MFS system is composed of an Ocean General Circulation Model (OGCM) at 6.5 km horizontal resolution and 72 vertical levels (Tonani et al., 2008;Oddo et al., 2009).Every day MFS produces forecasts of temperature, salinity, intensity and direction of currents for the next 10 days.Once a week, an assimilation scheme, as described in Dobricic and Pinardi (2008), corrects the model's initial guess with all the available in situ and satellite observations, producing analyses that are initial conditions for 10-day ocean current forecasts.The modelled currents and wind fields can be affected by uncertainties that arise from model initial conditions, boundaries, forcing fields, parameterisations, etc.In this paper the hourly mean analyses have been used to eliminate the additional uncertainty connected with forecasts for both atmospheric and oceanographic input data.
Whilst many of these parameters may be measurable at the initial time, prediction of the oil spill destination requires reasonable estimation of the conditions over the simulation period.There are numerous methods for circumventing this problem; usually the stochastic parameters are extrapolated from previous values -however, this can frequently cause gross errors.This hinders the accuracy of real time prediction.
In this problem, it is necessary to apply sampling to ensure that the considered points are representative of the domain.This problem cannot be approached deterministically due to the continuous nature of the parameters making the consideration of every possible quantity unfeasible.There are numerous methods of sampling available.Monte Carlo simulation is the simplest.However, the associated high computational cost is a constraint in the context of the model development.Another alternative could be importance sampling, which adopts a Monte Carlo style simulation, but biases the output to favour areas of greater interest, for example the tails of the distribution.This, however, is also inappropriate since the entire distribution is of interest, and it is still relatively expensive.Instead, a Latin hypercube (LHC) method will be used, where the distribution is separated into blocks of equal probability and then a random value is chosen from each block.This has the advantage of requiring a smaller amount of necessary simulations to create a good design and hence is relatively inexpensive.The main disadvantage is that it does not necessarily guarantee a well-stratified design (Myers et al., 2009).
A simple third-order polynomial regression model is used to map the responses.It was found that lower-order models are too sensitive to the fluctuating component present in the simulation data.This is the same reason which prevents the use of radius basis functions in place of a polynomial.
It is also possible that the input variables will possess cross-correlation.Therefore mixed variable terms, i.e. x 1 x 2 , have to be included in the model.
As previously stated, the underlying physics of an oil spill is very complex.Existing solvers require resolving many of the underlying phenomena.To perform direct simulations of all possible conditions would be far too computationally expensive.For example, MEDSLIK II requires several minutes per run; 1000 runs using different input parameters would therefore require many hours.Our approach avoids this problem by creating a polynomial which maps inputs to a response resulting in 1000 runs being possible in approximately 1 second.This allows for consideration of likely destinations of the oil spill using non-deterministic inputs.Note that the phenomena which can be accounted for in CranSLIK are limited to the phenomena modelled by MedSLIK II.The paper uses a non-intrusive method, whereby the regression model is developed using the results from the solver, and does not require being programmed into the solver itself.There are numerous benefits of this approach.Primarily it is performed to simplify the problem.However, it also means that the developed methodology can easily be applied to data from any source.

Choice of variables
Three variables have been chosen: wind velocity, current velocity and spill size.
It is necessary to express each variable as a distribution.Spill size will simply be assumed uniform and various sizes tested.However, velocities need to be separated into two components: speed and angle.
The angle can then be simplified by treating the current angle as an axis and only looking at the wind angle with respect to this axis.Also, symmetry can be applied, meaning that it is only necessary to consider angles between 0 and π, since other angles are a reflection in the current axis.A uniform distribution can then be assumed for this variable.
The distribution for wind speed is widely accepted to be reasonably well represented by a Weibull distribution with shape and scale parameters 2.26 and 9.02 respectively (de Prada Gil et al., 2012).Morgan et al. (2011) suggest that a log-normal distribution is better for extreme wind speeds.We are not looking at extreme speeds though, so the Weibull distribution is sufficient.It is somewhat more complicated to find a distribution for the current speed as this varies over the globe.Since the pattern is almost entirely that of wind-driven circulation, it is likely the same underlying distribution with varying coefficients based on location.Here, the current velocity for the test case has been analysed and a Weibull distribution superimposed, leading to the coefficients 1.9967 and 0.2132 for shape and scale respectively.The maximum velocity is limited by the highest sampled value.Performing the prediction for a value outside the sampled range is not recommended due to extrapolation errors.Therefore, if one wished to consider a value outside of the sampled range, additional simulations would have to be performed.

Sampling the variables
To develop the model, it is necessary to sample the chosen variables.In statistics and quantitative research methodology, a data sample is a set of data collected and/or selected from a statistical population by a defined procedure.This has been done using the LHC technique, which involves splitting the distribution into blocks of equal probability, then a random value is chosen from each block.A brief experiment was conducted and it was determined that a minimum of six samples are required to capture a reasonably complex shape, the Weibull distribution.Note that it is not possible to predict the shape of the resultant graph beforehand.However, it is expected to be more simple than the test shape.A zero point has also been considered for investigation of simulation noise generated by MEDSLIK II.The variables have also been decoupled by consideration of a point mass oil spill subject to oceanographic conditions.The result was that the destination can be determined by the current and wind velocities.It was also found that the size of the spill depends on the initial spill size as well as the spill age, that is time since initial spill.The sampled values for wind and current velocities, and the angles, can be seen in Table 1.Note that to simplify the required number of simulations, the developed model will displace the spill depending on the angle between the current and wind velocities, with the current velocity treated as an axis, and then translated to meaningful coordinates.Data have been generated from the stated input values for a simulation time of 36 h.

Response mapping
In order to map the responses, a third-order polynomial approximation was calculated using the method of least squares.It has been found that the zero-point fluctuations from the random walk procedure appear to skew the results disproportionally with lower-order models.For the spill size, a slightly different approach was taken, where the developed equation comes from r = g(θ), i.e. a radial function is developed, as opposed to a Cartesian.This assists in ensuring a periodic, or near-periodic model.Note that both a polynomial and sinusoidal functions were investigated and the polynomial appears to produce less skewed results in the central region and hence the polynomial function was chosen.But as outer rings are of greater interest, either choice could be acceptable.
Seven values have been sampled for the wind and current magnitudes, however only five angles have been considered.This is because the angle refers to the angle between the wind and current velocities, and since the current velocity is used as an axis, symmetry can be applied to reduce the number of necessary values in this parameter.Therefore five values over a semi-circle are chosen, which corresponds to eight values over a circle.

Developed methodology
Once the polynomials have been created, it is necessary to outline the developed methodology for prediction of an oil spill, using the calculated coefficients.The flow chart of the developed model can be seen in Fig. 3.The actual code has been written in the commercial software package MATLAB ® (2011) and the statistics toolbox is used for the Monte Carlo simulation to generate the random numbers.
An overview of the methodology is as follows: -Interpretation of oceanographic data.The key parameters in the prediction are the wind and current velocities.MEDSLIK II produces column-structured data for these from the raw NetCDF files.The prediction code is capable of reading these and converting them to block structure and converting from latitude and longitude to metres.A modified version of this code has also been written for the purposes of Monte Carlo simulation, where the user inputs a desired wind and current velocity directly.
-Centre of mass.To investigate the behaviour of the centre of mass, the wind, current and angle variables have been considered.A polynomial has been developed which links these variables to predict displacement in the x and y planes.The oceanic data are interpolated to find the parameters at the spill location and these are fed into the regression model to predict a new centre.
Since the current direction is treated as an axis, the displacement with respect to this is first calculated, and then translated into more meaningful coordinates.
-Reconstruction of surface.Now the centre of mass has been predicted, the surface reconstruction of the oil spill can be considered.Since this is not linked to the destination of the centre of mass, the rings are created around the origin and then displaced by the calculated displacement of the centre of mass.If desired, a contour can then be fitted according to these concentration rings.These are fourth-order polynomials and require the initial spill size (tonnes) and the spill age (hours).
-Set values for next iteration.For the next time step, it is necessary to set the new centre of mass for the oil spill.At this stage, the centre of mass can be corrected based on observation to produce more accurate results.

Case study
In order to validate CranSLIK, it is necessary to investigate its performance when applied to oceanographic conditions.and that over 99.5 % of the oil was captured for each case after 1 h of simulation.It was also found that the prediction becomes less accurate for extended periods.The wind and current velocities were both found to produce near-linear displacement with respect to time, when considered individually.The developed model works by hourly prediction which causes cumulative errors in extended simulation.Hindcast modelling, updating the centre of mass every hour, is therefore recommended to minimise error.The spill size prediction remains very accurate, above 99 %, over a 36 h period suggesting that hindcast modelling is not required to be applied to this part of the code.

Algeria test case
The case considered uses the oceanographic data for the Algeria spill on 6 August 2008 and a point mass oil spill is released from latitude 38.240 • and longitude 5.981 • .Current velocities were updated every hour and wind velocities every 6 h.It is found that the proportion of oil captured becomes poor when a full 36 h prediction is performed -the accuracy rapidly drops after the 4 h mark as shown in Fig. 4.However, under the application of hindcast modelling, where the centre of mass is updated every hour based on model data, the minimum accuracy is greatly improved.This is likely due to cumulative errors during prediction.These errors could be present in the developed model.However, since the model has been verified, it may be an error due to the oceanographic data.The model assumes that the oceanographic conditions at the start of the simulation period are representative of the conditions over the period.This however is not necessarily true and therefore the prediction is less accurate when these conditions change greatly over the simulation period.It is possible in this case to apply an interpolation since the quantities for the next time step are known.However, this would not be possible in a real scenario.Figure 5 shows the displacement error of the centre of mass, when this value is updated at different intervals.It is clear that the error is far smaller when the simulation is only predicting for an hour and then updating.
With regard to the spill size only, the accuracy appears to be very good, as seen in Fig. 6.Compared to the accuracy of the centre of mass prediction, this appears to be far more accurate, suggesting that the weakest component of this model is the centre of mass prediction; however, the overall accuracy appears to be reasonably good -a minimum of 80 % when hourly prediction is used as seen in Fig. 4.This also justifies the decoupling of variables.
The supplementary animation shows the predicted oil spill (black rings) and the MEDSLIK II result (background contour) for a 36 h simulation period for this test case.The centre of mass for the prediction is updated every hour.The lowest proportion of oil captured is approximately 80 % with the average being about 91 %.

Sensitivity analysis
It is also of interest to consider the sensitivity of CranSLIK with respect to the different input parameters.This is summarised in Table 2.
The most sensitive variables appear to be the current magnitude and angle.This is expected since the displacement due to current velocity is far greater than that due to the wind velocity, and since the majority of the oil is contained close to the centre, the dispersed elements do not skew the results significantly and hence there is some leeway with the spill size.This was expected since the current is more displacing than the wind, and it was concluded the wind is less important when the sensitivity of variables was investigated in MEDS-LIK II (De Dominicis et al., 2013a).

Monte Carlo simulation
CranSLIK assumes that the wind and current data at the start of the hour are representative of the full hour.This, however, is not necessarily true since oceanographic conditions may change.Therefore, more accurate prediction may be possible if an interpolation is applied to the data and expected fields are created.However, this is not relevant for prediction of real-time oil spills.In such scenarios it may be of interest to generate an expected region for the oil spill.
Due to the incredibly low computational cost required by CranSLIK, a Monte Carlo simulation can be performed in a very low time frame, approximately 1000 simulations per second on an AMD Phenom II X4 3.6 GHz processor.This can be used to generate an expected region for the oil spill and aid in clean-up and recovery operations.
For the Algeria test case, the Monte Carlo simulation was performed using input distributions developed from the available data.However, due to the 60 h period of data, there exists a bimodal peak in the simulation results representative of the alternating current forcing as shown in Fig. 7.This becomes clearer when the simulation is performed using 10 000 and 100 000 samples, as shown in Figs. 8 and 9 respectively.This result is not too helpful because of the bimodal peak.However, it does demonstrate the versatility and robustness of CranSLIK.If the distribution for a location is known, more meaningful results can be produced.

Discussion
Whilst CranSLIK appears to perform well for the tested scenarios, it is necessary to identify the assumptions made while modelling.Firstly, the displacement of the centre of mass is correlated to the wind and current velocities only, while the spill area is determined by the quantity of oil spilled and the age.Although these variables are considered dominant, in a fully robust model further simulations considering different variables should be performed.This would lead to an even more accurate prediction.However, it would require more complicated approximations to account for these variables and their correlations.Additional variables could be included to account for more complicated flow physics such as nonradial oil spill expansion.Secondly, rather than the MEDS-LIK II which was employed, other oil spill prediction codes and softwares may be used and compared to identify their performance in aspects of accuracy and computational effort and at the same time highlight efficiency of the proposed non-intrusive methodology.Finally, only one particular type of oil spill has been considered: point-mass.Since the developed model moves a centre of mass, and then reconstructs the surface, it is possible to mark several centres of masses and predict their destinations.The problem then becomes surface reconstruction which would require additional simulations.As with any stochastic problem, additional simulations could lead to a better regression fit and hence better prediction.

Conclusions
This paper describes the development of CranSLIK, a model for the prediction of the destination and spread of an oil spill via a stochastic approach.The key parameters were identified as wind velocity, current velocity, spill size and time, and a design square was created for the required samples.The simulations were then performed using MEDSLIK II and regression modelling was applied to create two equations: one to predict the centre of mass, and one to predict the spill size.The developed code has been presented and discussed.It was then validated against a real test case.Finally, the efficiency of the model is exploited using Monte Carlo simulation for the purposes of generating maximum likelihood regions.This has limited use when applied to the Algeria test case due to insufficient current and wind velocity data to more accurately fit a distribution.Note that CranSLIK is limited to the same physical phenomena which are modelled by MEDSLIK II.
The developed model appears to perform well when applied to the Algeria test case considered, with a minimum of 80 % of the oil captured when using hourly prediction.The major strength of the developed model is the efficiency and the minimal time required to perform Monte Carlo simulation and generate maximum likelihood regions.However, for this to provide useful results, it is necessary to have   a distribution or a reasonable estimate of expected oceanographic conditions.This paper serves as a demonstration of an alternative method for fast prediction of the advectiondiffusion-transformation of an oil spill.The assumptions have been discussed and areas for further work highlighted.
Whilst the key variables were considered, it has been identified that consideration of additional variables could result in improved accuracy.

Figure 3 .
Figure 3. Flowchart of the developed model.

Figure 4 .
Figure 4. Proportion of oil captured using different update frequencies.

Figure 5 .
Figure 5. Centre of mass prediction using different update frequencies.

Figure 6 .
Figure 6.Proportion of oil captured by the spill size prediction only, for the Algeria scenario.

Figure 7 .
Figure 7. Monte Carlo simulation of the Algeria test case contoured by cell frequency, 1000 iterations, 1 h simulation.

Figure 8 .
Figure 8. Monte Carlo simulation of the Algeria test case contoured by cell frequency, 10 000 iterations, 1 h simulation.

Figure 9 .
Figure 9. Monte Carlo simulation of the Algeria test case contoured by cell frequency, 100 000 iterations, 1 h simulation.

Table 1 .
Sampled values for centre of mass prediction.

Table 2 .
Sensitivity of variables for the first hour of the Algeria scenario.Upper and lower ranges for 90 % accuracy are given.