Ocean Plastic Assimilator v 0 . 1 : : . 2 : : Assimilation of Plastics Concentration Data Into Lagrangian Dispersion Models

A numerical scheme to perform data assimilation of concentration measurements in Lagrangian models is presented, along with its first implementation called Ocean Plastic Assimilator, which aims at improving predictions of plastics distributions over the oceans. This scheme uses an ensemble method over a set of particle dispersion simulations. At each step, concentration observations are assimilated across the ensemble members by switching back and forth between Eulerian and Lagrangian representations. We design two experiments to assess the scheme efficacy and efficiency when assimilating simu5 lated data in a simple double gyre model. Analysis convergence is observed with higher accuracy when lowering observation variance or using a more suitable circulation model ::::::::: circulation ::::: model ::::: closer :: to ::: the :::: real ::::::::: circulation. Results show that the distribution of plastic :::::: plastics : mass in an area can effectively be approached :::::::: improved with this simple assimilation scheme. ::::: Direct ::::::::: application :: to : a :::: real ::::: ocean :::::::: dispersion :::::: model :: of ::: the ::::: Great ::::: Pacific :::::::: Garbage :::: Patch :: is :::::::: presented :::: with :::::::: simulated ::::::::::: observations, :::::: which :::: gives :::::::: similarly :::::::::: encouraging :::::: results. : Thus, this method is considered a suitable candidate for creating a tool to assimilate plastic 10 :::::: plastics : concentration observations in real-world applications to forecast plastic :::::: estimate :::: and ::::::: forecast :::::: plastics : distributions in the oceans. Finally, several improvements that could further enhance the method efficiency are identified.

While the Lebreton et al. (2012) modeling framework has already produced valuable results, it is not able to assimilate observations and update forecasts accordingly yet. However, as the company prepares to release a number of systems to clean the Ocean, it will soon dispose of numerous sources of data collecting devices measuring plastics concentration in the oceans. Therefore, we believe it is timely to develop a method to assimilate incoming real-time observations. Methods to assimilate plastics concentration observations over a Lagrangian dispersion model are in the early development 30 stage (Lermusiaux et al. (2019)). However, earlier studies dealing with data assimilation applied to the atmospheric dispersion of particles around polluting facilities, such as Zheng et al. (2007), have been published. This paper introduces Ocean Plastic Assimilator v0.1 : .2, a numerical scheme developed to assimilate plastic :::::: plastics concentration data into 2D Lagrangian dispersion models. Section 2 formulates the method and section 3 then describes its initial implementation and application. For this proof of concept ::::::::::::: proof-of-concept : paper, we use a dispersion simulation generated 35 with the OpenDrift framework in a controlled environment based on a double gyre analytical flow field. The assimilation results are presented in section 4. Real-world application perspectives and future developments that could further improve the method are discussed in section 5. :::::: Finally, :: in :::::: section :: 6, ::: we :::::: present : a ::::: direct :::::::::: application :: of ::: the :::::: method :: to : a ::::::::: dispersion :::::: model :: of ::: the ::::: actual -A dispersion model along with the flow field used.
-Projections methods to go back and forth between Eulerian and Lagrangian representations.
-Prior estimates for model parameters and uncertainties.

60
This section presents our procedure on a set of N p particles drifting in a gridded domain. We divide this space , : with a grid of size (m, n), and use indices i, j to designate a grid cell. An Ensemble Kalman Filter works by running different simulations, or ensemble members, simultaneously with variations in model parameters (e.g., initial conditions). We use N e members in the following.

Projecting weights on densities 65
At each step t, we define: w f t the forecast weights vector, of size N p , in kg.
x fw a t the analysis weights vector computed by projecting on w f t the corrections computed on x a t , in kg.
-∆ i,j,t the set of particles present at step t in grid cell i, j.
To start, for grid cell i, j ::: with :::: area :::: A i,j , x f t is computed with the formula: p∈∆i,j,t (w f t ) p A i,j ::::::::::::: (1) 75 In the following, we omit sub-index t when all the operations are performed at the same time step t.
2.2.2 Assimilating with the :::::::: Ensemble ::::::: Kalman :::::: Filter : (EnKF : ) Our assimilation step relies on the use of Ensemble Kalman Filtering, as described in Evensen (2003). This method is derived from Kalman Filtering and notably suitable to situations in which the model is not an easily invertible matrix (used in Standard ::::::: standard Kalman Filtering), and one cannot efficiently compute an adjoint (used in Extended Kalman Filtering).

80
Standard Kalman Filtering allows computing the analysis state using a single equation. In standard Kalman Filtering, the forecast state vector x f (in this case, the densities) and the analysis vector x a are linked with: H is the observation matrix that maps the state x f to the observation space of y.
The Kalman gain matrix K is defined by the following equation: R is the observation error covariance matrix. P f is the forecast error covariance matrix. When using a Kalman Filter, P f is in principle meant to be computed from the previous state by application of the forward integration matrix operator, but this is generally too computationally expensive and impractical. Here, we use Ensemble Kalman Filtering, where the P f matrix computation is approximated by relying on an ensemble of simulations.

90
Ensemble members are different instances of our simulation with different initializations. For ensemble member k ∈ [|1, N e |], we write x f k the forecast state vector, and x f the ensemble average Accordingly, the computation of P f can be accomplished using the formula: Each ensemble member k is then updated using equation 2 with x k instead of x.

Projecting the density updates on particles
Several ways of projecting the density updates (step 3 in figure 1) can be thought of. In this initial study ::: the :::::: Ocean :::::: Plastic ::::::::: Assimilator :::: v0.2, we simply choose to update the weights by uniformly distributing the density correction ratio of a grid cell i, j among the particles in the same box using this formula: In this equation, (x f ) i,j cannot be null when a grid cell i, j contains particles (see equation 1), except if all particles have null weights. While extremely unlikely (we did not encounter this phenomenon during our numerous tests), particles with exactly null weights have to be taken out of the simulation.
This heuristic was chosen primarily for its simplicity and its computational efficiency. The multiplicative approach also tends 105 to prevent computing negative weights if the density analysis is lower than the density forecast.
Finally, for step 4 in figure 1, since the dispersion model changes particles positions but not their weights when integrating, the forecast weights at time t + 1 are:

110
As stated by Evensen (2003) the Ensemble Kalman Filter requires the initial ensemble to sample the uncertainty in variables that we want to update with data assimilation. In this article, we focus on our method's ability to compute the correct total mass of particles drifting. For this reason, we normally distribute the members' initial total masses with a mean µ e and standard deviation σ e . If we write M k the initial total mass for ensemble member k, we thus have:
This section presents the Python implementation of the aforementioned method, called Ocean Plastic Assimilator (v0.1 :: .2).
We then describe the Lagrangian dispersion model (OceanDrift) used to generate double gyre dispersion simulations and the experiments created with it to observe how our method performs in a controlled environment. 120

Python implementation of the Ocean Plastic Assimilator
This first implementation is coded in Python (see Peytavin (2021a) for the repository). It is meant as a standalone program, using as input a dispersal model output data, formatted as a netCDF4 dataset containing particle coordinates in a given space and time domain, along with their weights. It is assumed that the advection scheme of : in : the dispersion model does not depend on particle masses. In the more general case, one would have to regenerate the particle trajectories ::: run ::: the :::::: model :::: again : after 125 each assimilation time step, :: as :: a :::::: change :: of : a ::::::: particle :::: mass ::::: could :::::: change ::: its ::::: future :::::::: trajectory.
Once loaded, the input weights are duplicated in N e arrays, and the program runs the assimilation scheme presented in the previous section in a time loop, taking observations from an input data frame at each time step. The Assimilator can also take one additional dispersion simulation output from which it samples observations to assimilate at each time step. This is the approach used in the following test-case.

130
This implementation leverages the use of arrays and the fact that we only use one simulation for all ensemble members to perform vectorized computations for the computation of P f , equations 1 and 6. It also allows computing ∆ i,j,t only once for all ensemble members. Some parts of the algorithm are also executed with the just-in-time compiler numba (see Lam et al. (2015)) in order to run faster.
This implementation allows our algorithm to perform each following test-case, repeated assimilation of 2 observation points 135 during 2000 timesteps in a (60, 40) gridded domain, in less than a half-hour on a modern laptop, using about 3GB of storage and 2GB of RAM.
Running the Assimilator on a dispersion output and not inside a dispersion model allows it to work on outputs from different models, as long as the data is appropriately formatted. Future implementations could also offer the option of running online (i.e., embedded inside a dispersion model), which could allow more flexibility and possibilities, as discussed in section 6.2.3.

Double gyre plastic dispersion using the OceanDrift model
In order to create our test cases, we first need a dispersion model and a flow field. We chose the OceanDrift model from the Norwegian Lagrangian trajectory modeling framework OpenDrift (see Dagestad et al. (2018)). It was chosen mainly for its simplicity and the fact that OpenDrift embeds a module to generate a dispersion based on a 2D double gyre flow field.
This field consists of two gyres moving closer then farther away periodically in an enclosed area. It's a simple field but 145 complex enough to stir and disseminate particles and is regularly used as a standard case to study time-varying flows, for example in Guo et al. (2018). The evolving currents are generated using an analytical field 1 . The equations generating this 2D, time-varying, deterministic field are: The dimensionless domain size for these equations is Parameter A is the circulation amplitude, ω is the frequency of oscillation of the gyres, and is the amplitude of the gyres oscillation relative to the steady-state.
Particles are then generated and advected using the OceanDrift lagrangian model from the Norwegian trajectory modeling 155 framework OpenDrift (Dagestad et al. (2018)). Figure 2 shows such a dispersion and the associated concentration field.
Thus, we can generate different dispersion simulations by changing the initial particle positions seed, which changes the distribution of particles trajectories and the initial masses of the particles. We can also change the flow field parameters A, ω and .

165
In order to assess and quantify the efficacy of the Assimilator in different cases, we designed two experiments.
The first one aims at verifying that, when the forecast flow field reproduces the reference flow field accurately, our implemented scheme can correct an incorrect total mass guess. It also intends to check that the estimate gets better when the observation error gets lower, as one would generally expect.
The second experiment aims to assess the Assimilator's behavior and efficacy when the forecast flow field is slightly different 170 from the reference by changing the double gyre parameters A and .

175
In each test case, the Ocean Plastic Assimilator is executed over the course of 2000 timesteps. The double gyre size, which is [0, 2] × [0, 1] is dimensionless, which means that the timestep is dimensionless too. However, if the flow field was the size of the great pacific garbage patch, then with A = 0.1 the timestep would be of the order of a day.
Over the double gyre, we define a gridded domain of size (60, 40) and select two observation points, fixed, to run each assimilation test case. This sampling pattern can be thought of as representing a set of moorings that one may deploy in the 180 real Ocean. H is defined as a matrix that subsets (x) i,j to 2 points of observations.
For the i-th point, the measurement is simulated by adding a random error to x i such as : To compute matrix R, we choose to model the observation error as a sum of an additive error σ 0 and a multiplicative, relative error σ rel . As such, with y i the value measured at the i-th observation point: In the following, unless specified otherwise, we use N e = 10, σ e = 0.05, N p = 25000, σ 0 = 0.1 and σ rel = 1%. The 2 observation points coordinates are the following pairs: (12, 4), (55, 27).

190
In this first experiment, we want to assess the ability of our newly implemented scheme to estimate the total mass of plastics in the reference simulation correctly.  Figure 4 shows the evolution of the forecast total mass for each simulation. Forecasts starting with an initial ::: total : mass lower than approximately 0.82M ref have their total mass rise while those starting with higher ::: total : mass have their total mass fall.
We also compute the concentration field Root Mean Square Error RMSE f at the end of the simulation after assimilating, and RMSE ∅ at the end of a simulation with no assimilation. Values in Table 1 indicate a clear improvement of the RMSE when the initial :::: total mass was erroneous and a stable one compared to no assimilation when the initial :::: total mass was correct.

210
Overall, this points to an improvement in the forecast concentration field over time, thanks to data assimilation.
Finally, in order to assess the method accuracy depending on observation errors, we set µ e = 2 and run simulations with different values of σ rel . FTM and RMSE are then computed and presented in Table 2. We find that decreasing σ rel increase the final forecast mass :::::::: increases ::: the :::: final :::: total :::: mass :: of ::: the ::::::: forecast, getting it closer to 1 while the RMSE decreases. This demonstrates that the forecast bias can be reduced by decreasing the observation error, as 215 one would usually expect of a data assimilation method.

Impact of physical model errors
In this second experiment, we change the parameters used to generate the currents of the reference simulation double gyre. For example, the impact of a modification of on the generated flow field is illustrated in Figure 6. By assimilating observations from reference situations with different double-gyre parameters, we can observe the effects of having an erroneous physical 220 dispersion model when assimilating data.
We initiate the forecast with an erroneous initial :::: total mass of 2M ref and expect that the best total mass predictions will arise from assimilation simulations with the closest flow field.
The forecast simulation is generated using We find that data assimilation remains effective and that simulations run with values of and A closer to ref and A ref lead to better estimations of the total mass and concentration field after some time as one might expect (Figure 7 and Table 3).
This result illustrates that the assimilation method can be robust to unknown model errors.
Conveniently, we observed that the forecast total mass gets higher when the dispersion model is more accurate, thus acting, in a way, like a score. As a result, we might discriminate between dispersion models based on this method's output by selecting the ones that output the highest total mass.

Future Developments
Amongst the potential applications of the presented method, one might highlight the evaluation and design of real observational strategies. Here we considered one hypothetical, albeit plausible, scenario which might represent the deployment of a few relatively accurate moorings. In future studies it would be interesting to investigate how data coverage in space and time may affect forecast skill in more detail, for example, or use this data assimilation system as a benchmark for proposed field 275 campaigns. Several directions to further develop the method and make it more accurate also seem worth considering, as outlined below.

Improving the filter
Throughout the last two decades, the Ensemble Kalman Filter has been extensively developed and improved, with numerous variants published in the scientific literature. Using different ensemble sampling strategies or a square-root algorithm was 280 described as a way to improve accuracy in Evensen (2004). Other solutions include inflating the ensemble before assimilating (see Anderson (2007)), resampling the ensemble, or using a method to assimilate observations locally by adding a Schur product with a so-called correlation matrix in the computation of the Kalman gain in equation 3 (see Houtekamer and Mitchell (2002)). Assimilating locally around observation locations could also have the advantage of further improving the geography of the concentration field, which would translate in reduced values of RMSE f . 285
In section 2.2.3, we presented a simple way to update particles weights after assimilating density observations through the equation 6. Different possibilities of performing this step have been thought of, some of which we think may be worth investigating further. Another simple approach would be to apply an additive correction, instead of the multiplicative correction used in equation 6:

300
This approach was not favored in this first study, as it seemed more likely to generate negative weights more often.
Another alternative would be to generate new particles so that their weights sum up to the updated density, possibly fewer or more particles. This could be more technically challenging to implement and require implementing the assimilation scheme directly inside the dispersion model loop. However, it could also have the advantage of conveniently increasing resolution where there are high plastic :::::: plastics concentrations.

Conclusions
This paper has presented ::::::: presents a simple yet readily effective method to assimilate concentration observation :::::::::: observations :: of :::::: plastics ::::::::::: concentration : data into a Lagrangian dispersion model, and its first implementation called the Ocean Plastic Assimilator (v0.1 :: .2). We applied :::: apply it in a controlled environment to assess its efficacy. We studied :::: study : the impact of observation errors on the prediction accuracy and changed some of the dispersion parameters (A and ) to evaluate the impacts of model errors.
Thus, it :: the :::::: Ocean ::::: Plastic :::::::::: Assimilator : will be further developed at The Ocean Cleanup to assimilate plastic :::::: plastics concentration data from the oceans and improve our cleanup operations in oceanic gyres. This method will undergo more research to 315 develop its features and assess its efficacy on :::: when ::::: using : real-world observations. We expect it to be used to assess in real-time the cleanup operations of The Ocean Cleanup.