Preprints
https://doi.org/10.5194/gmd-2021-68
https://doi.org/10.5194/gmd-2021-68

Submitted as: methods for assessment of models 30 Apr 2021

Submitted as: methods for assessment of models | 30 Apr 2021

Review status: this preprint is currently under review for the journal GMD.

A method for assessment of the general circulation model quality using K-means clustering algorithm

Urmas Raudsepp and Ilja Maljutenko Urmas Raudsepp and Ilja Maljutenko
  • Department of Marine Systems, Tallinn University of Technology, Tallinn, 19086, Estonia

Abstract. The model's ability to reproduce the state of the simulated object or particular feature or phenomenon is always a subject of discussion. Multidimensional model quality assessment is usually customized to the specific focus of the study and often to a limited number of locations. In this paper, we propose a method that provides information on the accuracy of the model in general, while all dimensional information for posterior analysis of the specific tasks is retained. The main goal of the method is to perform clustering of the multivariate model errors. The clustering is done using the K-means algorithm of unsupervised machine learning. In addition, the potential application of the K-means clustering of model errors for learning and predicting is shown. The method is tested on the 40-year simulation results of the general circulation model of the Baltic Sea. The model results are evaluated with the measurement data of temperature and salinity from more than one million casts by forming a two-dimensional error space and performing a clustering procedure in it. The optimal number of clusters that consist of four clusters was determined using the Elbow cluster selection criteria and based on the analysis of the different number of error clusters. In this particular model, the error cluster of good quality of the model with a bias of 0.4 °C (std = 0.8 °C) for temperature and 0.6 g kg−1 (std = 0.7 g kg−1) for salinity made up 57 % of all comparison data pairs. The prediction of centroids from a limited number of randomly selected data showed that the obtained centroids gained a stability of at least 100 000 error pairs in the learning dataset.

Urmas Raudsepp and Ilja Maljutenko

Status: open (until 21 Aug 2021)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • CEC1: 'Comment on gmd-2021-68', Astrid Kerkweg, 08 Jun 2021 reply

Urmas Raudsepp and Ilja Maljutenko

Data sets

Data for A method for assessment of the general circulation model quality using K-means clustering algorithm Ilja Maljutenko https://doi.org/10.5281/zenodo.4588510

Urmas Raudsepp and Ilja Maljutenko

Viewed

Total article views: 288 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
218 65 5 288 1 2
  • HTML: 218
  • PDF: 65
  • XML: 5
  • Total: 288
  • BibTeX: 1
  • EndNote: 2
Views and downloads (calculated since 30 Apr 2021)
Cumulative views and downloads (calculated since 30 Apr 2021)

Viewed (geographical distribution)

Total article views: 257 (including HTML, PDF, and XML) Thereof 257 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 31 Jul 2021
Download
Short summary
The model's ability to reproduce the state of the simulated object is always a subject of discussion. A new method for the multivariate assessment of numerical model skills uses K-means algorithm for clustering of model errors. All available data that fall into model domain and simulation period is incorporated into the skill assessment. The clustered errors are used for the spatial and temporal analysis of the model accuracy. The method can be applied to different types of geoscientific models.