Preprints
https://doi.org/10.5194/gmd-2022-172
https://doi.org/10.5194/gmd-2022-172
Submitted as: model description paper
07 Sep 2022
Submitted as: model description paper | 07 Sep 2022
Status: this preprint is currently under review for the journal GMD.

Novel clustering framework using k-means (S k-means) for mining spatiotemporal structured climate data

Quang-Van Doan1, Toshiyuki Amagasa1, Thanh-Ha Pham2, Takuto Sato1, Fei Chen3, and Hiroyuki Kusaka1 Quang-Van Doan et al.
  • 1Center for Computational Sciences, University of Tsukuba, Japan
  • 2Hanoi University of Sciences, National University Hanoi, Vietnam
  • 3Research Applications Laboratory, National Center for Atmospheric Research, USA

Abstract. Dramatic increases in climate data underlie a gradual paradigm shift in knowledge-acquisition methods from physical-based models to data-based mining techniques. k-Means is one of the most popular data clustering/mining techniques, and it has been used to detect hidden patterns in climate systems. k-Means is established based on distance metrics for pattern recognition, which is relatively ineffective when dealing with “structured” data that are dominant in climate science, that is, data in time and space domains. Here, we propose (i) a novel structural similarity recognition-based k-means algorithm called structural k-means or S k-means for climate data mining and (ii) a new clustering uncertainty representation/evaluation framework based on the information entropy concept. We demonstrated that the novel S k-means could provide higher-quality clustering outcomes in terms of general silhouette analysis, although it requires higher computational resources compared with conventional algorithms. The results are consistent with different demonstration problem settings using different types of input data, including two-dimensional weather patterns, historical climate change in terms of time series, and tropical cyclone paths. Additionally, by quantifying the uncertainty underlying the clustering outcomes we for the first time evaluated the “meaningfulness” of applying a given clustering algorithm for a given dataset. We expect that this study will constitute a new standard of k-means clustering with “structural” input data, as well as a new framework for uncertainty representation/evaluation of clustering algorithms for (but not limited to) climate science.

Quang-Van Doan et al.

Status: open (until 02 Nov 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Quang-Van Doan et al.

Model code and software

S k-means Quang-Van Doan https://github.com/doan-van/S-k-means

Quang-Van Doan et al.

Viewed

Total article views: 213 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
168 40 5 213 17 1 1
  • HTML: 168
  • PDF: 40
  • XML: 5
  • Total: 213
  • Supplement: 17
  • BibTeX: 1
  • EndNote: 1
Views and downloads (calculated since 07 Sep 2022)
Cumulative views and downloads (calculated since 07 Sep 2022)

Viewed (geographical distribution)

Total article views: 176 (including HTML, PDF, and XML) Thereof 176 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 28 Sep 2022
Download
Short summary
This study proposes (i) a novel Structural or S k-means clustering algorithm that can deal with spatiotemporally ordered climate data; (ii) a novel framework for quantifying the uncertainty in clustering problems using the information entropy concept.