Preprints
https://doi.org/10.5194/gmd-2023-17
https://doi.org/10.5194/gmd-2023-17
Submitted as: methods for assessment of models
 | 
13 Feb 2023
Submitted as: methods for assessment of models |  | 13 Feb 2023
Status: this preprint is currently under review for the journal GMD.

A diffusion-based kernel density estimator (diffKDE, version 1) with optimal bandwidth approximation for the analysis of data in geoscience and ecological research

Maria-Theresia Pelz, Markus Schartau, Christopher J. Somes, Vanessa Lampe, and Thomas Slawig

Abstract. Probability density functions (PDFs) comprise basic information about the variability of observed or simulated variables within a system of interest. In geoscience data distributions are often expressed by a parametric estimation of their PDF, such as e.g. a Gaussian distribution. At present there is a growing attention towards the analysis of non-parametric estimation of PDFs, where no prior assumptions about the type of PDF are required. A common tool for such non-parametric estimation is a kernel density estimator (KDE). Existing KDEs are valuable but incomplete, because of the difficulty of specifying optimal bandwidths for the individual kernels. A diffusion-based KDE provides a useful approach to mitigate the difficulty in identifying bandwidths that resolve desired details of multi-modal data while being insensitive to noise. Therefore we designed and developed a new implementation of a diffusion-based KDE as an open source Python tool. We tested our implementation on artificial and real marine biogeochemical data individually and against other popular KDEs. Our estimator is able to detect relevant multiple modes and resolve boundary close data while suppressing details induced by noise and individual outliers. The convergence rate is comparable to the Gaussian estimator, but with a generally smaller error, most notably for small data sets with up to around 5000 data points. We exemplify and discuss the general applicability of such KDEs for data-model comparison in geoscience, in particular for sparse data. We also provide an example for how our approach can be efficiently utilized for the derivation of plankton size spectra in ecological research.

Maria-Theresia Pelz et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse

Maria-Theresia Pelz et al.

Maria-Theresia Pelz et al.

Viewed

Total article views: 572 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
428 129 15 572 6 7
  • HTML: 428
  • PDF: 129
  • XML: 15
  • Total: 572
  • BibTeX: 6
  • EndNote: 7
Views and downloads (calculated since 13 Feb 2023)
Cumulative views and downloads (calculated since 13 Feb 2023)

Viewed (geographical distribution)

Total article views: 558 (including HTML, PDF, and XML) Thereof 558 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 18 Sep 2023
Download
Short summary
Kernel density estimators (KDE) approximate the probability density of a data set without the assumption of an underlying distribution. We used the solution of the diffusion equation, and a new approximation of the optimal smoothing parameter build on two pilot estimation steps to construct such a KDE best suited for typical characteristics of geoscientific data. The resulting KDE is insensitive to noise and well resolves multi-modal data structures as well as boundary close data.