Journal cover Journal topic
Geoscientific Model Development An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

IF value: 5.240
IF5.240
IF 5-year value: 5.768
IF 5-year
5.768
CiteScore value: 8.9
CiteScore
8.9
SNIP value: 1.713
SNIP1.713
IPP value: 5.53
IPP5.53
SJR value: 3.18
SJR3.18
Scimago H <br class='widget-line-break'>index value: 71
Scimago H
index
71
h5-index value: 51
h5-index51
Preprints
https://doi.org/10.5194/gmd-2020-239
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/gmd-2020-239
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Submitted as: development and technical paper 30 Jul 2020

Submitted as: development and technical paper | 30 Jul 2020

Review status
This preprint is currently under review for the journal GMD.

A note on precision-preserving compression of scientific data

Rostislav Kouznetsov1,2 Rostislav Kouznetsov
  • 1Finnish Meteorological Institute, Helsinki, Finland
  • 2Obukhov Institute for Atmospheric Physics, Moscow, Russia

Abstract. Lossy compression of scientific data arrays is a powerful tool to save network bandwidth and storage space. Properly applied lossy compression can reduce the size of a dataset by orders of magnitude keeping all essential information, whereas a wrong choice of lossy compression parameters leads to the loss of valuable data. The paper considers statistical properties of several lossy compression methods implemented in "NetCDF operators" (NCO), a popular tool for handling and transformation of numerical data in NetCDF format. We compare the effects of imprecisions and artifacts resulting from use of a lossy compression of floating-point data arrays. In particular, we show that a popular Bit Grooming algorithm (default in NCO) has sub-optimal accuracy and produces substantial artifacts in multipoint statistics. We suggest a simple implementation of two algorithms that are free from these artifacts and have twice higher precision. Besides that, we suggest a way to rectify the data already processed with Bit Grooming.

The algorithm has been contributed to NCO mainstream. The supplementary material contains the implementation of the algorithm in Python 3.

Rostislav Kouznetsov

Interactive discussion

Status: final response (author comments only)
Status: final response (author comments only)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Login for Authors/Topical Editors] [Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement

Rostislav Kouznetsov

Rostislav Kouznetsov

Viewed

Total article views: 205 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
156 44 5 205 15 7 4
  • HTML: 156
  • PDF: 44
  • XML: 5
  • Total: 205
  • Supplement: 15
  • BibTeX: 7
  • EndNote: 4
Views and downloads (calculated since 30 Jul 2020)
Cumulative views and downloads (calculated since 30 Jul 2020)

Viewed (geographical distribution)

Total article views: 215 (including HTML, PDF, and XML) Thereof 214 with geography defined and 1 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Saved

No saved metrics found.

Discussed

No discussed metrics found.
Latest update: 28 Oct 2020
Publications Copernicus
Download
Short summary
Resetting of non-significant figures (precision trimming) enables efficient data compression and helps to avoid excessive usage of storage space and network bandwidth. The paper analyses accuracy of trimmed data and artifacts caused by trimming methods. The paper presents several methods with implementation, evaluation and illustrations. It is poses an extension and improvement over the methods introduced in a GMD paper By Zender (2016) https://gmd.copernicus.org/articles/9/3199/2016/.
Resetting of non-significant figures (precision trimming) enables efficient data compression and...
Citation