Articles | Volume 12, issue 9
https://doi.org/10.5194/gmd-12-4099-2019
https://doi.org/10.5194/gmd-12-4099-2019
Development and technical paper
 | 
23 Sep 2019
Development and technical paper |  | 23 Sep 2019

Evaluation of lossless and lossy algorithms for the compression of scientific datasets in netCDF-4 or HDF5 files

Xavier Delaunay, Aurélie Courtois, and Flavien Gouillon

Related authors

A Comparison of Lossless Compression Algorithms for Altimeter Data
Mathieu Thevenin, Stephane Pigoury, Olivier Thomine, and Flavien Gouillon
EGUsphere, https://doi.org/10.5194/egusphere-2022-1094,https://doi.org/10.5194/egusphere-2022-1094, 2022
Preprint archived
Short summary
A Parquet Cube alternative to store gridded data for data analytics and modeling
Jean-Michel Zigna, Reda Semlal, Flavien Gouillon, Ethan Davis, Elisabeth Lambert, Frédéric Briol, Romain Prod-Homme, Sean Arms, and Lionel Zawadzki
Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2021-138,https://doi.org/10.5194/gmd-2021-138, 2021
Preprint withdrawn
Short summary

Related subject area

Numerical methods
Numerical stabilization methods for level-set-based ice front migration
Gong Cheng, Mathieu Morlighem, and G. Hilmar Gudmundsson
Geosci. Model Dev., 17, 6227–6247, https://doi.org/10.5194/gmd-17-6227-2024,https://doi.org/10.5194/gmd-17-6227-2024, 2024
Short summary
Modelling chemical advection during magma ascent
Hugo Dominguez, Nicolas Riel, and Pierre Lanari
Geosci. Model Dev., 17, 6105–6122, https://doi.org/10.5194/gmd-17-6105-2024,https://doi.org/10.5194/gmd-17-6105-2024, 2024
Short summary
Consistent point data assimilation in Firedrake and Icepack
Reuben W. Nixon-Hill, Daniel Shapero, Colin J. Cotter, and David A. Ham
Geosci. Model Dev., 17, 5369–5386, https://doi.org/10.5194/gmd-17-5369-2024,https://doi.org/10.5194/gmd-17-5369-2024, 2024
Short summary
A computationally efficient parameterization of aerosol, cloud and precipitation pH for application at global and regional scale (EQSAM4Clim-v12)
Swen Metzger, Samuel Rémy, Jason E. Williams, Vincent Huijnen, and Johannes Flemming
Geosci. Model Dev., 17, 5009–5021, https://doi.org/10.5194/gmd-17-5009-2024,https://doi.org/10.5194/gmd-17-5009-2024, 2024
Short summary
Assessing the benefits of approximately exact step sizes for Picard and Newton solver in simulating ice flow (FEniCS-full-Stokes v.1.3.2)
Niko Schmidt, Angelika Humbert, and Thomas Slawig
Geosci. Model Dev., 17, 4943–4959, https://doi.org/10.5194/gmd-17-4943-2024,https://doi.org/10.5194/gmd-17-4943-2024, 2024
Short summary

Cited articles

Baker, A. H., Hammerling, D. M., Mickelson, S. A., Xu, H., Stolpe, M. B., Naveau, P., Sanderson, B., Ebert-Uphoff, I., Samarasinghe, S., De Simone, F., Carbone, F., Gencarelli, C. N., Dennis, J. M., Kay, J. E., and Lindstrom, P.: Evaluating lossy data compression on climate simulation data within a large ensemble, Geosci. Model Dev., 9, 4381–4403, https://doi.org/10.5194/gmd-9-4381-2016, 2016. 
Caron, J.: Compression by Scaling and Offset, available at: http://www.unidata.ucar.edu/blogs/developer/en/entry/compression_by_scaling_and_offfset (last access: 27 September 2018), 2014a. 
Caron, J.: Compression by bit shaving, available at: http://www.unidata.ucar.edu/blogs/developer/entry/compression_by_bit_shaving (last access: 27 September 2018), 2014b. 
Collet, Y.: LZ4 lossless compression algorithm, available at: http://lz4.org (last access: 27 September 2018), 2013. 
Collet, Y. and Turner, C.: Smaller and faster data compression with Zstandard, available at: https://code.fb.com/core-data/smaller-and-faster-data-compression-with-zstandard/ (last access: 27 September 2018), 2016. 
Download
Short summary
This research aimed at finding a compression method suitable for the ground processing of CFOSAT and SWOT satellite datasets. Lossless algorithms did not allow enough compression. That is why we began studying lossy alternatives. This work introduces the digit rounding algorithm which reduces the volume of scientific datasets keeping only the significant digits in each sample value. The number of digits kept is relative to each sample so that both small and high values are similarly preserved.