09 Sep 2024
The ESGF Virtual Aggregation (CMIP6 v20240125)

Ezequiel Cimadevilla, Bryan Lawrence, and Antonio Santiago Cofiño

Abstract. The Earth System Grid Federation (ESGF) holds several petabytes of climate data distributed across millions of files held in data centers worldwide. Obtaining and manipulating the scientific information (climate variables) held in these files is non-trivial. The ESGF Virtual Aggregation is one of several solutions to providing an out-of-the-box aggregated and analysis ready view of those variables. Here we discuss the ESGF Virtual Aggregation in the context of the existing infrastructure, and some of those other solutions providing analysis ready data. We describe how it is constructed, how it can be used, and provide some performance evaluation. It will be seen that the ESGF Virtual Aggregation provides a sustainable solution to some of the problems encountered in producing analysis ready data, without the cost of data replication to different formats, albeit at the cost of more data movement within the analysis than some alternatives. If heavily used, it may also require more ESGF data servers than are currently deployed in data node deployments. The need for such data servers should be a component of ongoing discussions about the future of the ESGF and its constituent core services.

Short summary
The Earth System Grid Federation (ESGF) stores an enormous amount of climate data spread across millions of files in data centers all over the world. Accessing and working with this scientific information is quite complex. This work presents ESGF Virtual Aggregation, an approach that combines data from different sources into a format that is ready for analysis straight away.