the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
The ESGF Virtual Aggregation (CMIP6 v20240125)
Abstract. The Earth System Grid Federation (ESGF) holds several petabytes of climate data distributed across millions of files held in data centers worldwide. Obtaining and manipulating the scientific information (climate variables) held in these files is non-trivial. The ESGF Virtual Aggregation is one of several solutions to providing an out-of-the-box aggregated and analysis ready view of those variables. Here we discuss the ESGF Virtual Aggregation in the context of the existing infrastructure, and some of those other solutions providing analysis ready data. We describe how it is constructed, how it can be used, and provide some performance evaluation. It will be seen that the ESGF Virtual Aggregation provides a sustainable solution to some of the problems encountered in producing analysis ready data, without the cost of data replication to different formats, albeit at the cost of more data movement within the analysis than some alternatives. If heavily used, it may also require more ESGF data servers than are currently deployed in data node deployments. The need for such data servers should be a component of ongoing discussions about the future of the ESGF and its constituent core services.
- Preprint
(1699 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 04 Nov 2024)
-
RC1: 'Comment on gmd-2024-120', Anonymous Referee #1, 01 Oct 2024
reply
In the manuscript titled “The ESGF Virtual Aggregation (CMIP6 v20240125)”, the authors study the Earth System Grid Federation (ESGF) Virtual Aggregation which provides a sustainable solution to some of the problems encountered in producing climate analysis ready data. In case of an overload of access requests, the Virtual Aggregation requires more ESGF data servers than are currently deployed. The study addresses this critical issue by focusing on data obtaining, improving deliver capabilities beyond conventional file search and download. This gap in the research makes the study timely and relevant, especially in the enhancing the efficiency and productivity of climate data analysis.
While the current focus on database operations might make the paper more suitable for a data management journal, there is potential for it to fit within Geoscientific Model Development if the numerical models of the Earth system aspects are emphasized. If the primary contribution remains database-oriented, the manuscript might not meet the scientific inquiry expectations of Geoscientific Model Development. Reframing the study to address earth system model processes more directly could improve its suitability, but if that is not feasible, the authors might consider submitting to a journal more focused on database management and database query. Please refer to the aims and scope of GMD https://www.geoscientific-model-development.net/about/aims_and_scope.html
Citation: https://doi.org/10.5194/gmd-2024-120-RC1
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
95 | 16 | 10 | 121 | 1 | 1 |
- HTML: 95
- PDF: 16
- XML: 10
- Total: 121
- BibTeX: 1
- EndNote: 1
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1