Quality assessment concept of the World Data Center for Climate and its application to CMIP5 data
Abstract. The preservation of data in a high state of quality which is suitable for interdisciplinary use is one of the most pressing and challenging current issues in long-term archiving. For high volume data such as climate model data, the data and data replica are no longer stored centrally but distributed over several local data repositories, e.g. the data of the Climate Model Intercomparison Project Phase 5 (CMIP5). The most important part of the data is to be archived, assigned a DOI, and published according to the World Data Center for Climate's (WDCC) application of the DataCite regulations. The integrated part of WDCC's data publication process, the data quality assessment, was adapted to the requirements of a federated data infrastructure. A concept of a distributed and federated quality assessment procedure was developed, in which the workload and responsibility for quality control is shared between the three primary CMIP5 data centers: Program for Climate Model Diagnosis and Intercomparison (PCMDI), British Atmospheric Data Centre (BADC), and WDCC. This distributed quality control concept, its pilot implementation for CMIP5, and first experiences are presented. The distributed quality control approach is capable of identifying data inconsistencies and to make quality results immediately available for data creators, data users and data infrastructure managers. Continuous publication of new data versions and slow data replication prevents the quality control from check completion. This together with ongoing developments of the data and metadata infrastructure requires adaptations in code and concept of the distributed quality control approach.