Using feature importance as exploratory data analysis tool on earth system models
Abstract. Machine learning (ML) models are commonly used to generate predictions, but these models can also support the discovery of new science. Generating accurate predictions necessitates that a model captures the structure of the underlying data. If the structure is properly extracted, ML could be a useful exploratory and evidential tool. In this paper, we present a case study that demonstrates the use of ML for exploratory data analysis (EDA) in the climate space. We apply the ML explainability method of spatio-temporal zeroed feature importance (stZFI) to understand how climate variable associations evolve over space and time. Our analyses focus on data from ensembles of earth systems models (ESMs), which provide data on different climate states and conditions. We elect to work with ESM ensembles since they allow us to compare feature importance across alternative scenarios not available with observed data. The ensembles also account for natural variability, so we can distinguish between signal and noise due to natural climate variability when computing feature importance. For our analyses, we consider the 1991 volcanic eruption of Mount Pinatubo: a large stratospheric aerosol injection. We explore the climate pathway associated with the eruption from aerosols to radiation to temperature at both the near-surface and stratospheric levels. In addition to applying the method to data generated from two different ESMs, we apply stZFI to reanalysis data to compare the associations identified by stZFI. We show how stZFI tracks the importance of aerosol optical depth over time on forecasting temperatures. This case study illustrates usefulness of an ML tool (stZFI) for EDA on a well studied climate exemplar.