Satellite Earth observation has led to the creation of global climate data records of many important environmental and climatic variables. These come in the form of multivariate time series with different spatial and temporal resolutions. Data of this kind provide new means to further unravel the influence of climate on vegetation dynamics. However, as advocated in this article, commonly used statistical methods are often too simplistic to represent complex climate–vegetation relationships due to linearity assumptions. Therefore, as an extension of linear Granger-causality analysis, we present a novel non-linear framework consisting of several components, such as data collection from various databases, time series decomposition techniques, feature construction methods, and predictive modelling by means of random forests. Experimental results on global data sets indicate that, with this framework, it is possible to detect non-linear patterns that are much less visible with traditional Granger-causality methods. In addition, we discuss extensive experimental results that highlight the importance of considering non-linear aspects of climate–vegetation dynamics.

Vegetation dynamics and the distribution of ecosystems are largely driven by
the availability of light, temperature, and water; thus, they are mostly
sensitive to climate conditions (

The current wealth of Earth observation data can be used for this purpose.
Nowadays, independent sensors on different platforms collect optical,
thermal, microwave, altimetry, and gravimetry information, and are used to
monitor vegetation, soils, oceans, and atmosphere (e.g.

In this article, we show new experimental evidence that advocates the need
non-linear methods to study climate–vegetation dynamics due to the
non-linear nature of these interactions

We start with a formal introduction to Granger causality for the case of two
times series, denoted as

In climate sciences, linear vector autoregressive (VAR) models are often
employed to make forecasts

Comparing the above two models,

In climate studies, the Granger-causal relationship between two time series

Similarly as above, we refer to the two models as full and baseline model,
respectively. Therefore, in the trivariate setting, Granger causality might
be tested using the following linear VAR model:

It is well known in the statistical literature that predictions made on
in-sample data, i.e. the same data that were used to fit the statistical
model, tend to be optimistic. This process is often referred to as
overfitting; i.e. by definition, the fitting process leads to parameter
values that cause the model to mimic the observed data as closely as possible

To prevent overfitting, out-of-sample data should be used in evaluating the
predictive performance in Granger-causality studies

The inclusion of a regularization term in the fitting process of over-parameterized
linear models will avoid overfitting. Typical regularizers that shrink the parameter
vectors of linear models towards 0 are L2 norms (as in ridge regression), L1 norms
(as in least absolute shrinkage and selection operator (LASSO) models), or a combination
of the two norms (as in elastic nets)

The methodology that we develop in this paper is closely connected to the
methods explained in the previous section. However, as we hypothesize that
the relationships between climate and vegetation can be highly non-linear

Compared to most application domains where random forests are applied, we
employ the algorithm in a slightly different way as an autoregressive
non-linear method for time series forecasting. In practice, this means that
we replace the full and baseline linear model of Sect.

An illustrative example of the moving window approach considered in
the analysis of vegetation drivers at a given timestamp

In our experiments, we treat each continental pixel as a separate problem
and use the Scikit-learn library

Generally, the null hypothesis (

Unfortunately, all the tests mentioned above were designed to compare the
out-of-sample prediction errors of linear parametric models

Our non-linear Granger-causality framework is used to disentangle the effect
of past climate variability on global vegetation dynamics. To this end,
climate data sets of observational nature – mostly based on satellite and
in situ observations – have been assembled to construct time series (see
Sect.

Data sets used in our experiments. Basic data set characteristics are provided, including the native spatial and temporal resolutions.

For temperature, we consider seven different products based on in situ and
satellite data: Climate Research Unit (CRU-HR)

To conclude, as a proxy for the state and activity of vegetation, we use the
third-generation (3G) Global Inventory Modeling and Mapping Studies (GIMMS)
satellite-based NDVI

In climate studies, Granger causality has already been applied on time series
of seasonal anomalies

The three components of the NDVI time series decomposition of a
specific pixel of the Northern Hemisphere (lat: 53.5

Example of lagged and cumulative variables extracted from a
temperature time series. On top is part of a raw daily time series with its
monthly aggregation. In the middle is the 4-month lag-time monthly time series.
On the bottom is the corresponding 4-month cumulative variable. The pixel
corresponds to a location in Kentucky, USA (lat: 37.5

We do not limit our approach to considering raw and anomaly time series of
the data sets in Table

Another type of higher-level predictor variable that can be constructed from
the data sets in Table

Combining the different climate and environmental predictor variables
described above, we obtain a database of 4571 predictor variables per
1

In a first experiment, we evaluate the extent to which climate variability
Granger causes the anomalies in vegetation using a standard Granger-causality
approach, in which only linear relationships between climate (predictors) and
vegetation (target variable) are considered. To this end, ridge regression is
used as a linear VAR model in the Granger-causality
approach (note that this ridge regression will be substituted by the non-linear
random forest approach in Sect.

Extreme indices considered as predictive variables. These indices
are derived from the raw (daily) data and the (daily) anomalies of the data
sets in Table

For further comparison, we analyse the predictive performance obtained when
(linear) Pearson correlation coefficients are calculated on the training data
sets, selecting the highest correlation to the target variable for any of the
4571 predictor variables at each pixel. Figure

Linear Granger causality of climate on vegetation.

Non-linear Granger causality of climate on vegetation.

To analyse the effect of climate on vegetation more thoroughly, we substitute
the linear ridge regression model (VAR) by the non-linear random forest
model. Results in Fig.

For a better understanding of the results obtained by the two models, we
average the performance of each model regionally. More specifically, we use
the International Geosphere-Biosphere Program (IGBP)

Mean

Analysis of spatiotemporal aspects of our framework.

Comparison of model performance with

Environmental dynamics reveal their effect on vegetation at different timescales.
Since the adaptation of vegetation to environmental changes requires
some time, and because soil and atmosphere have a memory, a necessary aspect
to investigate is the potential lag-time response of vegetation to climate
dynamics which relates to the ecosystem resistance and resilience properties.
The idea of exploring lag times was introduced by several studies in the past
(see, e.g.

To disentangle the response of vegetation to past cumulative climate
anomalies and climatic extremes, Fig.

Because of uncertainties in the observational records used in our study to
represent climate and predict vegetation dynamics, and given that ecosystems
and regional climate conditions usually extend over areas that exceed the
spatial resolution of these records, one may expect that the predictive
performance of our models becomes more robust when including climate
information from neighbouring pixels. In addition, it is quite likely that
neighbouring areas have similar climatic conditions which, in turn,
affect vegetation dynamics in a similar manner. We therefore also consider an
extension of our framework to exploit spatial autocorrelations, inspired by

Figure

In Sect.

The results are visualized in Fig.

In this paper, we introduced a novel framework for studying Granger causality in climate–vegetation dynamics. We compiled a global database of observational records spanning a 30-year time frame, containing satellite, in situ, and reanalysis-based data sets. Our approach consists of the combination of data fusion, feature construction, and non-linear predictive modelling. The choice of random forest as a non-linear algorithm has been motivated by its excellent computational scalability with regards to extremely large data sets, but could be easily replaced by any other non-linear machine learning technique, such as neural networks or kernel methods.

Our results highlight the non-linear nature of climate–vegetation
interactions and the need to move beyond the traditional application of
Granger causality within a linear framework. Comparisons to linear Granger-causality-based
approaches indicate that the random forest framework can
predict 14 % more variability of vegetation anomalies on average globally.
The predictive power of the model is especially high in water-limited regions
where a large part of the vegetation dynamics responds to the occurrence of
antecedent rainfall. Moreover, our results indicate the need to consider
multi-month antecedent periods to capture the effect of climate on
vegetation, in particular to account for the effects of climate extremes on
vegetation resilience. The reader is referred to

Our code (

Diego G. Miralles, Willem Waegeman, and Niko E. C. Verhoest conceived the study. Christina Papagiannopoulou conducted the analysis. Willem Waegeman, Diego G. Miralles, and Christina Papagiannopoulou led the writing. All co-authors contributed to the design of the experiments, discussion and interpretation of results, and editing of the manuscript.

The authors declare that they have no conflict of interest.

This work is funded by the Belgian Science Policy Office (BELSPO) in the framework of the STEREO III programme, project SAT-EX (SR/00/306). D. G. Miralles acknowledges support from the European Research Council (ERC) under grant agreement no. 715254 (DRY-2-DRY). W. Dorigo is supported by the “TU Wien Wissenschaftspreis 2015”, a personal grant awarded by the Vienna University of Technology. The authors thank Mathieu Depoorter and Julia Green for the fruitful discussions. Finally, the authors sincerely thank the individual developers of the wide range of global data sets used in this study.Edited by: D. Lawrence Reviewed by: two anonymous referees