Modes of climate variability strongly impact our climate and thus human society. Nevertheless, the statistical properties of these modes remain poorly known due to the short time frame of instrumental measurements. Reconstructing these modes further back in time using statistical learning methods applied to proxy records is useful for improving our understanding of their behaviour. For doing so, several statistical methods exist, among which principal component regression is one of the most widely used in paleoclimatology. Here, we provide the software ClimIndRec to the climate community; it is based on four regression methods (principal component regression, PCR; partial least squares, PLS; elastic net, Enet; random forest, RF) and cross-validation (CV) algorithms, and enables the systematic reconstruction of a given climate index. A prerequisite is that there are proxy records in the database that overlap in time with its observed variations. The relative efficiency of the methods can vary, according to the statistical properties of the mode and the proxy records used. Here, we assess the sensitivity to the reconstruction technique. ClimIndRec is modular as it allows different inputs like the proxy database or the regression method. As an example, it is here applied to the reconstruction of the North Atlantic Oscillation by using the PAGES 2k database. In order to identify the most reliable reconstruction among those given by the different methods, we use the modularity of ClimIndRec to investigate the sensitivity of the methodological setup to other properties such as the number and the nature of the proxy records used as predictors or the targeted reconstruction period. We obtain the best reconstruction of the North Atlantic Oscillation (NAO) using the random forest approach. It shows significant correlation with former reconstructions, but exhibits higher validation scores.

The interdependent components of the climate system, such as the atmosphere and the ocean, vary at different timescales. The interactions between those components

An unequivocal synchronous rise in both the greenhouse gas concentration in the atmosphere and the global mean temperature has been observed in instrumental measurements

Physics driving the climate system induce large-scale variations, organized around recurring climate patterns with specific regional impacts and temporal properties. These variations are known as climate modes of variability. Their evolution is usually quantified by an index that can be calculated from a specific observed climate variable. These indices provide an evaluation of the corresponding climate variations and their regional impacts

The dynamics of these modes are still not fully understood due to the relatively short duration of the instrumental records, which prevents robust statistical evaluation of their properties (e.g. spectrum, stability of teleconnections, underlying mechanisms). To partly overcome this limitation, reconstructions of climate beyond the period of direct measurements have been performed in numerous studies that combine appropriate statistical methods and information from proxy records. Proxy records provide indirect estimates of past local or regional climate, derived from natural archives coming for instance from sediment cores, speleothems, ice cores or tree rings. According to its nature, each proxy record has a specific temporal resolution, from years to millennia, and can cover a specific period: from hundreds to millions of years. New proxy records are continuously gathered, extending the available datasets and allowing paleoclimatologists to build increasingly consistent reconstructions

Based on the assumption that climate modes such as the NAO affect climate conditions in different locations, some studies have used regression-based methods on temperature and drought-sensitive proxy records to reconstruct the variability of these modes over the last thousand years.

More recent algorithms than PCR provide alternative regression methods that can also be used to reconstruct climate modes, and may possibly further improve the quality and the robustness of these reconstructions. In this paper, we present the computer tool ClimIndRec (Climate Index Reconstruction) version 1.0, which includes multiple statistical approaches, for reconstructing climate modes indices. It is based on four regression methods: PCR

Scheme summarizing the main features of ClimIndRec.

Section 2 develops the functioning and the added-value of ClimIndRec for climate time series reconstruction. Section 3 compares the four regression methods by reconstructing the NAO index over the last millennium and investigates the reconstruction sensitivity to methodological choices such as the method used, the learning period or the proxy records selection for regression. Finally, Sect. 4 presents a discussion including some outlooks for the next version of ClimIndRec and the conclusions of this study.

We here compare four models that all consist of regression methods among which the PCR has been used in many paleoclimate studies

In the case of the reconstruction of climate indices, regression methods seek to establish for each common time step the relationships between the proxies and the climate index to be reconstructed over the period of instrumental measurements. This set of relationships constitutes a statistical model of the considered climate index. The paleo-variations of proxy records are then translated into a climate index in the past using the relationships previously established by the statistical model. Since they all use unknown parameters, they must be optimized to make the reconstruction as robust as possible. In the case of PCR, for example, the number of principal components of the proxies used to regress the climate index directly affects the reconstruction since it modifies the set of predictors. The term “control parameter” is used to design this ensemble of parameters inherent to each method. They are identified for each method in Sect. S1. Their tuning (or optimization) using cross-validation techniques

Reconstruction of the same climate index obtained from different regression methods may significantly differ. Thus, if the same index is reconstructed using different regression methods that each suggest different interpretations of the past, it may be difficult to compare them directly. A common approach is to separate the observation years (called learning period) in two to evaluate a statistical model. The first period, called the training (or calibration) period, is used to build the model using control parameter tuning, and thus to establish relationships between the climate index and proxies. The proxies of the second period, called the testing (or validation) sample, are then translated into a climate index over the years of observations of this period. The actual values of the climate index can then be compared with the reconstructed climate index over the testing period using a given metric which will be defined in Sect. 2.3.2. It gives a score estimating the model ability to reconstruct the climate index using the first-seen data of proxies. This procedure is called the “hold-out” approach

The scores obtained for different regression methods for a given training/testing sample might be impacted by the specific sampling. This is overcome by repeating the hold-out approach several times where years of observations between the training and the testing samples are shuffled. An ensemble of scores is obtained, yielding an evaluation of the methods' ability to reconstruct the climate index. The most robust regression model is the one that has the highest scores, as it means this is the most accurate at reconstructing the climate index using the first-seen data of proxies. This most robust regression method is then applied to the whole learning period to build a final model and infer the paleo-variations of the climate index from proxy records. In our study, and by default in ClimIndRec, the determination of the testing samples is performed using a block-style approach over time. This means that the first testing period of a given size encompasses the first time steps of the learning period. This testing period is then shifted by one time step which gives the second testing period of same size, and so on until each time step of the learning period has been used at least once for testing. The reason is that for climate time series, autocorrelation is often large, so that one obtains skills from persistence alone. Thus sampling is usually used with a block-style approach for climate time series.

The reconstruction might also largely differ for a same reconstruction method according to both the proxy records used and the years of observations used. Here, the sources of uncertainties associated with the proxy selection as well as the learning period used can be reduced using the same hold-out approach with evaluation and comparison of optimal sets using scores.

The number of proxy records and the reconstruction period are thus fixed for the different training/testing period sections and the final model, in contrast with some previous studies which used nested approaches

It should be stressed that the approach of ClimIndRec implicitly assumes that the climate index to be reconstructed is a linear combination of the proxy records. It means assuming that the climate reacts to proxies, while the correct etiological relationship is the other way around

ClimIndRec has been developed using both bash and R scripts. It uses different R packages (presented Table S5 in the Supplement) that can be used independently to blindly perform reconstructions of any climate index. The added-value of ClimIndRec is to integrate the synchronous hold-out approach and cross-validation according to the user inputs (proxy records, regression method, reconstruction period targeted, proxy records pre-selection). It therefore allows several inputs to be tested and provides relevant metrics that can be used to determine the optimal regression model.

The general reconstruction and model evaluation procedure follows 12 steps (Fig. 1), applied sequentially as follows:

An observational time series representing modulations of the targeted mode of variability is chosen to be used as the predictand.

A target time period

The statistical reconstruction method to be applied is selected.

The proxy records that overlap with the selected reconstruction period are extracted to be used as predictors.

The common period

This common period is split in two, one for training the model (training period), and one for testing it (testing period). This is repeated

The proxy records that have a significant correlation at a given threshold with the climate index over the training period are selected to train the statistical model.

Each of the

The corresponding optimal setup is then applied to extend the reconstruction over the testing period for each member.

Validation scores are computed by comparing each of the observation-based testing series and each training sample-based individual reconstruction over the corresponding testing period.

The corresponding control parameters are tuned over the whole learning period

The final reconstruction is obtained by applying the final model to the proxies over the reconstruction period

Thus ClimIndRec provides the final reconstruction with associated uncertainties (Sect. S3) and a vector with

This section aims to clarify the technical details of the methodology presented in Sect. 2.1 and 2.2. It will thus call on the elements mentioned above.

To simplify the mathematical notation, we make the assumption that the proxy record selection and truncation to their common time window with the climate index have already been undertaken (see Sect. 2.2, steps 4 and 5). In this study, it is important that all proxy records are truncated to the same time window to make them mergeable in the same matrix. Each record has to cover at least the chosen reconstruction time window

Figure

Scheme of the initial data.

We denote the chosen reconstruction method by

Hence, the

We introduce

As mentioned above, the initial learning sample is split into

To estimate the optimal set of control parameters

Scheme of a

Using this approach, we retain the control parameter vector

Scheme of the whole procedure for score calculation for a given method

It should be stressed that

Once the model has been evaluated, it is launched over the whole learning set

The assessment of the proposed reconstruction techniques is investigated for the NAO index, as it is probably the mode of variability that has been observed for the longest time period. This index is indeed relatively simple to calculate from the SLP time series as it only requires two locations with instrumental records: one within the centre of action of the Azores anticyclone (typically Gibraltar) and one within the Icelandic Low (typically Reykjavik). The reference NAO index is then calculated as the normalized SLP difference between these two locations. We here use the

In terms of proxies, we use the state-of-the-art PAGES 2k database

We apply ClimIndRec with the four methods presented above to the reconstruction of the NAO. In the following, each reconstruction is obtained by averaging

pre-selecting the most relevant proxy records,

selecting the best learning period.

Boxplot of NSCE scores obtained for the four methods and different groups of proxy records by reconstructing the NAO index of the period 1000–1970 with

Among the previous climate reconstruction studies,

Figure

Overall, RF gives the best NSCE scores. Nevertheless, it should be stressed that these results have been obtained for a particular learning period (1856–1970). The sensitivity to this is assessed in the next section.

In this section, we keep for each method the optimal selection of proxy records over the training periods (see Sect. 3.1.1). We explore the impact of the reconstruction period. This affects the final reconstruction in two different ways, both related to the final proxy selection, as explained in Sect. 2.1.

We run the reconstruction for 31 periods

Following the optimal setup for each method from Sect. 3.1.1, RF uses 47 records and the three others use 21 records (Fig.

Results show that the four methods are strongly affected by the choice of the reconstruction period. Thus, we recommend determining this period carefully with different simulations in different time windows, following the approach we presented here, easily performable using ClimIndRec. Overall, this study shows that for each optimization, PCR and PLS are less reliable to reconstruct the NAO than RF and Enet (Sect. 3.1.1 and this section).

We compare and investigate the reconstruction with highest scores for each method following Sect. 3.1. The four optimized reconstructions are obtained by using the full set of proxy records for RF and only using the proxy records significantly correlated at the 95 % confidence level with the NAO index over the learning period for the other methods (see Sect. 3.1.1). RF and Enet reconstructions are performed for the period 1000–1972 while PCR and PLS reconstructions are performed for the period 1000–1970 (Sect. 3.1.2).

Figure

Red line: RF reconstruction for the period 1000–1972 (Sect. 3.1.2), using proxy records significantly correlated at the 80 % confidence level with the NAO over the training periods (Sect. 3.1.1). Blue line: Enet reconstruction for the period 1000–1971 (Sect. 3.1.2) by selecting the proxy records significantly correlated with the NAO index at the 95 % confidence level over the training periods (Sect. 3.1.1). Green line: PCR reconstruction for the period 1000–1971 (Sect. 3.1.2) by selecting the proxy records significantly correlated with the NAO index at the 95 % confidence level over the training periods (Sect. 3.1.1). Orange line: PLS reconstruction for the period 1000–1972 (Sect. 3.1.2) by selecting the proxy records significantly correlated with the NAO index at the 95 % confidence level over the training periods (Sect. 3.1.1). Thin black line: calibration-constrained reconstruction

Table of correlations between five reconstructions:

Comparison of reconstructions from this study with the original

Map of the 46 proxy records used for the reconstruction of the NAO index from

No significant correlation is found between the NAO reconstruction based on RF method and the total solar irradiance (TSI) reconstruction from

Superposed epoch analysis of the NAO response from 2 years (

The results presented above regarding the NAO have all been obtained using ClimIndRec. Indeed, they require advanced programming and statistical knowledge to ensure a good estimation of the reliability of the reconstruction performed. This is possible because ClimIndRec offers an integrated package through which parameters and methods can be efficiently tested and compared, together with reliable validation metrics such as the NSCE. Nevertheless, the methodology proposed in ClimIndRec could be further improved in different ways.

ClimIndRec does not deal with missing data in proxy records. This implies selecting exclusively the proxy records that entirely cover the reconstruction period, which thus excludes some existing proxy records. Also, proxy records with gaps are not used in the present version of ClimIndRec as their use in an interpolated version would artificially increase their weight in the reconstruction and thus possibly induce spectral artefacts in the reconstruction

Another caveat concerns the fact that the present version of ClimIndRec does not account for dating uncertainties in proxy records. Future developments of ClimIndRec may allow one to take into account these uncertainties and to provide their estimation along time. For doing so, deeper investigations for each proxy record are needed as these sources of uncertainty are not exhaustively provided in P2k2017. Also, we found that the reconstructions performed by ClimIndRec provide a clear loss of variance over the learning period and the reconstructed period (before 1856; see Table S4). The RF method is the only one that reproduces adequately the NAO amplitude only over the learning period but also provide significant loss of variance over the reconstructed period. This indicates that the loss of variance over the reconstruction period could partly be due to the proxy records themselves and not only to the statistical approach.

A key aspect that has been found within this study is the sensitivity of the results to the validation metric used. Indeed, we also used correlation as the main score for the test period. It appears that this metric was mainly capturing the phasing of the modes in their reconstruction (not shown;

We have proposed and described here four statistical methods for reconstructing modes of climate variability and have compared them for a particular example: the reconstruction of the NAO. By identifying and minimizing the sources of reconstruction uncertainty due to the method used (Sect. 3.1.1, 3.1.2), the time frame considered (Sect. 3.1.2) and the proxy selection (Sect. 3.1.1), we found the optimal NAO reconstruction. It was obtained for the RF method over the time frame 1000–1972 using the 46 proxy records available for this time frame (Sect. 3.2.1). This method has not been used yet to our knowledge for climate index reconstructions; it clearly outperforms the other methods (Sect. 3.1) and seems thus promising. The reconstruction we obtained is distinguishable from the

We have shown that for Enet, PLS and particularly PCR, which is frequently used in paleoclimatology, selecting proxy records with a strong correlation with the index to be reconstructed over the training periods is a good way to improve the NSCE scores, and hence it allows more reliable reconstructions (Sect. 3.1.1). Contrarily, RF gives more reliable reconstructions using the proxy records significantly correlated at the 80 % confidence level with the NAO (Sect. 3.1.1). This may be due to the fact that it has been mainly developed for large datasets

ClimIndRec's code and the proxy records database are available at

The supplement related to this article is available online at:

SM integrally coded CliMorRec and used it to produce the results of this study. SM was the main author of the manuscript, including figure production. DS contributed to develop the main features of ClimIndRec and supervised the manuscript writing throughout the process. PO, JM and MK contributed to writing the manuscript and discussing the results. MC contributed to writing the manuscript, with a particular focus on Sect. S1.

The authors declare that they have no conflict of interest.

To develop the statistical tool and analyse its outputs, this study benefited from the IPSL Prodiguer-Ciclad facility, supported by CNRS, UPMC Labex L-IPSL. Finally, this study used the PAGES 2k database version 2.0, available online and supported by the PAGES group.

This research has been partly funded by the Université de Bordeaux. It is also funded by the LEFE-IMAGO project VADEMECUM. Didier Swingedouw is supported by the European Commission, H2020 Research Infrastructures (Blue-Action (grant no. 727852) and EUCP (grant no. 776613)).

This paper was edited by Lauren Gregoire and reviewed by three anonymous referees.