Spy4Cast v1.0: a Python Tool for statistical seasonal forecast based on Maximum Covariance Analysis

Duran-Fonseca, Pablo; Rodríguez-Fonseca, Belén

doi:https://doi.org/10.5194/gmd-2024-164

Preprints

https://doi.org/10.5194/gmd-2024-164

Preprints

Submitted as: methods for assessment of models

11 Nov 2024

Submitted as: methods for assessment of models |

| 11 Nov 2024

Status: this preprint was under review for the journal GMD but the revision was not accepted.

Spy4Cast v1.0: a Python Tool for statistical seasonal forecast based on Maximum Covariance Analysis

Pablo Duran-Fonseca and Belén Rodríguez-Fonseca

Abstract. Maximum Covariance Analysis (MCA) is a well known discriminant analysis technique used for finding coupled patterns in climate data. This is a powerful tool that has been applied to the study of teleconnections, by reducing all possible relationships between a predictor and a predictand field to a few modes of covariability patterns. MCA can be used to provide statistical forecasts, which can complement predictions performed with dynamical models. Nevertheless, the power of this tool relies on its application in a productive and easy way, as it can be applied to the huge climate data-sets available. Spy4Cast is an open-source interface (API), implemented in Python, that contains a MCA-based statistical model to be used for seasonal forecast. Its main goal is to increase automation and productivity. Spy4Cast enables large data-set manipulation and also performs basic tasks like region slicing and plotting. The methodology consists on an initial configuration (data-set reading and slicing) and preprocessing that prepares the data to be fed into MCA, crossvalidation and validation. It acts upon any kind of predictor and predicting variables that can come from any source of data. Spy4Cast analyses the model sensitivity to particular years, including a diagnosis of the stability of the obtained modes to particular outliers. Finally, the spatial and temporal skill, in terms of anomaly correlation coefficient is obtained and a hindcast is provided. The software is easily accessible through a python package and well documented for beginners and experienced programmers. Only a reduced number of third-party libraries are needed, and they are those widely used in data-science and physics.

Received: 31 Aug 2024 – Discussion started: 11 Nov 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Pablo Duran-Fonseca and Belén Rodríguez-Fonseca

Status: closed

RC1:
'Comment on gmd-2024-164', Anonymous Referee #1, 29 Jan 2025

The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-164/gmd-2024-164-RC1-supplement.pdf

Citation: https://doi.org/10.5194/gmd-2024-164-RC1
- AC1: 'Reply on RC1', Pablo Duran-Fonseca, 01 Feb 2025
  
  Thank you for your comments.
  We will address them and get back to you as soon as possible with the corrected manuscript.
  Kind regards,
  Pablo Duran
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC1
- AC3: 'Reply on RC1', Pablo Duran-Fonseca, 26 Mar 2025
  
  Dear Anonymous Referee #1,
  Sorry for the late response. Here we include our answers to your comments.
  Thank you for your time,
  The authors.
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC3
RC2:
'Comment on gmd-2024-164', Anonymous Referee #2, 15 Mar 2025

Synopsis
As the authors note (L34-45), MCA is a well-developed and widely used method in geoclimate studies. A key strength of this work is that it provides accessible software for researchers to implement MCA in its full capacity. The model is useful for certain climate applications, such as the SST prediction demonstrated in the paper. However, its broader applicability appears limited—for instance, the current software only supports monthly data (L149). While I appreciate the effort in developing this tool for the community, I am uncertain about its potential for widespread adoption. Ultimately, the decision on its suitability for publication rests with the editor.
Major comments
Given the current scope of the software, I strongly recommend providing thorough documentation and a user manual. Additional examples of its application would further enhance its value to the community.
Furthermore, as a comprehensive tool for statistical seasonal forecasting, the software would benefit from referencing influential prior work that incorporates statistical analysis and cross-validation. This would provide a more complete methodological context. A quick Google search yields references such as 'Cross-Validation in Statistical Climate Forecast Models' by Barnston and van den Dool (1987). I would encourage the authors to conduct a more thorough search to identify additional relevant studies.
Since this software provides a ready-to-use implementation of an existing method rather than introducing a new approach, what are its future prospects? Do you have plans to expand its functionality or broaden its applicability?
Detail comments:
Line 1: Did you mean a dimension reduction technique?
Line 10: How do you test model sensitivity to particular years? Did you say this in the manuscript?
Line 11: How would you test modes to particular outliers? Did you mention this? Or is this implied by testing different batch of years?
Line 14: Is the software fully documented? As it stands, it would benefit from additional work to develop a more comprehensive manual.
Line 20-25: SST and SLP patterns which are highly ‘correlated’
You need to clarify an important distinction. Your description of the coupling between SST and SLP refers to only one phase of the Southern Oscillation or ENSO. However, it applies to both El Niño and La Niña, not just El Niño. The current wording suggests that only El Niño is being considered.
Line 28: a baseline for seasonal forecasts? Why
Line 30-33: Machine learning methods are rapidly evolving, and you should reference more recent studies to support this statement. For example, Toride et al. (2024, https://arxiv.org/abs/2404.15419) demonstrates the use of neural networks to identify physical relationships and find predictability.
Line 41-42: Instead of using the phrase 'a new paradigm,' it would be more accurate to reference earlier studies identifying the connection between ENSO and other tropical basins. For example, the connection between ENSO and the Indian Ocean Dipole (IOD) was first identified by Saji et al. (1999): A Dipole Mode in the Tropical Indian Ocean (Nature). This study demonstrated how the IOD can influence ENSO dynamics and has been foundational in the field.
Line 46-47: reliable? The citation at the end of the sentence is incomplete.
Line 56: ‘… not designed to assess stationarity’ seems to contradict Line 250: ‘Spy4Cast is able to perform a validation methodology to look for non-stationary relations.’ Line 260 as well.
Line 59: fix the reference
Line 69: unit tests?
Line 76: Section 4 is an example of using Atlantic to predict Pacific SSt. However, if you have the Sahelian rainfall example, I think it would be useful to include in your manual/documentation and showcase different settings and functionalities of your work.
Line 149: only being able to take monthly data seems limited capability to me.
Line 190: How do you determine the sample size? Monthly data are likely highly correlated, i.e., each month is not an independent point, you need to use the effective sample size when you do statistical analysis.
Line 195, 197 and more: What table or listing are you referring to? In general, when referencing your previous paper on which this software is built, I suggest specifying the relevant tables, listings, or figures. This would make it easier for users to trace the code development and better understand the overall concept.
Line 223: It says 2010 in the listing.
Line 224: You mention non-stationarity multiple times in the manuscript, but it is unclear how you determine it. Since there are various methods to assess non-stationarity, I recommend specifying the approach you used to ensure clarity.
Line 225: What does ‘a hot spot’ in climate variability studies mean?
Line 241: I think ‘can be represented’ is a more accurate phrasing.
Line 246 Us should be in math form?
Line 249: The rest of modes… this statement is misleading and not accurate. Fig 5 seems to say 68% instead of 76%?
Line 254: can you say what years? 94 and 91 for example?
Line 260: I am not sure you have explained how to use your software to determine stationarity
Line 265: This is not a ‘new’ approach but rather a ready-to-use software implementation of a well-established method for seasonal forecasting.
Line 270: What is OFF project again?
Figure 3 caption: You need to label what year this is. Is it 1997 based on List 4?
Listing 6: Do you need to ‘import Preprocess’ first in this script?

Citation: https://doi.org/10.5194/gmd-2024-164-RC2
- AC2: 'Reply on RC2', Pablo Duran-Fonseca, 17 Mar 2025
  
  Thank you for you comments.
  We will address them apporpietely and respond as soon as possible. We will integrate your observations with the ones provided by Anonymous Referee #1.
  Kind regards,
  Pablo Duran
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC2
- AC4: 'Reply on RC2', Pablo Duran-Fonseca, 08 Apr 2025
  
  Dear Anonymous Referee #1,
  
  Sorry for the late response. In this comment we attach our answers to your review.
  
  Thank you for your time,
  
  The authors.
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC4
- AC5: 'Reply on RC2', Pablo Duran-Fonseca, 08 Apr 2025
  
  Dear Anonymous Referee #2,
  The previous comment contains a typo, as it addreses Anonymous Referee #1. We intended to address Anonymous Referee #2.
  Thank you for your time,
  The authors.
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC5

Status: closed

RC1:
'Comment on gmd-2024-164', Anonymous Referee #1, 29 Jan 2025

The comment was uploaded in the form of a supplement: https://gmd.copernicus.org/preprints/gmd-2024-164/gmd-2024-164-RC1-supplement.pdf

Citation: https://doi.org/10.5194/gmd-2024-164-RC1
- AC1: 'Reply on RC1', Pablo Duran-Fonseca, 01 Feb 2025
  
  Thank you for your comments.
  We will address them and get back to you as soon as possible with the corrected manuscript.
  Kind regards,
  Pablo Duran
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC1
- AC3: 'Reply on RC1', Pablo Duran-Fonseca, 26 Mar 2025
  
  Dear Anonymous Referee #1,
  Sorry for the late response. Here we include our answers to your comments.
  Thank you for your time,
  The authors.
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC3
RC2:
'Comment on gmd-2024-164', Anonymous Referee #2, 15 Mar 2025

Synopsis
As the authors note (L34-45), MCA is a well-developed and widely used method in geoclimate studies. A key strength of this work is that it provides accessible software for researchers to implement MCA in its full capacity. The model is useful for certain climate applications, such as the SST prediction demonstrated in the paper. However, its broader applicability appears limited—for instance, the current software only supports monthly data (L149). While I appreciate the effort in developing this tool for the community, I am uncertain about its potential for widespread adoption. Ultimately, the decision on its suitability for publication rests with the editor.
Major comments
Given the current scope of the software, I strongly recommend providing thorough documentation and a user manual. Additional examples of its application would further enhance its value to the community.
Furthermore, as a comprehensive tool for statistical seasonal forecasting, the software would benefit from referencing influential prior work that incorporates statistical analysis and cross-validation. This would provide a more complete methodological context. A quick Google search yields references such as 'Cross-Validation in Statistical Climate Forecast Models' by Barnston and van den Dool (1987). I would encourage the authors to conduct a more thorough search to identify additional relevant studies.
Since this software provides a ready-to-use implementation of an existing method rather than introducing a new approach, what are its future prospects? Do you have plans to expand its functionality or broaden its applicability?
Detail comments:
Line 1: Did you mean a dimension reduction technique?
Line 10: How do you test model sensitivity to particular years? Did you say this in the manuscript?
Line 11: How would you test modes to particular outliers? Did you mention this? Or is this implied by testing different batch of years?
Line 14: Is the software fully documented? As it stands, it would benefit from additional work to develop a more comprehensive manual.
Line 20-25: SST and SLP patterns which are highly ‘correlated’
You need to clarify an important distinction. Your description of the coupling between SST and SLP refers to only one phase of the Southern Oscillation or ENSO. However, it applies to both El Niño and La Niña, not just El Niño. The current wording suggests that only El Niño is being considered.
Line 28: a baseline for seasonal forecasts? Why
Line 30-33: Machine learning methods are rapidly evolving, and you should reference more recent studies to support this statement. For example, Toride et al. (2024, https://arxiv.org/abs/2404.15419) demonstrates the use of neural networks to identify physical relationships and find predictability.
Line 41-42: Instead of using the phrase 'a new paradigm,' it would be more accurate to reference earlier studies identifying the connection between ENSO and other tropical basins. For example, the connection between ENSO and the Indian Ocean Dipole (IOD) was first identified by Saji et al. (1999): A Dipole Mode in the Tropical Indian Ocean (Nature). This study demonstrated how the IOD can influence ENSO dynamics and has been foundational in the field.
Line 46-47: reliable? The citation at the end of the sentence is incomplete.
Line 56: ‘… not designed to assess stationarity’ seems to contradict Line 250: ‘Spy4Cast is able to perform a validation methodology to look for non-stationary relations.’ Line 260 as well.
Line 59: fix the reference
Line 69: unit tests?
Line 76: Section 4 is an example of using Atlantic to predict Pacific SSt. However, if you have the Sahelian rainfall example, I think it would be useful to include in your manual/documentation and showcase different settings and functionalities of your work.
Line 149: only being able to take monthly data seems limited capability to me.
Line 190: How do you determine the sample size? Monthly data are likely highly correlated, i.e., each month is not an independent point, you need to use the effective sample size when you do statistical analysis.
Line 195, 197 and more: What table or listing are you referring to? In general, when referencing your previous paper on which this software is built, I suggest specifying the relevant tables, listings, or figures. This would make it easier for users to trace the code development and better understand the overall concept.
Line 223: It says 2010 in the listing.
Line 224: You mention non-stationarity multiple times in the manuscript, but it is unclear how you determine it. Since there are various methods to assess non-stationarity, I recommend specifying the approach you used to ensure clarity.
Line 225: What does ‘a hot spot’ in climate variability studies mean?
Line 241: I think ‘can be represented’ is a more accurate phrasing.
Line 246 Us should be in math form?
Line 249: The rest of modes… this statement is misleading and not accurate. Fig 5 seems to say 68% instead of 76%?
Line 254: can you say what years? 94 and 91 for example?
Line 260: I am not sure you have explained how to use your software to determine stationarity
Line 265: This is not a ‘new’ approach but rather a ready-to-use software implementation of a well-established method for seasonal forecasting.
Line 270: What is OFF project again?
Figure 3 caption: You need to label what year this is. Is it 1997 based on List 4?
Listing 6: Do you need to ‘import Preprocess’ first in this script?

Citation: https://doi.org/10.5194/gmd-2024-164-RC2
- AC2: 'Reply on RC2', Pablo Duran-Fonseca, 17 Mar 2025
  
  Thank you for you comments.
  We will address them apporpietely and respond as soon as possible. We will integrate your observations with the ones provided by Anonymous Referee #1.
  Kind regards,
  Pablo Duran
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC2
- AC4: 'Reply on RC2', Pablo Duran-Fonseca, 08 Apr 2025
  
  Dear Anonymous Referee #1,
  
  Sorry for the late response. In this comment we attach our answers to your review.
  
  Thank you for your time,
  
  The authors.
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC4
- AC5: 'Reply on RC2', Pablo Duran-Fonseca, 08 Apr 2025
  
  Dear Anonymous Referee #2,
  The previous comment contains a typo, as it addreses Anonymous Referee #1. We intended to address Anonymous Referee #2.
  Thank you for your time,
  The authors.
  
  Citation: https://doi.org/10.5194/gmd-2024-164-AC5

Pablo Duran-Fonseca and Belén Rodríguez-Fonseca

Viewed

Total article views: 446 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
327	92	27	446	20	24

HTML: 327
PDF: 92
XML: 27
Total: 446
BibTeX: 20
EndNote: 24

Views and downloads (calculated since 11 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	79	14	3	96
Dec 2024	32	7	2	41
Jan 2025	30	9	2	41
Feb 2025	31	3	2	36
Mar 2025	56	16	4	76
Apr 2025	47	23	4	74
May 2025	15	7	1	23
Jun 2025	33	13	9	55
Jul 2025	4	0	4

Cumulative views and downloads (calculated since 11 Nov 2024)

Month	HTML	PDF	XML	Total
Nov 2024	79	14	3	96
Dec 2024	32	7	2	41
Jan 2025	30	9	2	41
Feb 2025	31	3	2	36
Mar 2025	56	16	4	76
Apr 2025	47	23	4	74
May 2025	15	7	1	23
Jun 2025	33	13	9	55
Jul 2025	4	0	4

Viewed (geographical distribution)

Total article views: 441 (including HTML, PDF, and XML) Thereof 441 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 10 Jul 2025

Short summary

This paper describes the first release of Spy4Cast, a python interface to run a maximum covariance analysis model to produce seasonal forecast. This API allows the user to increase automation and productivity, including determination of modes, crossvalidation hindcast and validation. It includes a visualisation module for the results as well as a preprocessing tool that can be also used for other climate variability studies.


Total:	0
HTML:	0
PDF:	0
XML:	0