Carbon Monitor Power-Simulators (CMP-SIM v1.0) across countries: a data-driven approach to simulate daily power generation

. The impact of climate change on power demand and power generation has become increasingly signiﬁcant. Changes in temperature, relative humidity, and other climate variables affect cooling and heating demand for households and industries and, therefore, power generation. Accurately predicting power generation is crucial for energy system planning and management. It is also crucial to understand


S1.3 MARS
The Multivariate Adaptive Regression Splines (MARS) model is a flexible non-parametric regression technique that can capture complex nonlinear relationships between predictors and a response variable.MARS is a form of regression that constructs a piecewise linear model by breaking the predictor space into smaller subspaces and fitting a linear regression model to each subspace.
The MARS model builds upon the basic concept of linear regression by introducing nonlinear features and interactions between variables.It works by iteratively identifying breakpoints or knots in the predictor variables and fitting linear regression models to each segment between the breakpoints.
MARS is designed to handle both continuous and categorical variables and can automatically detect interactions between them.The model starts by creating simple linear models for each predictor and then combines them to form a more complex model.The model uses a forward selection approach to determine which variables to include in the model and where to place the breakpoints.
Max Degree: It is the maximum degree of terms generated by the forward pass.
Penalty: The MARS algorithm constructs a model by building basis functions that are combinations of simple functions such as linear, hinge, or threshold functions.Each basis function is a product of one or more simple functions, and the number of basis functions can quickly grow with the number of predictors and interactions considered.To avoid overfitting and improve the model's generalization performance, the MARS model uses a regularization penalty that penalizes the complexity of the model.The regularization penalty term is added to the objective function of the model, and it is typically a function of the sum of the absolute values of the coefficients or weights of the basis functions.The penalty encourages the MARS algorithm to choose simpler models with fewer basis functions and smaller coefficients, thereby avoiding overfitting.

S1.4 GAM
GAM stands for Generalized Additive Models.It is a statistical model that extends the linear model by allowing for non-linear relationships between the dependent variable and one or more independent variables.The principle of GAM is based on the idea that a complex relationship between the response variable and the predictor variables can be modeled as a sum of smooth functions of the predictors.
The model assumes that the response variable is a function of the predictor variables, which can be modeled using a combination of smooth functions.These smooth functions can be linear, non-linear, or a combination of both, and can be modeled using a variety of techniques, such as cubic splines or smoothing splines.
The key principle of GAM is to use these smooth functions to capture the non-linear relationship between the dependent and independent variables, without imposing any specific functional form on the relationship.GAM models can be used for both regression and classification problems, and they are particularly useful for analyzing complex relationships that cannot be modeled using linear models.
Lambda: lambda refers to the smoothing parameter that controls the amount of smoothing applied to the smooth functions.Lambda is the parameter that determines the amount of smoothing applied to the smooth functions.A small value of lambda will result in less smoothing and a more complex, wiggly fit to the data, whereas a large value of lambda will result in more smoothing and a simpler, smoother fit to the data.
N Splines: Splines are flexible functions that can approximate a wide range of non-linear relationships between variables.N splines refer to the number of spline basis functions used to model a smooth function in GAM.A spline basis function is a mathematical function that defines the shape of the spline.A spline function is a linear combination of these basis functions.

S2.
Outputs of the models for all the countries considered in this study.

Figure S1 .
Figure S1.ALE plots depicting the effect of different predictive features on the target variable.The features are divided into two categories: (a) climate features and (b) human activity features.Each ALE plot shows the partial dependence of the target variable on a single feature while controlling for the effects of all other features.The x-axis represents the range of values for each feature, and the y-axis represents the corresponding change in the predicted value of the target variable.The shaded areas represent the 95% confidence intervals for each ALE curve.

Figure S2 .
Figure S2.This figure displays validation curves for the hyperparameter of the MARS, which is the best model EU27 & UK.The curves demonstrate how changes in the values of the hyperparameters affect the performance of the model, as measured by the R 2 score.The x-axis represents the range of values for the hyperparameter, and the y-axis shows the mean R 2 cross-validation score and R 2 training score.The validation and training scores are averaged over five scores calculated through cross-validation.The shaded areas represent the 95% confidence intervals for each curve.The red dashed line indicates the value of the hyperparameter selected during the grid-search process.

Figure S3 .
Figure S3.Evolution of human activity predictive features and power demand over the model training and testing period.Shaded area represents the train periods, blank area the test periods.

Figure S4 .
Figure S4.Comparison of machine learning model performance: predicted power demand plotted against observed power demand (blue points).The red dashed line represents the 1:1 line of perfect agreement between predictions and observations.

Figure S5 .
Figure S5.Permutation feature importance scores for the five most important predictive features for four different machine learning models: Random Forest, Gradient Boosting, Multivariate Adaptive Regression Splines (MARS), and Generalized Additive Models (GAM).The x-axis represents the countries, and the y-axis the different predictive features used in the models.

Figure S7 .
Figure S7.This figure displays validation curves for the hyperparameter of the Gradient Boosting, which is the best model in Australia.The curves demonstrate how changes in the values of the hyperparameters affect the performance of the model, as measured by the R 2 score.The x-axis represents the range of values for the hyperparameter, and the y-axis shows the mean R 2 cross-validation score and R 2 training score.The validation and training scores are averaged over five scores calculated through cross-validation.The shaded areas represent the 95% confidence intervals for each curve.The red dashed line indicates the value of the hyperparameter selected during the grid-search process.

Figure S8 .
Figure S8.Evolution of human activity predictive features and power demand over the model training and testing period.Shaded area represents the train periods, blank area the test periods.

Figure S13 .
Figure S13.Evolution of human activity predictive features and power demand over the model training and testing period.Shaded area represents the train periods, blank area the test periods.

Figure S14 .
Figure S14.Comparison of machine learning model performance: predicted power demand plotted against observed power demand (blue points).The red dashed line represents the 1:1 line of perfect agreement between predictions and observations.

Figure S19 .
Figure S19.Comparison of machine learning model performance: predicted power demand plotted against observed power demand (blue points).The red dashed line represents the 1:1 line of perfect agreement between predictions and observations.

Figure S20 .
Figure S20.Permutation feature importance scores for the five most important predictive features for four different machine learning models: Random Forest, Gradient Boosting, Multivariate Adaptive Regression Splines (MARS), and Generalized Additive Models (GAM).The x-axis represents the countries, and the y-axis the different predictive features used in the models.

Figure S29 .
Figure S29.Comparison of machine learning model performance: predicted power demand plotted against observed power demand (blue points).The red dashed line represents the 1:1 line of perfect agreement between predictions and observations.

Figure S33 .
Figure S33.Evolution of human activity predictive features and power demand over the model training and testing period.Shaded area represents the train periods, blank area the test periods.