the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
An improved method of the Globally Resolved Energy Balance model by the Bayesian networks
Zhenxia Liu
Zengjie Wang
Jian Wang
Zhengfang Zhang
Dongshuang Li
Zhaoyuan Yu
Linwang Yuan
The accurate simulation of climate is always critically important and also a challenge. This study introduces an improved method of the Globally Resolved Energy Balance (GREB) model by the Bayesian networks based on the concept of a coarse–fine model. The improved method constructs a coarse–fine structure that combines a dynamical model with a statistical model based on employing the GREB model as the global framework and utilizing a Bayesian network constructed on the interrelationships between internal climate variables of the GREB model to achieve local optimization. To objectively validate the performance and generalization of the improved method, the method is applied to the simulation of surface temperature and temperature of the atmosphere based on the 3.75^{∘} × 3.75^{∘} global data sets by the National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR) from 1985 to 2014. The results demonstrate that the improved model exhibits higher average accuracy and lower spatial differentiation than the original GREB model and is robust in longterm simulations. This approach addresses issues with the accuracy of the GREB model in local areas, which can be attributed to an overreliance on boundary and initial conditions, as well as a lack of fully usable observed data. Additionally, the model overcomes the challenge of poor robustness in statistical models due to ambiguous climate inclusions. Thus, the improved method provides a promising way to give a reliable and stable simulation of climate.
As the global warming progresses, extreme events and meteorological disasters occur frequently (Grant, 2017). Thus, the simulation and prediction of climate have become an important topic in current scientific research for the conceptual understanding and development of hypotheses for climate change studies (Dommenget and Flöter, 2011; Huang et al., 2019). Climate models are mathematical models that describe the temporal evolution of climate, oceans, atmosphere, ice, and landuse processes across a spatial domain via systems of partial differential equations (Berrocal et al., 2012), which can be solved by supercomputer and are an important tool for simulating and predicting future climate change (Kay, 2020).
Generally, climate models mainly include two categories: a dynamic model and a statistical model. A dynamic model can well understand and express the dynamic process of climate by modeling various complex climate processes or interactions, but it still faces two major problems: (i) the simulation process overly relies on initial conditions and boundary conditions (Alley et al., 2019; Zhang et al., 2019; Ludescher et al., 2021); (ii) the climate model is too complicated, and its internal characteristics cannot be fully expressed (Fan et al., 2021; Zou et al., 2019; Feng et al., 2020). The Globally Resolved Energy Balance (GREB) model is a simple but representative dynamic model, which is based on energy balance theory (Dommenget and Flöter, 2011). Compared with other dynamic models, the GREB model is a relatively fast tool for the conceptual understanding and development of hypotheses for climate change studies, because it computes about one model year per second on a standard personal computer, which allows conducting sensitivity studies to external forcing within minutes to hours (Dommenget and Flöter, 2011; Dommenget, 2016; Stassen et al., 2019). However, in addition to the two main problems of dynamic models, the GREB model also faces the problem that the model does not respond well to anomalous climate change because the parameters of the GREB model are predetermined and the observed data can hardly be used to dynamically correct the model parameters (Dommenget and Flöter, 2011; Dommenget, 2016). How to solve these problems is an important research topic to improve the GREB model and further extend it to other dynamic models.
On the contrary, a statistical model, as another type of climate models, can make good use of historical observation data to dynamically modify the models from data (Feng et al., 2020) and solve the problem that dynamic climate models rely too much on initial and boundary conditions and underutilize full observation data. Therefore, it provides a possible way to solve those defects of the dynamical model by combining that with the statistical model (Chou, 1986, 2003). A Bayesian network is a statistical method which combines graph theory and probability (Cai et al., 2013, 2019; Jansen et al., 2003). The method uses a graph to express the structure relation of the variables related to the model and has the characteristics of structuring and quantifying the object relation through the causal relation among the parts of the probability computing system (Pearl, 1986), variable logic reasoning and predictive simulation can be realized, and it can use a large amount of historical observation data. As described, it is a possible way to improve the GREB model by the Bayesian networks.
The concept of a coarse–fine model provides a joint modeling approach of dynamical–statistical hybrid model that is different from the traditional use of statistical model to optimize the empirical parameters of the dynamical model. It starts from different coarse and fine granularity of the model (Akgul and Kambhamettu, 2003; Pal and Bhattacharya, 2010; Yibo et al., 2009), uses the dynamical model as a global framework and uses the statistical model to perform local optimization, and realizes the unified modeling of both. Based on this idea, this paper introduces a method for improving the GREB model by the Bayesian networks. The aim of method is to solve the problem of low model accuracy due to overreliance on boundary conditions and initial conditions and an inability to fully utilize historical observation data. The following section presents the improved method. Section 3 presents the study case and data sets to test the new improved model. Finally, we give a discussion and conclusion of the results.
The improved method is developed according to the following procedure. Firstly, climate variables representing different climate processes are chosen as nodes in the Bayesian networks constructed by the GREB model. And the structural relationships among different nodes are determined to establish an abstract model of the components and structural relationships of climate processes. Secondly, the selected climate variables are categorized into variable ranges based on their numerical values to form different classifications that are used to indicate a different climate state. Thirdly, the climate state simulation method is reconstructed based on the Bayesian networks and climate evolution process to achieve the simulation of the target variable climate state. Finally, the climate state simulation results obtained from the Bayesian networks are compared with the climate model simulation results from the original GREB model to get the local optimization grids, and the numerical results of the original GREB model simulation are optimized based on the comparison results. Based on the above considerations, the improved method is developed according to the following procedures (Fig. 1).
2.1 Structural relationship among climate variables
Based on the energy balance, the GREB model can simulate the main characteristics and climate mean states of global warming, including seven climate processes (solar radiation, thermal radiation, hydrological cycle, sensible heat and atmospheric temperature, atmospheric circulation, sea ice, and deep ocean) and four main climatic variables (surface temperature, temperature of the atmosphere, temperature of the subsurface of the ocean, and humidity of the surface). Each of these processes is represented with strongly simplified equations. Therefore, we can abstract the structural relationship among different climate variables from the simplified equation, i.e., which climate variables control a given climate variable and which climate variables are influenced by it. This structural relationship provides the possibility to construct Bayesian networks.
2.2 Categorization of climate state
According to the theory of climate sensitivity (Annan and Hargreaves, 2006; Dommenget, 2016), climate state, indicated by a range of numerical values, can be used to replace the specific numerical values to simulate climate change and characterize the longterm trend of climate change and extreme weather conditions. And it is better suited to capture the similarity of a given climate variable across different spatial and temporal locations compared to the numerical values of specific climate variables. Therefore, it can be used to assess the similarity between the simulated results of a model and the actual results, indicating the accuracy of the simulation. This provides a simple and practical approach to evaluating the accuracy of revealing local abrupt changes in simulation results. Moreover, by simulating state rather than a specific numerical value, it is possible to significantly reduce computational effort and simulation response time. This is consistent with the primary objective of the GREB model, which is to provide a fast tool for the conceptual understanding and development of hypotheses for climate change studies (Dommenget and Flöter, 2011).
The natural breaks classification (Jenks) method is a commonly used classification method that aims to minimize intraclass variation and maximize interclass variation. By categorizing the numeric values of climate variables into different classifications to indicate the climate state using the natural breaks classification method, it can be considered that numeric values within the same classification have less variation, representing the fact that the results of this classification of numeric values have a similar climate state.
2.3 Climate state simulation
According to the characteristics of Bayesian networks, the climate state simulation of climate variables is realized by a climate evolution process based on Bayesian networks; i.e., the climate state of unknown climate variables is inferred from the climate state of known climate variables at the same spatial locations.
2.3.1 Bayesian networks
A Bayesian network is a probabilistic model that simulates the human reasoning process, which is a combination of graph theory and probability theory, and its network topology is a directed acyclic graph, where variables are nodes and correlations or causal relationships between variables are directed edges. The dynamic evolution of Bayesian network node probabilities is controlled by conditional probabilities, and each node covers a probability distribution table under the joint distribution of the parent nodes, indicating the strength of the relationship between the nodes (Sahin et al., 2019). When the Bayesian network is constructed, given the state of any node, the probability distribution of the states of the remaining nodes can be calculated.
In the Bayesian networks, the probability of a node can be calculated in the form of probability using prior knowledge and statistical data, namely the Bayes probability (Maher, 2010). Observed sample are defined as $G=\left\{{X}_{\mathrm{1}}={x}_{\mathrm{1}},{X}_{\mathrm{2}}={x}_{\mathrm{2}},\mathrm{\cdots},{X}_{n}={x}_{n}\right\}$, where X is the event and x is the event value or state. When θ is the prior probability of event X=x, ζ is prior knowledge, and P(θζ) is the probability density function, then the probability $P\left({X}_{n+\mathrm{1}}={x}_{n+\mathrm{1}}\mathit{\theta},\phantom{\rule{0.25em}{0ex}}\mathit{\zeta}\right)$ of the n+1 event ${X}_{n+\mathrm{1}}={x}_{n+\mathrm{1}}$ can be obtained from the prior probability density P(θζ) and the sample G through the Bayes probability. It can be calculated by the total probability formula:
Based on the Bayes equation, the posterior probability $P\left(\mathit{\theta},G,\mathit{\zeta}\right)$ is denoted as
where G is the given sample, ζ is the prior probability of G, and θ is the posterior probability of G.
2.3.2 Climate evolution process based on Bayesian networks
In a climatic process composed of several climatic variables, there is an association relationship between climatic variables. These climatic variables are regarded as network nodes, and the association relations between climatic variables are taken as directed edges. The association relationship between nodes is represented by the graph model, and the action intensity of the association relationship is described quantitatively by the conditional probability table. Using the characteristics of Bayesian networks, the attribute feature state of nodes is inferred by probability to realize the expression and simulation of the attribute feature state of geographical variables.
A climate process ${M}_{t}=\left\{X\left({m}_{\mathrm{1}},{m}_{\mathrm{2}},\mathrm{\dots},{m}_{i}\right){m}_{\mathrm{1}t},{m}_{\mathrm{2}t},\mathrm{\dots},{m}_{it}\right\}$ is composed with i climate variables, ${m}_{\mathrm{1}},{m}_{\mathrm{2}},\mathrm{\dots},{m}_{i}$, and $X\left({m}_{\mathrm{1}},{m}_{\mathrm{2}},\mathrm{\dots},{m}_{i}\right)$ is the structural relationship among the variables. Suppose that the climate variable m_{i} has j states, then the states set of m_{i} is $\left\{{W}_{{m}_{i\mathrm{1}}},{W}_{{m}_{i\mathrm{2}}},\mathrm{\dots},{W}_{{m}_{ij}}\right\}$. The climate process is described by a Bayesian network $B=\left(S,X\right)$, where S is a directed acyclic graph composed of nodes; X is the node set of the graph, that is, climate variables ${m}_{\mathrm{1}},{m}_{\mathrm{2}},\mathrm{\dots},{m}_{i}$. Nodes are connected by directed edges to represent the relationship between climate variables. Each node has an independent conditional probability table, which represents the probability distribution under the joint distribution of its parent nodes. Assume that a climate m_{i} has one or more parent nodes ${m}_{\mathrm{1}},{m}_{\mathrm{2}},\mathrm{\dots},{m}_{e}\left(e\le i\mathrm{1}\right)$ and states ${d}_{\mathrm{1}},{d}_{\mathrm{2}},\mathrm{\dots}{d}_{e}$, and it can be denoted as ${m}_{\mathrm{1}},{m}_{\mathrm{2}},\mathrm{\dots},{m}_{e}\to {m}_{i}$. Under the parent node of all possible states, the conditional probability table composed of the set of state probabilities of m_{i} is as follows:
where ${B}_{{m}_{i}}^{{W}_{{m}_{\mathrm{1}r\mathrm{1}}},{W}_{{m}_{\mathrm{2}r\mathrm{2}}},\mathrm{\dots},{W}_{{m}_{ere}}}$ is a conditional probability table of climate variables m_{i}; ${W}_{{m}_{ij}}$ is the jth characteristic state of climate variables m_{i}; and ${P}_{{W}_{{m}_{ij}}}^{{W}_{{m}_{\mathrm{1}r\mathrm{1}}},{W}_{{m}_{\mathrm{2}r\mathrm{2}}},\mathrm{\dots},{W}_{{m}_{ere}}}$ is the probability of climate variable m_{i} that corresponds to the jth state under the $r\mathrm{1},r\mathrm{2},\mathrm{\dots}re$ characteristic state corresponding to the parent node ${m}_{\mathrm{1}},{m}_{\mathrm{2}},\mathrm{\dots},{m}_{e}$ expression set. The probability set of climate variable m_{i} at t moment can be denoted as ${C}_{{m}_{it}}$:
The conditional probability table of each node can be calculated by Eq. (2) using training data.
2.4 Local optimization
The numerical results simulated by the original GREB model are compared with the climate state results simulated by the Bayesian networks, and the grids where the numerical result simulated by the original GREB model are not in the range of the climate state simulated by the Bayesian networks are used as grids to be optimized.
According to the Third Law of Geography (Zhu et al., 2018), the more similar the geographic environment, the more similar the geographic target characteristics are. Therefore, for an unknown climate variable at a certain spatial and temporal location, the numerical values of other known climate variables at that spatial and temporal location can be used to infer. Accordingly, we propose that for an unknown climate variable, the position of its specific value in the range of its classification is related to the position of the specific value of the known climate variable in the range of its classification at the same spatial and temporal location. For a climate variable containing n relevant control variables, the numerical results are calculated as follows:
where ${E}_{\mathrm{value}}^{x}$ represents an unknown climate variable, ${S}_{\text{lower limit}}^{E}$ represents the lower limit of the range of classification in which the unknown climate variables are simulated by Bayesian networks, ${S}_{\text{upper limit}}^{E}$ represents the lower limit of the range of classification in which the unknown climate variables are simulated by Bayesian networks, n represents the number of known climate variables associated with the unknown variables in the Bayesian networks, ${E}_{\mathrm{value}}^{i}$ represents the actual value of the ith known climate variable, ${S}_{\text{lower limit}}^{i}$ represents the lower limit of the range of classification in which the ith known climate variables are in the simulation process by Bayesian networks, and ${S}_{\text{upper limit}}^{i}$ represents the upper limit of the range of classification in which the ith known climate variables are in the simulation process by Bayesian networks.
According to the above method, we can improve the accuracy of the model by comparing the climate state, identifying the grid to be optimized, and recalculating the values simulated by the original GREB model within the grid. In this way, the improved model with coarse–fine structure constructed with the GREB model as the global framework and the Bayesian networks as the local optimization can better reflect the localized abrupt changes in the climate process and achieve the purpose of improving the GREB model.
In order to demonstrate the accuracy of the improved model in simulating climate variables and to verify its reliability, surface temperatures and temperature of the atmosphere from the GREB model were selected for simulation objects. The simulation of these two climate variables includes most of the climate processes of the GREB model and can reflect the complex coupling process and climate change characteristics of the GREB model.
3.1 Data description
In this paper, data produced by the National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR) are used as the experimental data to evaluate the improved model. The data sets include surface temperature (T_{surf}), temperature of the atmosphere (T_{atmos}), solar radiation (F_{solar}), total cloud cover (CLD), water vapor (q_{air}), temperature of the subsurface ocean (T_{ocean}), and wind speed (u) stored as a 3.75^{∘} × 3.75^{∘} (latitude × longitude) grid NC data from 1985 to 2014. In order to facilitate calculation and comparative analysis, all climate data are preprocessed. Firstly, the downloaded climate data are removed from the outliers so that the data are calculated to avoid too large or too small results; secondly, the grid data are resampled, and the resampling method is bilinear interpolation. The bilinear interpolation method is used to interpolate the climate data, which not only fills the null values, but also unifies the scale size of the data. Finally, considering that changes in climate variables are usually seasonally related, climate data from 1985 to 2014 were processed as quarterly averages, where January, February, and March comprised the first quarter; April, May, and June formed the second quarter; July, August, and September constituted a third quarter; and October, November, and December comprised the fourth quarter.
3.2 Structural relationship among climate variables and climate state
The process of simulating the surface temperature includes solar radiation, thermal radiation, sensible heat and atmospheric temperature, and deep ocean (Dommenget and Flöter, 2011). The main heat source of the surface temperature is solar radiation: some is absorbed by the surface temperature; the other part is reflected by the surface temperature, and part of the heat on the surface temperature is transferred in the atmosphere; and some of it is transferred to the ocean below the surface. Each climate variable in this scene can be expressed by a highly simplified equation, which follows the surface temperature tendency equation as follows:
where T_{surf} is surface temperature, γ_{surf} is surface heat capacity, F_{solar} is the incoming solar radiation, F_{thermal} is the net thermal radiation, F_{latent} is the cooling by latent heat from surface evaporation of water, F_{sense} is the turbulent heat exchange with the atmosphere, and F_{ocean} is the heat exchange with the deeper subsurface ocean. The subprocesses of surface temperature are modeled as follows:
where F_{solar} is the incoming solar radiation, α_{clouds} is the fraction of the incoming solar radiation reflected by clouds, α_{surf} is the fraction of the incoming solar radiation reflected by the surface, S_{0} is the solar constant, α_{surf} is the fraction of the incoming solar radiation reflected by the surface, r is the 24 h mean fraction reaching a normal surface area on top of the atmosphere, ϕ is the function of latitude, t_{julian} is the Julian day of the calendar year, F_{thermal} is the net thermal radiation, T_{surf} is surface temperature, ε_{atmos} is the effective emissivity, T_{atmos−rad} is the temperature defined in the context of the atmospheric temperature, CLD is the total cloud cover, ε_{0} is the emissivity without considering clouds first, pe_{*} is the parameters, F_{latent} is the cooling by latent heat from surface evaporation of water, L is the constant parameters of the latent heat of evaporation and condensation of water, ρ_{air} is the density of air, C_{w} is the transfer coefficient, u_{*} is the wind speed, υ_{soil} is the bulk formula extended by a surface wetness fraction, q_{air} is the actual surface air layer humidity, q_{sat} is the saturation surface air layer specific humidity, F_{sense} is the turbulent heat exchange with the atmosphere, c_{atmos} is the coupling constant, T_{atmos} is temperature of the atmosphere, F_{ocean} is the heat exchange with the deeper subsurface ocean, Fo_{sense} is the turbulent mixing between the two ocean layers, ΔT_{entrain} is the heat exchange with the surface ocean layer due to decreasing of the mixed layer depth, c_{ocean} is the coupling constant, and T_{ocean} is the temperature of the subsurface ocean.
In the process of simulating the temperature of the atmosphere by the GREB model (Dommenget and Flöter, 2011), the temperature of the atmosphere is not only related to the thermal radiation reflected from the surface, but also related to the sensible heat exchange with the surface and latent heat release by condensation of atmospheric water vapor. Each climate variable in this process can be expressed by a highly simplified equation, which follows the temperature of the atmosphere tendency equation as follows:
where T_{atmos} is temperature of the atmosphere, γ_{atmos} is atmospheric heat capacity, F_{sense} is the sensible heat exchange with the surface, Fa_{thermal} is net thermal radiation of the atmosphere, Q_{latent} is the latent heat release by condensation of atmospheric water vapor, and u is the wind speed. The subprocesses of temperature of the atmosphere are modeled as follows:
where F_{sense} is the turbulent heat exchange with the atmosphere, c_{atmos} is the coupling constant, T_{atmos} is temperature of the atmosphere, Fa_{thermal} is net thermal radiation of the atmosphere, T_{surf} is surface temperature, ε_{atmos} is the effective emissivity, T_{atmos−rad} is the temperature defined in the context of the atmospheric temperature, Q_{latent} is the latent heat release by condensation of atmospheric water vapor, Δq_{precip} is the condensation or precipitation, and L is the constant parameters of the latent heat of evaporation and condensation of water.
For different climate processes, the climate subprocesses and relationship structures are different. Therefore, the selection of nodes in each climate process will also be different. Not only is the selection of appropriate variables as nodes very important, but also the number of nodes will directly affect the simulation of the final climate average state. In order to simplify the complex climate evolution process and facilitate calculation, four to six climate variables are selected as key nodes in each climate process, and the variable climate state in each process is simulated by these nodes.
Through the trend equations (Eqs. 6 and 7) in the processes of surface temperature, the relation equation of climate variables can be simplified:
where T_{surf} is the surface temperature; F_{solar} is solar radiation; T_{ocean} is the temperature of the subsurface ocean; q_{air} is the actual surface air layer humidity, i.e., water vapor content; and CLD is the total cloud cover. That is, the surface temperature, solar radiation, temperature of the subsurface ocean, total cloud cover, and water vapor content can be selected as the key nodes of the surface temperature process.
Through the trend equation (Eqs. 8 and 9) in the processes of the temperature of the atmosphere, the relation equation of climate variables can be simplified:
where T_{atmos} is the temperature of the atmosphere, u is the wind speed, CLD is the total cloud cover, and q_{air} is the water vapor content. That is, the temperature of the atmosphere, wind speed, total cloud cover, and water vapor content can be selected as the key nodes of the temperature of the atmosphere process.
According to Eq. (10), in the surface temperature process, the surface temperature (T_{surf}) is controlled by solar radiation (F_{solar}), total cloud cover (CLD), water vapor (q_{air}), and temperature of the subsurface ocean (T_{ocean}). The temperature of the subsurface ocean (T_{ocean}) and water vapor (q_{air}) are controlled by solar radiation (F_{solar}). For the above relationship, the Bayesian network structure in the surface temperature process can be constructed (Fig. 2a). According to Eq. (11), in the temperature of the atmosphere process, the temperature of the atmosphere (T_{atmos}) is controlled by cloud cover (CLD), water vapor (q_{air}), and wind speed (u). And water vapor (q_{air}) is controlled by wind speed (u). For the above relationship, the Bayesian network structure in the temperature of the atmosphere process can be constructed (Fig. 2b).
The climate state of the variables in the above climate processes was performed using the natural breaks classification method. The climate variable data are categorized into five, seven, and nine different classifications to indicate different climate states to test the improved model and verify the effect of the classification number of climate variable data on the simulation results. Detailed schemes are shown in Appendix Tables B1, B2, and B3.
3.3 Climate state simulation
Surface temperature and temperature of the atmosphere are considered as a simulation object and other climate variables as known objects, and historical data are used to calculate the conditional probability tables of each nodes through the Bayesian network structure with Eq. (4). Among them, the training data are the 10year historical data from 1985 to 1994.
In each simulation process, there are two training methods for the simulated object. The first is to train a conditional probability table using the data in all the grids, and the second is to use the conditional probability table to simulate the states of all grids. The conditional probability table obtained by this training method can reflect the numerical characteristic relationship between climate variables in the whole region. However, it cannot show the distribution pattern of the characteristics of the simulated state in space. The second is to train the data in each grid separately. Because the state grading data in each grid is different, the conditional probability table of the simulated object trained in each grid is also different, and a total of 96×48 conditional probability tables are obtained. The conditional probability tables obtained by this training method can accurately reflect the different numerical characteristic relationship between the simulated object and the known object in different regions. However, due to the training of more conditional probability tables, the running time of this training method will be a little longer. Considering the great differences in the pattern of climate evolution in different regions, this paper uses the second data training method in state simulation, which first divides the whole world into 96×48 grids and then uses the data in each grid to train the conditional probability tables of the grid. After the training is completed, the data of the known climate variables will be used to simulate the unknown climate variables from 1995 to 2014. The simulation results are shown in Figs. 3 and 4.
Figure 3 shows the climate state of the quarterly average of surface temperature from 1995–2014. The simulation results under different classifications all clearly show the global quarterly average surface temperature distribution with latitudinal variations. The surface temperature starts from the Equator and decreases with the increase in latitude, so the temperature in the North and South Pole is the lowest. The climate state distribution of surface temperature is basically in line with the real world. Different from the simulation result of surface temperature, the quarterly average temperature of the atmosphere rises from the Equator and increases with the increase in latitude in Fig. 4, which is also basically in line with the real world. The tropospheric height of the poles is lower, and the tropospheric height of the Equator is higher – a phenomenon which leads to the result that the temperature of the troposphere at the same height is higher in the poles.
3.4 Local optimization
After the climate state simulation of Bayesian networks, the numerical results simulated by the original GREB model are compared with the climate state results simulated by the Bayesian networks, and the grids where the numerical results simulated by the original GREB model are not in the range of the climate state simulated by the Bayesian networks are used as grids to be optimized. The GREB model uses the model code sourced from the Monash Simple Climate Model (MSCM) laboratory repository for its implementation, and we run the code in the FORTRAN language.
Based on the optimized area of surface temperature simulations and temperature of the atmosphere obtained from the climate state accuracy comparison (see Appendix A for details), the original GREB model simulation results of the grid to be optimized in the optimized area are recalculated according to Eq. (5).
In terms of surface temperature simulation, the original GREB model at low latitudes shows high state accuracy, so a local optimization scheme is used only for the middle and high latitudes. The empirical parameter for the optimization range of the surface temperature simulation has been determined to be 90 to 30^{∘} N and 30 to 90^{∘} S. The quarterly average surface average temperature simulated by the improved model for the period 1994–2015 is presented in Fig. 5. In terms of the temperature of the atmosphere simulation, a spatially global optimization approach has been chosen, owing to the higher global state accuracy of the Bayesian networks. The quarterly average temperature of the atmosphere simulated by the improved model for the period 1994–2015 is presented in Fig. 6. The details in Figs. 5 and 6 show that the optimization data results are well characterized by localized abrupt changes, which means that the improved model is able to effectively address the inadequate response of the original GREB model to localized abrupt changes.
3.5 Evaluation of improved model
In order to evaluate the simulation accuracy of the improved model (the optimized GREB model based on Bayesian networks of climate state), the root mean square error (RMSE) between the simulated and actual values is defined to evaluate the model:
where S_{i} represents the simulated value and A_{i} represents the actual value; when analyzed spatially n represents the length of time, whereas when analyzed temporally n represents the number of grids in space. The accuracy of original GREB model simulation result was used as a comparison.
The mean values of the RMSE between the simulated results of the original GREB model, as well as the improved models based on five, seven, and nine classifications, and the observed values for surface temperature were 13.26, 8.66, 8.85, and 9.81, respectively. For the temperature of the atmosphere, the corresponding mean values of the RMSE were 72.19, 22.77, 20.12, and 17.76, respectively. This result shows that the improved method significantly reduces the RMSE of the simulation; i.e., it improves the simulation accuracy. However, there are also significant differences between the surface temperature and the temperature of the atmosphere. There is no significant relationship between the RMSE and classification in the simulation of surface temperature, while the RMSE decreases with increasing classification in the simulation of temperature of the atmosphere.
Figure 7 depicts the spatial distribution of the RMSE between the simulated surface temperature and the observed values for the original GREB model (Fig. 7a) and the improved models based on five, seven, and nine classifications (Fig. 7b–d). The comparison shows that the improved model significantly improves the simulation accuracy of the surface temperature in the polar regions. Figure 8 depicts the spatial distribution of the RMSE between the simulated temperature of the atmosphere and the observed values for the original GREB model (Fig. 8a) and the improved models based on five, seven, and nine classifications (Fig. 8b–d). The comparison shows that the improved model significantly improves the simulation accuracy of temperature of the atmosphere at mid and lowlatitude regions. This is also well verified by the variation curve of the RMSE along the latitude direction shown in Fig. 9.
Figure 10 shows the quarterly variability and trends of the RMSE between 1995 and 2014 for both the surface temperature and temperature of the atmosphere. The comparison results demonstrate that the improved model significantly reduces the RMSE and exhibits temporal stability, indicating the robustness of the improved model. Moreover, the RMSE curves of the improved models exhibit the same seasonal cycle as the original GREB model, with the smallest RMSE occurring in the fourth quarter and the largest in the third quarter. This seasonal pattern can be attributed to the fact that the improved model is based on the modeling of the climate variable relationship within the GREB model, thus exhibiting similar temporal variation characteristics to those of the GREB model, which reflects the coarse–fine structure of improved model with the original GREB model as the global framework. The RMSE trends over time demonstrate that the improved model is temporally stable, and its accuracy does not deviate over time. This renders the improved model suitable for simulating surface temperature and temperature of the atmosphere over long time series.
In this study, we introduced a coarse–fine structure to improve the GREB model based on Bayesian networks. The improved model uses the GREB model as the basis of the global simulation framework and uses the Bayesian networks to do local optimization. By introducing a Bayesian networks, the results of the original GREB model are quickly evaluated with the climate state as the evaluation index, the local optimization region is confirmed, and the simulation results of the GREB model within the optimization region are recalculated, which improves the model accuracy significantly.
The improved model was evaluated by two cases: surface temperature and temperature of the atmosphere. The simulation results of the improved model show that the improved model has higher average accuracy and lower spatial variability compared to the original GREB model. This means that the improved model has better applicability and stability on a global scale. Meanwhile, on the timescale, the model maintains good robustness and does not suffer from the problem of accuracy divergence of traditional statistical models because the improved model uses the GREB model as the basic global framework. The results of the two study cases not only demonstrate that the improved method can also be used for the simulation of other climate variables within the GREB model, but they also reveal the construction of coarse–fine models through a combination of dynamical and statistical methods as a potential means of improving climate simulation and prediction. This improved approach can overcome the shortcomings of a single dynamical model that cannot accurately describe many nonlinear processes in the climate system and can be applied to other dynamical models. In terms of development, the improved methods for improving climate dynamical models by statistical methods show great possibilities for improving the accuracy of climate predictions.
In addition to the improved model with improved accuracy, the concept of evaluation through climate state introduced during the construction of the coarse–fine model can be well in studies on climate sensitivity (Dommenget, 2016; Kutzbach et al., 2013), extreme weather (Bellprat and DoblasReyes, 2016; Chen et al., 2018), and climate threshold (Mahlstein et al., 2015; Vogel et al., 2020). In particular, climate anomalies, as manifested by the climate state, can serve as effective indicators for tracking climate change trends. However, this improved method still has some shortcomings.

The scientific problem of categorization of climate variable attribute features. In this paper, the climate state of each variables is indicated by the classifications categorized by the natural breaks classification method according to the data characteristics and statistical regularities of the climate model, but this classification method changes with the data, and the databased classification model may not be consistent with the actual climate evolution pattern. Therefore, the following studies can discuss related issues and choose the appropriate feature classification criteria to achieve a balance between different simulation.

Balance of accuracy and resolution. If the actual numerical values rather than states are used as the calculation parameters, a higher resolution will be obtained, and of course the training data for each case will be reduced, which leads to the loss of accuracy. How to achieve the balance of accuracy and resolution will be an important issue.

Applicability of climate evolution models based on Bayesian networks. Stable conditional probability tables can be trained with historical climate data to simulate climate state, but conditional probability tables cannot change over time and cannot be adapted to timesensitive climate models. The following study can extend the applicability of the method by dynamically training Bayesian networks on climate data.
In order to verify the reliability of the simulated climate state using the Bayesian networks and to provide a basis for guiding the optimization of the GREB local simulation result, the state accuracy (dimensionless) was used to evaluate the reliability of the simulated climate state, which is expressed as
where n represents the number of time series in which the simulated state value of a grid is the same as the actual state value in the time series, which in this case refers to the number of seasons; and N represents the total number of time series. State accuracy means the same proportion of the simulated and actual states in the same grid.
The numerical results simulated by the original GREB model are also transformed into the climate state by the natural breaks classification method for comparative evaluation. The state accuracy averaged over different processes from 1985 to 1994 is shown in Fig. A1, and the state accuracy of the surface temperature and the temperature of the atmosphere are shown in Figs. A2 and A3. Overall, the comparison results (Fig. A1) show that the Bayesian networks have a higher simulation state accuracy in both the surface temperature and the temperature of the atmosphere. This higher state accuracy indicates that the Bayesian networks simulate climate state better than the GREB model, which provides a basis for evaluating the GREB model simulations with Bayesian network simulation results. Intrinsic to this result is the fact that since more observations are involved in the simulation process (in the construction of conditional probability tables) in Bayesian network simulations, this allows the Bayesian network response to localized abrupt changes in climate to be more pronounced.
When it comes to the number of classifications, the total number of data remains unchanged, and as the number of classifications increases, the number of training data per classification decreases, which results in a decrease in the accuracy of the simulations of the two methods. This implies that the accuracy of the simulation can be stabilized at a high level when there is enough training data in the longperiod simulation.
Figures A2 and A3 elucidate the spatial distribution characteristics of the state accuracy of the Bayesian network simulation and that of the GREB model simulation, which provides a basis for the subsequent selection of regions for recalculating GREB simulation data based on the Bayesian network simulation result. The state accuracy of the Bayesian network simulation result is relatively uniform in spatial distribution and has no obvious spatial characteristics. However, the state accuracy of the GREB model has obvious characteristics of latitude differentiation. Based on the above, the state accuracy on the space is averaged along the latitudinal direction as shown in Fig. A4. The variances of the state accuracy of the Bayesian network simulation result in six cases are 0.016 (BN5T_{surf}), 0.014 (BN7T_{surf}), 0.014 (BN9T_{surf}), 0.017 (BN5T_{atmos}), 0.008 (BN7T_{atmos}), and 0.004 (BN9T_{atmos}), and the variances of state accuracy of the GREB model simulation result in six cases are 0.089 (GREB5T_{surf}), 0.070 (GREB7T_{surf}), 0.060 (GREB7T_{surf}), 0.077 (GREB5T_{atmos}), 0.054 (GREB7T_{atmos}), and 0.036 (GREB9T_{atmos}). The variance indicates that the fluctuation range of the state accuracy of the Bayesian networks is much smaller than that of the GREB model along the latitude direction. This means that Bayesian networks have a wide range of applications in global climate state simulation.
Although the Bayesian networks have a higher state accuracy in both simulations, we also found that the state simulation accuracy of the GREB model in the range of 30^{∘} S to 30^{∘} N tends to be higher than that of the Bayesian networks when the classification numbers are 5, 7, and 9 in the surface temperature simulation. Therefore, we think that the GREB model can accurately represent the surface temperature simulation process in this range, and there is no abrupt change in region that cannot be expressed, so in the subsequent optimization, only the range of 30 to 90^{∘} N and 30 to 90^{∘} S is selected as the optimization region for surface temperature simulation.
Based on the above comparative analysis of the state accuracy of Bayesian network simulations and the state accuracy of GREB model simulations, the range of 30 to 90^{∘} N and 30 to 90^{∘} S was selected as the empirical parameter for the range of subsequent data recalculating in surface temperature simulations, and the global range was selected as the range of data recalculating in temperature of the atmosphere simulations.
The improved method in this paper was conducted in MATLAB R2021a. The code of the improved method used in this paper is archived on Zenodo (https://doi.org/10.5281/zenodo.7886620; Liu, 2023). The original GREB model uses the model code from the Monash Simple Climate Model (MSCM) laboratory repository for the GREB model and runs the code using the Fortran language. The model code is available from https://doi.org/10.5281/zenodo.2232282 (Stassen, 2018).
The data used in this paper are archived on Zenodo (https://doi.org/10.5281/zenodo.7886620; Liu, 2023). The data used for the analysis in this paper have been preprocessed, and the original data are available from the National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR) at https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.pressure.html (NOAA Physical Sciences Laboratory, 2021).
ZL, ZZ, and WL conceived the paper's ideas and designed the methods. ZL, ZW, and JW implemented the methods of the paper with the code. ZL, DL, and WL wrote the paper with considerable input from ZY and LY. ZW revised and checked the language of an earlier draft of the paper.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been supported by the National Natural Science Foundation of China (grant nos. 41976186, 42230406, and 42130103), the Postdoctoral Science Foundation of China (grant no. 2021M702757), and Postgraduate Research and Practice Innovation Program of Jiangsu Province (grant no. KYCX221578).
This research has been supported by the National Natural Science Foundation of China (grant nos. 41976186, 42230406 and 42130103), the Postdoctoral Science Foundation of Jiangsu Province (grant no. 2021M702757), and the Postgraduate Research and Practice Innovation Program of Jiangsu Province (grant no. KYCX22_1578).
This paper was edited by Rohitash Chandra and reviewed by two anonymous referees.
Akgul, Y. S. and Kambhamettu, C.: A coarsetofine deformable contour optimization framework, IEEE T. Pattern Anal., 25, 174–186, 2003. a
Alley, R. B., Emanuel, K. A., and Zhang, F.: Advances in weather prediction, Science, 363, 342–344, https://doi.org/10.1126/science.aav7274, 2019. a
Annan, J. D. and Hargreaves, J. C.: Using multiple observationallybased constraints to estimate climate sensitivity, Geophys. Res. Lett., 33, L06704, https://doi.org/10.1029/2005GL025259, 2006. a
Bellprat, O. and DoblasReyes, F.: Attribution of extreme weather and climate events overestimated by unreliable climate simulations, Geophys. Res. Lett., 43, 2158–2164, https://doi.org/10.1002/2015GL067189, 2016. a
Berrocal, V. J., Craigmile, P. F., and Guttorp, P.: Regional climate model assessment using statistical upscaling and downscaling techniques, Environmetrics, 23, 482–492, https://doi.org/10.1002/env.2145, 2012. a
Cai, B., Liu, Y., Liu, Z., Tian, X., Zhang, Y., and Ji, R.: Application of Bayesian Networks in Quantitative Risk Assessment of Subsea Blowout Preventer Operations, Risk Anal., 33, 1293–1311, https://doi.org/10.1111/j.15396924.2012.01918.x, 2013. a
Cai, B., Kong, X., Liu, Y., Lin, J., Yuan, X., Xu, H., and Ji, R.: Application of Bayesian Networks in Reliability Evaluation, IEEE T. Ind. Inf., 15, 2146–2157, https://doi.org/10.1109/TII.2018.2858281, 2019. a
Chen, L., Zhang, H., Wu, Q., and Terzija, V.: A Numerical Approach for Hybrid Simulation of Power System Dynamics Considering Extreme Icing Events, IEEE T. Smart Grid, 9, 5038–5046, https://doi.org/10.1109/TSG.2017.2679109, 2018. a
Chou, J.: Why do the dynamical models and statistical methods need to be combined? – Also on how to combine, Plateau Meteorol., 5, 77–82, 1986. a
Chou, J.: Short term climatic prediction: Present condition, problems and way out, Bimon. Xinjiang Meteorol., 26, 1–4, 2003. a
Stassen, C.: christianstassen/grebhydrodevelopgmd: A Hydrological Cycle Model for the Globally Resolved Energy Balance Model (GREB) v1.0 (v1.0), Zenodo [code], https://doi.org/10.5281/zenodo.2232282, 2018. a
Dommenget, D.: A simple model perturbed physics study of the simulated climate sensitivity uncertainty and its relation to control climate biases, Clim. Dynam., 46, 427–447, https://doi.org/10.1007/s0038201525914, 2016. a, b, c, d
Dommenget, D. and Flöter, J.: Conceptual understanding of climate change with a globally resolved energy balance model, Clim. Dynam., 37, 2143–2165, https://doi.org/10.1007/s0038201110260, 2011. a, b, c, d, e, f, g
Fan, J., Meng, J., Ludescher, J., Chen, X., Ashkenazy, Y., Kurths, J., Havlin, S., and Schellnhuber, H. J.: Statistical physics approaches to the complex Earth system, Phys. Rep., 896, 1–84, https://doi.org/10.1016/j.physrep.2020.09.005, 2021. a
Feng, G. L., Yang, J., Zhi, R., Zhao, J. H., and Sun, G. Q.: Improved prediction model for floodseason rainfall based on a nonlinear dynamicsstatistic combined method, Chaos Solitons Fract., 140, 110160, https://doi.org/10.1016/j.chaos.2020.110160, 2020. a, b
Grant, P. R.: Evolution, climate change, and extreme events, Science, 357, 451–452, https://doi.org/10.1126/science.aao2067, 2017. a
Huang, J., Chen, W., Wen, Z., Zhang, G., Li, Z., Zuo, Z., and Zhao, Q.: Review of Chinese atmospheric science research over the past 70 years: Climate and climate change, Sci. ChinaEarth Sci., 62, 1514–1550, https://doi.org/10.1007/s1143001994835, 2019. a
Jansen, R., Yu, H. Y., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S. B., Emili, A., Snyder, M., Greenblatt, J. F., and Gerstein, M.: A Bayesian networks approach for predicting proteinprotein interactions from genomic data, Science, 302, 449–453, https://doi.org/10.1126/science.1087361, 2003. a
Kay, J. E.: Early climate models successfully predicted global warming, Nature, 578, 45–46, https://doi.org/10.1038/d4158602000243w, 2020. a
Kutzbach, J. E., He, F., Vavrus, S. J., and Ruddiman, W. F.: The dependence of equilibrium climate sensitivity on climate state: Applications to studies of climates colder than present, Geophys. Res. Lett., 40, 3721–3726, https://doi.org/10.1002/grl.50724, 2013. a
Liu, Z.: An improved method of the Globally Resolved Energy Balance Model by the Bayes network, Zenodo [code and data set], https://doi.org/10.5281/zenodo.7886620, 2023. a, b
Ludescher, J., Martin, M., Boers, N., Bunde, A., Ciemer, C., Fan, J., Havlin, S., Kretschmer, M., Kurths, J., Runge, J., Stolbova, V., Surovyatkina, E., and Schellnhuber, H. J.: Networkbased forecasting of climate phenomena, P. Natl. Acad. Sci. USA, 118, e1922872118, https://doi.org/10.1073/pnas.1922872118, 2021. a
Maher, P.: Bayesian probability, Synthese, 172, 119–127, https://doi.org/10.1007/s1122900994716, 2010. a
Mahlstein, I., Spirig, C., Liniger, M. A., and Appenzeller, C.: Estimating daily climatologies for climate indices derived from climate model data and observations, J. Geophys. Res.Atmos., 120, 2808–2818, https://doi.org/10.1002/2014JD022327, 2015. a
NOAA Physical Sciences Laboratory: NCEPNCAR Reanalysis 1, NOAA Physical Sciences Laboratory [data set], https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.html, last access: 13 January 2021. a
Pal, R. and Bhattacharya, S.: Characterizing the effect of coarsescale PBN modeling on dynamics and intervention performance of genetic regulatory networks represented by stochastic master equation models, IEEE T. Signal Process., 58, 3341–3351, 2010. a
Pearl, J.: Fusion, propagation, and structuring in belief networks, Artificial intelligence, 29, 241–288, https://doi.org/10.1016/00043702(86)90072x, 1986. a
Sahin, O., Stewart, R. A., Faivre, G., Ware, D., Tomlinson, R., and Mackey, B.: Spatial Bayesian Network for predicting sea level rise induced coastal erosion in a small Pacific Island, J. Environ. Manage., 238, 341–351, https://doi.org/10.1016/j.jenvman.2019.03.008, 2019. a
Stassen, C., Dommenget, D., and Loveday, N.: A hydrological cycle model for the Globally Resolved Energy Balance (GREB) model v1.0, Geosci. Model Dev., 12, 425–440, https://doi.org/10.5194/gmd124252019, 2019. a
Vogel, M. M., Zscheischler, J., Fischer, E. M., and Seneviratne, S. I.: Development of Future Heatwaves for Different Hazard Thresholds, J. Geophys. Res.Atmos., 125, e2019JD032070, https://doi.org/10.1029/2019JD032070, 2020. a
Yibo, Y., Wensheng, Y., and Yu, Z.: Modeling and analysis of the coupling effect for one kind of coarsefine stages, in: 2009 IEEE International Conference on Control and Automation, Christchurch, New Zealand, 2011–2014, 011–2014, IEEE, 2009. a
Zhang, F., Sun, Y. Q., Magnusson, L., Buizza, R., Lin, S.J., Chen, J.H., and Emanuel, K.: What Is the Predictability Limit of Midlatitude Weather?, J. Atmos. Sci., 76, 1077–1091, https://doi.org/10.1175/JASD180269.1, 2019. a
Zhu, A.X., Lu, G., Liu, J., Qin, C.Z., and Zhou, C.: Spatial prediction based on Third Law of Geography, Ann. GIS, 24, 225–240, 2018. a
Zou, Y., Donner, R. V., Marwan, N., Donges, J. F., and Kurths, J.: Complex network approaches to nonlinear time series analysis, Phys. Rep., 787, 1–97, https://doi.org/10.1016/j.physrep.2018.10.005, 2019. a