Local spatiotemporal nonstationarity occurs in various natural
and socioeconomic processes. Many studies have attempted to introduce time
as a new dimension into a geographically weighted regression (GWR) model,
but the actual results are sometimes not satisfying or even worse than the
original GWR model. The core issue here is a mechanism for weighting the effects
of both temporal variation and spatial variation. In many geographical and
temporal weighted regression (GTWR) models, the concept of time distance has
been inappropriately treated as a time interval. Consequently, the combined
effect of temporal and spatial variation is often inaccurate in the
resulting spatiotemporal kernel function. This limitation restricts the
configuration and performance of spatiotemporal weights in many existing
GTWR models. To address this issue, we propose a new spatiotemporal weighted
regression (STWR) model and the calibration method for it. A highlight of
STWR is a new temporal kernel function, wherein the method for temporal
weighting is based on the degree of impact from each observed point to a
regression point. The degree of impact, in turn, is based on the rate of
value variation of the nearby observed point during the time interval. The
updated spatiotemporal kernel function is based on a weighted combination of
the temporal kernel with a commonly used spatial kernel (Gaussian or
bi-square) by specifying a linear function of spatial bandwidth versus time.
Three simulated datasets of spatiotemporal processes were used to test the
performance of GWR, GTWR, and STWR. Results show that STWR significantly
improves the quality of fit and accuracy. Similar results were obtained by
using real-world data for precipitation hydrogen isotopes (

Time, space, and attributes are three essential characteristics in geographic entities, and they are recorded to reflect the state and evolution of various real-world phenomena and processes. Because space and time frame all aspects of the discipline of geography (Goodchild, 2013), it is important to observe the spatiotemporal variations and explore appropriate analytical methods to study the internal mechanisms and evolutionary laws. In recent years, new platforms and instruments have brought increasingly massive spatiotemporal data, such as the time- and geo-tagged sensor monitoring records and remote sensing images. Those big data create great opportunities for studying human and environmental dynamics from different perspectives, such as the patterns of human behavior (Chen et al., 2011), environmental risk assessment (Sun et al., 2015), and disease outbreaks (Takahashi et al., 2008). Nevertheless, although spatiotemporal modeling has been a long-term research focus in the field of geographical information science (GIScience) (Cressie, 199; Cressie and Wikle, 2015), the models are not mature yet and challenges still exist (Fotheringham et al., 2015), which call for further work.

In this paper, the technological development and discussion focus on modeling local spatiotemporal variations within the framework of geographically weighted regression (GWR). GWR is a method for modeling spatially heterogeneous processes (Brunsdon et al., 1996, 1998; Fotheringham et al., 2003). It has been applied in a variety of areas, such as climate science (Brown et al., 2012), geology (Atkinson et al., 2003), mineral exploration (Wang et al., 2015), transportation analysis (Cardozo et al., 2012), crime studies (Cahill and Mulligan, 2007; Wheeler and Waller, 2009), environmental science (Mennis and Jordan, 2005), and house price modeling (Fotheringham et al., 2015). GWR calibrates a separate regression model at each location through a data-borrowing scheme, with which distance weights can be calculated by drawing on data from neighboring observations of each regression point (Fraser et al., 2012). This operation complies with Tobler's first law of geography – “Everything is related to everything else, but near things are more related than distant things” (Tobler, 1970).

Numerous studies have been devoted to incorporating the temporal dimension
into spatial regression (Pace et al., 2000; Gelfand et al., 2004; Crespo
et al., 2007; Cressie and Wikle, 2015). However, most of these studies
assume that temporal effects are constant over space from a global
perspective of modeling (Fotheringham et al., 2015). To address
that issue, Crespo et al. (2007) extended GWR by developing
spatiotemporal bandwidths that account for varying local spatial effects
across time. Huang et al. (2010) and Wu et al. (2014) proposed a geographical and
temporal weighted regression (GTWR) model with a method of measuring the
spatiotemporal “closeness” and a parameter ratio

A spatiotemporal kernel function, which consists of mixed spatial and time decay bandwidths, was proposed by Fotheringham et al. (2015). Nevertheless, the stepwise strategy applied in this function for bandwidth optimization does not always seem reasonable. In practice, this function needs to first find and fix an optimized spatial bandwidth, then it will find the optimized temporal bandwidth. After that, the spatiotemporal weight will be calculated. This stepwise search process means that the function is not able to optimize both the temporal and spatial bandwidths at the same time. However, a more reasonable thought is that the spatiotemporal bandwidth and its weight are simultaneously affected by both spatial and temporal effects of a process. There should be ways to further improve the spatiotemporal kernel function in Fotheringham et al. (2015).

The aim of this paper to develop a better methodology for the spatiotemporal kernel function. Following Tobler's first law, we propose an algorithm called spatiotemporal weighted regression (STWR). In STWR, the velocity of value change is more highly related with closer proximity in time and space. Therefore, STWR can borrow data not only from nearby locations, but also from nearby value variation through time. The latter is what we call “time distance” in STWR. The time distance is not the concept of a time interval but the rate of value variation through time. It is a kind of value change that reflects the temporal effect of nearby points on the regression point. Accordingly, our local spatiotemporal regression analysis model can take advantage of the variation in data to identify temporal nonstationarity, which is an advantage when comparing with GWR and GTWR.

Before giving more details about STWR, we can further clarify the meaning of a few concepts. A common issue in existing GTWR models is that they use the concept of a time interval, instead of the abovementioned time distance, to calculate temporal and spatiotemporal weights. A time interval is the period between two observed time stages. A time distance, in the context of STWR, is the rate of value variation between an observed point and a regression point through a time interval. We can think about the following scenario for a group of points. The values of some points do not change or change slightly from time A to time B, while a few other points may change greatly in that period. However, many GTWR models ignore the difference in the value changes of observed points during a period of time and regard all these points as having the same temporal effect on their neighbor regression point. It is hard to believe that some unchanged observations constantly affect their nearby regression points during the observed time interval. Intuitively, different variations of the observed points have different temporal effects. For example, the faster the house price of a point changes, the stronger the temporal effect is on the house price at its nearby point. Moreover, the rate of value changes at different observed points (time nonstationary) may also have spatial heterogeneity. The data values observed at different points are results of mixed spatiotemporal effects and some other unknown factors (including errors). Therefore, using only time interval in the calculation of temporal and spatiotemporal weights might imprecisely interpret the local spatiotemporal effect.

There are other issues in temporal kernel functions and the multiplication form of spatial and temporal kernels used by existing GTWR models (Huang et al., 2010; Wu et al., 2014; Fotheringham et al., 2015). When calculating the spatiotemporal effect, these models generally use time intervals and the common kernel functions to calculate temporal weights, such as a Gaussian kernel or bi-square kernel. However, an appropriate temporal kernel function should not be the same as the spatial kernel function because space is in two or three dimensions, while time is in one dimension and one direction. Each regression point can borrow observed points from any directions in space but only use points from the past rather than from the future. Moreover, the integrated spatiotemporal weights might be underestimated in these GTWR models by using a multiplication of the spatial and temporal weights. Because both the spatial weights and the temporal weights range from 0 to 1, the multiplied weight value is never bigger than the smaller one before multiplying, which means that the composite spatiotemporal impacts are never greater than the single spatial impacts and the single temporal impacts. However, the real combined spatiotemporal impacts may be higher than the single spatial impacts or the temporal impacts, or at least may be higher than the smaller ones. The multiplication formulation of a spatiotemporal kernel in GTWR also makes the calculated weight decay faster.

The abovementioned limitations and issues in GWR and GTWR are the driving forces behind our development of STWR. The remainder of this article is organized as follows. Section 2 introduces the STWR model formulation, including temporal kernel and spatiotemporal kernel functions. Section 3 describes the methods for bandwidth selection and calibration when STWR is in operation. Section 4 presents results of applying GWR, GTWR, and STWR to three sets of simulated data. Section 5 presents experiment results with real-world precipitation hydrogen isotope data. In Sect. 6, we close the article with a summary of the key findings and a few thoughts for future research.

Since GWR is the background of our work, it is helpful to first give a brief
overview of the GWR framework. The basic formulation of GWR can be described
in the two equations below (Fotheringham et al., 2003).

Spatiotemporal impacts of observed points with different rates of
value change on a regression point at time stage

GWR has a strategy of spatial distance decay impact on a regression point
(Brunsdon et al., 1998; Fotheringham et al., 2003). A similar “time
distance decay” strategy was also discussed in several recent GTWR models
(Crespo et al., 2007; Huang et al., 2010; Wu et al., 2014; Fotheringham et
al., 2015). Yet, those models did not fully reflect the effect of time
distance decay. Sample points are observed at different time stages, and
those data points closer in time distance to a regression point have more
impact on the regression point than those farther away. The time distance
refers to the value variation rate between an observed point and a
regression point during a certain time interval. For example, in Fig. 1,
there are four time stages from old to new:

Compared to existing GTWR models, the time distance decay strategy of STWR
considers the effect of different variations of observed points through
time. For example, some data points may have a higher impact on the regression
point, though their spatial distance is farther than other points. Figure 1
illustrates the fact that the locations of some star-shaped points are farther away
from the regression point than some pentagon-shaped points at time stage

We assume that a set of observed points

The time distance, as mentioned above, is not the time interval but the rate
of value variation between an observed point and a regression point through
a time interval. Following the time distance decay strategy in STWR, we can
further derive the temporal kernel

To calibrate the weight value

Some goodness-of-fit diagnostics (Loader, 1999) are widely used in
general GWR-based models, such as the cross-validation (CV) score
(Cleveland, 1979; Bowman, 1984) and the Akaike information
criterion (AIC) (Akaike, 1973, 1998). For STWR, we use
cross-validation (CV) as the default searching criteria and we also
calculate the value of a corrected version of AIC (Hurvich et
al., 1998), the AICc, which is defined below.

Although there is no need to optimize the spatial bandwidth

Weight matrix

Calibration of STWR models can be conducted by using weighted least
squares. The estimator for the coefficients at location

In order to obtain the optimized

The optimization procedure is to traverse the set

Three simulated initial surfaces for representing the spatial heterogeneity of parameters.

In this paper, STWR is used to predict the current values of regression
points with known coordinates. The prediction formulas of STWR are more
complicated than GWR because the spatial distance is calculated directly
from the regression point to each observed data point, while the time
distance between the regression point and the data points observed in the
past cannot be calculated directly. Therefore, we specify a few steps for prediction in STWR. First, we need to have the optimized initial spatial
bandwidth

To verify the performance of STWR and compare with the results of GWR and GTWR, several groups of simulated data were used in this study to represent different types of heterogeneity in space and time. All the data and code used in the experiments are shared on GitHub. Web links are provided at the end of this paper.

For GTWR, we only compared with the results generated by algorithms in Huang
et al. (2010) and Wu et al. (2014) because we did not find the software
package of Fotheringham et al. (2015). The data-generating process (DGP) and
the spatial heterogeneity are introduced here. The basic DGP is a linear
model shown in Eq. (1), and the study area is a regular

Our goal for this experiment was to test model performance by using sample
data from the simulation process at different times. Three case studies were
designed for different situations. Besides the spatial heterogeneity trends,
in our simulation design we assumed that the mean values of two independent
variables

Three heterogeneity trend surfaces.

We compared the results of OLS, GWR, GTWR, and STWR. A total of 333 random
sample points for five time stages
(

Results of case study 1 at time stage

The time interval of observations in case study 1 was one unit, such as 1 s or 1 d. The value changes of

Results of case study 2 at time stage

The time interval of observations in case study 2 was 10 units. The value
change of

STWR utilized data from the latest three time stages to calibrate the model.
The initial spatial bandwidth

Comparing prediction results of STWR and GWR in case study 1. Images

Comparing prediction results of STWR and GWR in case study 2. Images

The time interval of observations in case study 3 was 200 units. In both
case studies 1 and 2, the coefficients in Eq. (1) were unchanged. In
contrast, in case study 3, three surfaces of coefficients changed over time,
which were generated by the trends

Dynamic process of three surfaces of coefficients and the y_true surface at five different time stages.

Results of these comparisons in case study 3 show that STWR outperforms both
GWR and GTWR in the accuracy of the model and the effectiveness of the simulation process
(Fig. 8a). Along with the change in the coefficients and the increase in

It may seem strange that GWR can outperform GTWR (Fig. 8), but that is reasonable for the process in case study 3. The change in this process is faster, and the time interval of observations is bigger than the previous case studies. STWR is not only able to deal with time intervals, but also to make full use of the value variation of observed points for calibration. In contrast, GTWR only uses the time interval information and all the observed points to calibrate, which may cause problems when the observed values are significantly different in spatial distribution or the time intervals are long. GTWR makes use of points from previous time stages without considering their variation, but if the actual values are quite different from previous observations at the current time stage, all the point values for the calibration of GTWR will become smooth. Thus, GWR outperforms GTWR in this situation because GWR only uses the current data points for model calibration.

Comparing and evaluating the performance of GWR, GTWR, and STWR at
five time stages.

Comparing prediction results of STWR and GWR in case study 3. Images

STWR is better for estimation than GWR and GTWR because its sigma value is
much smaller. As shown in Fig. 8b, the sigma of STWR was half of GWR at time
stage

At

Optimized bandwidths (or initial bandwidths) of GWR, GTWR, and STWR for the five time stages in case study 3.

As Fig. 10 shows, the optimized bandwidths are quite different among these
models, and the bandwidths of GWR and GTWR are larger than the initial
bandwidth of STWR at each time stage. The optimized bandwidth for each time
stage refers to an optimized number of the nearest neighbors (see Sect. 3.3). As GTWR considers all the nearest neighbors from different time
stages, the optimized numbers of the nearest neighbors (bandwidth) grow
fast and exceed the GWR model at time stage

To further test the performance of STWR, we used data on precipitation

In the experiments, we collected a total of 782 measurements from 116 sites
located in the northeastern United States during the 3 d period and
prepared the data on a daily average. The daily precipitation, mean
temperature, and elevation were used as explanatory variables. The model
derived from Eq. (1) is represented below.

Similar to the experiments on three simulation datasets, the result here
shows that STWR outperforms GTWR and GWR. In the experiment, the number of
optimized initial neighbors of STWR was smaller than that of GWR and GTWR.
The optimized

Results of model performance with real-world data.

LOOCV results of STWR and GWR.

Predicted

We adopted leave-one-out cross-validation (LOOCV) at D3 for the comparison between STWR and GWR. The squared errors (SEs) of prediction are shown in Fig. 11. The prediction results of STWR are better than GWR for most points. The mean SE of STWR is smaller than GWR. Moreover, the SE of STWR shows a narrower regional trend, which indicates that STWR is more robust than GWR. In addition, the total SSEs of GWR and STWR are 50 216.510 and 39 724.995, respectively. Therefore, the result further validates the fact that the quality of predication in STWR is better than GWR.

In Fig. 12, the predicted

Spatiotemporal data analysis is important in many scientific studies. Due to the complexity of spatiotemporal models, the spatiotemporal effect may not be fully taken into account when the temporal and spatial information is manipulated simultaneously. In particular, models for the effect of spatial dynamics should not be simply adapted for modeling the effect of temporal dynamics. Although the GTWR model can borrow points from the recent past, without careful consideration of the temporal effect, the performance of GTWR may be even worse than GWR. Increasingly, many scientific issues are not just about spatial nonstationary but involve many spatiotemporal processes. It is necessary to review the limitations of current spatiotemporal models and make new extensions. The aim of the STWR model developed in this study is to advance work and discussion in that direction.

Based on a concept similar to GWR, a recently proposed model, called
geographically neural network weighted regression (GNNWR) (Du et al., 2020),
utilizes both OLS and neural networks to evaluate spatial nonstationarity.
It is characterized by a designed spatially weighted neural network (SWNN)
that can represent the spatial nonstationary weight matrix in spatial
processes. Additionally, a geographically and temporally neural network
weighted regression (GTNNWR) model (Wu et al., 2020), which is a temporal
extension of GNNWR, was also proposed by the same group for further
modeling spatiotemporal nonstationary relationships. GTNNWR can generate a
space–time distance by utilizing the so-called spatiotemporal proximity
neural network (STPNN), which may address complex nonlinear interactions
between time and space. Although both STWR and GTNNWR have the potential to
handle complex spatiotemporal nonstationarity in various natural and
socioeconomic processes, their principles and interpretability are
different.

The basic formulation of GNNWR is defined as Eq. (22) (Du et al.,
2020), which is different from Eq. (1) (Fotheringham et al., 2003). The

GTNNWR and GNNWR use the proposed ANN-based method (Eq. 23) (Du et
al., 2020) to calculate the weighted matrix, which is quite different from
the kernel functions used in GWR and STWR models. Although GTNNWR and GNNWR
use the idea of pointwise regression, they do not consider how to “borrow
points” from nearby neighbors and do not have the concept of bandwidth.
Without a spatial bandwidth, all observation points in the study area may have
impacts on the regression point, which might violate Tobler's first law
of geography (Tobler, 1970). It may be difficult to understand the
relationships between the influence weight and the spatial distances,
especially when the study area and the data amounts are large. STWR has
spatial bandwidths and follows Tobler's first law of geography, which
can help analyze the affected range of local regression points.

The data points will be divided into a training set (including a validation set) and test set for GTNNWR and GNNWR, which might require more data points. Thus, it may not be appropriate for analyzing fewer data points (data acquisition in many geoscience processes is difficult and costly). STWR and GWR do not need to divide data points into the training set (including a validation set) and test set, which requires fewer data points than GNNWR and GTNNWR.

Although GTNNWR utilizes a method called spatiotemporal proximity neural
network (STPNN) (Wu et al., 2020) to calculate the spatiotemporal distance,
the obtained integrated spatiotemporal distance lacks explanation, and
it is also impossible to tell which parts of the calculated weight are
affected by time or space. There is also no concept of a temporal
bandwidth in GTNNWR. Therefore, it fails to provide information on the earliest time
(stage) at which the observed points start to exert an impact on the
determination of the regression point. But STWR has a temporal bandwidth, and
it can distinguish the strength of temporal weight and spatial weight.
Therefore, we can analyze the characteristics of the local interaction of
time and space according to the temporal bandwidth, spatial bandwidth, and
the adjustment parameter

Though the performance of STWR is outstanding, the models can still be
further extended. A big topic is the time distance. In current
STWR, the time distance represents the rate of value variation between an
observed point and a regression point through a time interval. Nevertheless,
we can also use time distance to represent the rate of value variation at
each observed point object through time. Note that, from an object-oriented
perspective, here we differentiate the point objects from locations,
although the point objects have geospatial coordinates as part of their
attributes. Following that new definition of time distance, the

The location of an observed point object

The location of

There are other possibilities for the further improvement of STWR. The first
involves the optimization of

In short, the core contribution of STWR is the clarification of the “time distance” concept and the new temporal kernel and spatiotemporal kernel functions based on this concept. Our experiments show that STWR outperforms GWR and GTWR in analyzing and interpreting local spatiotemporal nonstationarity. We hope STWR can bring fresh ideas and new capabilities for spatiotemporal data analysis in many disciplines.

The Python source code of STWR v1.0, the data used in the experiments, and
all the case studies (written in Jupyter Notebook) were archived on Zenodo
and made freely accessible via

XQ, XM, and CM developed the algorithm. XQ implemented and coded the algorithm. XQ prepared the paper with contributions from all co-authors.

The authors declare that they have no conflict of interest.

The authors thank Stewart Fotheringham and other colleagues at the Spatial Analysis Research Center (SPARC) at Arizona State University for their insightful comments and suggestions during a seminar about the STWR model. The authors also thank two anonymous reviewers for their constructive comments and suggestions on the earlier versions of this paper.

The research presented in this paper was partially supported by the National Science Foundation under grant nos. 1835717 and 2019609, the China Scholarship Council under grant no. 201807870006, special projects for local science and technology development guided by the central government under grant no. 2020L3006, the Fujian Provincial Department of Education under grant no. KLA18025A, and the Digital Fujian Environmental Monitoring Internet of Things Laboratory open fund no. 202008.

This paper was edited by Wolfgang Kurtz and reviewed by two anonymous referees.