This paper introduces the Topographically InformEd Regression (TIER) model, which uses terrain attributes in a regression framework to distribute in situ observations of precipitation and temperature to a grid. The framework enables our understanding of complex atmospheric processes (e.g., orographic precipitation) to be encoded into a statistical model in an easy-to-understand manner. TIER is developed in a modular fashion with key model parameters exposed to the user. This enables the user community to easily explore the impacts of our methodological choices made to distribute sparse, irregularly spaced observations to a grid in a systematic fashion. The modular design allows incorporating new capabilities in TIER. Intermediate processing variables are also output to provide a more complete understanding of the algorithm and any algorithmic changes. The framework also provides uncertainty estimates. This paper presents a brief model evaluation and demonstrates that the TIER algorithm is functioning as expected. Several variations in model parameters and changes in the distributed variables are described. A key conclusion is that seemingly small changes in a model parameter result in large changes to the final distributed fields and their associated uncertainty estimates.

Gridded near-surface meteorological products (specifically precipitation and temperature) are a foundational product for many applications including weather and climate model validation, hydrologic modeling, climate model downscaling, among others (Day, 1985; Franklin, 1995; USBR, 2012; Pierce et al., 2014; Liu et al., 2017). It is often challenging to develop realistic estimates of these variables, particularly when complex terrain or large spatial climate gradients are present in the domain of interest. Because of their widespread usage and potential challenges generating products, a plethora of methods have been developed ranging from nearest neighbors, distance-weighted interpolation, Kriging, knowledge-based, climatologically aided interpolation, Gaussian filters, multiple linear regression, and others (Thiessen, 1911; Shepard, 1968, 1984; Chua and Bras, 1982; Daly et al., 1994; Willmott and Roebson, 1995; Thornton et al., 1997; Banerjee et al., 2003; Clark and Slater, 2006; Cressie and Wikle, 2011; Nychka et al., 2015; Cornes et al., 2018).

Across the methods, nearest neighbor and distance-weighted interpolations use spatial distance as the only predictor, a reasonable choice in areas with high station densities but much less so in sparsely gauged regions. The resultant field is also discontinuous between station areas of influence for nearest neighbor, while distance-weighted interpolation will increase precipitation occurrence unless explicit occurrence prediction is included (Thornton et al., 1997; Newman et al., 2015, 2019). Climatologically aided interpolation assumes the climatological field is better resolved by the available observations and has a strong relationship with the field of interest (e.g., daily precipitation) such that using the climatological field in the final interpolation increases the output information content (Willmot and Robeson, 1995). These assumptions are invalid when the climatological field is poorly resolved, or has little correspondence to the field of interest which happens when an event has a significantly different pattern than climatology (e.g., Lundquist et al., 2015; Newman et al., 2019). Kriging and linear regression frameworks may include multiple spatial predictors and uncertainty estimates. However, these methods may also produce unrealistic results with sparse station observations (Cornes et al., 2018). Finally, knowledge-based systems impose a regularization on the input data through knowledge-based rules (Sect. 2). This allows for physically plausible interpolation fields in sparsely gauged regions but inflexibility similar to climatologically aided interpolation. Finally, in sparsely observed regions, we do not know the true error characteristics of any method.

The currently available climate products that use these methods have complex processing systems (Daly et al., 2008; Xia et al., 2012; Livneh et al., 2015; Newman et al., 2015; Thornton et al., 2018). The product workflow typically includes many processing steps, methodological choices, and model parameters, all interacting to influence the characteristics of the final product. Therefore, comparison studies of product performance (and even single product evaluations) are often not able to attribute differences at the final output level to specific methodological choices (Newman et al., 2019). To help alleviate these difficulties and improve our understanding of method performance across conditions, flexible modular software systems need to be developed that expose model parameters to the users and allow for new functionality to be easily added (Clark et al., 2011).

This paper focuses on approaches that incorporate knowledge of atmospheric physics into relatively simple underlying statistical models (e.g., orographic precipitation, temperature lapse rates into linear regression models) to improve the accuracy of the gridded field (e.g., Daly et al., 1994; Willmott and Matsuura, 1995). Daly et al. (1994), hereafter D94, develop a complex knowledge-based system consisting of (1) terrain pre-processing; (2) station selection; (3) development of a locally weighted meteorological variable-elevation linear regression; and (4) post-processing. Omitted here are the pre-processing steps to screen station data (including quality control) and filling missing or suspicious data values. Following D94, many studies have included new knowledge-based capabilities, new intermediate processing steps, and increased granularity in a given step (Daly et al., 2002, 2007, 2008). In recent papers, there are upwards of 15–20 model parameters that have varying degrees of influence on the final product.

The Topographically InformEd Regression (TIER) model implements the knowledge-based approach described in D94 and subsequent papers (Daly et al., 2000, 2002, 2007, 2008; hereafter D00, D02, D07, and D08). Note that TIER is not designed to be an exact replica, as it does not match any source code, nor does it implement all features described in D94, D00, D02, D07, and D08. The paper is organized as follows: we introduce the TIER conceptual model in Sect. 2.1, the pre-processing algorithms in Sect. 2.2, the regression model in Sect. 2.3, and post-processing routines in Sect. 2.4. Then, a brief model evaluation is included in Sect. 3 to verify that the TIER model is functioning as expected. Next, we explore model parameter variation experiments for three simple test cases to highlight how model parameter choices impact the final product in Sect. 3.1. Finally, a summary discussion of TIER, lessons learned from the parameter experiments, and next steps are discussed in Sect. 4, with code and data availability provided at the end of the paper, respectively.

Precipitation and temperature are unevenly distributed around the globe for myriad reasons including general circulation patterns and landscape effects. Following D94, TIER assumes that large-scale gradients are resolved by the input station data and incorporates direct knowledge of atmospheric physics to account for landscape effects (e.g., orographic precipitation, mesoscale circulations near the coast). Since the landscape influences the distribution of precipitation and temperature, particularly their climatology, many past studies have developed methods to use terrain attributes to estimate meteorological fields (e.g., Spreen, 1947; Phillips et al., 1992). For example, orography has a particularly strong influence on precipitation by enhancing uplift of air (e.g., Schermerhorn, 1967; Smith and Barstad, 2004).

D94 develops a method to use a high-resolution digital elevation model (DEM) to produce empirical estimates of the precipitation–elevation relationship. They demonstrate that using actual station elevations in the precipitation–elevation relationship leads to a weak or nonexistent relationship, while using a coarse-resolution DEM smooths out local variability and results in a stronger relationship between precipitation and elevation. Such stronger relationships occur because microscale terrain features have a much smaller impact on the atmosphere than the larger-scale terrain features of the order of 2–15 km (D94 and references therein). Of course, the optimal length scale varies across atmospheric conditions and for each precipitation event, but in general a coarse-resolution or smoothed high-resolution DEM provides a strong basis for developing robust climatological precipitation–elevation relationships. Additionally, the amount of precipitation varies according to aspect (e.g., windward or lee slope), suggesting the need for different relationships for different aspects (e.g., Alter, 1919; Houghton, 1979). D94 use the smoothed DEM and decompose the domain into directional “facets” that all individually have a separate precipitation–elevation relationship. Facets are defined as continuous areas with similar aspects (slope orientation).

Daly and his colleagues have introduced several methodological enhancements since the seminal D94 paper. D00 expand on D94 to include maximum and minimum temperature, while D02 fully describe the knowledge-based system and the various physical processes included in it. Beyond elevation, the influence of large bodies of water on precipitation and temperature are incorporated by using coastal proximity or distance to the coastline. Finally, cold-air drainage down slopes and subsequent pooling in valleys is modeled as well. A conceptual two-layer atmosphere where layer 1 is the boundary layer containing temperature inversions and layer 2 is the free atmosphere is applied to the DEM. A simple two-layer atmospheric model for temperature is necessary to capture near-surface temperature inversions. This method identifies areas highly susceptible to inversions (e.g., valleys) and allows for the model to have temperature lapse rate reversals from increasing temperature with height below to decreasing above the inversion. Grid points that are identified to be within the boundary layer (layer 1) are allowed to have strong temperature inversions. These grid points are identified using the topographic position concept, which essentially computes the average nearby topographic relief to identify valleys and ridges. D94, D02, and D08 provide extensive details on the underlying theory of this knowledge-based approach.

The TIER pre-processing routines consist of the functions used to generate the required terrain attributes for the regression model. Currently, this consists of functions that perform NetCDF input/output (IO), process the DEM into topographic facets, the distance to the coast, topographic position, and estimate the idealized two-layer atmosphere. A parameter and control file specifies model parameters, and IO directories and files; see Tables 1 and 2, respectively. A flowchart describing the general flow, order of operations, and data requirements is given in Fig. 1.

Terrain pre-processing model parameters.

Terrain pre-processing control file.

Flowchart describing the TIER pre-processing system. Processes are shaded gray, input files are orange, topographic inputs and outputs are shades of blue, and outputs are various shades of green.

The native resolution DEM is first smoothed using a user-defined filter
(Table 2). The “Daly” filter is defined as (D94)

The user defines a land–ocean mask field in the input grid file. This mask defines which grid points are large bodies of water (ocean points) and are used in the coastal proximity calculation. The distance to the coast is computed using the great circle distance assuming a spherical earth for every grid cell within a user-defined distance threshold.

To identify the atmospheric layer (Sect. 2.2.4) of a grid point, the local topographic position of a grid point is computed first. The topographic position calculation uses the high-resolution DEM. Following D02, for each grid point, the following steps are taken.

The minimum elevation within a user-defined local search radius (

The topographic position is then estimated as

Following the determination of the topographic position, each grid cell is
placed into the first (boundary or inversion layer) or second layer (free
atmosphere) of the idealized two-layer atmosphere. The height of the
inversion layer is defined by the user (Table 2) and added to the mean
elevation computed on the left-hand side of Eq. (3). This defines an
inversion height above sea level for all grid points. All grid points where

After the input domain grid file has been processed, the pre-processing routine generates station metadata files for all precipitation and temperature stations that will be used in the regression model. Each station is assigned the closest grid point value of the smoothed DEM, facet, topographic position, atmospheric layer, and coastal distance.

The regression model is applied to each land-masked grid cell. It consists of routines to compute the station weights, to estimate the meteorological–terrain relationships, and to estimate the variable value at each grid point. A parameter and control file specifies model parameters, and IO directories and files; see Tables 3 and 4, respectively. A flowchart describing the general flow, order of operations, and data requirements is given in Fig. 2. Figures 3 and 4 provide more detailed flowcharts for the specific processing flow for precipitation and temperature variables.

TIER model parameters. Default values are given for precipitation with values for temperature given in parentheses.

TIER model control file.

Flowchart describing the TIER processing algorithm including post-processing. Color shading is the same as that in Fig. 1.

Flowchart describing the precipitation grid point estimate algorithm. Color shading is the same as that in Fig. 1.

Flowchart describing the temperature grid point estimate algorithm. Color shading is the same as that in Fig. 1.

For each grid point, a set of stations is used to estimate the precipitation
and temperature values. First, all stations within the user-defined search
radius are found (nearby stations), up to the maximum number of stations
considered (Table 4). From that subset of stations, all stations on the same
facet as the current grid point are identified (facet stations). Then, a set
of distance-dependent weights and weights for each physical process
component described in Sect. 2.2.1–2.2.4 is generated for all nearby and
facet stations for each grid point. These component weights are then
combined to create the final station weight vector.

A station's relevance to the current grid point decreases as the station
distance increases (e.g., Shepard, 1968); thus, this component station
weighting decreases with increasing distance. Here, we generally follow the
synagraphic computer mapping (SYMAP) algorithm of Shepard (1968, 1984) and
develop inverse distance weights that are further modified by including
direction information. Direction information is used to downweigh stations
that are in a similar direction but further distance than other stations, as
their influence has been “shadowed” by the nearer station (Shepard, 1968).
The distance weighting function of Barnes (1964) is used:

Stations on the same facet type as the current grid point receive an initial facet weight of 1. D02 introduces a method to reduce the weight of stations on the same facet type but with intervening facets of different types between the station and grid point (D02 Eq. 5). This is not implemented here, and all stations on the same facet type as the current grid point receive the same weight. This could be a decision considered for exploration in a future TIER release. The distance-dependent weights already account for this implicitly, but the explicit inclusion of additional weight decreases for stations on the same facet type will increase the localization of the TIER station weighting even further. This would increase the small-scale features of TIER.

The atmospheric layer weight function is defined as

The topographic position weights are computed following D07:

Using the computed distance to the coast, the coastal proximity weights are
computed as

Once nearby stations are selected and the final weight vector is computed
(Eq. 4), a base grid point estimate is developed using the weighted
average of all nearby stations:

Next, the variable-elevation linear regression coefficients are solved
for

D02 define the method to adaptively adjust the station search radius until
the minimum number of needed stations is met. Here, we do not adjust the
search radius and instead attempt the regression and uncertainty estimation
when

Finally, D94 found that normalizing the precipitation lapse rate (km

Several post-processing steps are undertaken to reach the final gridded estimates after all grid points have an initial estimate, shown in Fig. 5. These include updating estimated slope values, applying spatial filters, and recomputing the final fields.

Flowcharts describing the

The initial precipitation normalized slope estimates are used to recompute the default slope if the user specifies (Table 4). In this case, all grid points with valid regression slopes are used to compute the domain mean normalized precipitation slope. This value is then substituted at all grid points with default slope estimates. Next, a 2-D Gaussian filter is applied to the normalized slopes to reduce noise and smooth the artificial numerical boundaries in slope values and is taken as the final precipitation slope estimate (Fig. 5a). The parameters of this spatial filter (size and spread) are specified in the TIER model parameter file (Table 4).

After the slope estimates have been finalized, the precipitation is field is recomputed using Eq. (15) and then a feathering process is applied to smooth any remaining very large gradients (e.g., D94; Fig. 5a). The feathering routine operates on the normalized precipitation slopes and searches for grid cell–grid cell gradients in the normalized slope larger than a user-specified value (Table 4). If a large gradient is found, the slope of the grid cell with less precipitation is increased until the gradient falls below the maximum allowable value. The feathering routine iterates over the grid until there are no remaining large gradients and is an additional smoothing step for precipitation in TIER. Also, the feathering routine only runs for grid cells with larger elevation changes than a user-specified minimum gradient (Table 4), which effectively ignores flat areas (D94), and in TIERv1.0 the feathering routine only operates on grid cells above a user-specified minimum elevation.

Finally, uncertainty estimates are recomputed for the entire grid, first for
the base estimate and slope components, then for the total uncertainty of
Eq. (15) (Fig. 5a). For those grid points with no initial

Post-processing for temperature is simpler than that for precipitation because the temperature lapse rates are in physical units. The initial valid temperature slope estimates are used to recompute the default lapse rate if the user specifies (Table 4) when there is no spatially varying default temperature lapse rate information provided (Table 3). Again, the mean of all valid regression slope estimates is used as the updated default temperature lapse rate for this case. As for precipitation, a 2-D Gaussian filter is then applied to the slopes to reduce noise and smooth the artificial numerical boundaries in slope values and is taken as the final temperature slope estimate (Fig. 5b). Then, the final temperature estimate is computed using these updated lapse rate values and Eq. (15).

As for precipitation, the component and total uncertainty estimates are then
finalized for temperature. The base temperature estimate uncertainty and
slope uncertainty are smoothed using the 2-D Gaussian filter to estimate the
final component uncertainties,

An example use case over the western United States, focused primarily on the
Sierra Nevada mountains between roughly 35–43

The TIER test domain with

The gridded output fields are nearly unbiased for all three variables: 0.2 mm,

Calibration sample evaluation statistics for TIER using the default parameters with 90 % confidence intervals in parentheses.

Calibration sample evaluation scatter plots (TIER versus observations)
for

Figure 8 highlights the methodological choice made in Sect. 2.3.2 to
disassociate the intercept parameter from the regression estimated slope in
Eq. (15) for precipitation. We compare estimates using

Comparison of precipitation (mm) estimates using

These differences could be due to several reasons, including that TIERv1.0 parameters were subjectively tuned for the published methodology. Also, in-sample validation does not truly determine method performance, an out of sample verification exercise and further evaluations should be undertaken. The PRISM-similar method within the PRISM model performs extremely well and may likely be more appropriate for higher elevations given the tendency for these types of linear regression systems to underestimate precipitation above the highest observation when using smoothed DEM values (see Sect. 4d.4 and parameter B1EX in Table 1 of D94).

Here, we explore the impact of model parameter changes on the output values
and their associated uncertainty estimates. We modify TIER model parameters
only (no pre-processing parameters) and make three parameter changes to
parameters focused on different parts of the interpolation model for
different variables in an effort to concisely highlight how model parameter
choices impact the final product. First, we modify the inverse distance
weighting exponent in the distance-dependent weighting function for

Calibration sample evaluation statistics for the three experiments.

In experiment 1, the parameter “distanceWeightExp” (Table 4), which is the
exponent in the distance-dependent weighting function (Eq. 5), is modified
from 2 (default) to 1.75 (modified) for a spatial simulation of

Spatial distribution of minimum temperature (

For experiment 2, we examine precipitation and modify the “coastalExp” (Table 4)
parameter from 0.75 to 1 to examine the influence of changes to the
coastal proximity weighting. Qualitatively, the two precipitation
distributions are identical to the overall precipitation pattern, remaining
essentially unchanged (Fig. 10a–b). The difference fields show that there
are shifts in the precipitation placement throughout the domain through the
alternating positive/negative difference patterns, particularly across the
complex terrain (Fig. 10b), but essentially there is no net precipitation change with
a total relative difference of 0.2 % between the two estimates. Absolute
differences can be as large as 46 mm in areas of large total accumulations;
however, the relative differences in those areas are generally less than
10 % (Fig. 10b). Correspondingly, dry areas have smaller absolute
differences but sometimes larger relative differences, as can be seen along
the eastern third of the domain (Fig. 10b). The mean absolute value of the
cell-to-cell precipitation gradient is 1.56 mm km

Spatial distribution of

The total uncertainty is generally increased by a few millimeters across the domain (0.55 mm on average), with a corresponding relative increase in uncertainty of around 5 %–10 % (Fig. 10c). This is due to the fact that increasing this weight exponent decreases the weight of stations more dissimilar to the current grid point, effectively increasing the localization of the weights and increasing the variability of the leave-one-out estimates (Eq. 16).

Finally, we change the “nMaxNear” parameter (Table 4) from 10 to 13, which
controls the maximum number of stations used for each grid point the
interpolation model. Again, the precipitation pattern is essentially
qualitatively unchanged between the two configurations with a 0.2 % domain
average change; also see Fig. 11a. However, the relative difference fields
highlight larger and more systematic changes to the precipitation
distribution than in experiment 2. Areas of the highest accumulation in the
base case have less precipitation in the modified case (compare Fig. 10a and
Fig. 11a) with 89 % (

Spatial distribution of differences for model parameter
sensitivity experiment 3, modified maximum number of stations parameter
(nMaxNear).

Increasing the number of stations considered reduces the estimated
uncertainty across nearly the entire domain (Fig. 11b–c). On average, there
is a 2.1 mm (16 %) reduction in the domain mean uncertainty, with some grid
cells having reductions

The Topographically InformEd Regression (TIER) software was developed for several reasons. First, the systems for spatial modeling of meteorological variables from in situ observations have matured to the point that they are complex systems with many methodological choices and model parameters. TIERv1.0 provides an initial implementation of a knowledge-based statistical modeling system based on D94, D02, D00, D07, and D08 with the capability to explore different methodological choices in a systematic fashion. The system is modular so that new knowledge-based ideas can be added to the regression model through including new weighting terms. Model parameters are also accessible to the user, allowing for parameter perturbation experiments. More broadly, this should be viewed as a first step towards development of flexible, open-source systems that include many of the commonly used spatial interpolation models so the community can more fully understand methodological choices in gridded meteorological product generation (e.g., Newman et al., 2019). Understanding how methods and model parameters interact and modify the final output is key to improving these systems.

The parameter experiments performed here provide three examples highlighting how minor changes to one model parameter impact the final spatial distribution. For example, modifying the coastal weight exponent results in a shift in placement of precipitation across the domain (Fig. 10) and systematic changes in the estimated uncertainty. Increasing the maximum number of stations considered for the interpolation results in systematic changes to the precipitation distribution and decreases the sharpness of the final field (Fig. 11). Also, the spatial gradients of precipitation and total uncertainty changes are of opposite sign for experiments 2 and 3. In general, parameter changes that act to increase localization will enhance gradients and uncertainty, while those that decrease localization or increase sample sizes will decrease gradients and uncertainty. This highlights that parameter interactions could play a role in the final result through positive or negative feedbacks. Finally, experiments 2 and 3 result in non-significant differences as compared to the base case, while the MAE between the modified parameter sets in experiments 2 and 3 results in statistically significant MAE differences. Given the ability to perform parameter sensitivity experiments in TIER, we reemphasize the need for novel evaluation methods including out-of-sample station networks (e.g., Daly, 2006; Daly et al., 2017; Newman et al., 2019) that are as independent from the input networks as possible and integrated validation methods using ancillary observations such as streamflow and other modeling tools such as hydrologic models (Beck et al., 2017; Henn et al., 2018; Laiti et al., 2018).

Finally, TIER does not implement the exact system developed by Daly and colleagues and will not produce the same climate fields even with the same input data. TIER is not duplicating source code and every feature described in D94, D00, D02, D07, and D08, as TIER was developed as a knowledge-based system following these papers, not replicating them and other unpublished details. Also, TIER version 1.0 does not contain station input data pre-processing routines. Instead, example input data are provided in the example cases' dataset (Sect. 5). Station pre-processing and quality control can encompass a vast number of methods (e.g., Serreze et al., 1999; Eischeid et al., 2000; Durre et al., 2008, 2010; Menne and Williams 2009). These methods may be included in future releases or as separate community station quality control tools.

The TIERv1.0 code is available at

The input data for the example domain used here are available
at

AJN and MPC developed the TIER model concept. AJN implemented the model and developed the test case, model validation, and model sensitivity experiments. AJN and MPC contributed to the manuscript.

The authors declare that they have no conflict of interest.

Color tables used here are provided by Wikipedia (precipitation), GMT
(difference plots), and the GRID-Arendal project (

This research has been supported by the National Center for Atmospheric Research, which is a major facility sponsored by the National Science Foundation under cooperative agreement no. 1852977, and the US Army Corps of Engineers (USACE) Climate Preparedness and Resilience program.

This paper was edited by Richard Neale and reviewed by two anonymous referees.