A reduced-order modeling approach to represent subgrid-scale hydrological dynamics for land-surface simulations: application in a polygonal tundra landscape

Existing land surface models (LSMs) describe physical and biological processes that occur over a wide range of spatial and temporal scales. For example, biogeochemical and hydrological processes responsible for carbon (CO2, CH4) exchanges with the atmosphere range from the molecular scale (pore-scale O 2 consumption) to tens of kilometers (vegetation distribution, river networks). Additionally, many processes within LSMs are nonlinearly coupled (e.g., methane production and soil moisture dynamics), and therefore simple linear upscaling techniques can result in large prediction error. In this paper we applied a reducedorder modeling (ROM) technique known as “proper orthogonal decomposition mapping method” that reconstructs temporally resolved fine-resolution solutions based on coarseresolution solutions. We developed four different methods and applied them to four study sites in a polygonal tundra landscape near Barrow, Alaska. Coupled surface–subsurface isothermal simulations were performed for summer months (June–September) at fine (0.25 m) and coarse (8 m) horizontal resolutions. We used simulation results from three summer seasons (1998–2000) to build ROMs of the 4-D soil moisture field for the study sites individually (single-site) and aggregated (multi-site). The results indicate that the ROM produced a significant computational speedup (> 10 3) with very small relative approximation error (< 0.1 %) for 2 validation years not used in training the ROM. We also demonstrate that our approach: (1) efficiently corrects for coarseresolution model bias and (2) can be used for polygonal tundra sites not included in the training data set with relatively good accuracy (< 1.7 % relative error), thereby allowing for the possibility of applying these ROMs across a much larger landscape. By coupling the ROMs constructed at different scales together hierarchically, this method has the potential to efficiently increase the resolution of land models for coupled climate simulations to spatial scales consistent with mechaistic physical process representation.


Introduction
The terrestrial hydrological cycle strongly impacts, and is impacted by, atmospheric processes.Further, a primary control on terrestrial biogeochemical (BGC) dynamics and greenhouse gas (GHG) emissions from soils (e.g., CO 2 , CH 4 , N 2 O) across spatial scales is exerted by the system's hydrological state (Schuur et al., 2008).Soil moisture also impacts soil temperature, which is another important controller of GHG emissions (Torn and Chapin, 1993).Since climate change is predicted to change the amount and temporal distribution of precipitation globally, there is a critical need for models to not only accurately capture subgrid heterogeneity of terrestrial hydrological processes, but also the impacts of subgrid hydrological heterogeneity on BGC fluxes.
Terrestrial hydrological states are important for climate prediction across a wide range of spatial scales, from soil pores to continental.The critical spatial scale relevant to soil moisture state and subsurface and surface fluxes may be as small as ∼ 100 m (Wood et al., 2011), although there is vibrant disagreement about the relative increase in predictability when trying to explicitly simulate at such high resolutions with limited observational data to constrain parameter values (Beven and Cloke, 2012).However, the importance of representing fine-resolution spatial structure in G. S. H. Pau et al.: Hydrological dynamics for land-surface simulations hydrological states and fluxes has been demonstrated for surface evapotranspiration budgets (Vivoni et al., 2007;Wood, 1997), runoff and streamflow (Arrigo and Salvucci, 2005;Barrios and Francés, 2012;Vivoni et al., 2007), and atmospheric feedbacks (Nykanen and Foufoula-Georgiou, 2001).It remains unclear what the critical spatial scale is for biogeochemical dynamics, but it has been shown that "hot spot" formation is important for wetland biogeochemistry at scales ∼ O(10 cm) (Frei et al., 2012) and for nitrogen cycle variations at ∼ O(m) (McClain et al., 2003).In contrast, the current suite of land surface models applicable at watershed (e.g., PAWS, Riley and Shen, 2014;Shen, 2009;regional, Maxwell et al., 2012;or climate, Koven et al., 2013;Tang et al., 2013) scales typically represent hydrological or biogeochemical cycles at ∼ O(100 m-km) scales.
The methods to represent spatial heterogeneity in hydrological and biogeochemical dynamics differ between watershed and regional or climate-scale models.While many current watershed-scale models explicitly represent lateral inter-connectivity for subsurface and surface fluxes, regionaland climate-scale models currently rely on a non-spatiallyexplicit tiling approach.For example, CLM4.5 (Koven et al., 2013;Lawrence et al., 2012;Tang et al., 2013), the land model integrated in the Community Earth System Model (Hurrell et al., 2013), represents land-surface grid cells with the same horizontal extent as the atmospheric grid cells (which can range from ∼ 1 • × 1 • for climate change simulations to ∼ 0.25 • × 0.25 • for relatively short simulations; Bacmeister et al., 2014;Wehner et al., 2014).These grid cells are disaggregated into a subgrid hierarchy of nonspatially-explicit land units (e.g., vegetated, lakes, glacier, urban), columns (with variability in hydrological, snow, and crop management), and plant functional types (accounting for variations in broad categories of plants and bare ground).Therefore, we contend that representing the much smaller spatial scales now recognized to control hydrological and biogeochemical dynamics in regional and global-scale models will require a reformulation of the overall design of these models.
One potential approach to represent spatial heterogeneity in soil moisture fields at resolutions finer than represented in a particular modeling framework is to relate the statistical properties of the soil moisture field with the spatial scale.Hu et al. (1997) showed that the variance (σ 2 θ ) of the soil moisture (θ ) field at different spatial averaging areas (A) can be related to the ratio of those areas raised to a scaling exponent (γ ).They also showed that γ is related to the spatial correlation structure of the soil moisture field and that it decreases as soils dry.Observational studies have described a power law decay of variance as a function of the observation scale (Rodriguez-Iturbe et al., 1995;Wood, 1998), and several investigators have demonstrated that the relationship between σ 2 θ and spatial scale is not "simple" (i.e., not log-log linear across all spatial scales; e.g., Das and Mohanty, 2008;Famiglietti et al., 1999;Joshi and Mohanty, 2010;Mascaro et al., 2010Mascaro et al., , 2011;;Nykanen and Foufoula-Georgiou, 2001).
A second potential approach to account for spatial heterogeneity in soil moisture states is to relate its higher-order moments to the mean, and then apply these relationships within a model that predicts the transient coarse-resolution mean.In many observationally based studies, an upward convex relationship between the mean and variance has been reported (e.g., Brocca et al., 2010Brocca et al., , 2012;;Choi and Jacobs, 2011;Famiglietti et al., 2008;Lawrence and Hornberger, 2007;Li and Rodell, 2013;Pan and Peters-Lidard, 2008;Rosenbaum et al., 2012;Tague et al., 2010;Teuling et al., 2007;Teuling and Troch, 2005).Theoretical analyses have also indicated that an upward convex relationship is consistent with current understanding of soil moisture dynamics (e.g., Vereecken et al., 2007).However, as discussed in Brocca et al. (2007), the relationships between soil moisture mean and statistical moments have been reported to depend on many factors, including lateral redistribution, radiation, soil characteristics, vegetation characteristics, elevation above the drainage channel, downslope gradient, bedrock topography, and specific upslope area.These large number of observed controllers and the lack of an accepted set of dominant factors argue that substantial work remains before this type of information can be integrated with land models to represent subgrid spatial heterogeneity.
Modeling studies have also been performed to investigate spatial scaling properties of moisture and how these properties relate to ecosystem properties.For example, Ivanov et al. (2010) studied spatial heterogeneity in moisture on an idealized small hill slope, and found hysteretic patterns during the wetting-drying cycle and that the system response depends on precipitation magnitude.Riley and Shen (2014) used a distributed modeling framework to analyze relationships between mean and higher-order moments of soil moisture and ecosystem properties in a watershed in Michigan.They concluded that the strongest relationship between the observed declines in variance and increases in mean moisture (past a peak in this relationship) was with the gradient convolved with mean evapotranspiration.Other studies have focused on upscaling fine-resolution model parameters to effective coarser-resolution parameters.For example, Jana and Mohanty (2012) showed that power-law scaling of hydraulic parameters was able to capture subgrid topographic effects for four different hill slope configurations.
Theoretical work to explicitly include spatial heterogeneity in the hydrological governing equations has also been applied to this problem.Albertson and Montaldo (2003) and Montaldo and Albertson (2003) developed a relationship for the time rate of change of soil moisture variance based on the mean moisture and spatial covariances between soil moisture, infiltration, drainage, and ET.Teuling and Troch (2005) applied a similar approach to study the impacts of vegetation, soil properties, and topography on the controls of soil moisture variance.Kumar (2004) applied a Reynolds averaging approach, and ignoring second-and higher-order terms, derived a relationship for the time rate of change of the mean moisture field that depends on the moisture variance.Choi et al. (2007) applied the model to a ∼ 25 000 km 2 Appalachian Mountain region for the summer months of 1 year and found that subgrid variability significantly affected the prediction of mean soil moisture.
The approaches described above to capture fine-resolution spatial heterogeneity within a coarse-resolution modeling framework have some limitations.First, the soil moisture probability density function is often very non-normal (Ryu and Famiglietti, 2005), making the sole use of variance as a descriptor of moisture heterogeneity insufficient.A similar problem arises with the Reynolds averaging approach that does not include higher-order terms.This approach also requires a method to "close" the solution (i.e., relate the higherorder terms to the mean moisture), and there is no generally accepted method to perform this closure.Perhaps the largest constraint of these approaches in the context of climate change and atmospheric interactions is that they cannot account for the temporal memory in the system that impacts biogeochemical transformations.In particular, the biogeochemical dynamics at a particular point in time depend on the state and dynamics that occurred in the past, and just knowing the statistical distribution of moisture at a particular time may not maintain the continuity required for accurate prediction.Therefore, for applications related to regional-to global-scale interactions with the atmosphere, a method is required that allows for (1) computationally tractable simulations (i.e., relatively coarser resolution); (2) spatially explicit prediction of the temporal evolution of soil moisture at relatively finer resolutions; and (3) integration of the relatively finer resolution soil moisture predictions with representations of the relevant biogeochemical dynamics.
To that end, we describe a generally applicable reducedorder modeling technique to reconstruct a fine-resolution heterogeneous 4-D soil moisture solution from a coarseresolution simulation, thereby resulting in significant computational savings.In this study, we built ROMs based on the proper orthogonal decomposition mapping method (Robinson et al., 2012), which first involved training the ROMs using fine-and coarse-resolution simulations over multiple years.Hydrologic simulations of coupled surface and subsurface processes for an Alaska polygonal tundra system were performed using the PFLOTRAN model (Bisht and Riley, 2014;Hammond et al., 2012).Simulations were performed for four study sites in Alaska with distinct polygonal surface characteristics and individual ROMs were built for each site.The resulting ROMs were then applied over periods outside of the ROM training period.
In Sect. 2 we describe the polygonal tundra site used for our simulations, the PFLOTRAN hydrological simulations configuration, and the methods used to develop and evaluate the ROMs.In Sect.3, these methods are used under different scenarios to develop ROMs for the polygonal tundra site that increase in generality in the following order: single-site ROMs (limited to a single site), multi-site ROMs (limited to sites included in the training data) and site-independent ROMs (applicable even for sites not included in the training data).For each of the above scenarios, different ROMs can be developed using methods that we propose in Sect.2; the applicability of a method to a given scenario is discussed in Sect. 2. We then compare the accuracy of the different ROMs and end with a discussion of limitations of the approach, possible improvements, and methods to incorporate the proposed ROM approach within a global-scale hydrological and biogeochemical model.

Site description and hydrologic simulation setup
In this study, we developed ROMs for hydrological simulations performed at four sites in the Barrow Environmental Observatory (BEO) in Barrow, Alaska (71.3 • N, 156.5 • W).The BEO lies within the Alaskan Arctic costal plain, which is a relatively flat region, characterized by thaw lakes and drained basins (Hinkel et al., 2003;Sellmann et al., 1975) and polygonal ground features (Hinkel et al., 2001;Hubbard et al., 2013).The Department of Energy (DOE) Next-Generation Ecosystem Experiments (NGEE-Arctic) project has established four intensely monitored sites (A, B, C and D, shown in Fig. 1) within the BEO in 2012 to study the impact of climate change in high-latitude regions.The four NGEE-Arctic study sites have distinct micro-topographic features, which include low-centered (A), high-centered (B), and transitional polygons (C, and D).The mean annual air temperature for our study sites is approximately −13 • C (Walker et al., 2005) and the mean annual precipitation is 106 mm, with the majority of precipitation falling during the summer season (Wu et al., 2013).The study site is underlain with continuous permafrost and the seasonally active layer depth ranges between 30 and 90 cm (Hinkel et al., 2003).
We applied a version of the three-dimensional subsurface reactive transport simulator PFLOTRAN, which was modified to include surface water flows, for simulating surfacesubsurface hydrologic processes at the four NGEE-Arctic study sites.The subsurface flows in PFLOTRAN are solved with a finite volume and an implicit time integration scheme, and are sequentially coupled to a finite-volume-based surface flow solution that is solved explicitly in time.Simulations at the four study sites were conducted using meshes at horizontal resolutions of 0.25, 0.5, 1.0, 2.0, 4.0, and 8.0 m.A constant vertical resolution of 5 cm with a total depth of 50 cm was used for all simulations.The simulations were carried out for four summer months (July-September Community Land Model (CLM4.5;Oleson et al., 2013).Vertical heterogeneity in soil properties was prescribed using data from Hinzman et al. (1991).A static active layer depth of 50 cm, corresponding approximately to the maximum seasonal value, was assumed for all simulations.Details of the model setup are provided in Bisht and Riley (2014).In the current study, the ROM was trained on 3 years of data (1998)(1999)(2000), and the ROM predictions for 2002 and 2006 were compared against fine-resolution simulations.

Development of the reduced-order modeling approach
The multifidelity ROM approach used in this study is based on the gappy proper orthogonal decomposition (POD) mapping approach (Robinson et al., 2012).Let p be a set of parameters that defines a particular solution or observation.
The set of parameters could include system parameters (e.g., vegetation distribution, soil types, and topography), climate forcings, time, and other quantities that have an influence on the system response.In this paper, the parameters that vary in the simulations that we have performed for each site are time (days for summer seasons in a year) and the climate forcings (precipitation and evapotranspiration rates) prescribed at that particular time.Then, given a sample set S N = {q 1 , . .., q N }, where q i is a set of parameters p, and N is the number of samples, we can compute the corresponding solution {f (q 1 ), . .., f (q N )}.In this paper, f corresponded to a simulated fine-resolution three-dimensional soil moisture field, but in general, f can be any spatial quantity of interest (e.g., soil temperature or GHG emission).

POD method
The POD approximation of f , f POD , is given by where are the POD bases and M is the number of POD bases.
The POD bases are determined through a singular value decomposition (SVD) of the data matrix given by W POD = f (q 1 ) − f , . .., f (q N ) − f : where U ∈ R N ×N are the left eigenvectors, V ∈ R N ×N are the right eigenvectors, and thus given by W POD V, and λ i are the associated eigenvalues with each POD basis.The POD method is similar to the principal component analysis (Jolliffe, 2002) and the Karhunen-Loeve decomposition (Moore, 1981).We computed the POD bases based on the kernel eigenvalue approach (Everson and Sirovich, 1995).
The number of POD bases (denoted by M) used to reconstruct the approximate solution to a certain level of error (ε λ ) can be determined by finding M that satisfies where λ T = N i=1 λ i .As mentioned in Wilkinson (2011), the dimensional reduction afforded by the POD method depends on the extent to which the components of f are correlated.We note that Eq. ( 1) only states how f is represented in a linear space spanned by the POD bases, but there are multiple approaches of determining α(p) = {α 1 (p), . .., α M (p)} for a given p.One optimal solution of α that minimizes the least squares error between f (p) and f POD (p), denoted by α POD (p), is given by However, α POD (p), determined using Eq. ( 4), does not lead to any computational savings since f (p) is the quantity we would like to approximate.Determination of f (p) can be avoided by using the POD projection method (Willcox and Peraire, 2002), which discretizes the governing equations using the linear space spanned by ζ POD i and solves the resulting algebraic equations for α POD (p).However, the POD projection method requires extensive modification of the existing code of the simulator, and is thus not suitable for existing LSMs.To demonstrate the limit of accuracy of POD-related methods presented in subsequent subsections (Sects.2.2.2-2.2.5), we determine α POD (p) based on Eq. ( 4) by evaluating f (p) explicitly and present the results in Sect.3.
In subsequent sections, we describe four different methods of developing a ROM that reconstructs the fine-resolution solution based on the coarse-resolution solution.Each of the methods is a modification of the basic POD method, but uses a different reference basis, data matrix, or method to compute α(p).The differences among the various methods for developing a ROM are summarized in Table 1.

POD mean method (POD-mean)
To overcome the difficulties associated with calculating α POD (p), we propose a POD-mean method (POD-mean).We first determine α POD (q), ∀q ∈ S N using Eq.(4); this step requires negligible computational overhead since construction of ROM based on the POD method already requires the determination of f (q), ∀q ∈ S N .We then construct a polynomial fit between α POD (q) and the mean of f (q) (i.e., fineresolution mean soil moisture, µ f (q)), which we denote as Table 1.Summary of differences between various methods used for constructing ROM.In the table below, f is the fine-resolution solution; g is the coarse resolution solution; h is given by Eq. ( 10); f and ĝ are given by Eq. ( 15); p is any given parameter set; q i is the ith parameter set in S N ; µ f (p) and µ g (p) are spatially averaged f (p) and g(p); and ζ , where α fit is a polynomial fit between α(q) and µ f (q).
α fit (µ f ).Then, for any given p, we approximate f by where µ g (p) is the mean of g(p), a coarse-resolution solution simulated at resolution x g > x f .This particular approach works well if (1) the relationships between α fit i (µ f ) and µ f exist; and (2) µ g is a good approximation of µ f .For the Arctic tundra study sites, we will show that these conditions hold true for i = 1, and f POD-mean x g is a good approximation of f .

POD mapping method (POD-MM)
In the POD-mean method, we only used the mean of the coarse-resolution solution, g(p), to reconstruct the fineresolution solution.The POD mapping method (POD-MM) attempts to use all information in g(p) to efficiently and accurately reconstruct the fine-resolution solution.The POD-MM method is a modification of the gappy POD (Everson and Sirovich, 1995).For the same sample set S N , we determine {g(q 1 ), . .., g(q N )}, where q i ∈ S N .As in Robinson et al. (2012), the multifidelity POD bases, ζ POD-MM i , are then determined through a SVD of the data matrix W POD-MM : where f is as defined before and ḡ = 1 where are components associated with the fine-and coarse-resolution models.Given a coarse-resolution solution g(p), we first determine where • 2 is the L 2 norm.We note that α POD-MM (p) is not simply given by Eq. ( 4) since ζ g,POD-MM i are not mutually orthogonal.The approximate solution, f POD-MM , where x g is the resolution at which g(p) is computed.

Second alternative formulation of the POD mapping method (POD-MM2)
We also introduce an alternative formulation of the POD-MM method (POD-MM2) to determine whether the number of POD bases required could be reduced for a fixed approximation error threshold.Instead of applying Eq. ( 6), we perform a SVD of the data matrix W POD-MM2 : where and g is the solution obtained from a piecewise constant mapping of g from the coarse-resolution grid of g onto the fine-resolution grid of f .By using the deviation of f from the mapped coarse-resolution solution g, we remove the bias resulting from the mismatch between the mean of f and g.We note that this alternative POD mapping formulation is possible since our coarse-and fine-resolution grids are nested (which will always be the case for the types of applications we are developing here).For non-nested grids, a linear mapping is expected to work as well, although we do not analyze that approach here.We denote the resulting POD-based vector as where ζ h,POD-MM2 i are the components associated with h.Given a solution g(p), the approximate f POD-MM2 x g (p) is then given by where α POD-MM2 is determined analogously to α POD-MM based on Eq. ( 8) with ζ g,POD-MM i replaced by ζ g,POD-MM2 i .

Third alternative formulation of the POD mapping method (POD-MM3)
When a solution is spatially highly correlated with a spatially varying parameter w, such as the topography, we may use this information in our reconstruction of the fine-resolution solution.This third alternative formulation of the PODmapping method approximates f by where and α POD-MM3 is determined analogously to α POD-MM based on Eq. ( 8) with ḡ and ζ f,POD-MM i replaced by ĝ and ζ f,POD-MM3 i , respectively.In the above, the correlation between w and f (g) is used to construct f ( ĝ) based on the following: where µ f (µ g ) is f (g) averaged over the domain and all the snapshots used in constructing the ROM, w x f (w x g ) is the model parameter evaluated at resolution x f ( x g ), and µ w x f (µ w xg ) is the mean of w x f (w x g ).
The POD-MM3 approach is developed to improve the performance of POD-MM method when one of the parameters is heterogeneous and spatially varying.This method is only applicable to site-independent ROM since the surface elevation is included as a parameter in the site-independent ROM but not in the single and multi-site ROMs.

Error definitions
We define the relative error of the POD method with respect to the true fine-resolution solution as This error measure gives the maximum theoretical accuracy achievable using POD-related methods.We also define ēPOD as the mean of e POD evaluated over a specified number of days.
For POD-X methods, where POD-X stands for PODmean, POD-MM, POD-MM2, or POD-MM3, the error measures can be constructed for each x g , and are defined as Similarly, we define ēPOD-X x g as the mean of e POD-X x g evaluated over a specified number of days.

Results and discussion
As described in the Methods section, we developed the ROM models for the four NGEE-Arctic Barrow study sites chosen for detailed characterization.The four sites differ in their topographic characteristics and therefore each site has a different dynamic soil moisture response to the same meteorological forcings.In addition, since the parameters varied in this study are time and the magnitude of the forcing terms, historical data (prior-year simulations) can be used to construct the ROM.The resulting ROM is subsequently used to predict future responses.For more general cases involving system parameters, statistical or adaptive sampling techniques are needed to generate S N (Pau et al., 2013a, b).For all study years, domain average soil moisture decreased during the first half of the simulation time period due to losses associated with evapotranspiration, while soil moisture increased in the latter half due to increased rainfall.Sites A and B had the lowest mean soil moisture, followed by site C, and then the wettest site, D.

Application of POD method
We first constructed four separate ROMs, one for each site, using the POD method and the finest resolution ( x f =  1) and ( 4).The mean relative error of the POD method, ēPOD , over 120 days in years 2002 and 2006 decreases with increasing M (Fig. 2).The M values at which we evaluate ēPOD correspond to decreasing ε λ = 10 −1 , 10 −2 , . . ., 10 −8 in Eq. ( 3).There is no significant difference between the error budgets as a function of M for 2002 and 2006.The number of POD bases for a given ēPOD increases with sites in the following order: A, B, C, and D.
The above observation cannot be deduced based solely on the probability distribution functions (PDFs) of the DEM (digital elevation model) of the sites (Fig. 3) even though DEM is the only quantity that is different between the models for the four study sites.For example, site D requires the most M although its DEM has the smallest standard deviation.The larger number of POD bases required by site D can be attributed to particularly non-smooth soil moisture PDFs under relatively saturated conditions.The POD method is more efficient when the approximated solution has more smoothness in the parameter space (i.e., a solution at a particular point varies smoothly with the parameters).Site D is relatively flat, and at the end of the summer season it tends to get completely saturated, thereby resulting in a discontinuity in the parameter space and requiring larger M.

Application of POD-mean method
To determine whether we can use the POD-mean method, we first examine the relationship between α POD i (q) and µ f (q) for all q ∈ S N .For all four sites, we found α POD 1 to be linearly correlated to µ f (Fig. 4).For i > 1, a simple correlation between α POD i and µ f cannot be found.We can thus approximate α POD 1 by α fit 1 (µ f ) = a 1 (µ f ) + a 2 , where a 1 and a 2 are determined from a least-square fit of α POD 1 (q) and µ f (q).In addition, µ f is well approximated by µ g , allowing us to use the POD-mean method.For x g = 8 m, the maximum and mean of e POD-mean x g are, respectively, 0.013 and 0.0016 at site A, and 0.016 and 0.005 at site D (Fig. 5).A mean error that is < 1 % can thus be achieved using POD-mean method.

Application of POD-MM method
As with the previous analysis, ROMs based on POD-MM were constructed using only soil moisture data from 1998  to 2000 and daily prediction of soil moisture at 0.25 m were made for 2002 and 2006 using only the ROMs and coarseresolution solutions.We only present our analyses for site A and D for brevity but the results are consistent with those from the remaining sites; Fig. 2 shows that site B should yield similar results to site A and site C to site D.
The mean error for the POD mapping method ( ēPOD-MM x g ) decreases monotonically with M for all x g = x > 0.25 m up to M = M optimal , after which ēPOD-MM x g starts to increase and fluctuate (Fig. 6).This behavior is consistent with results from Everson and Sirovich (1995) in their development of ROMs for face reconstruction.Although larger M improves the least square fit, it leads to overfitting and increases the uncertainty in the computed α POD-MM .This increased uncertainty in α POD-MM introduces significant random noise into the reconstructed fine-resolution solution, leading to fluctuating ēPOD-MM For a given M ≤ M optimal , ēPOD-MM x g decreases with x g , which implies that increasing the number of bases leads to a more accurate reconstructed fine-resolution solution.For site D, M optimal also increases with decreasing  larger information content allows more α POD-MM for more POD bases to be determined accurately.For the year 2006, M optimal = 33 for x g = 0.5 m and M optimal = 28 for x g = 8 m.For site A however, M optimal = 10 for all x g .As such, when the underlying dynamics that we want to capture are mild (as indicated by the small M optimal ), the dependence of M optimal on x g is weaker.For the results shown, ēPOD-MM x g is less than 6×10 −5 when f POD-MM x g is evaluated at M optimal for site D. In addition, the mean of (f POD-MM x g −f ) is 1.15×10 −5 and 6.72 × 10 −6 for sites A and D, respectively, indicating that there is only a negligible bias in the ROM solution.
The above approach requires knowledge of the true fineresolution solutions to determine M optimal .Alternatively, we can determine M optimal by examining the amount of variance represented by the first M POD bases.For M < M optimal , there is a linear relationship between log( ēPOD-MM ) and log(e λ M ); the slope of the line is dependent on x g (Fig. 7).In addition, e λ M < 10 −6 appears to be a reasonable criterion for determining M optimal .Choosing this value leads to M = 10 and M = 25 at sites A and D, respectively, for x g = 8.0 m.These values are very close to the M optimal values identified based on Fig. 6.We next analyze the daily variation of e POD-MM x g for years 2002 and 2006 (Fig. 8).The error e POD-MM x g is typically larger in the wetter periods although its maximum is below 0.01.We further examined the relative pointwise error, given by for the days with the largest e POD-MM x g ; they corresponded to day 1 of 2002 for site A (Fig. 9) and day 106 of 2002 for site D (Fig. 10).For site A, the maximum ε POD-MM x g is 2.77 × 10 −3 and the locations of large errors are not discernable from Fig. 9, indicating that large errors are only localized to small regions of the domain, resulting in small average errors, e POD-MM x g .For site D, the maximum ε POD-MM x g is 1.17 × 10 −3 , but a larger region of the domain has a higher ε POD-MM x g compared to site A, resulting in a higher e POD-MM x g (Fig. 10).In addition, the saturated portion of the solution has small fluctuating errors, as evident from ε POD-MM can remove these fluctuations by simultaneously taking into account both water content and saturation.

Application of POD-MM2 method
With the POD-MM2 method, the resulting error, ēPOD-MM2 for small M (Fig. 11).For example, for M = 1 and x g = 0.5 m, ēPOD-MM2 x g is an order of magnitude smaller than ēPOD-MM x g .However, the convergence behavior of ēPOD-MM2 x g with M is less well behaved as compared to the POD-MM.As a result, the minimum achievable value of ēPOD-MM2 x g is larger than the minimum achievable value of ēPOD-MM x g , especially for larger x g .The POD-MM method is thus preferred since it allows the error to be reduced systematically by increasing M, especially when x g is large.

Multi-site ROM
To construct a multi-site ROM, we used daily snapshots from all four sites for 1998-2000 to construct a single ROM.This is a first step towards developing a ROM that is applicable to the entire NGEE-Arctic study region.Based on the analysis performed using the POD method, we conclude that the POD-related methods can theoretically perform very well even when all four sites are considered in aggregate (Fig. 12).However, the number of POD bases needed to achieve similar accuracy is greater than when separate ROMs are constructed for each site (compare Figs. 2 and 12).
With the POD-MM method, ēPOD-MM x g ≤ 10 −3 when only a relatively small number of POD bases are used (Fig. 13).For x g = 8 m, the error is minimum when M = 30.The magnitude of the error is only slightly larger than single-site ROMs.Although this approach is less efficient since M is generally larger than M for the single-site ROM, it is still significantly faster than performing simulations at the finest resolution.A multi-site ROM is a good alternative to multiple single-site ROMs when the number of sites becomes large.In addition, if the sites have some similar features, a smaller number of snapshots is required per site, leading to lower  for day 106 for year 2002 at site D and for three soil depths; the top, middle, and bottom rows correspond to layers 0-5, 20-25, and 45-50 cm, respectively, from the surface.Regions with homogeneous red color in the panels reflect the fact that large regions of the solutions are saturated.computational cost needed to construct a single multi-site ROM compared to multiple single-site ROMs.POD-MM2 method is not used to develop a multi-site ROM for reasons given in our analysis of single-site ROMs.

Site-independent ROM
Here, we include the spatially heterogeneous surface elevation, as described by the DEM, in the parameter space during the construction of the ROM.We trained the ROM using the soil moisture solutions at sites B, C, and D and evaluated the performance of the ROM for soil moisture prediction at site A. The resulting ROM is denoted as a site-independent ROM since it is applied on a site that was excluded from the training data set.For the POD method, the error ēPOD for the siteindependent ROM decreases with an increasing number of bases but not as rapidly as ēPOD of single-or multi-site ROMs (Fig. 14).For the POD mapping method, the error ( ēPOD-MM x g ) also decreases slowly with M when compared to single-or multi-site ROMs (Fig. 15a).For x g = 8 m, the minimum ēPOD-MM x g is 0.025, occurring at M = 10.For has a negligible decrease for M > 10.
The PDF of f POD-MM x g for 0.4 ≤ θ ≤ 0.6 is reasonably close to the PDF of f (Fig. 16, shown for day 20 for year 1998 for which e POD-MM ).However, for 0.6 ≤ θ ≤ 0.8, the fit is poorer, with the PDF of f POD-MM x g resembling a dual-mode Gaussian distribution centered on 0.69 and 0.74.These peaks also deviate slightly from that of f .The three modes in the PDF in Fig. 16 correspond to the three different soil material properties used to characterize the subsurface structure of the polygonal landscape.
The predicted pointwise soil moisture errors at site A have a maximum relative error of 0.15 and a mean of 0.02 (Fig. 17, shown for 20-25 cm layer solutions of day 20 for year 1998), and is substantially less accurate than for the site-dependent ROM (Fig. 8).To improve the accuracy of the reconstructed fine-resolution solution, we study the use of the POD-MM3 method.Since Bisht and Riley (2014) demonstrated that the soil moisture at each soil layer is inversely correlated to elevation, we define f and ĝ in Eq. ( 14) as ) is the average of the elevation over site A, and µ f, i (µ g, i ) is the average of all f (g) in the training data.Since the DEM is a two-dimensional data set, and f (g) is three-dimensional soil moisture fields, f ( ĝ) is constructed separately for each vertical layer of the three-dimensional domain of our discrete models of the sites via Eq.( 19).
At x g = 8 m, a minimum of 0.017 is obtained for ēPOD-MM3 x g when M = 21 (Fig. 15b), compared to 0.025 for ēPOD-MM x g . The PDF of f POD-MM3 x g is also a closer approximation of the PDF of f compared to the PDF of f POD-MM x g (Fig. 16), and the heterogeneous structure of f is approximately reproduced (Fig. 17).In addition, for the fifth soil layer, the mean and variance of ε POD-MM3 x g , defined analo- , with f POD-MM x g in Eq. ( 18) replaced by x g , are more uniformly smaller than ε POD-MM x g .Fine-resolution soil moisture fields retrieved using the siteindependent ROM are quite accurate (< 1.5 %) given the large topographic differences between site A and the remaining three sites.In other words, this approach led to an accurate fine-scale soil moisture prediction for a site that was excluded from the training data set, but did share some topographic features with sites that were part of the training data set.Our hypothesis is that the level of error from the siteindependent ROM is well below that required for an accurate prediction of soil moisture impacts on BGC dynamics.x g versus for different x g at site A site-independent ROM constructed using POD-MM and POD-MM3 methods, respectively.The means are taken over 1998, 1999, 2000, 2002, and 2006.For example, at a moisture content of 0.4 m 3 m −3 , a mean relative error of 0.017 corresponds to an error in moisture content of 0.007 m 3 m −3 , which will have negligible impacts on GHG emission predictions.
To improve the performance of the site-independent ROM, the topography of the subdomains must be more carefully parameterized and sampled, allowing the impact of topographic variations on soil moisture to be captured by the ROM.For the above example, a larger number of sites needs to be included in the training data.More generally, the inclusion of any spatially heterogeneous parameter requires proper parameterization and sampling of that parameter.Appropriate parameterization and sampling of a heterogeneous parameter is a research question that will be addressed in our future work.

Application to larger-scale hydrological simulations
The POD mapping method shows great promise in allowing prediction of fine-resolution soil moisture dynamics using coarse-resolution simulations.Here we applied a factor of 2 5 difference in resolution and achieved soil moisture simulation errors of < 0.06 % during 2 years that were not included in the training data set, with an effective decrease in computation time of more than a factor of 1000.If the above results hold for simulations that include more sources of heterogeneity in the subsurface (e.g., conductivity) and surface (vegetation) properties, integration of the relevant ROMs into a land model such as CLM will allow for a much finer representation of processes than is currently possible, without a drastic increase in computational cost.
The results indicate that the POD-MM is insensitive to fine-versus coarse-resolution simulation biases.This result is potentially useful in cases where we know coarseresolution solutions are biased.For example, Chen and Durlofsky (2006) showed that an adaptive upscaling technique for subsurface permeability was needed to correct for bias in coarse simulation of synthetic channelized reservoir.To demonstrate that POD-MM corrects bias in the coarse solution, let g bias = (1 + δ)g, where δ is a prescribed perturbation.We showed that α POD-MM is determined by solving for which the solution is equivalent to solving Eq. ( 8).Therefore, a constant bias will not affect the accuracy of our approximation.To further support the above analysis, we constructed and validated single-site ROMs constructed using g bias for 0.01, 0.05, 0.1, 0.2, and 0.3 and for all x g studied earlier.
The results agreed with our earlier analysis and the errors are the same as when there was no bias (Fig. 18, shown for x g = 8.0 m but similar behaviors were obtained for all other x g ).Small differences only emerge at large M due to overfitting, the same reason for which we observed fluctuations in Fig. 7.However, the above analysis does not apply to x g for the 20-25 cm layer computed over 1998, 1999, 2000, 2002, and 2006.
relies on the assumption that the coarse-and the fine-resolution means have negligible differences.Thus, any bias in the coarse-resolution mean will lead to a biased f POD-mean x g . Further study is needed to study the biases in site-independent ROMs and the effects of coarseresolution bias due to the upscaling of heterogeneous soil properties across scales.
The Arctic Tundra sites that we have studied have spatial extents that are smaller and landscapes that are less heterogeneous than domains studied in typical regional-and climatescale simulations.Although our results conceptually demonstrate that the POD mapping method can accurately reconstruct fine-resolution solutions from coarse-resolution solutions, further development is needed to generalize the technique to problems of larger extent and diversity.The development of a site-independent ROM is one of the first steps in achieving this goal.
For larger-scale simulations, the parameter space that we are interested in is expected to be significantly more diverse (e.g., larger variations in topography and multiple landscape types).A single ROM will typically be inefficient since a large number of bases would be needed to accurately approximate the response of a diverse parameter space.Partitioning of the parameter space will allow us to construct multiple ROMs that are tailored to each domain.Dividing the parameter space based on landscape types is one possible approach.Partitioning strategies, such as treed partitioning (Gramacy and Lee, 2008), can also help minimize the number of ROMs that we need to build.
Directly downscaling from a 10 km scale (climate-scale) to 0.01 m (BGC-scale) may not be possible, especially if simulation at the finest scale is infeasible on the spatial extent used to simulate the coarse-scale solution.The coarsescale solution may also have insufficient information to accurately reconstruct the finest scale solution.We propose a hierarchical approach that involves using POD-MM methods to develop ROMs at multiple scales; the scales at which these ROMs are built may critically depend on scales of the different processes we are modeling.The POD-MM reconstruction procedure is then recursively applied to reconstruct solutions at a progressively finer scale, starting from the coarsest scale solution.In addition, proper parameterization (such as parameterizing the topography) will allow finer-scale simulations to be performed on subsets of the original domain.
As with any sampling-based technique, the POD mapping method performs well only if the snapshots of the solution used to construct the ROM form an approximation space that can reasonably represent the solution.In the cases that we examined here, the annual cycle of the climate forcing does not change drastically from year to year and the response of soil moisture to climate forcing was relatively smooth.We thus obtained well-predicted solutions using only data from a period of 3 years to build the ROM.However, for a more diverse parameter space, relying solely on historical climate forcings is insufficient.Statistical or adaptive sampling techniques should be used to sample the parameter space to ensure that future conditions not represented by historical data are accounted for.Accurately defining the extent of the parameter space is crucial.In addition, just as with any data assimilation technique, the ROM must be updated when new information is available, or when the forcing moves outside of the phase space under which the ROM was developed.For example, if we are using the ROM at a parameter point far outside the convex hull of the parameter space used to construct the ROM, it is a clear indication that the ROM needs to be updated to reflect the change in the extent of the parameter space.
The current method can be efficiently deployed within the existing CESM framework.For the cases that we have examined, the ROMs for the subsurface processes can be developed without considering the full coupled system so that the fine-resolution solutions can be determined more efficiently.Once ROMs are constructed, coarse-resolution predictions of soil moisture can be mapped onto a fine grid to predict biogeochemical processes at higher spatial resolution, while the land-atmosphere interactions can still be modeled at a coarser grid.We will explore how such a ROM framework can be robustly implemented within the CLM model in future work.
Finally, while the computational costs of evaluating the ROM are typically low, the initial computational overhead required to construct the ROM can be large.High-performance computing resources are needed to simulate the potentially large number of simulations required.Storing and retrieving the simulated solutions will also require a good database management system and efficient parallel IO.

Conclusions
In this paper, we describe the construction of ROMs for land surface models based on POD-related methods.ROMs were built for soil moisture predictions from the PFLOTRAN model for the four NGEE-Arctic sites.An initial analysis based on the POD method is first used to determine whether POD-related methods can be used to accurately approximate the soil moisture.We then use four different methods that utilize coarse-resolution solutions to reconstruct fine-resolution solutions to construct single-site, multi-site, and site-independent ROMs.We evaluate their performance against fine-resolution simulations.Both the single-site and multi-site ROMs are very accurate (< 0.1 %) with a computational speedup greater than 10 3 .The site-independent ROM has a relative error < 1.5 % when it is used to assess a site that is not included in the ROM training.However, the overall error magnitude is still quite low given the large topographical differences across the sites, thereby giving creditability for using ROMs in larger-scale simulations.We provide several approaches by which we can generalize our methods to problems of larger extent and diversity in this paper.We thus conclude that the integration of ROMs into an Earth system modeling framework is practical and can provide an accurate approach to spatial scaling.
) of each year between 1998 and 2006.Evapotranspiration and effective precipitation boundary conditions for the PFLOTRAN simulations were obtained from offline simulations of the www.geosci-model-dev.net/7

Figure 1 .
Figure 1.DEM for site A, B, C, and D. The spatial extent of each site is 104 m × 104 m.

Figure 2 .
Figure 2. Variation of the mean POD error ( ēPOD ), with respect to number of bases (M), in year 2002 and 2006 for single-site ROM constructed using the POD method.

Figure 3 .
Figure 3. Elevation distributions of the DEM for sites A, B, C and D.
improved by utilizing more POD bases in the approximation.

Figure 4 .
Figure 4. Relation between mean soil moisture µ f (q) = µ θ (q) and α POD 1 (q) for sites A, B, C, and D. The lines are linear fits to the data (symbols).

Figure 5 .
Figure 5. POD-mean error (e POD-mean x g ) at sites A and D in years 2002 and 2006 for single-site ROM constructed using the PODmean method.

Figure 6 .Figure 7 .
Figure 6.The variation of mean POD-MM error ( ēPOD-MM x g ) with respect to M for different x g at sites A and D in 2006.Results are shown for single-site ROM constructed using POD-MM method.

Figure 8 .
Figure 8.The POD-MM error (e POD-MM x g ) in 2002 and 2006 at sites A and D for single-site ROM constructed using POD-MM method.

Figure 9 .
Figure 9. Solutions of f , g, and ε POD-MM x g for day 1 for year 2002 at site A and for three soil depths; the top, middle, and bottom rows correspond to layers 0-5, 20-25, and 45-50 cm, respectively, from the surface.

Figure 10 .
Figure 10.Solutions of f , g, and ε POD-MM x g

Figure 11 .Figure 12 .
Figure 11.The variation of mean POD-MM2 error ( ēPOD-MM2 x g ) with respect to M for different x g at sites A and D in 2006.Results are shown for single-site ROM constructed using POD-MM2 method.

Figure 13 .
Figure 13.The variation of mean POD-MM error ( ēPOD-MMx g ) with respect to M for different x g at sites A and D in 2006.Results are shown for multi-site ROM constructed using POD-MM method.

Figure 14 .Figure 15 .
Figure 14.The mean POD error ēPOD at site A based on a siteindependent ROM constructed using a POD method that only utilizes soil moisture solutions from sites B, C, and D.

Figure 16 .
Figure 16.The probability density function (PDF) of f , f POD-MM x g

Figure 17 .
Figure 17.The top row shows the 20-25 cm layer solutions of g, f , f POD-MM x g

Figure 18 .
Figure 18.The variation of mean POD-MM error ( ēPOD-MM x g ) with respect to M for different δ and for x g = 8 m; δ = 0 is the reference case where there is no bias.Results are shown for sites A and D for the year 2006.
MM, POD-MM2 and POD-MM3 methods, respectively.(Please refer to each method's subsection in Sect. 2 for more details on the above variables.)