Models of soil organic matter decomposition: the S OIL R package, version 1.0

. Soil organic matter decomposition is a very important process within the Earth system because it controls the rates of mineralization of carbon and other biogeochemical elements, determining their ﬂux to the atmosphere and the hydrosphere. S OIL R is a modeling framework that contains a library of functions and tools for modeling soil organic matter decomposition under the R environment for computing. It implements a variety of model structures and tools to represent carbon storage and release from soil organic matter. In S OIL R, organic matter decomposition is represented as a linear system of ordinary differential equations that generalizes the structure of most compartment-based decomposition models. A variety of functions is also available to represent environmental effects on decomposition rates. This document presents the conceptual basis for the functions implemented in the package. It is complementary to the help pages released with the software.


Introduction
Soil organic matter decomposition is a fundamental process within the Earth system (Swift et al., 1979;Schlesinger, 1997;Jacobson, 2000).Through this process, carbon and other biogeochemical elements fixed by plants in the process of photosynthesis are transferred to the atmosphere and the hydrosphere in mineral form.This release of biogeochemical elements is fundamental for other processes in the Earth system, such as the global energy balance, with important consequences for climate.Soil organic matter decomposition is also a basic process for the availability of biogeochemical elements necessary for plant growth, therefore it has important consequences for agriculture and humanity.
Given the importance of soil organic matter decomposition, many models have been developed describing its dynamics (Manzoni and Porporato, 2009), but only few attempts have been made to synthesize them (e.g.Paustian et al., 1997;Wu and McGechan, 1998;Manzoni and Porporato, 2009).Although more than 250 different models of soil organic matter decomposition have been proposed since the 1930s, most of these models share common mathematical structures (Manzoni and Porporato, 2009).This suggests that it is possible to develop models that can generalize most of the models already proposed.In fact, Ågren and Bosatta (1998) have made important contributions to a general theory of organic matter decomposition with the development of the continuous quality theory.
In most models, soil organic matter is usually characterized by compartments with homogeneous decomposition rates, which in the continuous quality theory are approximated by a continuous function between a rank variable denoted as quality and the decomposition rate.The continuous quality approach introduces a high level of generality, but it also introduces limitations in terms of finding analytical solutions for complex representations of organic matter heterogeneity (Sierra et al., 2011).For this reason, the continuous quality theory has only been implemented to describe the average dynamics of soil organic matter decomposition.Furthermore, the description of microbial dynamics in the continuous quality theory lacks the generality needed to encompass different mathematical representations of microbial-substrate interactions.
A general theory of soil organic matter decomposition can benefit greatly from a synthesis of the different modeling approaches already proposed.It would help to identify model

General information about SOILR and R
The modeling framework we describe in this document is implemented in the R environment for computing (R Development Core Team, 2011).However, numerical ecosystem models are frequently developed in low-level programing languages such as C or Fortran.There are many advantages of using these low-level languages, specially in terms of computational efficiency; however, they are difficult to learn for scientists not formally trained in programing.At a different side of the spectrum of programing languages is the R environment; a high-level language in which ease of use may compromise efficiency.For many applications in ecosystem modeling, the nature and size of the problems are usually not large enough for this to be an important issue.Ease of use however, has been a major constraint for a wide adoption of models in ecosystem science.
Another important issue for model development is accessibility.Models coded in licensed software impose limitations in accessibility and future developments of new tools and models.Open-source software is ideal for guaranteeing that code can be freely distributed, used, and modified by everyone.
SOILR was developed in the R environment for computing to provide simple access of soil organic matter decomposition models in an open-source platform where the code is freely accessible.To see source code and examples of the functions implemented in SOILR the user only needs to type the name of the function in the R command shell.To obtain more detailed documentation, the user can simply type a question mark (?) followed by a function name.
R allows the integration of concepts from functional and object oriented programing and can interface with many other low-level programing languages (Chambers, 2008).R also allows the development of software using concepts of literate programing (Knuth, 1984), allowing the production of code and documentation within the same environment.Models and data analyzes in R are therefore easily reproducible.
As an open-source tool, SOILR is also open for contributions by users interested in improving the existing code, make corrections on bugs in the code, add new functions, improve efficiency and functionality, etc.

Brief history of SOM modeling
Although early models of soil organic matter decomposition employed geometric series or difference equations (e.g.Nikiforoff, 1936;Jenny et al., 1949), the predominant mathematical formalism since the 1940s is that of ordinary differential equations (Manzoni and Porporato, 2009).Representing decomposition of chemical substances by differential equations was introduced much earlier than that (Van't Hoff, 1884).However, within the ecological disciplines, Olson (1963) presented the first comprehensive treatment of mathematical models of organic matter decomposition, popularizing the model where X is either oven-dry weight, organic carbon, or energy in organic matter; L is the income of organic matter; and k a decay constant.Equation (1) treats soil organic matter as one single compartment with an overall decomposition rate representative of all substances within the soil matrix.It has been commonly noted that soil organic matter is heterogeneous, and the single exponential model of decomposition fails to account for this heterogeneity (Minderman, 1968;Swift et al., 1979).Earlier, Henin et al. (1959) proposed a model to account for the different rates of decomposition of labile and stable material, also considering the process of humification, i.e. the transfer of material from the labile to the stable pool.This model can be expressed as where X 1 represents the labile pool and X 2 the stable pool.
The parameter α represents the humification or transfer rate.
A different version of a two-pool model has been widely used for studies of litter decomposition, in which the system of equations takes the form (Minderman, 1968;Means et al., 1985) In this case, the two pools decompose independently from one another and the amount of litter inputs L is partitioned between the pools according to the parameter γ .Different variations of these models can be found in the literature, with different number of pools and transfer among compartments.
Two numerical compartment models have become standard in representing organic matter decomposition, these are the RothC (Jenkinson and Rayner, 1977;Jenkinson et al., 1990) and the Century (Parton et al., 1987) models.These two models have been used successfully to represent soil carbon dynamics at different spatial and temporal scales (Paul and Clark, 1996;Paustian et al., 1997).Although these models were developed on the grounds of pragmatism rather than based on strict mathematical formalisms (Bolker et al., 1998), they can be easily translated into systems of differential equations with the general model (Bolker et al., 1998;Paustian et al., 1997) where θ is a parameter set modifying the decomposition rate k, and m the total number of compartments representing the system.
In general, the number of compartments in this type of models is less than 10 ( Manzoni and Porporato, 2009), and the decomposition rate constant may be a function of temperature, moisture, and/or other edaphic conditions.
We make use of this mathematical abstraction (Eq.4), to propose a general model of soil organic matter decomposition.

A general model of soil organic matter decomposition
Models of soil organic matter decomposition are, in their large majority, specific cases of linear dynamical systems (Bolker et al., 1998;Manzoni and Porporato, 2009;Luo and Weng, 2011).Making use of this property, we propose a model that generalizes the majority of all previously proposed compartment models.This general model is given by where C(t) is a m × 1 vector of carbon stores in m pools at a given time t; A(t) is a m × m square matrix containing time-dependent decomposition rates for each pool and transfer coefficients between pools; and I (t) is a time-dependent column vector describing the amount of inputs to each pool m.
The matrix A(t) is particularly important because it defines both the model structure and the extrinsic effects on decomposition and transfer rates.For this reason we rewrite Eq. ( 5) as where ξ(t) is a time-dependent scalar containing the extrinsic or environmental effects on decomposition rates.Notice that the matrix A contains now constant coefficients defining model structure.
From this general Eq. ( 6), it is possible to derive a large variety of structures for compartment models.

The matrix A and model structure
Organic matter decomposition can be represented with a large variety of model structures and levels of connectivity among compartments (Swift et al., 1979;Bruun et al., 2008;Manzoni and Porporato, 2009).Different model structures are determined by the matrix A in linear dynamical systems (Bolker et al., 1998;Manzoni et al., 2009).For instance, the parallel or pure decay structure (Fig. 1) in compartment models is defined by a diagonal matrix of the form , where the entries in the diagonal represent the decomposition rate k j for each compartment j .A required condition is that all k j ≥ 0.
Compartments connected in series (Fig. 1) can be represented with a matrix of the form where the entries a i,j are the transfer coefficients of material from pool j to pool i.A required condition is that all a i,j ≥ 0. Similarly, feedback between adjacent compartments (Fig. 1) is defined by a matrix of the form More complex model structures are created by replacing zero entries in the matrix A, representing transfers between different compartments, i and j .
An important characteristic of the entries a i,j is that they are proportional to the decomposition rate, i.e. a i,j = α i,j k i , where α i,j represents the proportion of the decomposition rate that is transferred to pool i from pool j .Furthermore, 0 ≤ α i,j ≤ 1, and the column sum i α i,j ≤ 1, with representing the proportion of the decomposed material that gets released from the system from pool j .

The environmental term ξ(t)
The majority of organic matter decomposition models include functions f (x) that modify decomposition rates according to a set of time-varying environmental conditions {x 1 (t) . . .x n (t)}, such as temperature, moisture, evapotranspiration, etc. (Burke et al., 2003;Adair et al., 2008).In our model (Eq.6), the representation of these environmental effects is done with the term ξ(t), which is the result of the evaluation of the function or set of functions f (x i (t)), yielding a scalar value that can be directly multiplied to the matrix A. In this case, The values of f (x i (t)) are determined by different functions that depend on temperature, precipitation, and other environmental variables.Time-dependence is, therefore, introduced with time series of these environmental variables as input in the model.SOILR contains a library of functions that calculate environmental effects on decomposition rates based on functions reported for different models (Table 1).
More complex functions are also introduced in SOILR but are not included in Table 1 due to space limitations.The documentation and help files of SOILR contain a more detailed description of all functions.

Initial conditions
The linear dynamical system represented by Eq. ( 6), has many different solutions, but we are only interested in the solution that satisfies where C 0 is a m×1 vector with the value of carbon content in the different compartments i. C 0 must be specified in SOILR to run any possible model structure.

The vector of inputs
Inputs to the system from above and belowground components are represented by the vector I (t).This vector can also  2001) be expressed as where I (t) is a time-dependent scalar representing the total amount of inputs and the coefficients γ i represent the partitioning among the different pools.In this representation 0 ≤ γ i ≤ 1, and m i=1 γ i = 1.

Carbon release
A variable of interest in modeling soil organic matter decomposition is the amount of carbon leaving the system over time either in the form of CO 2 gas or as dissolved organic carbon.
We represent this flux with the general term r, which is given by where r is a m×1 vector containing the instantaneous release of carbon for all pools, and R is a m×m diagonal matrix with the release coefficients r j in its diagonal calculated from (8).

Analytical solution
Analytical solutions to Eq. ( 5) are implemented in SOILR only with the purpose of testing the performance of the numerical methods.However, we can only test cases under certain simplifications of the general model of Eq. ( 5).In particular, for a homogeneous system with constant coefficients, which is analogous to the decomposition of a single cohort www.geosci-model-dev.net/5/1045/2012/Geosci.Model Dev., 5, 1045-1060, 2012 of organic matter ( Ågren and Bosatta, 1998); Eq. ( 5) simplifies to With initial conditions as in Eq. ( 10), the analytical solution to this problem is given by If I (t) is not identically zero, then the solution of the linear system with initial conditions as in Eq. ( 10), is given by

Numerical implementation
The solution to the dynamical system described by Eq. ( 6) is discretized over time, with h denoting the time step and n the number of steps.The time step h may or may not be constant.Initial conditions are given at time t 0 = 0.The solution to the system is then given by where D r [f (C n ), h] is an r order finite difference approximation to the system of ODEs of Eq. ( 6) for each time step h (LeVeque, 2007); in other words, an ODE solver.
The choice for the ODE solver is flexible in SOILR.Currently, we provide the option to use a simple Euler forward method or an interface to the deSolve package of Soetaert et al. (2010).The function deSolve.lsoda.wrapper in SOILR, is a wrapper to the function lsoda in package deSolve.

The Model class
The general numerical model described by Eq. ( 6) is represented in SOILR by an object oriented design centering around the class Model.Nearly all computations possible in SOILR are facilitated by Model itself or its building blocks.The class defines the information all Model objects contain and how those objects behave when (generic) functions are applied to them (Chambers, 2008).Any model constructed with SOILR provides then a stable and consistent interface that encapsulates details about its implementation.This interface provide a number of advantages to the user: 1.A level of abstraction close to the mathematical description of the model being implemented, focusing on the scientific content rather than on technical details.
2. Stability of code generated by the user based on SOILR output.We believe that the more essential and less redundant the interface, the less likely it is to change in future revisions.
3. Safeguards against unreasonable (and therefore probably unintentional) input, from simple dimensional checks to mathematical consistency of all arguments of the general model described in Sects.2.2 through 2.7.
4. Smaller programs that are easier to read.
Developers also benefit from this implementation design by: 1. Freedom to change the implementation behind the scenes without breaking user code.
2. Easier maintenance, testing, bug tracking, and exception handling by reduced duplication Objects of class Model are initialized in SOILR by various functions, which are listed with the command ?Model.Most of them provide shortcuts for the construction of standard models such as those in Fig. 1.These functions usually do not require the user to consider the real building blocks of a Model object, but rather infer the needed information from objects more common in R such as data frames.However, all these high-level functions call GeneralModel, which is the most general constructor.The function GeneralModel initializes the Model object after applying various validity checks, but does not run the model itself, which instead is done by specific functions (methods) applied to objects of class Model.
The function GeneralModel takes five arguments representing a specific case of Eq. ( 6). 1.A vector t which contains the time values where the solution to the ODE system is sought.It can be of any length but must be of class "numeric".
2. An object of class "TimeMap" which implements the matrix A of Eq. ( 5) as a function of time (including its domain) thus allowing the same level of abstraction as the most general ODE solvers (Soetaert et al., 2010).
The class TimeMap is native to SOILR and simply extends a common R function to a mathematical valid function definition that includes the time domain where the function is defined.Thus it contains the necessary information to prevent extrapolations beyond the range of input data.Details about TimeMap are discussed in Appendix A.
3. A vector of class "numeric" containing the initial values C 0 of the ODE system.
4. Again an object of class "TimeMap" containing the inputs I (t) to the system as a vector valued function of time.The length of this vector must be equal to the dimension of the matrix A, and is checked as well as the time domain.
Once a new object of class Model is intialized from a call to GeneralModel, or one of its various wrappers, the new object can be queried by several generic functions.For example, to obtain the amount of carbon over time solving the system of ODEs, any object of class Model (and therefore all its subclasses) can be used as argument of the function getC.The call to this function returns a n × m matrix with the amount of carbon for each pool m at each time step n.Similarly, to obtain the amount of carbon release over time, the function getReleaseFlux can be called with any object of class Model as argument.The result will be a n × m matrix with the amount of released carbon for each pool m at each time step n.It is also possible to use the typical operators [ ] and $ for our Model class given access to methods even easier.For example, getC(mod) is equivalent to mod$C and something like: df=as.data.frame(cbind(getTime(mod),getC(mod))) can also be expressed as This avoids the necessity of the somewhat dangerous use of the @ operator in user code, which we strongly discourage, since it attaches the code to implementational details that may change in the future.
The implementation of specific models as subclasses of Model with methods for the generic functions will allow the integration of new functionality without major modifications to our current implementation.For example, once nutrient cycling and isotope dynamics are incorporated into SOILR, new methods will be developed independently without major modifications to the current implementation of carbon stocks and release.
The vignette GeneralModel provided with SOILR presents some further insight in the implementation of specific model structures with the help of the general tool-set provided by the Model class.The examples therein may also serve as templates to implement new model structures as desired by the user.Examples on how to use these model structures as a function are presented in Sect. 4.

Version control system, unit testing, and automatic documentation
The development of SOILR is aided by a significant amount of existing open-source software.To solve the ordinary differential equations produced by our framework, we rely on the well tested and documented deSolve package developed by Soetaert et al. (2010).We also use the open-source symbolic python library SymPy (SymPy Development Team, 2008) to compute analytical solutions for the models for which this is possible.The analytical solutions obtained from SymPy are used to automatically create unit tests for SOILR.
To constantly run these tests, we use another open-source software, the Runit package (Burger et al., 2009).The tests are distributed with the release version of SOILR and thus add to the transparency of its development.In addition, we use Sweave (Leisch, 2002(Leisch, , 2003) ) and the inlinedocs package (Hocking et al., 2012) to produce documentation and encourage a literate programing style (Knuth, 1984).
As a version control system, we use Mercurial (O'Sullivan, 2009) and the Trac (Edgewall Software, 2011) online project management tool, which includes ticket system, wiki, and online access to our source code.

Documentation
There are different types of documentation for SOILR.The first source of information is this document, which introduces the science and some general technical details.A second source of information is the documentation to each function provided within the package itself.To view this documentation, the user only needs to open R and type help.start().This will open a help window on a webbrowser.There the user only needs to go to Packages/SoilR to view a list of all the functions implemented.Clicking on each function will show details about the arguments of each function and examples on how to use them.For specific functions, the user can also just type the name of the function preceded by the question mark on the R command shell.For example, typing ?TwopParallelModel in the R command shell will open a help window with the description of the function.
To view the source code of the function, the user only needs to type the name of the function (without the question mark) on the R command shell.
A third source of information are the so called Package Vignettes.These are short documents illustrating the use of the package for specific purposes.Currently, we provide one vignette with version 1.0 of SOILR.This vignette illustrates the implementation of any model structure within SOILR.For future versions, we will provide vigenettes about fitting specific model structures to data and how to use SOILR for modeling radiocarbon.

Installing and loading SOILR
SOILR can be obtained from the Comprehensive R Archive Network (CRAN), the official repository for R packages with mirrors in places all over the world.Packages stored in CRAN can be downloaded directly from an R session.It can also be obtained from R-Forge, a repository for package developers.To install SOILR from CRAN, the user simply needs to type in the R command shell install.packages("SoilR").To install from R-Forge, the statement is install.packages("SoilR",repos="http://R-Forge.R-project.org").After installing the package, simply type library(SoilR) and the package is loaded into your R session.

Examples
In this section we present examples on how to run some of the functions implemented in SOILR based on the theoretical framework presented previously.Additional details about the implementations of each function and instructions on how to implement new model structures are presented in the vignette 'Implementing Compartment Models in SOILR: the GeneralModel Function' provided with the package.To view this vignette, simply type vignette("GeneralModel", package="SoilR") in the R command shell.

Implementation of a two pool model with connection in series: the ICBM model
One of the first models ever proposed to represent soil organic matter dynamics was a two-pool model with connection in series (cf.Eq. 2, Henin et al., 1959).More than 50 yr later, Andren and Katterer (1997) proposed the ICBM model, which is practically the same model proposed earlier by Henin et al. (1959), but including a term for temperature and moisture dependence of decomposition rates.The set of differential equations of the ICBM model are given by where α is a humification or transfer coefficient and ξ a parameter representing external effects on decomposition rates.
In the ICBM model, C 1 represents a "young" pool and C 2 an "old" pool.This set of equations can be rewritten using our model formulation, which gives with initial conditions (C 1,0 , C 2,0 ) T , and where with a 2,1 = αk 1 , γ = 1, and ξ(t) = ξ .The function ICBMModel in SOILR implements this model structure requiring as its arguments: (1) a vector of any length with the points in time when we are interested in finding a solution, (2) a column vector of decomposition rates (k 1 , k 2 ) T , (3) the value of α, (4) the value of ξ , (5) a column vector with the initial amount of carbon at the beginning of simulation (C 1,0 , C 2,0 ) T , and (6) the mean annual carbon input to the soil I .Andren and Katterer (1997) provided values for these arguments from a 35-yr field experiment manipulating carbon and nitrogen inputs to an agricultural soil in Sweden.For the case of a treatment in which the soil was left as bare fallow without N or C inputs, the ICBM model can be parameterized as with initial conditions (0.3, 3.96) T .Assuming SOILR is already installed, it is only necessary to write the following lines of code to run the ICBM model library(SoilR) times=seq(0,20,by=0.1) Bare=ICBMModel(t=times, ks=c(k1=0.8,k2=0.00605),h=0.13, r=1.32, c0=c(C10=0.3,C20=3.96),In=0) The call to ICBMModel simply initialize, the model and checks for consistency on its arguments.In this example, the decomposition rates are given in units of year −1 and the initial amounts of carbon in units of kg C m −2 .
To obtain the amount of carbon over time it is necessary to invoke the function getC storing the output into an object.For example, to store the amount of carbon from the object Bare the user can type CtBare=getC(Bare).
We implemented the different N and C treatments reported in Andren and Katterer (1997) from the set of parameters reported by those authors.The code necessary to reproduce Fig. 2 in Andren and Katterer (1997) is provided as an example with the function ICBMModel and can be accessed by  (1997).This figure can be reproduced typing example(ICBMModel) or attr(ICBMModel,"ex") in SOILR.
typing example(ICBMModel), or from the html help in R.This code produces Fig. 2, which is identical to Fig. 2 in Andren and Katterer (1997).

Alternative two-pool models
The ICBM model described in the previous section, although useful, does not offer too much flexibility in terms of the input arguments.For example, the litter inputs to the system could vary over time as well as the temperature and moisture effects on decomposition rates.In addition, there are other possibilities to implement a two pool model depending on the type of connection between pools (Fig. 1).
The parallel pool model structure can be implemented with the function TwopParallelModel, while a more general version of a series model structure can be implemented with the function TwopSeriesModel.Similarly, the feedback model structure can be implemented with the function TwopFeedbackModel.The inputs of litter and the modification of decomposition rates by external factors can be either constant or a function of time.Furthermore, the functions presented in Table 1, and their combinations, can be used as arguments in the different model structures providing a large variety of options to model decomposition with just two pools.In fact, the same flexibility can be obtained with any number of pools with the application of the more general function GeneralModel.
As an example, we show the differences obtained by running three different versions of a two-pool model with the same amount of carbon at the beginning of the simulation, similar rates of litter inputs, and equal decomposition rates (Fig. 3).As an illustration, we also ran the simulations with different options for the time dependence of the litter inputs and the decomposition rates.In the first simulation, we ran a model with a structure of parallel compartments.In this simulation, the litter inputs and decomposition rates were constant, but the decomposition rates were modified by average values of temperature and moisture according to the functions proposed in the Daycent model (Table 1).For the second simulation, we ran a two-pool model with connection in series introducing temporal variability in the amount of inputs using a sine function that artificially represents an annual cycle.In the third simulation, we ran a two-pool model with connection in series among compartments.The amount of litter inputs over time were calculated using random numbers over time.In this simulation, we also produced random numbers of temperature and moisture and applied the functions to modify decomposition rates according to the Century and the Demeter models (Fig. 3).The code to reproduce Fig. 3 is provided in the example of the function TwopFeedbackModel.
These simulations, without being necessarily realistic, simply show that small differences in model structure can produce very different predictions, even when the main parameters of the model remain unchanged basic functions of SOILR to represent the process of organic matter decomposition over time.
To implement more sophisticated models with a higher degree of complexity, it is possible to specify a larger amount of pools with complex functions representing the dependence of litter inputs, decomposition rates, and transfer between pools with other external variables, such as temperature, moisture, soil texture, nutrient status, among many other.

Implementation of the RothC model
RothC is a popular and widely used model for predicting organic matter dynamics over time.Although earlier versions of the model included five active pools and one inert pool (Jenkinson and Rayner, 1977), more recent versions only include four active pools plus the inert pool (Jenkinson et al., 1990).RothC is implemented within SOILR and we provide details here about this implementation to illustrate the use and potential implementation of any other model.RothC can be described by the following set of differential equations where C 1 represents the decomposable plant material (DPM) pool, C 2 the resistant plant material (RPM) pool, C 3 the microbial biomass (BIO) pool, C 4 the humified organic matter (HUM) pool, and C 5 the inert organic matter pool (IOM).This set of equations can be rewritten in the form or as The values of the decomposition rates are constant and given by: k 1 = 10, k 2 = 0.3, k 3 = 0.66, and k 4 = 0.02 (Jenkinson et al., 1990;Coleman and Jenkinson, 1999).The value of the transfer coefficients is determined by a function of soil texture.For the microbial biomass pool, transfer coefficients are calculated as where x is a value that determines the proportion of decomposed material that is respired as CO 2 and is given by where pClay is percent clay in mineral soil (Jenkinson et al., 1990).Similarly, the transfer coefficients for the humified pool are given by a 4,j = k j 0.54 The partitioning of incoming plant material is determined by the ratio DPM/RPM, which in RothC is set as 1.44.Therefore, γ = 0.59.Now, the basic structure of the RothC model can be written as The annual amount of inputs is set in the RothC model as With this parameterization, it is possible to run the RothC model without varying environmental effects on decomposition rates and observe how the system approaches steadystate for the different pools (Fig. 4).Parameter values, initial conditions, and litter inputs can be changed easily within SOILR to compare different predictions from this model.

Applications for large-scale modeling
R can handle a large variety of data structures, which offers interesting possibilities for the application of the functions implemented in SOILR.One important type of data structure is spatial data with global coverage.There is a large number of R packages to import and manipulate spatial data, which facilitates the technical aspects for running SOILR functions at the global scale.
As an example, we show here a simple calculation of climate decomposition indexes (CDIs) (Adair et al., 2008) using a global dataset of temperature, precipitation, and potential evapotranspiration available at 0.5 degree resolution on NetCDF files from the WATCH dataset (Weedon et al., 2011).The input dataset contained monthly average temperature, precipitation and evapotranspiration for the period 1958 to 2001.We calculated the CDIs as with f (T ) implemented by the function fT.Century1, and f (W ) by the function fW.Century as described in Table 1.Results can be easily plotted on a map (Fig. 5) and exported again to NetCDF files.Different combinations of the functions described in Table 1 can be used to calculate CDIs (Adair et al., 2008) and evaluate different hypotheses about the response of decomposition rates to moisture, temperature, and other variables.

Discussion
Many models of soil organic matter decomposition have been proposed previously, and there even exist some open-source tools to implement some of these models (e.g.Easter et al., 2007).We have developed a tool for implementing and running a large variety of these models with the idea of facilitating comparison among multiple models in an easy to use interface.In this section, we discuss some of the advantages and disadvantages of our approach.

Parameter space and structural domain
Models of organic matter decomposition are basically hypothetical abstractions about the structure and dynamics of soil organic matter.The multitude of models previously developed suggests that there exists a large number of hypotheses about the structure and functioning of soil organic matter, but there is basically little consensus on whether one model structure (hypothesis) would have more support on observations than other model structures (Manzoni and Porporato, 2009).The majority of modeling studies have focused on finding the set of parameter values of a particular model that best fit some observed data.This approach is useful and has provided much insight on understanding the rates of soil organic matter decomposition.However, from the perspective of assessing different hypotheses about the structure and dynamics of soil organic matter, parameter estimation can only give a narrow view of the more complex spectrum resulting from www.geosci-model-dev.net/5/1045/2012/Geosci.Model Dev., 5, 1045-1060, 2012 the combination of structure domain and the parameter space of models.SOILR provides the possibility of assessing both model structure and parameter values broadening the spectrum of ideas that can be assessed within one single analytical framework.
We believe this approach complements well, and could be even more powerful than previous approaches to assess performance of model structure with inter-comparisons among different modeling groups (e.g.Melillo et al., 1995;Wu and McGechan, 1998;Cramer et al., 2001;Randerson et al., 2009).Multi-model inter-comparison projects do not necessarily cover the whole domain of model structures, and may be subject to important issues, such as independence of code, bias of the whole model ensemble, inappropriate metrics to define model performance, etc (Knutti et al., 2009;Knutti, 2010).SOILR can partially help to overcome some of these problems for assessing the performance of model structure through the option of using alternative functions to represent the same process.Philosophically, this approach is also similar to testing multiple working hypotheses.

Model hierarchies and functional programing
The gap between simulations and understanding described by Held (2005) is currently exacerbated by the continuous increase in detail and complexity of simulation models.Held (2005) suggests that a way forward to close the gap between simulations and understanding is by the development of model hierarchies in which large-scale complex models are particular cases of general models that are more amenable for understanding of system structure and behavior.
SOILR can also be viewed as a system for hierarchical modeling of soil processes.Consider for example, the environmental or external effects on decomposition rates, which here are denoted by the term ξ(t).In its more simple and general case, the external effects can be simply a constant (ξ(t) = c) that allows the understanding of model behavior without changes in environmental conditions.Simulations can then be run with a changing environment, for example with variable soil moisture (ξ(t) = f (W (t))).Soil moisture could depend on other variables, such as precipitation and potential evapotranspiration (W (t) = f (P (t), PET(t))), which in turn can be dependent on other functions.In this form, a hierarchy of models is build with the dependence of different functions on other functions.
In terms of programing, this concept of model hierarchies can be easily implemented in a functional programing style.A function that performs certain tasks can have as its arguments other functions that perform other tasks.These functions can be independent among each other so many different functions that perform the same task can be available within the same modeling environment.This is one of our goals with SOILR, to provide a modeling environment that serves as a repository of different functions that can perform the same task so their performance can be easily compared.It also allows building soil organic matter decomposition models in a hierarchical framework because functions can have a large number of dependences on other functions creating a bridge between simple general models and detailed modeling constructions under the same basic principles.

Scope and limitations
Our approach to soil organic matter decomposition modeling with SOILR may have some limitations.One potential limitation is the use of a high-level programing language that can have issues in terms of computational efficiency.This could be an important problem if specific models structures with a large number of computations per time step are applied to a large number of points such as a large spatial grid.Other programing languages such as C or Fortran may be more suitable for these tasks, although a good programing style can avoid many issues of computational efficiency in R. It is important to keep in mind that the scope of SOILR is more for the exploration of different model structures and hypotheses about soil processes rather than for large computational tasks; however, some parallelization tools within R could be used for this purpose.We recommend that once a specific model structure is identified as useful for a large computation, the entire model object is translated to other language or optimized for running in R. We are exploring these possibilities to include in future releases of SOILR.
Another potential limitation is the incompatibility of our conceptual framework to represent the decomposition process as a non-linear dynamical system.Non-linear dynamics are important for representing microbial processes such as priming, and are very relevant for simulations at short time scales (Wutzler and Reichstein, 2008;Manzoni andPorporato, 2007, 2009).Non-linear models however, can be easily implemented in R with the deSolve package.We are considering two possibilities to include non-linear dynamical systems within our theoretical framework.One possibility is to provide a framework to specify and linearize non-linear systems and solve the linear equations as presented here.Another possibility is to include a set of non-linear models solving them directly with deSolve.We aim for further releases to include this additional functionality.

Conclusions
We have developed a modeling environment for representing the process of soil organic matter decomposition.This tool, SOILR, is an open-source package for the implementation and testing of different representations of the process of soil organic matter decomposition.The main characteristic of SOILR is its hierarchical structure in which we describe a general model that can accommodate any possible model structure of a multi-pool model of decomposition.
More detailed models can be implemented to simulate specific controls on the decomposition process.This allows for testing of multiple working hypotheses about the structure and functioning of soils and their behavior over time.SOILR not only allows for exploring dynamics on the parameter space of a model but also on the structural domain.
This first version of SOILR allows simulations of organic matter decomposition and future versions will include representations of other biogeochemical elements such as nitrogen and phosphorus as well as their isotopic composition.A module for parameter fitting will also be included in future releases.

Appendix A Implementation details and design decisions regarding SOILR A1 The choice of the object-oriented programing (OOP) system
While in object oriented languages the implementation of object oriented design principles is usually hardwired, in R it is a choice between at least three different approaches, thus forcing the user to explain why a particular system was used.
In our case the choice of S4 is a compromise.On one hand there is very limited support in S4 for real encapsulation, expressed most notably by the possibility to read and even write the values of slots from outside the class by the slot function or the @ operator, respectively.This default behavior is actually not even trivially changed.On the other hand there are some important advantages: 1. Compatibility with the functional paradigm of R and its associated benefits such as lazy evaluation, concurrency and in the future maybe even automatic parallelization, which depends on the absence of side effects (unavoidable with a reference based system like R5).
2. The integration with R's standard library, which increases a. the probability of future maintenance, and b. acceptance by other R users.
3. The relative seamless integration to common R workflows without in depth knowledge of object oriented programing.
Considering that if R is capable to implement an OOP system by means of on board tools it will probably be flexible enough to change the default behavior of this OOP system with on board tools as well.SOILR is a rather small package and we think S4 is and appropriate choice for our needs.

A2 Comparison between SIMECOL and SOILR
SIMECOL (Petzoldt and Rinke, 2007) an R package for the implementation of ecological models, is an alternative to our own implementation.Here we compare the two approaches in terms of their design.

A2.1 Scope
SOILR is focused solely on soil organic matter decomposition, while SIMECOL aims to be a general framework for ecological modeling in R. SIMECOL represents an abstraction level somewhere between SOILR and R's S4 system.Hence, the obvious way for SOILR to make use of SIMECOL would be to implement our Model class as a subclass of SIMECOL 's simObj.One motivation to do so consists in the possibility to inherit functionality from the ancestor which is usually additionally shared by other subclasses thus enabling easier communication between them.Whether subclassing is the appropriate strategy thus depends on the amount of inheritable code and the number of subclasses.We found that within SOILR, Model would be the only one subclass of SIMECOL's model with nearly no code shared owing to differences in functionality and design goals (see below).

A2.2 Functionality
One major difference with SIMECOL's simObj is that SOILR's Model initialize method requires objects of class TimeMap or its subclasses, thus guaranteeing that the different objects conforming a model share a common support in the time domain; even if the creation of TimeMap objects usually takes place far away from the initialization of a Model object since this usually happens behind the scenes, e.g. by conversion of data.frames the user provides as input.This is a significant difference from SIMECOL's approach.Implementing this functionality in a subclass of SIMECOL's model would be misleading and would not conform well with our design goals.

A2.3 Design goals
The main aim of SOILR's Model class is to provide a fence between interface and implementation.While the interface should change as little as possible to protect code using the class, we want to have as much freedom as possible to change its implementation.We strive to achieve this by avoiding every possible syntactic distinction between references to object data and object methods from the outside.With R's S4 system, one (and maybe the only) way to do so is to wrap all access to object data with get and set methods.This is one aspect of a concept often referred to as encapsulation and regarded as one of the key components of OOP.In languages such as Smalltalk (Kay, 1993) or Ruby (Matsumoto and Ishituka, 2002), fully dedicated to OOP, this approach is mandatory: A method call (a message send to an object) is www.geosci-model-dev.net/5/1045/2012/Geosci.Model Dev., 5, 1045-1060, 2012 the one and only way for objects to communicate with each other.The S4 approach in R is much less strict and allows direct manipulation of the data belonging to an object by means of the operator @ (or the slot function respectively), thus revealing the implementation of an object to the outside code using this object.User code based on @ will break if the implementation changes and the desired slot disappears, while a get method may be able to retrieve the desired information from other data.The use of the slot operator from outside should therefore be avoided (Genolini, 2008;Gentleman, 2003).Imagine for instance a circle object c that stores its radius and computes the diameter when getDiameter(c) is called.If the radius is accessed solely by calls of getRadius(c) the implementation of the circle class can later be changed to store the diameter and compute the radius without any affect to outside code using the class.
Applications to SOILR's Model occur if one considers the increasing number of constructors with redundant parameter sets.The distinction between parameters and derived variables may even become difficult.Differing from SIMECOL we unify access to Model elements and avoid a syntactic difference of parameters and derived values as expressed by SIMECOL's functions params and out.In addition to the get... functions we provide for all data intended to be accessed from outside, access through the typical operators [] and $ for our Model class.This gives access to all data the model can provide, either during the initialization or computed by the methods of the class.In fact, [] and $ internally use the get... methods and may perform any number of checks and computations to provide a consistent output.This facility is powerful enough to avoid the use of @ completely for our class.This is exactly our intention, since the implementation of the class may change, accessing its elements via @ is strongly discouraged.While we will maintain the defined interface via [], $ and the get... functions we cannot guarantee that code relying on @ will work in future versions of SOILR.
Another difference with SIMECOL is also related to the [] operator.It enables the user to create the desired output on demand, thus avoiding duplication between output objects.If the user asks for "C" she/he gets "C" and not "time" and "C", which sometimes may not be intended if the user is concerned about issues of memory or duplication.However, it may be very useful to have both in other situations when the "times" argument to the constructor that created the model in the first place is not available anymore.With [] the user can decide which fields the output should contain.
An additional intended feature of our [] operator is to always produce data.framesand not a "Result" object that would (or at least should) be more protective of its data against alterations post simulation than a data.frame.If we provided code to post-process results we would certainly implement such a "Result" class to make sure that our post-processing is based on valid data.Since at the moment we don't, we also do not force a structured output object to the user.
At the moment we do not even provide output as deSolve objects, since every structured output is part of the interface, and may be relied on by user code.Although we are happy and thankful to use deSolve extensively within SOILR, making it a part of the interface would be a promise to the user that we will keep on supporting it, which is a much bigger commitment.
Until now the responsibility beyond the scope of class Model stays with the user.

Fig. 1 .
Fig. 1.Basic model structures implemented in SOILR.Squares represent the compartments, and arrows represent inputs and outputs to and from the compartments.These model structures are special cases of the matrix A.

Fig. 3 .
Fig. 3. Examples of three different representations of a two-pool model with different model structures and environmental effects on decomposition rates.The upper panel shows carbon stocks and the lower panel carbon release.Additional details about the implementation are given in the text.
. The simulations also serve to illustrate different possibilities in the use of the www.geosci-model-dev.net/5

Fig. 4 .
Fig. 4. Carbon accumulation for the different pools included in the RothC model.DPM: the decomposable plant material pool, RPM: resistant plant material, BIO: microbial biomass pool, HUM: humified organic matterpool, and IOM: inert organic matter pool.

Fig. 5 .
Fig. 5. decomposition index (CDI) calculated as the product of a function of temperature (fT.Century1) and a function of precipitation and potential evapotranspiration (fW.Century) using monthly data from the WATCH dataset(Weedon et al., 2011).

Table 1 .
Functions implemented in SOILR to represent the effects of temperature T , and moisture W on decomposition rates.