Flux towers measure ecosystem-scale surface–atmosphere exchanges of energy,
carbon dioxide and water vapour. The network of flux towers now encompasses
Land surface models (LSMs) provide the lower boundary condition for climate and weather forecast models, simulating the exchange of carbon, water and energy fluxes between the soil, vegetation and the atmosphere (Pitman, 2003). Flux towers measure ecosystem-scale exchanges of carbon dioxide, water vapour fluxes and energy (Baldocchi, 2014) and have proven invaluable for LSM evaluation and benchmarking (Abramowitz et al., 2008; Best et al., 2015; Blyth et al., 2010; Haughton et al., 2016; Luo et al., 2012; Williams et al., 2009). Flux towers are particularly useful for modelling applications as they provide simultaneous observations of the meteorological data needed for forcing offline models as well as the key ecosystem variables against which models may be evaluated (e.g. sensible and latent heat) at time intervals similar to those used by LSMs, often over multiple years. As such, they are ideal for characterising the interactions between climate and ecosystem processes, and allow the evaluation of LSMs over time periods ranging from subdaily through to seasonal and interannual timescales (e.g. Blyth et al., 2010; Bonan et al., 2011; Mahecha et al., 2010; Matheny et al., 2014; Powell et al., 2013; Ukkola et al., 2016; Wang et al., 2011; Whitley et al., 2016). The investment in flux tower measurements is considerable and there are multiple benefits to these data being more widely used. First, the use of these data for LSM evaluation and benchmarking helps realise the value of existing investments. Second, where flux tower measurements identify biases in how LSMs represent processes, the potential exists to improve how well these models simulate the surface energy, water and carbon balances. Since LSMs are central to the simulation of key phenomena including droughts, water resource availability, carbon storage and feedbacks on heat waves, this has direct policy implications. Thirdly, greater use of flux tower measurements by the LSM and climate science community could help with the argument in support of ongoing resourcing of flux tower measurements. In short, the effective and widespread use of flux tower measurements is beneficial across the science and policy communities.
Before data from flux tower sites can be used in models, they commonly require significant preprocessing. In principle, flux towers provide near-continuous observations of ecosystem fluxes but, in practice, the measurements often include discontinuities due to instrument failure or unfavourable weather conditions (Reichstein et al., 2005). As LSMs must be provided with continuous meteorological forcing data, flux tower datasets require varying degrees of gap-filling of missing time steps. This also poses challenges for using these data for model evaluation and benchmarking. Ideally, models should be evaluated against high-quality observations. Due to data gaps, as well as measurement biases (e.g. Leuning et al., 2012), flux tower measurements do not provide reliable observations representative of the true ecosystem dynamics in all circumstances. Arguably, therefore, the full breadth of flux tower data available across the entire network is unlikely to be suitable to the role of evaluating LSMs.
FLUXNET, an international network of flux tower sites, is comprised of
In an effort to resolve some of these problems and to connect the flux tower
researchers with the LSM researchers more strongly, we present the R package
“
The package offers a useful tool for post-processing eddy covariance datasets for modelling applications and simplifies rigorous documentation of data processing methods in LSM studies to enhance their reproducibility. Specifically, future studies using these data would be able to explicitly demonstrate how the data were used, gap-filled, quality controlled and so on, and this could be reproduced by other users. In the following sections, we describe the different functionalities of the package.
General workflow of the
The
The package has two processing streams: the collection of site metadata and processing of high-frequency temporally varying variables. These are described in Sect. 2.3 and 2.4, respectively. The package outputs a separate NetCDF file for meteorological and evaluation variables, with metadata stored in each file. Additionally, a log file is produced detailing output file names, potential warnings and errors. The package also provides the option to produce diagnostic plots for further data exploration. Figure 1 illustrates the general workflow with each component described in detail below.
The
Input arguments to the main
The package is run by invoking a single R function called
Site metadata provided with the package. All attributes are provided for each Tier 1 site, with the exception of tower and canopy height.
The package collates metadata on the flux tower sites and stores these as
attributes in the output NetCDF files. These include information required
for modelling such as site coordinates, elevation and vegetation type. The
primary source for metadata is a site attribute file provided with the
package (stored in
Additionally, the code stores the dataset name and version (as set by the
This processing step connects key site metadata directly to each model's
forcing files. It can be extended to include additional metadata, such as
site soil or vegetation properties, with minimal code modifications. For
example, LSMs generally use plant functional types (PFTs) instead of the
International Geosphere-Biosphere Programme (IGBP;
Attributes required for each output variable (stored separately for
FLUXNET2015 and La Thuile data releases in
The package is supplied with a suggested list of output variables that will
be processed by the package for each site, where available. Separate lists
are provided for FLUXNET2015 FULLSET and SUBSET, and La Thuile data releases
due to different naming conventions and variables (stored in
The
meteorological variables include the data variables typically required to
force LSMs. The meteorological variables processed by the package by default
are detailed in Table S1 in the Supplement. The user can also nominate essential
meteorological variables that must be available and processed by modifying
the
The evaluation variables include the data variables typically predicted by
land surface models and used to evaluate model outputs. The default
evaluation variables processed by the package are provided in Table S2 in the
Supplement. The user can nominate preferred evaluation variables by modifying
the
In addition to common evaluation variables, the package also processes and
outputs uncertainty estimates provided with the FLUXNET2015 release by
default. These include uncertainty bounds for LE,
The code produces NetCDF files with whole years of data only to ensure LSM automated spin-up procedures remain relatively unbiased. It determines which years are included in its output according to user-defined thresholds for gap-filled and missing values as detailed below.
A threshold must be set for the maximum percentage of missing values per year
(argument
Additionally, thresholds can be set for the maximum percentage of all
gap-filling (default option; set by argument
If a threshold for gap-filling is set, the percentage of both gap-filled and
missing values must not exceed their respective thresholds for a year to be
processed. If no years fulfilling the criteria are found, or the time period
is shorter than the user-defined minimum number of consecutive years (set by
argument
Provided that at least one evaluation variable has fewer gaps than the
user-defined thresholds, all evaluation variables are written to the output
file by default, with the exception of any variables that only contain
missing values. An option is provided to discard any evaluation variables
with gaps exceeding the user-defined thresholds by setting the argument
LSMs require continuous forcing data, but a number of essential meteorological variables (rainfall, wind speed, incoming long-wave radiation and air pressure) are not fully gap-filled in the FLUXNET2015 FULLSET and/or La Thuile releases. The package provides two methods for gap-filling meteorological variables: statistical and ERA-Interim (Dee et al., 2011; Vuichard and Papale, 2015). Additionally, statistical methods are provided for gap-filling evaluation variables.
Downscaled ERA-Interim reanalysis estimates are provided as part of the
FLUXNET2015 dataset for gap-filling meteorological variables. These are
available only in the FULLSET version of the FLUXNET2015 release
(
This gap-filling option is chosen by setting the argument
Alternatively, meteorological, as well as evaluation, variables can be
gap-filled using statistical methods using a combination of methods depending
on the length of missing periods. This gap-filling option can be chosen for
meteorological and evaluation variables by setting arguments
Surface air pressure and incoming long-wave radiation are synthesised using
empirical functions (Abramowitz et al., 2012). Air pressure is calculated
from air temperature and elevation using the barometric formula as detailed
in Sect. S1.1 in the Supplement. Three methods for synthesising long-wave
radiation are provided (“Abramowitz_2012”, “Swinbank_1963” and
“Brutsaert_1975”) and are set by the argument
For all other meteorological and evaluation variables, short data gaps (by
default up to 4 h, set by argument
For meteorological variables, longer gaps (by default up to 10 days, set by
argument
For evaluation variables, longer gaps (by default up to 30 days, set by
argument
After performing the gap-filling, the code checks for missing values (as per Sect. 2.4.2). If missing values remain in any essential meteorological variables or all preferred evaluation variables at a given year, the year is removed from the outputs. If the remaining time period is shorter than the user-defined minimum number of consecutive years, the site is not processed.
Available unit conversions.
The code retains and outputs the original FLUXNET quality control (QC) flags,
when these are included in the output variable list. These flags are set to 0
for measured data, and 1, 2 and 3 for good-, medium- and poor-quality
gap-filling, respectively, for La Thuile and FLUXNET2015 FULLSET data
(Reichstein et al., 2005;
Additionally, the code produces QC flags for meteorological variables when they are gap-filled using ERA-Interim data or statistical methods. The QC flag is set to 4 when a time step is gap-filled with ERA-Interim data and 5 for statistical gap-filling. If a QC flag does not exist for a given variable, the code creates a QC flag variable with measured time steps set to 0 and ERA-Interim or statistically gap-filled time steps set to 4 or 5, respectively. This flag is automatically stored as a variable in the meteorological data output file and is named as the output variable plus the extension “_qc” (e.g. Precip_qc). See below for QC flag conventions when aggregating data to coarser time steps.
By default, the package outputs the data in its original time resolution.
However, a longer time step may be desired for some model applications. The
package allows the aggregation of the data to up to a daily resolution. The
aggregated time step size (in hours) is set by the argument
The package uses ALMA convention units for outputs by default where possible
(as indicated in Tables S1 and S2). These differ from the
original FLUXNET units for a number of variables and a conversion is
performed in each case. Available conversions are detailed in Table 4. If a
conversion is not available for the specified units, the code will produce an
error and terminate. Additionally, the package provides functions for
converting (i) vapour pressure deficit to relative humidity, (ii) relative
humidity to specific humidity and (iii) photosynthetically active radiation
(PAR) to incoming short-wave radiation (SW
For these conversions, saturated vapour pressure (
Examples of output plots produced by the package. Mean annual cycle
by month is shown in panel
The package provides an option to visualise output variables. Three types of
plots can be produced: a mean annual cycle, a mean diurnal cycle by season
and a time series figure. This is controlled by the argument
The outputs are retrieved from the output NetCDF files and all data variables are plotted with separate figures produced for meteorological and evaluation variables. Any missing values are ignored during plotting, but their presence is noted in the figure, when applicable. The data are plotted in their output units, with the exception of air temperature (converted from Kelvin to Celsius) and rainfall (converted from millimetres per second to millimetres per time step). It is envisaged the plots will complement the automated quality control performed during data processing and enable further detection of unsuitable data periods or sites.
Half-hourly time series of
Here we present an example application using
Once the data have been processed and outputted, they can be visualised. Three types of plots are produced by default: mean annual and diurnal cycles and a time series plot. Figure 2 shows an example of each type of output plot produced by the package. These plots can be used for further quality controlling to detect any anomalous data periods not automatically excluded by the package.
Efforts to better utilise existing observational data provide multiple benefits, including bringing research communities together, evaluating models against broader data and providing further support to groups seeking to maintain primary observations. To maximise the use of observed data by communities other than those that collect the data, it is advantageous to make the data as accessible and easy to use as possible. In the case of the FLUXNET data, one major community is the land surface modelling sciences. Land surface models are key components in climate modelling and are therefore critical to broader science and policy communities. It is important to take any opportunities to improve the evaluation of land surface models that exist; making FLUXNET datasets more reliably and easily available to the land surface modelling community removes a significant hurdle in that process.
To enhance transparency, to help reproducibility and as a platform for further community efforts we have presented an R package that transforms FLUXNET data into a form directly useable by LSMs. As released, FLUXNET data cannot be directly employed in LSMs due to data gaps, incompatible units and non-standard (land surface community) file format (CSV rather than NetCDF). The R package also collates metadata on data processing steps and the flux tower sites and stores these in the output files for easy access and to permit more reliable reproducibility for modelling experiments. Finally, the package generates visualisations of outputs to facilitate further quality control of flux tower data and to help inform appropriate site selection, an important step in applying these data to modelling studies.
The package is open source, fully documented and simple to use, requiring minimal input from the user. It allows multiple sites to be processed into a form usable by LSMs in a short R script. Simultaneously, it provides optional settings for an advanced user to produce flux tower datasets suited for specific applications. For example, the user may wish to process the data differently if interested in evaluating models during short-term phenomena (such as heat waves) compared to longer seasonal to annual scales. Importantly, the package provides a tool for producing flux tower datasets for modelling applications in a fully citeable and reproducible framework. The package is stored in a publicly available repository and is being actively developed with community contributions encouraged.
The
The authors declare that they have no conflict of interest.
We acknowledge the support of the Australian Research Council Centre of Excellence for Climate System Science (CE110001028). Martin G. De Kauwe was supported by Australian Research Council Linkage grant LP140100232. This work used eddy covariance data acquired and shared by the FLUXNET community, including these networks: AmeriFlux, AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada, GreenGrass, ICOS, KoFlux, LBA, NECC, OzFlux-TERN, TCOS-Siberia and USCCC. The ERA-Interim reanalysis data are provided by ECMWF and processed by LSCE. The FLUXNET eddy covariance data processing and harmonisation was carried out by the European Fluxes Database Cluster, AmeriFlux Management Project and Fluxdata project of FLUXNET, with the support of CDIAC and ICOS Ecosystem Thematic Center, and the OzFlux, ChinaFlux and AsiaFlux offices. Edited by: Carlos Sierra Reviewed by: two anonymous referees