<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" dtd-version="3.0"><?xmltex \makeatother\@nolinetrue\makeatletter?>
  <front>
    <journal-meta>
<journal-id journal-id-type="publisher">GMD</journal-id>
<journal-title-group>
<journal-title>Geoscientific Model Development</journal-title>
<abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1991-9603</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>

    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-10-483-2017</article-id><title-group><article-title>BUMPER v1.0: a Bayesian user-friendly model for palaeo-environmental reconstruction</article-title>
      </title-group><?xmltex \runningtitle{BUMPER v1.0}?><?xmltex \runningauthor{P.~B. Holden et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Holden</surname><given-names>Philip B.</given-names></name>
          <email>philip.holden@open.ac.uk</email>
        <ext-link>https://orcid.org/0000-0002-2369-0062</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2 aff3">
          <name><surname>Birks</surname><given-names>H. John B.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff4">
          <name><surname>Brooks</surname><given-names>Stephen J.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Bush</surname><given-names>Mark B.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff6">
          <name><surname>Hwang</surname><given-names>Grace M.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Matthews-Bird</surname><given-names>Frazer</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Valencia</surname><given-names>Bryan G.</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>van Woesik</surname><given-names>Robert</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Earth, Environment and Ecosystems, The Open University, Walton Hall,
Milton Keynes MK7 6AA, UK</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Department of Biology, University of Bergen, P.O. Box 7803, 5020
Bergen, Norway</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Environmental Change Research Centre, University College
London, London WC1E 6BT, UK</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>Department of Entomology, Natural History Museum, Cromwell Road,
London SW7 5BD, UK</institution>
        </aff>
        <aff id="aff5"><label>5</label><institution>Department of Biological Sciences, Florida Institute of Technology,
150 West University Boulevard, <?xmltex \hack{\newline}?> Melbourne, FL 32901, USA</institution>
        </aff>
        <aff id="aff6"><label>6</label><institution>The Johns Hopkins University Applied Physics Laboratory, 11000 Johns
Hopkins Road, Laurel, MD 20723, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Philip B. Holden (philip.holden@open.ac.uk)</corresp></author-notes><pub-date><day>1</day><month>February</month><year>2017</year></pub-date>
      
      <volume>10</volume>
      <issue>1</issue>
      <fpage>483</fpage><lpage>498</lpage>
      <history>
        <date date-type="received"><day>25</day><month>August</month><year>2016</year></date>
           <date date-type="rev-request"><day>21</day><month>September</month><year>2016</year></date>
           <date date-type="rev-recd"><day>11</day><month>January</month><year>2017</year></date>
           <date date-type="accepted"><day>11</day><month>January</month><year>2017</year></date>
      </history>
      <permissions>
<license license-type="open-access">
<license-p>This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link></license-p>
</license>
</permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017.html">This article is available from https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017.html</self-uri>
<self-uri xlink:href="https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017.pdf</self-uri>


      <abstract>
    <p>We describe the Bayesian user-friendly model for
palaeo-environmental reconstruction (BUMPER), a Bayesian transfer function
for inferring past climate and other environmental variables from
microfossil assemblages. BUMPER is fully self-calibrating, straightforward
to apply, and computationally fast, requiring <inline-formula><mml:math id="M1" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 2 s to
build a 100-taxon model from a 100-site training set on a standard personal
computer. We apply the model's probabilistic framework to generate thousands
of artificial training sets under ideal assumptions. We then use these to
demonstrate the sensitivity of reconstructions to the characteristics of the
training set, considering assemblage richness, taxon tolerances, and the
number of training sites. We find that a useful guideline for the size of a
training set is to provide, on average, at least 10 samples of each taxon.
We demonstrate general applicability to real data, considering three
different organism types (chironomids, diatoms, pollen) and different
reconstructed variables. An identically configured model is used in each
application, the only change being the input files that provide the
training-set environment and taxon-count data. The performance of BUMPER is
shown to be comparable with weighted average partial least squares (WAPLS)
in each case. Additional artificial datasets are constructed with similar
characteristics to the real data, and these are used to explore the reasons
for the differing performances of the different training sets.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <title>Introduction</title>
      <p>Transfer functions are numerical tools that are widely used to infer past
environment from species (taxon<fn id="Ch1.Footn1"><p>A taxon is a biological entity of
any taxonomic rank (e.g. family, genus, species or group of morphologically
similar species)</p></fn>) assemblages of microfossils preserved in lake or ocean
sediments (Imbrie and Kipp, 1971; Birks and Seppä, 2004; Birks et al.,
2010). These taxon assemblages can provide a strong indicator of the
environmental characteristics of the habitat in which the microfossils were
found. Numerical relationships can be developed between present-day taxon
distributions and the environment, and by applying the principle of
uniformitarianism (Rymer, 1978), these relationships can estimate
palaeo-environments by examining the presence and abundance of palaeo-taxa.
Here we take a Bayesian approach and develop a transfer-function model that
is tested on three different organism types (chironomids, diatoms, pollen).
Although Bayesian transfer functions have a moderately long history (Vasko
et al., 2000; Korhola et al., 2002; Haslett et al., 2006) they are still not
routinely employed by palaeo-ecologists, who generally apply frequentist
approaches, such as weighted averaging (Birks et al., 1990). The advantages
of frequentist approaches are that they are relatively straightforward to
understand and apply, and that they are computationally fast. In contrast,
Bayesian approaches tend to involve more complex mathematics and can be
computationally demanding. The advantages of Bayesian approaches, however,
are numerous and include the use of prior information (van Woesik, 2013).</p>
      <p>The field of Bayesian palaeo-environmental statistics is rapidly developing.
Recent work includes the development of a pollen-based multinomial
regression model that assumes a Gaussian species response, with joint
inference across core time slices (Ilvonen et al., 2016) and a
foraminifera-based multinomial non-parametric response model that allows for
multi-modal and non-Gaussian taxon response curves (Cahill et al., 2016).
Parnell et al. (2016) have published an open-source R package Bclim (Parnell
et al., 2015) that uses pollen response surfaces to generate a series of
equally probable joint multivariate climate trajectories.</p>
      <p>The principal motivation for a Bayesian approach is that the
palaeo-environment is treated probabilistically, and can be updated as
additional data become available. Bayesian approaches therefore provide a
reconstruction-specific quantification of the uncertainty in the data and in
the model parameters. Bayesian models are in principle ideally suited to
multi-proxy reconstructions because the posteriors that are derived from
independent proxies can be combined. However, to our knowledge, no Bayesian
approach has yet been applied to a multi-proxy reconstruction, at least in
part because of the aforementioned difficulties with application, though we
note that Cahill et al. (2016) incorporate a second proxy (<inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">δ</mml:mi><mml:mn>13</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>C)
through a prior, assuming a normal likelihood <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mfenced close=")" open="("><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="italic">τ</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula> with constant precision <inline-formula><mml:math id="M4" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula>.</p>
      <p>Here we describe the “Bayesian user-friendly model for palaeo-environmental
reconstruction” (BUMPER) and demonstrate its general validity and ease of use
through applications to both artificial and real datasets. The model can be
regarded as a Bayesian analogue to weighted-averaging-type approaches (Birks
et al., 1990) because it assumes a unimodal response of each taxon to the
reconstructed environmental variable of interest. It therefore shares many
of the weaknesses of weighted-averaging-type approaches (Huntley, 2012;
Juggins, 2013), most notably by assuming that the responses to different
environmental variables are independent. It is the task of the ecologist,
assisted by multivariate techniques such as ordination (Hill and Gauch,
1980; Juggins and Birks, 2012; Juggins, 2013), to ascertain which
environmental variable is the dominant driver of change in the assemblage
through time. When strong interactions are suspected, approaches that
concurrently reconstruct the interacting variables may be more appropriate
(Huntley, 1993, 2012).</p>
      <p>The mathematics of BUMPER was developed and applied to the Surface Water
Acidification Programme (SWAP) diatom training set (Stevenson et al., 1991)
in Holden et al. (2008). The mathematics is unchanged here. Instead the
primary motivation of the present study is to document the developments to
the model needed to make it user-friendly. Two important steps were to,
first, develop an approach to generalise and automate the priors for the model
parameters (we do not assume that the user has the necessary expertise to
make these decisions) and, second, demonstrate that the model can be
applied to other organism types and environmental variables. We apply the
model to both artificial and real data, considering a range of different
organism types to demonstrate the model's general applicability.</p>
</sec>
<sec id="Ch1.S2">
  <title>The model</title>
      <p>The mathematics of BUMPER has been described in detail (Holden et al.,
2008). In this section we summarize the underlying philosophy. The principal
assumption underlying the model is that a biological taxon exhibits a
unimodal response to an environmental variable, so that both the probability
of presence and the expected abundance of a given taxon are maximized at the
environmental optimum favoured by that taxon. Moreover, the taxon optima are
fixed, so that the response to the environmental variable of interest is
assumed to be independent of other environmental variables.</p>
<sec id="Ch1.S2.SS1">
  <title>Parameter estimation</title>
      <p>A species (or taxon) response curve (SRC) defines the probability of an
observed count of a species, or taxon, as a function of the environmental
variable of interest. An SRC is defined by five (initially unknown)
parameters. These parameters specify the environmental optimum and tolerance
of each taxon, together with the probability of its presence, and its
expected abundance under optimal conditions (see Sect. 2.3 for details).</p>
      <p>BUMPER considers 2560 possible SRCs for each taxon<fn id="Ch1.Footn2"><p>The model of
Holden et al. (2008) considered 8000 SRCs. We here reduce the SRC
resolution from (20, 4, 5, 5, 4) to (10, 4, 4, 4, 4) in order to benefit
from the <inline-formula><mml:math id="M5" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 3-fold increase in computational speed.</p></fn> and
applies Bayesian parameter estimation to quantify how well each of the SRCs
fit the observations of the training set. Bayes equation states that the
probability that a model (each of the SRCs) is correct, in light of
observational data, is proportional to the probability that the data would
be observed if the model were correct. Expressed more formally, for each of
the <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula> 2560 SRCs considered for each taxon <inline-formula><mml:math id="M7" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>,
probability weights are derived from the observed environmental variable
<inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and the taxon count <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> in each training-set site <inline-formula><mml:math id="M10" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>:

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M11" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext>prob</mml:mtext></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mfenced close=")" open="("><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>X</mml:mi></mml:mfenced><mml:mo>∝</mml:mo><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E1"><mml:mtd/><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>×</mml:mo><mml:msub><mml:mo>∏</mml:mo><mml:mi>i</mml:mi></mml:msub><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            The first term on the right hand side is the prior, the probability that the
model is correct before we consider the data. The proportionality constant
means that we have to normalise the calculated probabilities. This is
achieved from the assumed constraint that for each taxon <inline-formula><mml:math id="M12" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>, the following
applies:
            <disp-formula id="Ch1.E2" content-type="numbered"><mml:math id="M13" display="block"><mml:mrow><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>j</mml:mi></mml:munder><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>Y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>X</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:mn>1.</mml:mn></mml:mrow></mml:math></disp-formula>
          From now on prob<inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mfenced open="(" close=")"><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mfenced></mml:mrow></mml:math></inline-formula> is the posterior probability,
the probability of the SRC given the training data <inline-formula><mml:math id="M15" display="inline"><mml:mi>Y</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M16" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula>.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <title>Environmental reconstruction</title>
      <p>To reconstruct the environment from an observed fossil count <inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> of
each taxon <inline-formula><mml:math id="M18" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> within the fossil assemblage, the probability-weighted SRCs
are used to derive likelihood functions for each taxon of the reconstructed
variable <inline-formula><mml:math id="M19" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>. Here we again use the Bayes relationship, this time stating
that the probability that a reconstructed value is correct in the light of
an observed taxon count is proportional to the probability that the taxon
count would be observed in that environment. Considering for now a single
observed taxon, Bayes' equation can be written as follows:
            <disp-formula id="Ch1.E3" content-type="numbered"><mml:math id="M20" display="block"><mml:mrow><mml:mtext>prob</mml:mtext><mml:mfenced close=")" open="("><mml:mi>x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mfenced><mml:mo>∝</mml:mo><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi>x</mml:mi></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mtext>prob</mml:mtext><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p>We derive a normalized likelihood function for the taxon, considering only
the first term on the right hand side of Eq. (3). This allows us to treat the
function as a probability distribution of the environmental variable, given
no prior knowledge and a count of this single taxon in isolation. As we do
not know the true SRC of the taxon with certainty (in general, the
calibration will have resulted in many SRCs with non-zero probability), the
likelihood function is derived from all significant SRCs, combined using the
probability weights calculated in Sect. 2.1, applying the law of total
probability:

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M21" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>L</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mfenced open="(" close=")"><mml:mi>x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mfenced><mml:mo>∝</mml:mo><mml:mtext>prob</mml:mtext><mml:mfenced close=")" open="("><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi>x</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:msub><mml:mo>∑</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:mi>x</mml:mi></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E4"><mml:mtd/><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>=</mml:mo><mml:msub><mml:mo>∑</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:mfenced open="{" close="}"><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mfenced><mml:mo>×</mml:mo><mml:mtext>prob</mml:mtext><mml:mfenced close=")" open="("><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mfenced></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            This expression is evaluated at 100 evenly spaced points across a range that
comfortably spans the training-set environmental range (see Sect. 2.4),
and is normalized to 1. (We note that this normalization step is required to
build the blended likelihood function in Eq. 6.)</p>
      <p>An alternative and less-constrained presence–absence likelihood function can
be derived (and normalized as Eq. 4):

                <disp-formula specific-use="align" content-type="numbered"><mml:math id="M22" display="block"><mml:mtable displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mfenced open="(" close=")"><mml:mi>x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mfenced><mml:mo>∝</mml:mo><mml:mtext>prob</mml:mtext><mml:mfenced close=")" open="("><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mi mathvariant="normal">|</mml:mi><mml:mi>x</mml:mi></mml:mfenced><mml:mo>=</mml:mo><mml:msub><mml:mo>∑</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:mfenced open="{" close=""><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mfenced></mml:mfenced></mml:mrow></mml:mtd></mml:mtr><mml:mlabeledtr id="Ch1.E5"><mml:mtd/><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mfenced open="." close="}"><mml:mo>×</mml:mo><mml:mtext>prob</mml:mtext><mml:mfenced close=")" open="("><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mtext>SRC</mml:mtext><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>x</mml:mi></mml:mfenced></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            BUMPER derives a linear combination of these two functions, as follows:
            <disp-formula id="Ch1.E6" content-type="numbered"><mml:math id="M23" display="block"><mml:mrow><mml:mi>L</mml:mi><mml:mfenced close=")" open="("><mml:mi>x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mfenced><mml:mo>=</mml:mo><mml:mfenced close=")" open="("><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">η</mml:mi></mml:mfenced><mml:msub><mml:mi>L</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mfenced close=")" open="("><mml:mi>x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mfenced><mml:mo>+</mml:mo><mml:mi mathvariant="italic">η</mml:mi><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mfenced close=")" open="("><mml:mi>x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          The motivation for this blended likelihood function is that <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> provides
a tighter constraint on the posterior, but the wide tails contributed by
<inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> reduce the possibility of the correct solution being ruled out by an
outlying taxon, for instance resulting from misidentification or unusual
taphonomic processes. We assume <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>, except when building a purely
binary, presence–absence model, when <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>1.0</mml:mn></mml:mrow></mml:math></inline-formula>. Hereafter, we refer to <inline-formula><mml:math id="M28" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula>
simply as the likelihood function of a taxon.</p>
      <p>The posterior probability distribution for the reconstructed variable
prob<inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mfenced close=")" open="("><mml:mi>x</mml:mi></mml:mfenced></mml:mrow></mml:math></inline-formula> is derived by combining any prior knowledge
prob<inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:mfenced close=")" open="("><mml:msup><mml:mi>x</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mfenced></mml:mrow></mml:math></inline-formula> with the product of likelihood functions of all
considered taxa in the assemblage, as follows:
            <disp-formula id="Ch1.E7" content-type="numbered"><mml:math id="M31" display="block"><mml:mrow><mml:mtext>prob</mml:mtext><mml:mfenced close=")" open="("><mml:mi>x</mml:mi></mml:mfenced><mml:mo>∝</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msup><mml:mi>x</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msub><mml:mo>∏</mml:mo><mml:mi>k</mml:mi></mml:msub><mml:mi>L</mml:mi><mml:mfenced open="(" close=")"><mml:mi>x</mml:mi><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>k</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:msub></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          The reconstruction is evaluated at the same 100 points as the likelihood
function, and the probabilities are normalized to 1.</p>
      <p>A point estimate for the reconstructed variable is derived as follows:
            <disp-formula id="Ch1.E8" content-type="numbered"><mml:math id="M32" display="block"><mml:mrow><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo movablelimits="false">∫</mml:mo><mml:mi>x</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mtext>prob</mml:mtext><mml:mfenced close=")" open="("><mml:mi>x</mml:mi></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="normal">d</mml:mi><mml:mi>x</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          A simple uncertainty metric for the reconstruction is derived as follows:
            <disp-formula id="Ch1.E9" content-type="numbered"><mml:math id="M33" display="block"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mo movablelimits="false">∫</mml:mo><mml:msup><mml:mfenced close=")" open="("><mml:mi>x</mml:mi><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:mi>x</mml:mi></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi mathvariant="normal">d</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msqrt><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
</sec>
<sec id="Ch1.S2.SS3">
  <title>Probability distribution</title>
      <p>To apply Bayes' equation we require an expression for the probability of
observing some count <inline-formula><mml:math id="M34" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> as a function of the environmental variable <inline-formula><mml:math id="M35" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>.
The assumption of a unimodal response leads us to assume that the expected
abundance (given presence) <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> of taxon <inline-formula><mml:math id="M37" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> in site <inline-formula><mml:math id="M38" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> can be fitted
by a Gaussian response curve (ter Braak and Barendregt, 1986) about some
optimum value <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> of the environmental variable <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, written as follows:
            <disp-formula id="Ch1.E10" content-type="numbered"><mml:math id="M41" display="block"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:msup><mml:mfenced close=")" open="("><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mrow><mml:mfenced open="/" close=""><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:mo>-</mml:mo><mml:msup><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mn mathvariant="normal">2</mml:mn><mml:msubsup><mml:mi>t</mml:mi><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msubsup><mml:mi>t</mml:mi><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          Here <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the expected abundance (given presence) at the taxon
optimum, and the tolerance <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is a measure of how rapidly the expected
abundance falls off away from this optimum.</p>
      <p>The probability of presence (e.g. used to calculate the <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> likelihood
function, Eq. 5) is also assumed to follow a Gaussian distribution
about the same optimum, though not necessarily of the same tolerance, and
can be written in terms of the expected abundance, as follows:
            <disp-formula id="Ch1.E11" content-type="numbered"><mml:math id="M45" display="block"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:msup><mml:mfenced close=")" open="("><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mfenced close="" open="/"><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mfenced><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is needed because we do not require the same tolerance for
<inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> , although we can achieve this with <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. We
illustrate the role of <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> through example; <inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>≪</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> describes a
taxon that is found at high abundance only in an environment that is
relatively close to its environmental optimum, but may also be present
(albeit only at low abundance) far away from this optimum.</p>
      <p>Taxon-count distributions are represented with a hurdle model (a zero-inflated distribution with truncation at zero of the distribution of
percentage counts). The probability of a non-zero count <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> (used to
calculate the <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> likelihood function, Eq. 4) is assumed to follow
an exponential decay, with the decay constant defined by the expected count
<inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mfenced open="/" close=""><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:mn mathvariant="normal">1</mml:mn><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, normalized so that the total probability
of all (non-zero) percentage counts equals the probability of presence
<inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, and is expressed as a continuous distribution, truncated at 0 and
100:
            <disp-formula id="Ch1.E12" content-type="numbered"><mml:math id="M56" display="block"><mml:mrow><mml:mtext>prob</mml:mtext><mml:mfenced open="(" close=")"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi mathvariant="normal">|</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mfenced><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mfenced open="(" close=")"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mfenced close="" open="/"><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mfenced><mml:mi>exp⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mfenced open="/" close=""><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mfenced></mml:mrow><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>exp⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mo>-</mml:mo><mml:mn>100</mml:mn><mml:mrow><mml:mfenced close="" open="/"><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:mo>-</mml:mo><mml:mn>100</mml:mn><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mfenced></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          The denominator normalizes the integral over the range <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>&lt;</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>≤</mml:mo><mml:mn>100</mml:mn></mml:mrow></mml:math></inline-formula>. We
note that the denominator is missing from the description of Holden et al. (2008), although it was correctly implemented in the model itself. The
probability of absence (<inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is given by <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <title>Species response curve (SRC) priors</title>
      <p>The SRCs are defined by five parameters: species or taxon optimum <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>,
tolerance <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, tolerance scaling <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, probability of presence at
environment optimum <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and expected abundance (given presence) at
environmental optimum <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. In general we have little a priori knowledge
of appropriate SRC values, and require their priors to be uninformative.
However, for computational tractability, we need to eliminate implausible
parameter values.</p>
      <p>Previous applications (Holden et al., 2008; Matthews-Bird et al., 2016) used
uniform priors for the SRC parameters over ranges that were chosen largely
on the basis of subjective judgement (and ascribing a probability of zero
outside of those ranges). However, requiring subjective judgement is not
desirable given our aim to maximize the user-friendliness of the model, and
it can be rather time consuming. To prevent subjectivity, BUMPER introduces a
simple automated process, based upon an indicative taxon tolerance <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> that is a characteristic of the training set:
            <disp-formula id="Ch1.E13" content-type="numbered"><mml:math id="M66" display="block"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>m</mml:mi></mml:mfrac></mml:mstyle><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:msup><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mi>k</mml:mi><mml:mtext>WA</mml:mtext></mml:msubsup></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          For each taxon <inline-formula><mml:math id="M67" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> we consider the environmental variable <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> at each
site where the taxon is present, and derive a root mean square (RMS)
distance from the weighted-average optima, <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mi>k</mml:mi><mml:mtext>WA</mml:mtext></mml:msubsup></mml:mrow></mml:math></inline-formula>. We average across
the <inline-formula><mml:math id="M70" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> taxa that have at least two training-set observations (<inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>≥</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The weighted-average optima are defined in the usual way:
            <disp-formula id="Ch1.E14" content-type="numbered"><mml:math id="M72" display="block"><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mi>k</mml:mi><mml:mtext>WA</mml:mtext></mml:msubsup><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mfenced open="/" close=""><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>
          This metric <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> is used to automate the definition of plausible
ranges for taxon optima and tolerance. The approach we take is analogous to
precalibrating the parameters of a physical model (Edwards et al., 2011). We
only attempt to rule out implausible parameter values and we ascribe equal
probabilities to all plausible parameter combinations. The precalibration
only seeks to eliminate SRCs with a probability close to zero, and is
thereby designed to avoid the trap of over-constraining the posterior by
using the training-set data twice.</p>
      <p>We consider 10 possible values for the taxon optima, and 4 possible
values for each of the 4 other parameters, giving a total of 2560 SRC
combinations. The priors for the five SRC parameters are uniform within
specified ranges:
<list list-type="order"><list-item>
      <p>Optima <inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are allowed to take values in the range <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>-</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mfenced></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M76" display="inline"><mml:mrow><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mo>max⁡</mml:mo></mml:msub><mml:mo>+</mml:mo><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mfenced></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mo>max⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mo>min⁡</mml:mo></mml:msub></mml:mrow></mml:math></inline-formula> are
the extremes of the training-set environments. We allow optima beyond the
sampled environment range, but optima beyond this extended range are
considered unlikely. These ranges are approximately consistent with the
subjective priors used in Holden et al. (2008), being training set <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula> pH units (<inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mn>0.63</mml:mn></mml:mrow></mml:math></inline-formula> pH units), and Matthews-Bird et al. (2016), being training set <inline-formula><mml:math id="M81" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula>5 <inline-formula><mml:math id="M82" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C (<inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mn>3.0</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M84" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C).</p></list-item><list-item>
      <p>Tolerances <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are allowed to take values in the range <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mrow><mml:mfenced open="/" close=""><mml:mphantom style="vphantom"><mml:mpadded width="0pt" style="vphantom"><mml:mn mathvariant="normal">2</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mn mathvariant="normal">3</mml:mn></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>. Tolerances at the upper limit are presumably unlikely, and
in any event would only weakly constrain the reconstruction. Although
tolerances below the lower limit may be reasonable, this choice represents a
conservative constraint that, for instance, prevents unrealistically narrow
tolerances when a taxon is poorly sampled within a training set. These
ranges are consistent with the subjective choices in Holden et al. (2008) of
0.4 to 1.7 pH units, equivalent to 0.63<inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mn>2.7</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>, and
Matthews-Bird et al. (2016) of 2 to 10 <inline-formula><mml:math id="M90" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C, equivalent to 0.67<inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>
to <inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:mn>3.3</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item>
      <p>The probability of presence at the environmental optimum <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is allowed
to take values in the range <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mn>2.5</mml:mn><mml:msubsup><mml:mi>p</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, where
<inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> is the percentage of sites in which the taxon is present. A
value below <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> is clearly unlikely as the probability of presence
at the optimum must be at least as high as the probability of presence
across the training set. The upper limit is less straightforward to define;
this choice is discussed further below.</p></list-item><list-item>
      <p>The expected abundance (given presence) at the environmental optimum <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>
is allowed to take values in the range <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:mn>2.5</mml:mn><mml:msubsup><mml:mi>N</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>,
where <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> is the average count of all non-zero observations across
the training set. A value below <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> is clearly unlikely as the
expected abundance at the optimum must be at least as high as the average
abundance across the training set. Again, the choice of upper limit is
discussed below.</p></list-item><list-item>
      <p>The tolerance scaling <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (Eq. 11) is allowed to take values in the
range 0.2 to 1.0. Low values (<inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>≪</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are required to represent taxa
that need near-optimum conditions to flourish at high abundance, but can
survive even when conditions are far removed from this optimum. Values
<inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> can be ruled out because this would (nonsensically) describe a
taxon that can flourish at high abundance away from its optimum, but cannot
survive there. In Holden et al. (2008), <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> was allowed to take values
in the range 0.4 to 1.0. Here we reduce the lower limit to 0.2, for reasons
discussed below.</p></list-item></list></p>
      <p>While aspects of the motivation for ranges 3–5 are discussed above, other
aspects are not clearly defined by logical considerations, these being the
upper limits for <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and the lower limit for <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. These
ranges were selected empirically on the basis of application of the model to
three training sets, being chironomid-based temperature (Matthews-Bird et
al., 2016), diatom-based pH (Stevenson et al., 1991) and pollen-based
temperature (Bush et al., 2017). We define the posterior value for a
parameter to be the probability-weighted value of that parameter across the
2560 SRCs of the respective taxon. The posterior values of <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>,
and <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for each taxon were sorted from smallest to largest and are
plotted sequentially in Fig. 1, for each of the three training sets. The
horizontal grid lines represent the four values that each parameter is
allowed to take. (Note that although the SRC parameters can only take one of
four discrete values, the probability-weighted average can take any value
within the prior range). It is apparent that the full range of allowed
values is required to describe the responses of all taxa in all three
training sets (i.e. because the extremes of the probability-weighted
averages approach the extremes of the allowed parameter range). Furthermore
it is apparent that a wider range is not required, although there is some
suggestion that a slightly higher upper limit for <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> could have been
allowed for the SWAP data (evidenced by the fact that 7 of the 225
species take precisely the upper limit <inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>2.5</mml:mn><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:msup><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1"><caption><p>Probability-weighted SRC parameters <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (left-hand axis),
<inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (left-hand axis) and <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (right-hand axis) are plotted for all
taxa. Three training sets are considered: chironomid-based temperature
(Matthews-Bird et al., 2016), diatom-based pH (Stevenson et al., 1991), and
pollen-based temperature (Bush et al., 2017). The <inline-formula><mml:math id="M118" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axes represent the
distinct taxa in the training sets (59 chironomid taxa, 225 diatom taxa, and
553 pollen taxa). For each of the three SRC parameters, probability-weighted
values are derived for each taxon. These are ordered by increasing value and
are plotted sequentially. Horizontal grid lines represent the discrete values
allowed within individual SRCs.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017-f01.png"/>

        </fig>

      <p>We note that the indicative tolerance is also used to define the environmental range considered in the reconstruction (Sect. 2.2), from <inline-formula><mml:math id="M119" display="inline"><mml:mrow><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mo>min⁡</mml:mo></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mfenced><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mtext>to</mml:mtext><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mfenced open="(" close=")"><mml:msub><mml:mi>x</mml:mi><mml:mo>max⁡</mml:mo></mml:msub><mml:mo>+</mml:mo><mml:mn mathvariant="normal">6</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mfenced></mml:mrow></mml:math></inline-formula>.
Significant probabilities beyond this range are unlikely given the
constraints imposed upon the optima and tolerances. In any event, as with
any transfer function, the model should not be applied under suspected
extrapolation far beyond the training-set environment.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <title>Artificial training-set data</title>
      <p>The probabilistic framework of BUMPER is well suited to the generation of
artificial training sets. In part, our motivation for artificial
training sets is to investigate how the characteristics of the assemblage
affect the performance of the transfer function. Additionally, an important
motivation is to apply BUMPER to a selection of training sets with different
characteristics to demonstrate that the model can be applied to an arbitrary
training set without tuning or user modification. In all of the analyses
that follow (in both Sects. 3 and 4) the identical model is applied
(although we consider the sensitivity of the model to two important
assumptions). This model internally generates precalibrated priors from the
characteristic tolerance <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> and the environmental gradient as described
in Sect. 2.4.</p>
      <p>The artificial data are generated from the BUMPER probabilistic framework.
As a result of the generation procedure, the data are idealized and are
expected to over-state performance relative to a training set of real data
with otherwise similar characteristics. Consider for instance that by applying the probabilistic framework to generate the
training set, taxon responses are forcibly defined to be unimodal and with
optima that are independent of other environmental variables. Furthermore,
there are no misidentifications, nor any possibility of taphonomic error. However, it is worth noting that the artificial training-sites are
statistically independent; this eliminates potentially over-optimistic
performance statistics that can arise from pseudo-replication in real
spatial data that have similar assemblage composition because of geography
proximity, and not necessarily because they experience the same environment
(Telford and Birks, 2005).</p>
<sec id="Ch1.S3.SS1">
  <title>Generating artificial training sets</title>
      <p>We generate an artificial training set for an arbitrary environmental
variable, under the following assumptions.
<list list-type="custom"><list-item><label>i.</label>
      <p>The training-set sites have an observed environment that is randomly sampled
from a uniform distribution over some range <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. This choice
is arbitrary; we select <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>100</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>200</mml:mn></mml:mrow></mml:math></inline-formula>. The SRC parameters and
performance statistics that follow are expressed as a percentage of the
environmental gradient <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>100</mml:mn></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label>ii.</label>
      <p>The taxa have optima <inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> that are randomly sampled from a uniform
distribution in the range <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. Although BUMPER allows
optima beyond the sampled gradient, we here simply assume the range of taxon
optima is equal to the environmental gradient (<inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>,
<inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></list-item><list-item><label>iii.</label>
      <p>The taxa have tolerances <inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> that are randomly sampled from a uniform
distribution in the range <inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The characteristic tolerance
of a training set is likely to impact greatly upon the performance of the
transfer function because it drives the taxon-turnover rate across the
environmental gradient.</p></list-item><list-item><label>iv.</label>
      <p>The taxa have probabilities-of-presence-at-optimum <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> that are randomly
sampled from a modified exponential with scale parameter <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>,
defined for <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>≤</mml:mo><mml:mi>p</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>.<disp-formula id="Ch1.E15" content-type="numbered"><mml:math id="M137" display="block"><mml:mrow><mml:mi>f</mml:mi><mml:mfenced close=")" open="("><mml:mi>p</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mfenced><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mfenced close=")" open="("><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mfenced open="/" close=""><mml:mphantom style="vphantom"><mml:mpadded style="vphantom" width="0pt"><mml:mn mathvariant="normal">1</mml:mn><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mfenced><mml:mi>exp⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mo>-</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mfenced close="" open="/"><mml:mphantom style="vphantom"><mml:mpadded width="0pt" style="vphantom"><mml:mo>-</mml:mo><mml:mi>p</mml:mi><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mfenced></mml:mrow><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>exp⁡</mml:mi><mml:mfenced close=")" open="("><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mfenced close="" open="/"><mml:mphantom style="vphantom"><mml:mpadded width="0pt" style="vphantom"><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mpadded></mml:mphantom></mml:mfenced></mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mfenced></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula>The denominator adjusts the exponential so that the cumulative probability
is 1 at <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. The scale parameter <inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is used to tune the
assemblage for taxon richness.</p></list-item><list-item><label>v.</label>
      <p>The taxa's tolerance scaling <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (Eq. 11) are randomly sampled from a
uniform distribution in the range 0.2 to 1.0.</p></list-item></list></p>
      <p>The values of the abundance parameter <inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are drawn from a modified
exponential distribution (cf. Eq. 15) with scale parameter <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The value of <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is not an input because an arbitrary choice
will lead to an expected total number of counts (i.e. the sum of all taxon
counts within a site) that differs from 100 %. An appropriate value of
<inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is derived from a Monte Carlo approach; the expected total
count is derived from an exploratory 10 000-site training set with <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math></inline-formula> % and then <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is scaled appropriately and the scaled
value applied to generate the actual training set. BUMPER does not model a
multinomial probability distribution so that percentage counts generated
under this approach are not constrained to sum precisely to 100 %, even
when the appropriate value of <inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is chosen. A pragmatic approach
is taken to address this issue by randomly drawing assemblages, accepting
only those with a total count of between 90 and 110 % until the
required number of sites has been generated. In accepted sites the counts
are scaled to total 100 %.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <title>Generating transfer functions from the artificial data</title>
      <p>In this section, we cross-validate transfer functions derived from
artificial training sets with different characteristics to investigate how
these characteristics affect the reconstruction performance. The
characteristics we consider are assemblage richness and assemblage
tolerance. We define the characteristic assemblage richness as the average
number of taxa found in the training-set sites. This is varied in the
artificial ensemble through the exponential scale parameter <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. A
low value of <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> ascribes low probabilities of presence, even under
optimal conditions, and leads to taxon-poor assemblages. We define the
characteristic assemblage tolerance to be the indicative tolerance <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> (Sect. 2.4). This is adjusted through the prior range for
tolerance, <inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. For each assemblage type considered,
training sets are generated from 25, 100, 250, and 1000 sites, and are
comprised of the same 100 randomly generated artificial taxa.</p>
      <p>We consider three assemblage types.
<list list-type="order"><list-item>
      <p>Low richness, high tolerance; <inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>15</mml:mn></mml:mrow></mml:math></inline-formula> %,
<inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>25</mml:mn></mml:mrow></mml:math></inline-formula> %, (solved <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>40</mml:mn></mml:mrow></mml:math></inline-formula> %). The assemblage richness that
results from these assumptions is 5.7, i.e. the training-set sites contain
on average 5.7 taxa. Tolerances are typically 20 % of the environmental
gradient.</p></list-item><list-item>
      <p>High richness, high tolerance; <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>50</mml:mn></mml:mrow></mml:math></inline-formula> %,  <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>15</mml:mn></mml:mrow></mml:math></inline-formula> %,
<inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>25</mml:mn></mml:mrow></mml:math></inline-formula> %, (solved <inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">9</mml:mn></mml:mrow></mml:math></inline-formula> %). The prior for tolerance is
unchanged from Eq. (1), but the expected probability of presence at the taxon
optima is increased, generating taxon-rich assemblages of average richness
18.3.</p></list-item><list-item>
      <p>Low richness, low tolerance; <inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>20</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>15</mml:mn></mml:mrow></mml:math></inline-formula> %,
(solved <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>30</mml:mn></mml:mrow></mml:math></inline-formula> %). The tolerance is halved relative to 1 and
the expected probability of presence at the taxon optima is approximately
doubled. These two changes together result in a taxon richness of 6.4,
similar to that of Eq. (1), but now the assemblage is comprised of taxa that are
more sensitive environmental indicators.</p></list-item></list></p>
<sec id="Ch1.S3.SS2.SSS1">
  <title>WAPLS performance: characterising the assemblages</title>
      <p>Our first objective is to investigate how the characteristics of a
training set affect transfer function performance. To achieve this objective
we apply the computationally fast weighted average partial least squares
(WAPLS) (ter Braak and Juggins, 1993) to large ensembles of training sets.
An ensemble approach is necessary because random variability can lead to
significantly different transfer function performance, especially for small
training sets. For simplicity of interpretation we consider only the first
PLS component, equivalent to standard two-way WA regression and calibration
with an inverse deshrinking (Birks et al., 1990, 2010). The WAPLS1 ensemble
results are plotted as the circles (with associated error bars) in Fig. 2.
These points are cross-validated (leave-one-out) RMSEP as a function of the
sampling density, defined as the average number of training-set observations
per taxon. Each plotted circle in Fig. 2 is derived from 10 000
training-set sites. The sites are randomly generated and assigned to a
collection of training sets of the desired size. For instance, to generate
an ensemble of 25-site training sets, the 10 000 sites are divided into 400
training sets. This enables us to produce robust statistics for each data
point, being the mean and standard deviation of the RMSEP, derived across an
ensemble of many training sets with similar characteristics.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2"><caption><p>Application to artificial data. Reconstruction performance plotted against the sampling density
(the average number of observations per species in the training set). Colour
indicates assemblage type: low richness and high tolerance (red), high
richness and high tolerance (blue) or low richness and low tolerance (green).
Circles with error bars are the mean and standard deviation of the WAPLS1
leave-one-out RMSEP calculated across an ensemble of artificial
training sets. From each artificial ensemble, a member is selected that has
a WAPLS1 error equal to the ensemble mean. BUMPER is applied to this
training set; crosses are the BUMPER leave-one-out RMSEP, solid lines are
the BUMPER uncertainty. In summary (see Sect. 3.2), this plot demonstrates the following: (1) reconstruction
errors of WAPLS1 (circles) and BUMPER (crosses) are similar for all
assemblages. (2) Increasing the sampling density of a training set reduces
both the reconstruction errors (circles and crosses) and the BUMPER
reconstruction uncertainty (solid lines). However, continued benefits beyond
a sampling density of <inline-formula><mml:math id="M165" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 10 are modest. (3) Reconstructions from
assemblages that benefit neither from high richness nor from low tolerances
(high taxon turnover), plotted in red, are associated with significantly
greater error and reconstruction uncertainty. We note that the overstatement
of BUMPER uncertainty relative to the reconstruction error (solid lines
compared to crosses) is expected for this application to idealized data (see
Sect. 3.2.3).</p></caption>
            <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017-f02.png"/>

          </fig>

      <p>Unsurprisingly the low richness, high tolerance training sets (assemblage
type 1, Sect. 3.2) perform least well. As reconstructions are derived from
an average of only six taxa, they are not well constrained and are sensitive to
statistical outliers. As the taxa are typically characterized by broad
tolerance, they are relatively insensitive environmental indicators. In this
assemblage type the cross-validated RMSEP decreases from 16.8 <inline-formula><mml:math id="M166" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 3.3 to
10.0 <inline-formula><mml:math id="M167" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.2 as the training-set size increases from 25 to 1000 sites
(sampling density increases from 1.4 to 57). The improvement in performance
with training-set size is expected as the WA optima become more accurately
defined. However, there is only a modest improvement in performance as the
training-set size increases from 250 to 1000 (sampling density increasing
from 14 to 57). These diminishing returns are seen across all three of the
assemblage types, and together suggest that a requirement for an average of
at least 10 observations per taxon is a useful indicator for a well-characterized training set, although we note that idealized data are
presumably easier to characterise than real (noisier) data.</p>
      <p>An increase in
assemblage richness (by a factor of approximately 3, assemblage type 2)
or a decrease in assemblage tolerance (by a factor of 2, assemblage type
3), both approximately halve the WAPLS1 prediction error (RMSEP reductions
of between 37 and 52 %). We note that while these improvements cannot
in general be controlled (they are characteristic of the assemblage),
low-richness assemblages are likely to benefit especially from high sample
counts (high number of counts per training-set site), which tends to increase
the observed species richness.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <title>BUMPER performance and CPU demand</title>
      <p>We now consider the performance of BUMPER. The previous applications of
BUMPER (to real ecological data) made two pragmatic assumptions. First,
<inline-formula><mml:math id="M168" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula> (Eq. 6), blending the abundance likelihood function <inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>
(Eq. 4) with the presence–absence likelihood function <inline-formula><mml:math id="M170" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (Eq. 5) so that the wide tails contributed by <inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> allow for the possibility
of outlying taxon counts. Second, only including taxa that have abundances
<inline-formula><mml:math id="M172" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 2 % when performing a reconstruction; while very low counts
do contribute information, they are generally less informative than high
counts, they increase computational load, and they are likely to be less
robust. Neither of these conservative assumptions is necessary for
application to the idealized data considered here. However, we choose to
retain them for this analysis in order to avoid the risk of overstating the
performance statistics (relative to WAPLS) that are likely to be possible in
practice.</p>
      <p>The crosses in Fig. 2 plot the leave-one-out RMSEP of BUMPER. For each of
the 12 scenarios we select 1 training set, being the first of the
randomly generated ensemble members that exhibits a WAPLS1 RMSEP within 0.1
of the ensemble mean, and we use this training set to build a BUMPER model.
The performance of this BUMPER model is generally also similar to the
ensemble mean WAPLS1 performance. Similar BUMPER and WAPLS performance
statistics were previously found with the SWAP diatom-pH training set
(Holden et al., 2008) and tropical-Andean chironomids (Matthews-Bird et al.,
2016). We note that individual reconstructions (as distinct from the summary
statistics of a training set) can differ significantly between the two
approaches (Matthews-Bird et al., 2016), in particular because of the
increased sensitivity of the reconstruction to “low-count” taxa (that are
only ever observed in low abundance).</p>
      <p>The CPU demands of BUMPER are dominated by the calculation of the SRC
probabilities (i.e. applying Eq. 1 across all training-set sites for each
taxon). This takes 0.022 s per taxon on a standard personal computer
(2.5 GHz Intel Core i5 processor, 4 GB 1600 MHz DDR3 memory) for a 100-site
training set. The CPU demand scales approximately linearly with number of
sites. To illustrate, deriving the leave-one-out cross-validated performance
of a training set of 500 sites with an average taxon richness of 10, would
require on average <inline-formula><mml:math id="M173" display="inline"><mml:mrow><mml:mo>∼</mml:mo><mml:mn>4.99</mml:mn><mml:mo>×</mml:mo><mml:mn>10</mml:mn><mml:mo>×</mml:mo><mml:mn>0.022</mml:mn><mml:mo>=</mml:mo><mml:mn>1.1</mml:mn></mml:mrow></mml:math></inline-formula> s per site
(i.e. to derive leave-one-out SRC probabilities for 10 taxa from 499
training-set sites), thereby taking <inline-formula><mml:math id="M174" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 9 min to
cross-validate the entire training set.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS3">
  <title>BUMPER uncertainty</title>
      <p>The solid lines in Fig. 2 plot the BUMPER uncertainty metric <inline-formula><mml:math id="M175" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>
(Eq. 9) from the reconstructions described in Sect. 3.2.2. Although
the uncertainties broadly reflect the trends in the reconstruction error,
uncertainty is significantly overstated, being 151 <inline-formula><mml:math id="M176" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 21 % of the
RMSEP across the 12 training sets. This overstatement is expected for
idealized data because of the blended presence–absence likelihood function
(<inline-formula><mml:math id="M177" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>, Eq. 6). This intentionally broadens the likelihood
functions to prevent an outlier, such as a taxon misidentification, from
ruling out the correct solution. While this assumption has proved useful for
real data, it is unnecessarily conservative for idealized data and is
therefore expected to overstate uncertainty. Additional BUMPER models (not
illustrated) were built for each training set with <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.0</mml:mn></mml:mrow></mml:math></inline-formula> and including
all taxa present. These models exhibited improved RMSEP relative to the
default, reduced by typically 10 %. Moreover, these models produced
posteriors that better reflect the reconstruction error, with an average
uncertainty of 112 <inline-formula><mml:math id="M179" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 17 % RMSEP, demonstrating that the uncertainty
is meaningfully quantified when the likelihood function is not artificially
broadened. (Nevertheless, we retain the broadened likelihood functions for
applications to real ecological data.)</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S4">
  <title>Applications to real data</title>
      <p>We consider three training sets of real data, using three different
biological proxies and having different assemblage characteristics,
summarized in Table 1. The tropical chironomid (mean annual temperature)
dataset of Matthews-Bird et al. (2016) is a 59-site taxon-poor training set
comprised largely of taxa with narrow environmental tolerances (expressed as
a percentage of the training-set environmental gradient). The SWAP diatom
(pH) dataset (Stevenson et al., 1991) is a 167-site taxon-rich training set,
significantly larger than the Matthews-Bird dataset, but with broader
characteristic tolerances. The NIMBIOS tropical pollen (mean annual
temperature) dataset (Bush et al., 2017) is a 682-site species-rich
training set that is currently under development and characterized by narrow
tolerances. These pollen data are not (in their current state) expected to
have the high quality of SWAP, because some taxa are identified to family
level whereas other taxa are identified to genus level.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1"><caption><p>Training-set characteristics.</p></caption><oasis:table frame="topbot"><?xmltex \begin{scaleboxenv}{.85}[.85]?><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">Chironomids</oasis:entry>  
         <oasis:entry colname="col3">Diatoms</oasis:entry>  
         <oasis:entry colname="col4">Pollen</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">Environmental gradient</oasis:entry>  
         <oasis:entry colname="col2">25.0 <inline-formula><mml:math id="M180" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C</oasis:entry>  
         <oasis:entry colname="col3">2.92 pH units</oasis:entry>  
         <oasis:entry colname="col4">24.9 <inline-formula><mml:math id="M181" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Number of sites</oasis:entry>  
         <oasis:entry colname="col2">59</oasis:entry>  
         <oasis:entry colname="col3">167</oasis:entry>  
         <oasis:entry colname="col4">682</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Number of taxa</oasis:entry>  
         <oasis:entry colname="col2">59</oasis:entry>  
         <oasis:entry colname="col3">225</oasis:entry>  
         <oasis:entry colname="col4">553</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Average richness</oasis:entry>  
         <oasis:entry colname="col2">7.0</oasis:entry>  
         <oasis:entry colname="col3">50</oasis:entry>  
         <oasis:entry colname="col4">27</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Indicative tolerance</oasis:entry>  
         <oasis:entry colname="col2">3.2 <inline-formula><mml:math id="M182" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/13 %</oasis:entry>  
         <oasis:entry colname="col3">0.63/22 %</oasis:entry>  
         <oasis:entry colname="col4">3.0 <inline-formula><mml:math id="M183" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/12 %</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Sampling density</oasis:entry>  
         <oasis:entry colname="col2">7.0</oasis:entry>  
         <oasis:entry colname="col3">37</oasis:entry>  
         <oasis:entry colname="col4">33</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup><?xmltex \end{scaleboxenv}?></oasis:table></table-wrap>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2" specific-use="star"><caption><p>Four BUMPER models. For each of the three training sets
(Matthews-Bird et al., 2016; Stevenson et al., 1991; Bush et al., 2017) four
models are considered. We use either the standard model (<inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> or the
presence–absence model (<inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>1.0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and reconstruct with either all taxa
(<inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 0 %), or only including taxa with more than 2 % abundance
(<inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 2 %).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">(1) <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 0 %, <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">Chironomids</oasis:entry>  
         <oasis:entry colname="col3">Diatoms</oasis:entry>  
         <oasis:entry colname="col4">Pollen</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">RMSEP</oasis:entry>  
         <oasis:entry colname="col2">2.46 <inline-formula><mml:math id="M190" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/9.8 %</oasis:entry>  
         <oasis:entry colname="col3">0.321 pH/11.0 %</oasis:entry>  
         <oasis:entry colname="col4">2.51 <inline-formula><mml:math id="M191" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/10.1 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Posterior width</oasis:entry>  
         <oasis:entry colname="col2">2.74 <inline-formula><mml:math id="M192" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/11.0 %</oasis:entry>  
         <oasis:entry colname="col3">0.183 pH/6.2 %</oasis:entry>  
         <oasis:entry colname="col4">1.64 <inline-formula><mml:math id="M193" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/6.6 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">(2) <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 0 %, <inline-formula><mml:math id="M195" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>1.0</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">Chironomids</oasis:entry>  
         <oasis:entry colname="col3">Diatoms</oasis:entry>  
         <oasis:entry colname="col4">Pollen</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">RMSEP</oasis:entry>  
         <oasis:entry colname="col2">2.40 <inline-formula><mml:math id="M196" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/9.6 %</oasis:entry>  
         <oasis:entry colname="col3">0.357 pH/12.2 %</oasis:entry>  
         <oasis:entry colname="col4">2.73 <inline-formula><mml:math id="M197" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/11.0 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Posterior width</oasis:entry>  
         <oasis:entry colname="col2">2.95 <inline-formula><mml:math id="M198" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/11.8 %</oasis:entry>  
         <oasis:entry colname="col3">0.204 pH/7.0 %</oasis:entry>  
         <oasis:entry colname="col4">1.73 <inline-formula><mml:math id="M199" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/6.9 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">(3) <inline-formula><mml:math id="M200" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 2 %, <inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">Chironomids</oasis:entry>  
         <oasis:entry colname="col3">Diatoms</oasis:entry>  
         <oasis:entry colname="col4">Pollen</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">RMSEP</oasis:entry>  
         <oasis:entry colname="col2">2.41 <inline-formula><mml:math id="M202" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/9.7 %</oasis:entry>  
         <oasis:entry colname="col3">0.369 pH/12.6 %</oasis:entry>  
         <oasis:entry colname="col4">2.88 <inline-formula><mml:math id="M203" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/11.6 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">Posterior width</oasis:entry>  
         <oasis:entry colname="col2">3.01 <inline-formula><mml:math id="M204" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/12.0 %</oasis:entry>  
         <oasis:entry colname="col3">0.330 pH/11.3 %</oasis:entry>  
         <oasis:entry colname="col4">2.79 <inline-formula><mml:math id="M205" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/11.2 %</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1">(4) <inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 2 %, <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>1.0</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>  
         <oasis:entry colname="col2">Chironomids</oasis:entry>  
         <oasis:entry colname="col3">Diatoms</oasis:entry>  
         <oasis:entry colname="col4">Pollen</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">RMSEP</oasis:entry>  
         <oasis:entry colname="col2">2.27 <inline-formula><mml:math id="M208" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/9.1 %</oasis:entry>  
         <oasis:entry colname="col3">0.377 pH/12.9 %</oasis:entry>  
         <oasis:entry colname="col4">3.04 <inline-formula><mml:math id="M209" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/12.2 %</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">Posterior width</oasis:entry>  
         <oasis:entry colname="col2">3.33 <inline-formula><mml:math id="M210" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/13.3 %</oasis:entry>  
         <oasis:entry colname="col3">0.473 pH/16.1 %</oasis:entry>  
         <oasis:entry colname="col4">3.77 <inline-formula><mml:math id="M211" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C/15.1 %</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p>We consider four alternative BUMPER models. First, we test the requirement
for a 2 % abundance threshold (Sect. 3.2.2) by generating
reconstructions with and without this constraint. Removing this constraint
is expected to decrease the uncertainty of the reconstruction (all taxa
present are included, thereby narrowing the posterior). We wish to test
whether this decreased uncertainty is associated with a reduction in
reconstruction error when applied to real data. Second, we consider
sensitivity to the form of the likelihood function. The default model
applies <inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula> (Sect. 2.2). However, in Holden et al. (2008), RMSEP
was found to be only weakly dependent upon this parameter so that
presence–absence data alone (<inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>1.0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> contain sufficient information to
derive a useful predictive model. Compared with the full model, the
presence–absence model is expected to increase robustness at the expense of
increased uncertainty, with a broader posterior. There may be situations
where this conservative approach is preferable, for instance when SRCs are
poorly constrained in small training sets, or when misclassification is a
concern as the presence–absence model is less sensitive to outliers. The
cross-validated performances of the four models (<inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 2  or 0 %,
<inline-formula><mml:math id="M215" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula> or 1.0) are summarized for each of the three training sets in
Table 2, and are plotted for two models (<inline-formula><mml:math id="M216" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula> or 1.0, <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula> %) in Fig. 3. Importantly, Fig. 3 illustrates the
reconstruction-specific uncertainty <inline-formula><mml:math id="M218" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula> (<inline-formula><mml:math id="M219" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula>2<inline-formula><mml:math id="M220" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula> is plotted),
which in general differs significantly from the training-set RMSEP that is
usually assumed to describe the uncertainty of WAPLS approaches.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><caption><p>Cross-validated reconstructions (Eq. 8) together with associated
uncertainty <inline-formula><mml:math id="M221" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula>2<inline-formula><mml:math id="M222" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula> (Eq. 9) are plotted against observed training-set
environments for each of the three training sets (Matthews-Bird et al., 2016;
Stevenson et al., 1991; Bush et al., 2017). We plot the default model
(<inline-formula><mml:math id="M223" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and the presence–absence model (<inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>/</mml:mo><mml:mi>a</mml:mi></mml:mrow></mml:math></inline-formula>) (<inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>1.0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. All
analyses impose the constraint that 2 % abundance is required for
inclusion.</p></caption>
        <?xmltex \igopts{width=412.564961pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017-f03.png"/>

      </fig>

<sec id="Ch1.S4.SS1">
  <title>Chironomids (Matthews-Bird et al., 2016)</title>
      <p>The chironomid training set of Matthews-Bird et al. (2016) comprises 59
lakes across tropical South America. The prediction uncertainty associated
with this dataset (WAPLS RMSEP 2.4 <inline-formula><mml:math id="M226" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C) is approximately twice the
uncertainty that has been achieved for European chironomid transfer
functions, which typically approach <inline-formula><mml:math id="M227" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 1 <inline-formula><mml:math id="M228" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C (e.g.
Brooks et al., 2012). It has been postulated that this reduced performance
is caused by the small size of the training set and uneven sampling across
the environmental gradient (Matthews-Bird et al., 2016).</p>
      <p>The default model (<inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 2 %, <inline-formula><mml:math id="M230" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> exhibits an RMSEP of
2.41 <inline-formula><mml:math id="M231" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C (Fig. 3a), and 97 % of cross-validated reconstructions lie
within <inline-formula><mml:math id="M232" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula>2<inline-formula><mml:math id="M233" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula> of the observed temperature. However, the best model
(RMSEP 2.27 <inline-formula><mml:math id="M234" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C) is the presence–absence model that requires 2 %
abundance to include a species in a reconstruction (Fig. 3b). This model
version is the most conservative and might therefore be expected to be the
least well performing. However, the training set is small, so the SRCs are
not expected to be well defined, and furthermore the number of counts per
site is relatively low. These data may therefore favour a more conservative
model that does not over-constrain the reconstructions.</p>
      <p>When all taxa are included, the reconstruction uncertainty is reduced as
expected, from 12.0  to 11.0 % in the default model and from 13.3
to 11.8 % in the presence–absence model. However, this is not accompanied
by a reduction in RMSEP. The RMSEP increases slightly from 9.7  to
9.8 % in the default model and from 9.1  to 9.6 % in the
presence–absence model. These data support our choice to retain the 2 %
abundance threshold in the default model. Including very low abundance
counts increases the computational demand, does not in general improve model
performance, and is likely to understate the uncertainty associated with
the reconstruction. We note for completeness that the default BUMPER model
was also applied in Matthews-Bird et al. (2016) and has a RMSEP of
2.37 <inline-formula><mml:math id="M235" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C. The slight difference in performance here results from
the different SRC priors used (see Sect. 2.4).</p>
      <p>The Matthews-Bird training set can be broadly classified as being low
taxon richness (7.0), narrow tolerance (12 %), and relatively poorly
sampled (sampling density 7.0). We generated an ensemble of artificial
datasets with approximately those characteristics (59 lakes, 59 taxa, <inline-formula><mml:math id="M236" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>40</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M237" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M238" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo></mml:mrow></mml:math></inline-formula> 18 %, yielding an ensemble-averaged
species richness of 6.7). The WAPLS1 transfer functions built from these
data exhibits a mean RMSEP of 6.3 <inline-formula><mml:math id="M239" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.6 % of the environmental
gradient. Although a direct comparison is difficult, given that an
improvement in performance is expected under idealized assumptions, these
statistics are broadly consistent with those of the real transfer functions
(ranging from 9.1 to 9.8 %).</p>
      <p>We consider the training-set characteristics in order to investigate whether
they might explain the larger uncertainties of the Matthews-Bird
transfer function when compared with European chironomid transfer functions.
First, we increase the idealized-analogue training-set size from 59 lakes to
118 lakes, thereby better defining the taxon characteristics. This improves
the WAPLS1 RMSEP from 6.3 <inline-formula><mml:math id="M240" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.6 % to 5.9 <inline-formula><mml:math id="M241" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.4 %. Second, we
increase the taxon richness from 6.7 to 13.5 by doubling the number of
species to 118. This improves the WAPLS1 performance to 5.1 <inline-formula><mml:math id="M242" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.5 %.
Although both factors are likely to contribute to explaining the reduced
uncertainty of the European transfer functions, in isolation they do not
appear sufficient. We consider a specific example. The Norwegian-chironomid
training set of Brooks et al. (2012) comprises 140 taxa sampled from 157
lakes, and was used to construct a WAPLS2 transfer function for July
temperature with RMSEP of 1.06 <inline-formula><mml:math id="M243" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C when four outliers are deleted
(Brooks et al., 2012). We note that the Matthews-Bird training set
reconstructs the mean annual temperature, not the July temperature. The Brooks and Birks training set
has an average richness of 24 and an indicative tolerance of 16 %. An
idealized dataset was built with these characteristics (159 lakes, 140 taxa,
<inline-formula><mml:math id="M244" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>55</mml:mn></mml:mrow></mml:math></inline-formula> %,  <inline-formula><mml:math id="M245" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>11</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M246" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>21</mml:mn></mml:mrow></mml:math></inline-formula> %, yielding an ensemble-averaged taxon richness of 22) and was found to exhibit a WAPLS1 RMSEP of
5.1 <inline-formula><mml:math id="M247" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.3 % (<inline-formula><mml:math id="M248" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.64 <inline-formula><mml:math id="M249" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C). When expressed as a percentage of
the environmental gradient, modest improvements in performance are apparent,
consistent with that expected from increased training-set size and species
richness. However, most of the improvement in predictive power is due to
tolerance, but expressed in <italic>absolute</italic> terms. The Matthews-Bird
indicative tolerance is 3.2 <inline-formula><mml:math id="M250" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C. The Brooks and Birks indicative
tolerance is 2.1 <inline-formula><mml:math id="M251" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C. These factors together (increased
training-set size, increased taxon richness and, most significantly,
narrower <italic>absolute</italic> tolerances) appear sufficient to explain the
improved performances of the Brooks and Birks transfer function over the
Matthews-Bird transfer function.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <title>SWAP (Stevenson et al., 1991)</title>
      <p>The SWAP training set (Stevenson et al., 1991) was developed as part of a
substantial scientific effort directed at understanding the impacts of acid
rain. Taxonomic workshops resolved problems of nomenclature, splitting and
amalgamation of species, and identification criteria (Munro et al., 1990).
Approximately 500 counts per sample were made. The training set was
statistically pruned (Birks et al., 1990) to leave a high-quality dataset of
267 taxa in 167 sites. The best WAPLS component has RMSEP of 0.310 pH units
(Holden et al., 2008).</p>
      <p>The best-performing BUMPER model for SWAP is the model that includes all
taxa, with no threshold abundance. This model has an RMSEP <inline-formula><mml:math id="M252" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.321 pH
units. The improvement in performance relative to the default BUMPER
(<inline-formula><mml:math id="M253" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 2 %) threshold (RMSEP <inline-formula><mml:math id="M254" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.369 pH units, Fig. 3c) may reflect the
high quality of the SWAP training set, so that even very low counts are
unlikely to be erroneous, and therefore add value to the reconstruction.
However, a weakness of the <inline-formula><mml:math id="M255" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 0 % model is that the posterior widths
substantially understate the uncertainty; the low-count taxa narrow the
posterior widths by more than is justified, given the modest improvement in
performance. Using the default model (<inline-formula><mml:math id="M256" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo></mml:mrow></mml:math></inline-formula> 2 %, <inline-formula><mml:math id="M257" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, 92 %
of cross-validated reconstructions lie within <inline-formula><mml:math id="M258" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="normal">Δ</mml:mi></mml:mrow></mml:math></inline-formula> of the measured
pH. We note that the performance of the subjectively tuned model in Holden
et al. (2008) was RMSEP <inline-formula><mml:math id="M259" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.328 pH units.</p>
      <p>The SWAP training set can be broadly classified as having high taxon richness
(50) and broad tolerance (22 %) and being well sampled (sampling density 37). It is
interesting that the RMSEP of this training set is similar to the
Matthews-Bird set, both exhibiting an RMSEP of approximately 10 % of the
environmental gradient. The improvements due to the high quality, taxon
richness, and sampling density of SWAP are offset by the loss of precision
that is achievable due to broader tolerances.</p>
      <p>We generate an artificial training set to mimic the characteristics of SWAP
(167 lakes, 225 taxa, <inline-formula><mml:math id="M260" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>40</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M261" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>17</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>27</mml:mn></mml:mrow></mml:math></inline-formula> %,
yielding a characteristic taxon richness of 48). The performance statistics
of a model derived from these data 5.1 <inline-formula><mml:math id="M263" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.3 % are again broadly
consistent with, though significantly better than, the model derived from
the real data (11.0 to 12.9 %). The difference between real and artificial
data is greater than in the chironomid comparison. Diatoms have been applied
as indicators of a wide range of environmental variables including pH,
climate, water chemistry, and eutrophication (Smol and Stoermer, 2015), and
it may be that the assumption of a single environmental driver of
diatom-assemblage variability is less well satisfied than it is with
chironomids.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <title>NIMBIOS (Bush et al., 2017)</title>
      <p>The NIMBIOS dataset (Bush et al., 2017) comprises 682 samples that range
from soil samples to mud–water interface samples from lakes. The dataset is
of variable quality as it represents a <inline-formula><mml:math id="M264" display="inline"><mml:mo>∼</mml:mo></mml:math></inline-formula> 30-year data
acquisition effort, in which time there have been substantial improvements in
the ability to accurately identify pollen because new pollen keys and
descriptions have become available (e.g. Roubik and Moreno, 1991; Bush and
Weng, 2007).</p>
      <p>The best BUMPER model for NIMBIOS is the full model including all taxa
(<inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:mi mathvariant="italic">η</mml:mi><mml:mo>=</mml:mo><mml:mn>0.5</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M266" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> %), exhibiting an RMSEP <inline-formula><mml:math id="M267" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula>
2.51 <inline-formula><mml:math id="M268" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C. This represents a significant improvement to the best
WAPLS model (2.92 <inline-formula><mml:math id="M269" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C, WAPLS component 2). As with SWAP, however, a
weakness of this model is that the posteriors substantially understate the
uncertainty. The default model also performs well (RMSEP <inline-formula><mml:math id="M270" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 2.88 <inline-formula><mml:math id="M271" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C) and exhibits posteriors that accurately reflect the uncertainty (Fig. 3e),
with 93 % of cross-validated reconstructions lying within <inline-formula><mml:math id="M272" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula>2<inline-formula><mml:math id="M273" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>
of the observed temperature.</p>
      <p>The NIMBIOS training set can be classified as having high taxon richness
(27) and low tolerance (12 %) and being well sampled (sampling density 33). The
training set benefits from optimal characteristics in all three respects. We
generate an artificial analogue of NIMBIOS (682 samples, 553 taxa, <inline-formula><mml:math id="M274" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mi mathvariant="normal">p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>15</mml:mn></mml:mrow></mml:math></inline-formula> %,  <inline-formula><mml:math id="M275" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">7</mml:mn></mml:mrow></mml:math></inline-formula> %, <inline-formula><mml:math id="M276" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>17</mml:mn></mml:mrow></mml:math></inline-formula> %, yielding a taxon richness of
30). The BUMPER model built from this analogue exhibits an RMSEP of
3.8 <inline-formula><mml:math id="M277" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> 0.1 %. This is significantly better than the artificial
analogues of the SWAP and Matthews-Bird sets, as would be expected given the
optimal characteristics of the training set. However, the performance
statistics in this case are substantially better than those of the model
built from the real data (RMSEP <inline-formula><mml:math id="M278" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 10.1 to 12.2 %); the transfer
function built from real data does not demonstrate the high performance that
might be expected. The discrepancy between the idealized and real data
appears to suggest that the quality of the NIMBIOS data is poorer than the
other training sets. A major reason for the apparent inaccuracy may be that,
unlike the other datasets, pollen is often not recognized to species
level<fn id="Ch1.Footn3"><p>We note that chironomids are identified to species
morphotype level, or genus in some cases.</p></fn> and so a pollen type could
contain numerous species that have different environment preferences, and
hence may well have a multi-modal distribution rather than the unimodal
distribution assumed by BUMPER.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <title>Summary and conclusions</title>
      <p>We have developed BUMPER, a user-friendly implementation of a Bayesian
transfer function for palaeo-environmental reconstruction. BUMPER is fully
self-tuning, applying a precalibration approach to eliminate implausible
values for five model parameters before performing the full probabilistic
calibration. The model is straightforward to use, and the only requirement
for application to a new training set is to construct the input data files
(as described in Appendix A).</p>
      <p>We have applied BUMPER to generate transfer functions for chironomid-based
temperature (Matthews-Bird et al., 2016), diatom-based pH (Stevenson et al.,
1991), and pollen-based temperature (Bush et al., 2017). In each case,
cross-validated performance statistics are comparable to the best WAPLS
component. The motivation for the model is that it generates a
reconstruction-specific uncertainty that, for instance, renders it well
suited to multi-proxy reconstructions.</p>
      <p>Although the model calibration is performed using all observations,
including taxon absences, for reconstruction purposes we only consider taxa
found at abundances greater than 2 %. When taxa with very low counts are
included in a reconstruction, the additional narrowing of the posterior is
found to exceed the reduction in reconstruction error (with the
Matthews-Bird training set it even leads to slightly increased error),
suggesting that the information provided by very low counts is less
reliable. We retain the 2 % threshold as we regard the reliable
quantification of uncertainty to be more important than the modest
improvements in prediction accuracy than can be achieved from including very
low taxon counts.</p>
      <p>The performance statistics of the three datasets are quite similar, with
each displaying a cross-validated error of prediction that is approximately
10 % of the environmental gradient. However, artificial training sets with
similar characteristics to the real data revealed that this similarity is
largely coincidental and arises for different reasons. The chironomid data
are hampered by taxon-poor assemblages, but benefit from narrow-tolerance
taxa. Conversely, the diatom data are composed of species with broader
tolerances (expressed as percentage of environmental gradient) but the
assemblages are substantially richer and taxa are well characterized by the
large dataset. The pollen data are large, taxon-rich, and characterized by
narrow tolerances. The pollen data were expected to significantly outperform
the other training sets; however, this is not found to be the case. The
likely explanation is that pollen, unlike the other organism types
considered, can be difficult to identify to a low taxonomic (e.g. species)
level.
<?xmltex \hack{\newpage}?></p>
</sec>
<sec id="Ch1.S6">
  <title>Code availability</title>
      <p>The source code BUMPER1p0.F90, example input data files (Matthews-Bird et al.,
2016), and an output visualizer BUMPERpp.R are all provided in the Supplement. Instructions are provided in the Appendices.</p><?xmltex \hack{\clearpage}?>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <title>USER MANUAL</title>
      <p>The authors are happy to provide whatever assistance may be needed to run
BUMPER. The model is written in FORTRAN 90, included in the Supplement
as BUMPER1P0.F90. The open-source GNU Fortran compiler is freely
available for MacOS, Linux, and Windows platforms. Once a compiler has been
installed, the model is compiled to create an executable BUMPER with the
following (linux) command.<?xmltex \hack{\newline}?></p>
      <p>$ gfortran BUMPER1P0.f90 –o BUMPER<?xmltex \hack{\newline}?></p>
      <p>Before running BUMPER, four input data files must be produced and saved in
the same directory as the executable file. These files are NAME.lakes,
NAME.species, NAME.ts.counts and NAME.core.counts, where “NAME” should be
set to a string that identifies the dataset. The Matthews-Bird data have
been uploaded to Supplement as an example “MB16”.</p>
      <p>MB16.lakes: the first line of this file is the number of training-set lakes
and is followed by a list of site names together with the environmental
variable. All string inputs are limited to 20 characters. Do not include
spaces (use an underscore) in character strings. Avoid characters with
functionality in R, such as “/”, as these will upset the BUMPERpp plotting
software (Appendix B).</p>
      <p>MB16.species: the first line is the number of species in the training set,
and this is followed by a list of species names.</p>
      <p>MB16.ts.counts: a file providing the percentage counts of all species in all
training-set sites. The file consists of a row for each lake, each row
containing a list of the species counts. It is very important that the rows
are ordered identically as they are listed in NAME.lakes and the columns are
ordered identically as they are in NAME.species.</p>
      <p>MB16.core.counts has the same format as MB16.ts.counts, except the first
line is the number of core samples and the first column of the subsequent
data is the name of the sample (e.g. depth or age).</p>
      <p>The model, can now be run, using the following command.<?xmltex \hack{\newline}?></p>
      <p>$ ./BUMPER<?xmltex \hack{\newline}?></p>
      <p>The first things that the model will ask for is the name of the
training set, in our example MB16. You will then be asked a series of
questions:
<list list-type="order"><list-item>
      <p>whether to calculate the apparent RMSE (i.e. the fitted model error);</p></list-item><list-item>
      <p>whether to calculate the jackknifed (leave-one-out) RMSEP;</p></list-item><list-item>
      <p>whether to reconstruct the core;</p></list-item><list-item>
      <p>whether to build a presence–absence model (0<inline-formula><mml:math id="M279" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mo>=</mml:mo></mml:mrow></mml:math></inline-formula>NO<inline-formula><mml:math id="M280" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula>preferred
default);</p></list-item><list-item>
      <p>whether to apply the 2 % abundance threshold (1<inline-formula><mml:math id="M281" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mo>=</mml:mo></mml:mrow></mml:math></inline-formula>YES<inline-formula><mml:math id="M282" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula>preferred
default).</p></list-item></list>
During the input phase, the model will provide the sampling density, mean
richness, and indicative tolerance as outputs to the screen. After the input
phase, the requested calculations will be performed and various summary
outputs will be provided to the screen as the calculations progress. Output
files will be created, depending upon the options chosen. These comprise the
following:
<list list-type="order"><list-item>
      <p>output.data: summary performance statistics, together with individual
site reconstructed values and uncertainties;</p></list-item><list-item>
      <p>ts_likelihood_app.data: likelihood
functions for all species and all training-set sites from the fitted model;</p></list-item><list-item>
      <p>ts_likelihood_core.data: jackknifed
likelihood functions for all species and all training-set sites (i.e.
cross-validated likelihood functions);</p></list-item><list-item>
      <p>core_likelihood.data: likelihood functions for all species
and all core samples;</p></list-item><list-item>
      <p>reconstruct_ts_app.data: the posteriors of
the fitted model;</p></list-item><list-item>
      <p>reconstruct_ts_jack.data: the
(cross-validated) posteriors of the jack-knifed models;</p></list-item><list-item>
      <p>reconstruct_core.data: the core sample posteriors.</p></list-item></list>
<?xmltex \hack{\newpage}?></p>
</app>

<app id="App1.Ch1.S2">
  <title>Data visualization with BUMPERpp</title>
      <p>The self-documented tool BUMPERpp.R is a post-processing graphical tool
written in R, included in the Supplement. The code reads the
likelihood data files and generates plots of the reconstructions.<?xmltex \hack{\newline}?></p>
      <p><?xmltex \hack{\noindent}?>Load the model in R with the following command.<?xmltex \hack{\newline}?></p>
      <p><inline-formula><mml:math id="M283" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> source(“BUMPERpp.R”)<?xmltex \hack{\newline}?></p>
      <p><?xmltex \hack{\noindent}?>Generate a plot with the following (illustrative) command. <?xmltex \hack{\newline}?></p>
      <p><inline-formula><mml:math id="M284" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> BUMPERpp(“ts_likelihood_
jack.data”,1)<?xmltex \hack{\newline}?></p>
      <p><?xmltex \hack{\noindent}?>This specifies the input file and sample number to be plotted. Figure B1
provides an illustrative plot. <?xmltex \hack{\newpage}?></p>

      <?xmltex \floatpos{th!}?><fig id="App1.Ch1.F1"><caption><p>Illustrative output of BUMPERpp.R reconstruction visualization
software.</p></caption>
        <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/10/483/2017/gmd-10-483-2017-f04.png"/>

      </fig>

<?xmltex \hack{\newpage}?><?xmltex \hack{\clearpage}?><supplementary-material position="anchor"><p><bold>The Supplement related to this article is available online at <inline-supplementary-material xlink:href="http://dx.doi.org/10.5194/gmd-10-483-2017-supplement" xlink:title="zip">doi:10.5194/gmd-10-483-2017-supplement</inline-supplementary-material>.</bold></p></supplementary-material>
</app>
  </app-group><notes notes-type="competinginterests">

      <p>The authors declare that they have no conflict of interest.</p>
  </notes><ack><title>Acknowledgements</title><p>We are grateful for the careful reviews of Cajo ter Braak and Andrew
Parnell, which greatly helped to clarify the mathematical description of the
model. Philip B. Holden was funded through the EPSRC project “Research on Changes of
Variability and Environmental Risk” (ReCoVER), award RFFER003. Mark B. Bush, Grace M. Hwang,
Bryan G. Valencia, and Robert van Woesik were funded through the Climate Proxies Working Group at the
National Institute for Mathematical and Biological Synthesis, sponsored by
the National Science Foundation through NSF Award no. DBI-1300426, with
additional support from the University of Tennessee, Knoxville. Frazer Matthews-Bird was
funded by the National Aeronautics and Space Administration (NASA)
NNX14AD31G. H. John B. Birks was supported by the Research Council of Norway (projects
NoAClim and IGNEX).<?xmltex \hack{\newline}?><?xmltex \hack{\newline}?>Edited by: C. Sierra
<?xmltex \hack{\newline}?>
Reviewed by: A. Parnell, C. ter Braak, and one anonymous referee</p></ack><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>
Birks, H. J. B. and Seppä, H.: Pollen-based reconstructions of
late-Quaternary climate in Europe – progress, problems and pitfalls, Acta
Palaeobot., 44, 317–334, 2004.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Birks, H. J. B., Line, J. M., Juggins, S., Stevenson, A. C., and ter Braak,
C. J. F.: Diatoms and pH reconstruction, Philos. T. Roy. Soc. B, 327,
263–278, 1990.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Birks, H. J. B., Heiri, O., Seppä, H., and Bjune, A. E.: Strengths and
weaknesses of quantitative climate reconstructions based on late-Quaternary
biological proxies, Open Ecol. J., 3, 68–110, 2010.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Brooks, S. J., Matthews, I. P., Birks, H. H., and Birks, H. J. B.: High resolution
Lateglacial and early-Holocene summer air temperature records from Scotland
inferred from chironomid assemblages, Quaternary Sci. Rev., 41, 67–82,
2012.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Bush, M. B. and Weng, C.: Introducing a new (freeware) tool for palynology,
J. Biogeogr., 34, 377–380, 2007.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Bush, M. B., et al.: in preparation, 2017.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Cahill, N., Kemp, A. C., Horton, B. P., and Parnell, A. C.: A Bayesian hierarchical model for reconstructing
relative sea level: from raw data to rates of change, Clim. Past, 12, 525–542, <ext-link xlink:href="http://dx.doi.org/10.5194/cp-12-525-2016" ext-link-type="DOI">10.5194/cp-12-525-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Edwards, N. R., Cameron, D., and Rougier, J.: Precalibrating an intermediate
complexity climate model, Clim. Dynam., 37, 1469–1482,
<ext-link xlink:href="http://dx.doi.org/10.1007/s00382-010-0921-0" ext-link-type="DOI">10.1007/s00382-010-0921-0</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Haslett, J., Whiley, M., Bhattacharya, S., Salter-Townshend, M., Wilson,
S. P., Allen, J. R. M., Huntley, B., and Mitchell, F. J. G.: Bayesian paleoclimate
reconstruction, J. Roy. Stat. Soc. A, 169, 395–438, 2006.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Heiri, O. and Lotter, A. F.: Effect of low count sums on quantitative
environmental reconstructions: an example using subfossil chironomids, J.
Paleolimnol., 26, 343–350, 2001.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Hill, M. O. and Gauch, H. G.: Detrended correspondence analysis: an improved
ordination technique, Vegetatio, 42, 47–58, 1980.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Holden P. B., Mackay A. W., and Simpson, G. L.: A Bayesian paleoenvironmental
transfer function model for acidified lakes, J. Paleolimnol., 39, 551–566,
<ext-link xlink:href="http://dx.doi.org/10.1007/s10933-007-9129-7" ext-link-type="DOI">10.1007/s10933-007-9129-7</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Huntley, B.: The use of climate response surfaces to reconstruct
paleoclimate from quaternary pollen and plan macrofossil data, Philos.
T. R.oy Soc.  B, 341, 215–223, 1993.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Huntley, B.: Reconstructing palaeoclimates from biological proxies: Some
often overlooked sources of uncertainty, Quaternary Sci. Rev., 31, 1–16, 2012.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Ilvonen, L., Holmström, L., Seppä, H., and Veski, S.: A Bayesian
multinomial regression model for palaeoclimate reconstruction with time
uncertainty, Environometrics, 27, 409–422, <ext-link xlink:href="http://dx.doi.org/10.1002/env.2393" ext-link-type="DOI">10.1002/env.2393</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Imbrie, J. and Kipp, N. G.: A new micropaleontological method for
quantitative paleoclimatology: application to a Late Pleistocene Caribbean
core, in:  The Late Cenozoic Glacial Ages, edited by: Turekian, K. K., Yale
University Press, New Haven,  77–181, 1971.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Juggins, S.: Quantitative reconstructions in palaeolimnology: new paradigm
or sick science?, Quaternary Sci. Rev., 64, 20–32,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.quascirev.2012.12.014" ext-link-type="DOI">10.1016/j.quascirev.2012.12.014</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Juggins, S. and Birks, H. J. B.: Quantitative environmental reconstructions
from biological data,  in: Tracking Environmental Change Using Lake
Sediments, Volume 5: Data Handling and Numerical Techniques, edited by: Birks, H. J. B., Lotter, A. F., Juggins, S., and
Smol, J. P., Springer,
Dordrecht, 431–494, 2012.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>Korhola, A., Vasko, K., Toivonen, H. T. T., and Olander, H.: Holocene
temperature changes in northern Fennoscandia reconstructed from chironomids
using Bayesian modeling, Quaternary Sci. Rev., 21, 1841–1860, 2002.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Matthews-Bird, F., Brooks, S. J., Holden, P. B., Montoya, E., and Gosling, W. D.: Inferring late-Holocene climate in the Ecuadorian Andes
using a chironomid-based temperature inference model, Clim. Past, 12, 1263–1280, <ext-link xlink:href="http://dx.doi.org/10.5194/cp-12-1263-2016" ext-link-type="DOI">10.5194/cp-12-1263-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>Munro, M. A. R., Kreiser, A. M., Battarbee, R. W., Juggins, S., Stevenson, A. C.,
Anderson, D. S., Anderson, N. J., Berge, F., Birks, H. J. B., Davis, R. B.,
Flower,
R. J., Fritz, S. C., Haworth, E. Y., Jones, V. J., Kingston, J. C., and Renberg, I.:
Diatom quality control and data handling, Philos. T. Roy. Soc. B,
327, 257–261, 1990.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Parnell, A. C., Sweeney, J., Doan, T. K., Salter-Townshend, M., Allen, J. R. M.,
Huntley, B., and Haslett, J.: Bayesian inference for palaeoclimate with time
uncertainty and stochastic volatility, J. Roy. Stat. Soc. C-App., 64, 115–138, 2015.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Parnell, A. C., Haslett, J., Sweeney, J., Doan, T. K., Allen, J. R. M., and
Huntley, B.: Joint palaeoclimate reconstruction from pollen data via forward
models and climate histories, Quaternary Sci. Rev., 151, 111–126, <ext-link xlink:href="http://dx.doi.org/10.1016/j.quascirev.2016.09.007" ext-link-type="DOI">10.1016/j.quascirev.2016.09.007</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>Roubik, D. W. and Moreno, P. J. E.: Pollen and spores of Barro Colorado Island, Monographs in Systematic Botany from the Missouri Botanical Garden, St. Louis, Missouri, 1991.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>Rymer, L.: The use of uniformitariansim and analogy in palaeoecology,
particularly pollen analysis, in:  Biology and
Quaternary environments, edited by:  Walker, D. and  Guppy, J. C.,  Australian Academy of Sciences, Canberra,
245–258, 1978.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Smol, J. P. and Stoermer, E. F.: The diatoms: applications for the
environmental and Earth sciences, Cambridge University Press, 686 pp.,  2015.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Stevenson, A. C., Juggins, S., Birks, H. J. B., Anderson, D. S., Anderson, N. J.,
Battarbee, R. W., Berge, F., Davis, R. B., Flower, R. J., Haworth, E. Y., Jones,
V. J., Kingston, V. J., Kreiser, A. M., Line, J. M., Munro, M. A. R., and Renberg, I.:
The Surface Waters Acidification Project Palaeolimnology Programme: modern
diatom/lake-water chemistry set, ENSIS, London, 86 pp., 1991.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation>Telford, R. J. and Birks, H. J. B.: The secret assumption of transfer
functions: problems with spatial autocorrelation in evaluating model
performance, Quaternary Sci. Rev., 24, 2173–2179,
<ext-link xlink:href="http://dx.doi.org/10.1016/j.quascirev.2005.05.001" ext-link-type="DOI">10.1016/j.quascirev.2005.05.001</ext-link>, 2005.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>ter Braak, C. J. F. and Barendregt, L. G.: Weighted averaging of species
indicator values: its efficiency in environmental calibration, Math.
Biosci., 78, 57–72, 1986.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>ter Braak, C. J. F. and Juggins, S.: Weighted averaging partial least squares
regression (WA-PLS): an improved method for reconstructing environmental
variables from species assemblages, Hydrobiologia, 269/270, 485–502, 1993.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>van Woesik, R.: Quantifying uncertainty and resilience on coral reefs using
a Bayesian approach, Environ. Res. Lett., 8, 044051, <ext-link xlink:href="http://dx.doi.org/10.1088/1748-9326/8/4/044051" ext-link-type="DOI">10.1088/1748-9326/8/4/044051</ext-link>,
2013.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation>Vasko, K., Toivonen, H. T. T., and Korhola, A.: A Bayesian multinomial response
model for organism-based environmental reconstruction, J. Paleolimnol., 24,
243–250, 2000.</mixed-citation></ref>

  </ref-list><app-group content-type="float"><app><title/>

    </app></app-group></back>
    <!--<article-title-html>BUMPER v1.0: a Bayesian user-friendly model for palaeo-environmental reconstruction</article-title-html>
<abstract-html><p class="p">We describe the Bayesian user-friendly model for
palaeo-environmental reconstruction (BUMPER), a Bayesian transfer function
for inferring past climate and other environmental variables from
microfossil assemblages. BUMPER is fully self-calibrating, straightforward
to apply, and computationally fast, requiring  ∼  2 s to
build a 100-taxon model from a 100-site training set on a standard personal
computer. We apply the model's probabilistic framework to generate thousands
of artificial training sets under ideal assumptions. We then use these to
demonstrate the sensitivity of reconstructions to the characteristics of the
training set, considering assemblage richness, taxon tolerances, and the
number of training sites. We find that a useful guideline for the size of a
training set is to provide, on average, at least 10 samples of each taxon.
We demonstrate general applicability to real data, considering three
different organism types (chironomids, diatoms, pollen) and different
reconstructed variables. An identically configured model is used in each
application, the only change being the input files that provide the
training-set environment and taxon-count data. The performance of BUMPER is
shown to be comparable with weighted average partial least squares (WAPLS)
in each case. Additional artificial datasets are constructed with similar
characteristics to the real data, and these are used to explore the reasons
for the differing performances of the different training sets.</p></abstract-html>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
Birks, H. J. B. and Seppä, H.: Pollen-based reconstructions of
late-Quaternary climate in Europe – progress, problems and pitfalls, Acta
Palaeobot., 44, 317–334, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>Birks, H. J. B., Line, J. M., Juggins, S., Stevenson, A. C., and ter Braak,
C. J. F.: Diatoms and pH reconstruction, Philos. T. Roy. Soc. B, 327,
263–278, 1990.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>Birks, H. J. B., Heiri, O., Seppä, H., and Bjune, A. E.: Strengths and
weaknesses of quantitative climate reconstructions based on late-Quaternary
biological proxies, Open Ecol. J., 3, 68–110, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>Brooks, S. J., Matthews, I. P., Birks, H. H., and Birks, H. J. B.: High resolution
Lateglacial and early-Holocene summer air temperature records from Scotland
inferred from chironomid assemblages, Quaternary Sci. Rev., 41, 67–82,
2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>Bush, M. B. and Weng, C.: Introducing a new (freeware) tool for palynology,
J. Biogeogr., 34, 377–380, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>Bush, M. B., et al.: in preparation, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
Cahill, N., Kemp, A. C., Horton, B. P., and Parnell, A. C.: A Bayesian hierarchical model for reconstructing
relative sea level: from raw data to rates of change, Clim. Past, 12, 525–542, <a href="http://dx.doi.org/10.5194/cp-12-525-2016" target="_blank">doi:10.5194/cp-12-525-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>Edwards, N. R., Cameron, D., and Rougier, J.: Precalibrating an intermediate
complexity climate model, Clim. Dynam., 37, 1469–1482,
<a href="http://dx.doi.org/10.1007/s00382-010-0921-0" target="_blank">doi:10.1007/s00382-010-0921-0</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>Haslett, J., Whiley, M., Bhattacharya, S., Salter-Townshend, M., Wilson,
S. P., Allen, J. R. M., Huntley, B., and Mitchell, F. J. G.: Bayesian paleoclimate
reconstruction, J. Roy. Stat. Soc. A, 169, 395–438, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>Heiri, O. and Lotter, A. F.: Effect of low count sums on quantitative
environmental reconstructions: an example using subfossil chironomids, J.
Paleolimnol., 26, 343–350, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>Hill, M. O. and Gauch, H. G.: Detrended correspondence analysis: an improved
ordination technique, Vegetatio, 42, 47–58, 1980.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>Holden P. B., Mackay A. W., and Simpson, G. L.: A Bayesian paleoenvironmental
transfer function model for acidified lakes, J. Paleolimnol., 39, 551–566,
<a href="http://dx.doi.org/10.1007/s10933-007-9129-7" target="_blank">doi:10.1007/s10933-007-9129-7</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>Huntley, B.: The use of climate response surfaces to reconstruct
paleoclimate from quaternary pollen and plan macrofossil data, Philos.
T. R.oy Soc.  B, 341, 215–223, 1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>Huntley, B.: Reconstructing palaeoclimates from biological proxies: Some
often overlooked sources of uncertainty, Quaternary Sci. Rev., 31, 1–16, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>Ilvonen, L., Holmström, L., Seppä, H., and Veski, S.: A Bayesian
multinomial regression model for palaeoclimate reconstruction with time
uncertainty, Environometrics, 27, 409–422, <a href="http://dx.doi.org/10.1002/env.2393" target="_blank">doi:10.1002/env.2393</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>Imbrie, J. and Kipp, N. G.: A new micropaleontological method for
quantitative paleoclimatology: application to a Late Pleistocene Caribbean
core, in:  The Late Cenozoic Glacial Ages, edited by: Turekian, K. K., Yale
University Press, New Haven,  77–181, 1971.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>Juggins, S.: Quantitative reconstructions in palaeolimnology: new paradigm
or sick science?, Quaternary Sci. Rev., 64, 20–32,
<a href="http://dx.doi.org/10.1016/j.quascirev.2012.12.014" target="_blank">doi:10.1016/j.quascirev.2012.12.014</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>Juggins, S. and Birks, H. J. B.: Quantitative environmental reconstructions
from biological data,  in: Tracking Environmental Change Using Lake
Sediments, Volume 5: Data Handling and Numerical Techniques, edited by: Birks, H. J. B., Lotter, A. F., Juggins, S., and
Smol, J. P., Springer,
Dordrecht, 431–494, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>Korhola, A., Vasko, K., Toivonen, H. T. T., and Olander, H.: Holocene
temperature changes in northern Fennoscandia reconstructed from chironomids
using Bayesian modeling, Quaternary Sci. Rev., 21, 1841–1860, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
Matthews-Bird, F., Brooks, S. J., Holden, P. B., Montoya, E., and Gosling, W. D.: Inferring late-Holocene climate in the Ecuadorian Andes
using a chironomid-based temperature inference model, Clim. Past, 12, 1263–1280, <a href="http://dx.doi.org/10.5194/cp-12-1263-2016" target="_blank">doi:10.5194/cp-12-1263-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>Munro, M. A. R., Kreiser, A. M., Battarbee, R. W., Juggins, S., Stevenson, A. C.,
Anderson, D. S., Anderson, N. J., Berge, F., Birks, H. J. B., Davis, R. B.,
Flower,
R. J., Fritz, S. C., Haworth, E. Y., Jones, V. J., Kingston, J. C., and Renberg, I.:
Diatom quality control and data handling, Philos. T. Roy. Soc. B,
327, 257–261, 1990.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>Parnell, A. C., Sweeney, J., Doan, T. K., Salter-Townshend, M., Allen, J. R. M.,
Huntley, B., and Haslett, J.: Bayesian inference for palaeoclimate with time
uncertainty and stochastic volatility, J. Roy. Stat. Soc. C-App., 64, 115–138, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>Parnell, A. C., Haslett, J., Sweeney, J., Doan, T. K., Allen, J. R. M., and
Huntley, B.: Joint palaeoclimate reconstruction from pollen data via forward
models and climate histories, Quaternary Sci. Rev., 151, 111–126, <a href="http://dx.doi.org/10.1016/j.quascirev.2016.09.007" target="_blank">doi:10.1016/j.quascirev.2016.09.007</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>Roubik, D. W. and Moreno, P. J. E.: Pollen and spores of Barro Colorado Island, Monographs in Systematic Botany from the Missouri Botanical Garden, St. Louis, Missouri, 1991.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>Rymer, L.: The use of uniformitariansim and analogy in palaeoecology,
particularly pollen analysis, in:  Biology and
Quaternary environments, edited by:  Walker, D. and  Guppy, J. C.,  Australian Academy of Sciences, Canberra,
245–258, 1978.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>Smol, J. P. and Stoermer, E. F.: The diatoms: applications for the
environmental and Earth sciences, Cambridge University Press, 686 pp.,  2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>Stevenson, A. C., Juggins, S., Birks, H. J. B., Anderson, D. S., Anderson, N. J.,
Battarbee, R. W., Berge, F., Davis, R. B., Flower, R. J., Haworth, E. Y., Jones,
V. J., Kingston, V. J., Kreiser, A. M., Line, J. M., Munro, M. A. R., and Renberg, I.:
The Surface Waters Acidification Project Palaeolimnology Programme: modern
diatom/lake-water chemistry set, ENSIS, London, 86 pp., 1991.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>Telford, R. J. and Birks, H. J. B.: The secret assumption of transfer
functions: problems with spatial autocorrelation in evaluating model
performance, Quaternary Sci. Rev., 24, 2173–2179,
<a href="http://dx.doi.org/10.1016/j.quascirev.2005.05.001" target="_blank">doi:10.1016/j.quascirev.2005.05.001</a>, 2005.

</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>ter Braak, C. J. F. and Barendregt, L. G.: Weighted averaging of species
indicator values: its efficiency in environmental calibration, Math.
Biosci., 78, 57–72, 1986.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>ter Braak, C. J. F. and Juggins, S.: Weighted averaging partial least squares
regression (WA-PLS): an improved method for reconstructing environmental
variables from species assemblages, Hydrobiologia, 269/270, 485–502, 1993.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>van Woesik, R.: Quantifying uncertainty and resilience on coral reefs using
a Bayesian approach, Environ. Res. Lett., 8, 044051, <a href="http://dx.doi.org/10.1088/1748-9326/8/4/044051" target="_blank">doi:10.1088/1748-9326/8/4/044051</a>,
2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>Vasko, K., Toivonen, H. T. T., and Korhola, A.: A Bayesian multinomial response
model for organism-based environmental reconstruction, J. Paleolimnol., 24,
243–250, 2000.
</mixed-citation></ref-html>--></article>
