PyCO2SYS v1.7: marine carbonate system calculations in Python

Oceanic dissolved inorganic carbon (TC) is the largest pool of carbon that interacts considerably with the atmosphere on human timescales. Oceanic TC is increasing through uptake of anthropogenic carbon dioxide (CO2), and seawater pH is decreasing as a consequence. Both the exchange of CO2 between ocean and atmosphere and the pH response are governed by a set of parameters that interact through chemical equilibria, collectively known as the marine carbonate system. To investigate these processes, at least two of the marine carbonate system’s parameters are typically measured — most commonly, two from 5 TC, total alkalinity (AT), pH, and seawater CO2 fugacity (fCO2 ; or its partial pressure, pCO2 , or its dry-air mole fraction, xCO2 ) — from which the remaining parameters can be calculated and the equilibrium state of seawater solved. Several software tools exist to carry out these calculations, but no fully functional and rigorously validated tool was previously available for Python, a popular scientific programming language. Here, we present PyCO2SYS, a Python package intended to fill this capability gap. We describe the elements of PyCO2SYS that have been inherited from the existing CO2SYS family of software and explain 10 subsequent adjustments and improvements. For example, PyCO2SYS uses automatic differentiation to solve the marine carbonate system and calculate chemical buffer factors, ensuring that the effect of every solute and reaction is accurately included in all its results. We validate PyCO2SYS with internal consistency tests and comparisons against other software, showing that PyCO2SYS produces results that are either virtually identical or different for known reasons, with the differences negligible for all practical purposes. We discuss new insights that arose during the development process, for example that the marine car15 bonate system cannot be unambiguously solved from the total alkalinity and carbonate ion parameter pair. Finally, we consider potential future developments to PyCO2SYS and discuss the outlook for this and other software for solving the marine carbonate system. The code for PyCO2SYS is distributed via GitHub (https://github.com/mvdh7/PyCO2SYS) under the GNU General Public License v3, archived on Zenodo (Humphreys et al., 2021), and documented online (https://PyCO2SYS.readthedocs.io). 1 20 https://doi.org/10.5194/gmd-2021-159 Preprint. Discussion started: 8 June 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction
The ocean absorbs about a quarter of the anthropogenic carbon dioxide (CO 2 ) currently being emitted (Friedlingstein et al., 2020). This absorption is a double-edged sword. Removing CO 2 from the atmosphere reduces the impact of our emissions on Earth's climate. However, CO 2 uptake causes seawater pH and calcium carbonate mineral saturation states (Ω) to decline through a process termed ocean acidification, which may have adverse effects on some marine species and ecosystems (Doney 25 et al., 2009).
Exchange of CO 2 between the atmosphere and ocean, and the biogeochemical consequences of this process, are governed by a series of equilibrium chemical reactions and parameters collectively known as the marine carbonate system (Millero, 2000).
The core parameters are: the substance contents of aqueous CO 2 , the bicarbonate and carbonate ions formed by its hydration and dissociation (HCO − 3 and CO 2− 3 ), and the sum of these three components (dissolved inorganic carbon, T C ); total alkalinity 30 (A T ; Dickson, 1981); the fugacity, partial pressure, or dry-air mole fraction of CO 2 in seawater (f CO2 , p CO2 , or x CO2 ; Weiss, 1974); and pH (Dickson et al., 2015). If any valid pair of these parameters is known, plus metadata including temperature, pressure, salinity and nutrient contents, then all the other parameters can be calculated (Park, 1969;Zeebe and Wolf-Gladrow, 2001).
Many research questions require solving the marine carbonate system from some measured or modelled pair of its pa-35 rameters. Several software tools have been developed for this purpose, such that most scientific software environments and programming languages have a widely accepted marine carbonate system solver . However, there is not yet an established and fully functional tool for the popular scientific programming language Python, although partial solutions exist (e.g. Branson, 2018). Here, we present PyCO2SYS, a Python package designed to fill this capability gap and provide a robust platform for future developments in calculating marine chemical speciation. Being free and open source, and working across 40 all major operating systems, a Python package is a highly accessible, desirable and useful tool.
As its name suggests, PyCO2SYS originates from the existing CO2SYS family of software. The original CO2SYS program for MS-DOS (Lewis and Wallace, 1998) has been further developed and 'translated', with implementations now available for Microsoft Excel (Pierrot et al., 2006;Orr et al., 2018;Pierrot et al., 2021) and MATLAB/GNU Octave (van Heuven et al., 2011;Xu et al., 2017;Orr et al., 2018;Sharp and Byrne, 2019;. PyCO2SYS was created as an as-close-as-possible 45 translation of CO2SYS-MATLAB v2.0.5 (Orr et al., 2018), but we have since made several additional developments to it.
Many of these developments involved reshaping the internal code into a more Pythonic style. These changes did not affect the calculations and so are not discussed further. Other developments added new functionality or made minor differences to the calculated results; these are documented and justified here.
As the original CO2SYS software is so well-established in the research field, we provide a relatively brief summary of the 50 components of PyCO2SYS that are identical to CO2SYS-MATLAB in Sect. 2, before describing the areas where PyCO2SYS differs in more detail in Sect. 3. Equations that were inherited from CO2SYS-MATLAB or taken from the literature are reported in appendices rather than being reproduced in these sections. We go on to validate PyCO2SYS in Sect. 4 by examining its internal consistency and by comparing its calculations with another CO2SYS implementation. In Sect. 5, we discuss some new Other internal settings are consistent across all cases (Table 3). These three cases have been present since the original 85 CO2SYS for MS-DOS (Lewis and Wallace, 1998). That program included only options 1-8 for the carbonic acid dissociation constants (Table 1), the others being published subsequent to its release. All subsequently added carbonic acid options follow the Standard case. While it is beyond the scope of this manuscript to judge the relative merits of the different options, in general we recommend that one of the Standard cases be used unless there is a specific reason for doing otherwise.
In addition to the carbonic acid equilibria, the user has multiple parameterisation options for each of: (i) the ratio between 90 total borate and salinity, (ii) the bisulfate dissociation constant K * SO4 , and (iii) the hydrogen fluoride dissociation constant K * HF (Tables 2 and 3). However, note that for (i), the user's choice is not respected in the GEOSECS cases, and neither (ii) nor (iii) are included at all in the Freshwater case (Table 2). It should also be noted that choices (ii) and (iii) affect pH scale conversions, including of equilibrium constants, which can have a small (but practically negligible) effect on the results.  Dickson and Millero (1987) a Standard 4 Dickson and Millero (1987) b Standard 5 Dickson and Millero (1987) Hansson (1973a, b) data. b Refit of Mehrbach et al. (1973) data. c Refit of Hansson (1973a, b) and Mehrbach et al. (1973) data. d Constants for zero-salinity freshwater. e Including the corrections of Waters et al. (2014). Equilibrium constants in PyCO2SYS are all stoichiometric rather than thermodynamic and thus denoted with K * . This 95 means that they represent the equilibrium balance of solute substance contents, not of their chemical activities. They are evaluated as follows: 1. Calculated on the pH scale reported in the literature, as a function of temperature and salinity, at zero in-water pressure; 2. Converted to the Seawater pH scale (Appendix A); H + activity Takahashi et al. (1982), except for GEOSECS-Peng, which uses Peng et al. (1987) Vapour pressure factor Weiss and Price (1980) CO2 solubility (K * 0 ) Weiss (1974) a As selected by the user. b Including the corrections of Waters et al. (2014). c This option was written into the code for CO2SYS-MATLAB v2.0.5 and other versions, but commented out and therefore not directly usable. It is available in CO2SYS-MATLAB v3.2.0.
3. Corrected to the in situ pressure; There are some exceptions to the evaluation steps listed above. First, the pH scale conversions (steps 2 and 4) are not applied to K * SO4 , K * HF , K * sp (calcite), K * sp (aragonite), or K * 0 . For K * SO4 and K * HF , this is because these constants always remain on the Free pH scale. The other constants in this list are for equilibria that do not directly involve H + and are therefore independent of the pH scale. Second, no pressure correction (step 3) is applied to the CO 2 solubility factor K * 0 (Weiss, 1974). This value, 105 and calculations of f CO2 , p CO2 and x CO2 , are thus valid only for the surface ocean (Sect. 5.3).

Input and output conditions
A useful feature of all CO2SYS software that nonetheless can cause confusion is calculations at 'input' and 'output' conditions, where 'conditions' refers to temperature and pressure. There is an unhelpful overlap of nomenclature, with 'input' and 'output' used firstly in a programming context to refer to arguments that are passed into functions and returned from them as results, 110 and secondly in a measurement context where they refer to the temperatures and pressures under which the known parameter pair are provided and at which results are to be calculated. For clarity, we therefore use the terms 'arguments' and 'results' in the programming context, while 'input' and 'output' always refer to the measurement context. Thus we provide values at both input and output conditions as arguments to PyCO2SYS and we receive calculations at both input and output conditions as results from the program.

115
Input and output conditions are used when measurements were conducted at a different temperature and/or pressure from what the sample would experience in situ, or to evaluate the effect of changing these conditions on the solution chemistry. All carbonate system parameters except for A T and T C are temperature-and pressure-sensitive, so the values of other measured arguments and calculated results may differ between the input and output conditions. For example, measurements might be conducted in a laboratory at 25°C on samples that were collected from several kilometres' depth in the ocean at sub-zero 120 temperatures. In this case, we would provide the measurement conditions (i.e. temperature and pressure in the laboratory) as input arguments, and the environmental conditions (i.e. temperature and pressure in the ocean) as output arguments. The corresponding output-condition results from PyCO2SYS then represent the true state of the sample in situ in its environment.
The input-condition results are of less environmental interest but may be useful for quality-control purposes. To calculate its results ( Fig. 1), PyCO2SYS first determines the unknown core parameters from whichever pair is provided by the user, under the input conditions (Appendix C). The parameter pairs that require an iterative solver to find pH (i.e. A T plus T C or one of its components) are solved using a scheme that has been updated from previous versions of CO2SYS (Sect. 3.1).

135
The A T and T C provided or determined under the input conditions are then used to solve the core marine carbonate system again under the output conditions, if these have been provided. This is possible because both A T and T C are unaffected by temperature and pressure changes.
Other properties of interest are subsequently calculated from whichever core parameters are most convenient under both input and (if provided) output conditions. These properties include all the individual components of alkalinity (Appendix B), 140 calcite and aragonite saturation states (Appendix D), and various chemical buffer factors (Sect. 3.3.4).  Other results (e.g. carbonate mineral saturation states, buffer factors) are then calculated from the results under input conditions ('Others').
If the user provides output-condition temperature and/or pressure values, then the dissociation constants are recalculated under these new conditions, the core MCS is solved again ('Output MCS results') from these updated constants ('K * values out'), the original 'Totals', and the now-known AT and TC, which are independent of temperature and pressure. Finally, other results are calculated again from the output-condition results ('Others out'). where . Unlike other implementations of CO2SYS, the equations that determine the relative abundances of different chemical species as functions of pH and their total contents (Appendix B) appear only once in PyCO2SYS, in what we term the 'master chemical 150 speciation function'. While this approach does not alter the calculated results, it does make the software more robust by reducing the opportunity for typographical errors when similar equations are repeated across the code.
The derivative term in Eq. (1) is evaluated from the master chemical speciation function using automatic differentiation, as implemented by the Python package Autograd (Maclaurin, 2016). Distinct from numerical or symbolic differentiation, the automatic approach breaks down the code to be differentiated into a sequence of individual arithmetic operations (addition, An advantage of our approach is that if the master chemical speciation function is modified in the future, for example to include additional components of alkalinity, then these changes are automatically incorporated into all the alkalinity-pH solvers 165 without needing to do any calculus and make changes to the solver functions by hand. Automatic differentiation is also used to evaluate other PyCO2SYS results (e.g. buffer factors; Sect. 3.3.4), so the same advantages apply to those calculations too.
In short, our approach ensures that PyCO2SYS calculations will remain internally consistent and reflect the influence of every solute and equilibrium modelled in the master chemical speciation function, even if this function is modified in the course of future development (Sect. 5.4).

Vectorised arguments and solver jumps
PyCO2SYS adjusts how to determine when the alkalinity-pH solver should stop solving for vectorised arguments. In CO2SYS-MATLAB v2.0.5, the solvers continue to iterate and update all values until the change in every element of the array satisfies the ∆pH tolerance threshold (10 −4 in CO2SYS-MATLAB, 10 −8 in PyCO2SYS). This means that a given set of arguments could return slightly different results depending on what data appears in the other, supposedly independent, elements of the argument arrays. Although negligible for all practical purposes, these differences are detectable in code validation exercises.
In PyCO2SYS (and in CO2SYS-MATLAB v3.2.0;  this process has been changed such that each element stops being updated once it has reached the tolerance threshold, independent of the other elements. The maximum solver jump -which constrains the greatest change in pH possible between solver iterations, thus helping to prevent overshoot -is implemented slightly differently in PyCO2SYS than in other CO2SYS programs. In CO2SYS-180 MATLAB, any ∆pH values with a magnitude greater than 1 are halved. In PyCO2SYS, the same applies, but any ∆pH values with a magnitude still greater than 1 after halving are decreased to 1 (while preserving the sign). This has negligible effect on calculations but it is sometimes detectable in intercomparisons.

pH scale conversions
PyCO2SYS fixes a simplification in earlier CO2SYS implementations regarding how pH scales are converted within the master 185 chemical speciation function. This simplification is noted in the programmer's comments in the relevant CO2SYS-MATLAB functions, carried through from the original MS-DOS implementation (Lewis and Wallace, 1998): "Though it is coded for H on the total pH scale, for the pH values occuring in seawater (pH > 6) it will be equally valid on any pH scale (H terms negligible) as long as the K Constants are on that scale." In short, pH and the equilibrium constants are provided to these functions on the same pH scale as each other -except for factor is used based on the user-selected pH scale.

Initial pH estimates
Like most iterative solvers, the Newton-Raphson method (Sect. 3.1) requires an initial pH value that is near to the true value in order to prevent overshoot and guarantee convergence to a root. Previous versions of CO2SYS used 8 as the initial pH estimate in every case. This works well for typical open-ocean seawater, but may be less appropriate in niche environments or when 205 modelling acidimetric titrations. Munhoven (2013) found a better initial pH estimate for solving from known A T and T C by considering only the contributions of carbonate and borate species to A T , simplifying the A T equation: Following Munhoven (2013) as implemented by  in mocsy, PyCO2SYS and CO2SYS-MATLAB v3.2.0 also take this approach (Appendix E). Furthermore, we have extended it to apply to the pH solvers that use one of the 210 components of T C as the second known parameter, as follows.

Solving from A T and f CO2
For clarity in the equations in this section, we abbreviate [CO 2 (aq)] as s, and [H + ] as h. As noted in Appendix C1.2, the approach described here is also used for known parameter pairs of A T plus any of p CO2 , x CO2 or [CO 2 (aq)].
First, f CO2 is converted to s using Eq. (C5). Carbonate-borate alkalinity (A CB ) as a function of s and h is: This can be rearranged into a third-order polynomial in h: Following an equivalent scheme to Munhoven (2013), the initial h value is determined by: When A T is positive, the square-rooted term g 2 2 − 3g 1 is always greater than zero, thus h min has a real value. For clarity in the equations in this section, we abbreviate [HCO − 3 ] as b, and [H + ] as h. Carbonate-borate alkalinity as a function of b is: This can be rearranged into a second-order polynomial in h: The initial h value is estimated following: When b < A T , the square-rooted term g 2 1 − 4g 0 g 2 is always positive and thus h 0 (b) has a real value. Otherwise, b can only be greater than A T if the negative components of A T such as [H + ] are dominant, as happens at low pH. The default initial pH estimate used by PyCO2SYS in that case is therefore 3.

Solving from A T and [CO 2− 3 ]
For clarity in the equations in this section, we abbreviate [CO 2− 3 ] here as c, and [H + ] as h. Carbonate-borate alkalinity as a function of c is: The initial h value is estimated following: When 2c + T B < A T , the square-rooted term g 2 1 − 4g 0 g 2 is always positive and thus h 0 (c) has a real value. Otherwise, 2c + T B can only be greater than A T if the negative components of A T such as [H + ] are dominant, as happens at low pH. The default initial pH estimate used by PyCO2SYS in that case is therefore 3.

Additional alkalinity components
The contributions of ammonia and bisulfide to alkalinity  plus the ability to solve from carbonate and/or bicarbonate ion content have been added in collaboration with  to ensure consistency between PyCO2SYS and CO2SYS-MATLAB v3.2.0.

265
The total substance contents and stoichiometric dissociation constants for up to two additional acid-base systems that contribute to total alkalinity can be provided as arguments to PyCO2SYS and are part of its speciation model. The effects of these extra components are automatically incorporated into all PyCO2SYS calculations, including the iterative pH solvers (Sect. 3.1), buffer factors (Sect. 3.3.4), and uncertainty propagation (Sect. 3.6). These extra components are modelled following Sharp and Byrne (2020), as described in Appendix B11. No corrections of any sort (e.g. for pressure or pH scale; Sect. 2.2) are made to 270 the dissociation constants for these user-defined additional components within PyCO2SYS; the user must ensure that they are already suitable for the conditions being analysed and on the user-indicated pH scale. balance between the availability of a substrate for calcification (i.e. HCO − 3 ) and the inhibition of calcification by H + (Eq. (D2)).

Buffer factors
A buffer factor quantifies the sensitivity of a certain marine carbonate system parameter to a change in another parameter. Best known is the Revelle factor, which is the ratio of the fractional change in p CO2 corresponding to a fractional change in T C at constant A T (Revelle and Suess, 1957). Frankignoulle (1994) derived a broader set of buffer factors for the marine carbonate 285 system, quantifying the responses of several different parameters to changes in T C and A T ; these were later rediscovered by Egleston et al. (2010). PyCO2SYS calculates these buffer factors using the nomenclature of Egleston et al. (2010).
Closely related to these buffer factors, Frankignoulle et al. (1994) introduced the factor ψ, which quantifies the change in T C required to return to the original seawater p CO2 after the action of calcification (which reduces A T and T C in a 2:1 ratio) or CaCO 3 dissolution (the reverse). Humphreys et al. (2018) introduced the 'isocapnic quotient' (Q), which is the ratio of 290 A T to T C change that does not affect seawater p CO2 , thus generalising the concept of ψ for application to all biogeochemical processes that affect A T and T C (denoted φ). PyCO2SYS calculates both ψ and Q, the latter of which can be used to calculate φ for any biogeochemical process .
PyCO2SYS offers two independent ways to evaluate the various buffer factors of the marine carbonate system: with explicit equations and by automatic differentation. The latter is used by default.

295
The 'explicit' approach follows equations reported in the literature Egleston et al., 2010;Humphreys et al., 2018), noting that the typographical errors in Egleston et al. (2010) identified by Richier et al. (2018) and Orr et al. (2018) have been corrected. In general, these equations do not include the effect of species beyond the carbonate, borate, and water contributions to total alkalinity.
The 'automatic' approach uses automatic differentiation to find the derivative necessary to evaluate each buffer factor. The 300 appropriate derivatives are taken from the functions that calculate a third carbonate system parameter from a known pair (Appendix C). All species modelled in the master chemical speciation function are therefore included, including any extra alkalinity components (Sect. 3.3.1). Typographical errors from the literature cannot influence these calculations.
Of the buffer factors, only the Revelle factor was included in previous versions of CO2SYS. It was evaluated using finite central-difference derivatives, which is replicated as the 'explicit' option in PyCO2SYS (with the corrections described in 305 Appendix F). However, as for all other buffer factors, the default calculation uses automatic differentiation.

No-solve modes
As well as solving from a pair of parameters, PyCO2SYS can be run with one or no marine carbonate system parameter arguments.
from temperature, pressure, and salinity (Sect. 2.2), without actually using these to do any further computations.
If one parameter is provided, then the results that can be computed with that parameter alone are returned. This applies to pH, p CO2 , f CO2 , x CO2 , and [CO 2 (aq)], as follows.
pH can be converted between the different scales without knowledge of a second carbonate system parameter. Therefore if pH alone is provided to PyCO2SYS, it is converted to every pH scale under the input conditions (Appendix A). Conversion to 315 a different temperature and/or pressure does require solving the carbonate system ( Fig. 1), so output-condition values are not calculated.
Seawater p CO2 , f CO2 , x CO2 , and [CO 2 (aq)] can also be interconverted without knowledge of a second carbonate system parameter (Appendix C1.2). Therefore if any of these parameters alone are provided to PyCO2SYS, all the others are calculated under the input conditions. If an output-condition temperature is provided, then p CO2 is also adjusted to the new temperature 320 following Takahashi et al. (2009), and all others in this set of parameters are calculated under output conditions from the new p CO2 value.

Multidimensional arguments
All arguments to PyCO2SYS, including settings, can be multidimensional. A combination of scalar and multidimensional arguments can be provided, with the latter formatted as NumPy ndarrays (Harris et al., 2020). Results that depend only on 325 scalar arguments are themselves scalar, while results depending on multidimensional inputs are 'broadcasted' into consistently shaped arrays (Fig. 2). The code is optimised to efficiently compute across multidimensional arrays following the approach of  Propagating the uncertainty in an argument through to a result requires knowing the derivative of the result with respect to the argument. This feature is available for a subset of the arguments in the original MS-DOS CO2SYS (Lewis and Wallace, 1998) and was added to the Excel and MATLAB implementations more recently (Orr et al., 2018). However, while much of the code to solve the marine carbonate system in PyCO2SYS has been directly inherited from CO2SYS-MATLAB, its implementation of uncertainty propagation is totally independent.

335
PyCO2SYS evaluates the derivatives using a finite forward-difference approach. We use finite differences rather than automatic differentiation here because the latter, while possible, is computationally inefficient to apply over the entire PyCO2SYS program. We use forward-rather than central-difference derivatives because the former can be safely evaluated at zero for variables where negative values are impossible (e.g. salinity). The derivative of a result r with respect to an an argument a is calculated thus: The size of ∆a is scaled relative to the absolute median of all values provided for each argument a, denoted |median ( This scaling is necessary because some arguments can differ by over 20 orders of magnitude from other arguments, so a constant absolute ∆a is not effective.

345
PyCO2SYS can conveniently obtain derivatives of all its results with respect to all of its arguments and also with respect to all parameters that are normally calculated internally from temperature, pressure and/or salinity, such as equilibrium constants and total salt contents.
The derivatives are calculated by a function that wraps the entire PyCO2SYS program, rather than by adding extra internal variables that keep track of the effects of differences in to the arguments, as has been implemented elsewhere (e.g. Orr et al., . The PyCO2SYS approach means that if the main program is producing valid results, then the derivatives can also be considered reliable without needing to verify some separate calculation mechanism.
To determine the overall uncertainty in each result, the uncertainty components from different arguments are combined using

355
where σ is the uncertainty as a standard deviation (thus σ 2 is a variance). However, Eq. (24) is only valid if the uncertainties in all arguments are independent from each other. Propagation of co-varying uncertainties can still be carried out with PyCO2SYS, because as noted above, the derivative of any result with respect to any argument can be calculated. The user can therefore build the Jacobian matrix of partial derivatives needed to propagate any arbitrary set of co-varying argument uncertainties through to any result (Appendix G). fail, an email report would be sent to us, the developers, and the failure displayed publicly in a badge on the GitHub repository's public web page (Fig. 3). Updates to PyCO2SYS are made in a developmental branch of the repository and the tests must all pass before these changes may be incorporated into the main branch and publicly released in a new version. All validation 370 tests described below were run with PyCO2SYS v1.7.0 (https://doi.org/10.5281/zenodo.4757055), but these protocols should ensure that the quantitative statements made here will hold true as the code continues to be developed. In a 'round-robin' test, we first determine all of the core carbonate system parameters from one pair, and then solve the system 375 again using every possible pair of determined parameters. Under typical seawater conditions, we find the same results for every parameter pair, to within better than the tolerance of the iterative pH solvers (i.e. 10 −8 in pH). The maximum absolute difference in each parameter across all possible input pair combinations is acceptably small (Table 4).

Buffer factors
If we include only the solution components that appear in the 'explicit' equations for the buffer factors (i.e. zero nutrients 380 and total salts, except for T B ) then we can compare these results with the 'automatic' values (Sect. 3.3.4). Under a range of typical seawater conditions, we find that the differences between these two calculation approaches are totally negligible: on the order of 10 −12 % for the Egleston et al. (2010) buffers; 10 −9 % for ψ and Q; and 10 −7 % for the Revelle factor. The Revelle factor is less well-matched because its 'explicit' value is computed using a finite difference scheme (for consistency with CO2SYS-MATLAB), which is inherently less accurate than than using a direct equation. Typically, one would not set the total salt contents to zero when computing buffer factors with the default automatic approach.
As a consequence, differences between the explicit and automatic buffer factors may be larger than described above, but still practically negligible: keeping nutrients at zero but using T SO4 and T F calculated from a salinity of 35, we find that the automatic buffer factors change such that their differences with the corresponding explicit buffer factors increase to the order of 0.01 %.

Uncertainty propagation simulations
The propagation of independent uncertainties using forward-difference derivatives (Sect. 3.6) is tested by comparison with Monte-Carlo simulations for all equilibrium constants and all known parameter pair combinations. In every case, the uncertainty determined from the simulations (n = 10 4 ) as a standard deviation is either within 3 % of the directly calculated value if the latter is non-zero, or negligibly small if not (absolute value less than 10 −10 ). However, these CO2SYS-MATLAB versions do not permit solving with either carbonate or bicarbonate ion content as a 405 known parameter, nor do they include ammonia or sulfide speciation. They also lack the parameterisations of Sulpis et al.
(2020) and Schockman and Byrne (2021) for the carbonic acid dissociation constants (options 16 and 17 in Table 1), and the parameterisation of  for bisulfate dissociation (Table 3). We therefore also tested PyCO2SYS against CO2SYS-MATLAB v3.2.0 , which does include all these options.

Temperature-salinity-pressure parameterisations 410
All equilibrium constants and total salt contents, calculated from salinity, temperature, and pressure, are virtually identical (absolute tolerance 10 −12 , relative tolerance 10 −16 , in pK * values or in µmol · kg −1 ) to those in both CO2SYS-MATLAB v2.0.5 and v3.2.0. These tests are run across a range of practical salinity from 0 to 50, temperature from −1 to 50°C, and pressure from 0 to 10 5 dbar, including values of exactly zero in every case. Every pH scale and parameterisation option is included (Tables 1 and 2).

Uncertainty propagation comparisons
PyCO2SYS reproduces all the derivatives reported by Orr et al. (2018) in their Tables 2 and 3 Table 4 to within 10 −4 %, again with the exception of [H + ]. We consider all these differences to be negligible.

Simulated seawater titration 460
PyCO2SYS successfully reproduces the closed-cell seawater titration datasets simulated by Dickson (1981). Each simulated dataset contains pH values for a seawater sample as it is titrated with incremental HCl additions across a pH range from approximately 8 to 3. Dickson (1981) specified exact values for all stoichiometric equilibrium constants. PyCO2SYS allows these to be provided, instead of them being calculated internally from temperature and salinity. The titration is then simulated by calculating how A T should change through the titration due to acid addition, accounting for dilution of A T , T C and all other dissolved solutes by acid addition, and then solving the carbonate system for pH from the so-determined A T and T C . On test here is the ability to solve for pH from known A T and T C across a wide range of pH and A T values, including negative A T .
The first titration dataset, without phosphate, is reproduced perfectly by PyCO2SYS to the number of decimal places reported by Dickson (1981). The second titration, with 10 µmol · kg −1 of total phosphate included, is reproduced perfectly by
The other 48 data points in this titration agree perfectly. The noted discrepancies occur in non-consecutive data points and 475 are therefore unlikely to all be associated with an error in a particular equilibrium. Coupled with the nature of the differences (underlined above), that is, one or two specific digits switched or replaced rather than the entire number being different, we conclude that these differences most likely represent minor typographical errors and therefore that PyCO2SYS does accurately reproduce these simulations in full.

Initial pH estimates
The aim of our revised scheme for initial pH estimates following Munhoven (2013) was to find values that were closer to the final solution across a wide range of pH, thus providing a more suitable starting point for the iterative solvers and thereby reducing the number of iterations required to converge at the solution.
We find that the initial pH estimates do follow a similar pattern to the final solutions across wide ranges of argument values,

485
including at the extremes where the initial-estimate equations become invalid and default pH values are used instead (Fig. 4).
The number of iterations required to fall beneath the solver's tolerance threshold (10 −8 in pH) is also reduced, compared with the original approach of always using an initial pH of 8. Indeed, for typical ocean conditions we find that the iterative solver often does not alter the initial estimate at all. Suitable starting points for the iterative solvers are clearly being found.

Total alkalinity-carbonate ion parameter pair 490
The iterative A T -pH solvers can be thought of as working by evaluating A T at a sequence of different possible pH values until the pH that returns the true A T is found. This pH is known as the 'root' of the A T -pH equation. The difference between the true A T and these estimates from pH is the 'residual' alkalinity, which is zero at the root. We find that the equations for initial pH estimates and final pH values have very similar roots and similar residuals in the region around these roots (Fig. 5). This similarity is why the initial pH estimates provide such suitable starting points for the final solvers.  (10) and (16), and (ii) total alkalinity (dashed lines; AT) from Eq. (B1), calculated across a range of pH, with a second known parameter of (a) dissolved inorganic carbon (2.15 mmol·kg −1 ), (b) CO2 fugacity (600 µatm), (c) bicarbonate ion content (2011 µmol · kg −1 ), and (d) carbonate ion content (116 µmol · kg −1 ), all at a salinity of 35 and temperature of 15°C. Each possible pH value returns a different residual alkalinity, and the true pH root is where the residual alkalinity is zero. Both the initial estimates and the final solutions find this zero-residual pH root, using the ACB and AT equations respectively (Sects. 3.1.1 and 3.2). The similarity between the ACB and AT residual curves, particularly around zero residual alkalinity, shows that the initial estimates provide excellent starting values for the subsequent iterative solvers. In (d), the final iterative solver has two possible roots, where residual alkalinity is zero. However, the initial estimate has only one root, corresponding to the lower-pH final root. This ensures that the final solver will always converge to the lower-pH root, which is usually appropriate for the seawater system.
For the A T -[CO 2− 3 ] parameter pair, there are often two real pH roots and thus two possible equilibrium states of the marine carbonate system (Fig. 5d). This contradicts the paradigm that the carbonate system can always be solved from any pair of its core parameters. Strictly speaking, the system cannot be uniquely solved from the known parameter pair of A T and [CO 2− 3 ]; it is also necessary to know which of the two possible pH roots is correct.
We conceptualise the two pH roots as follows. The lower-pH root corresponds to typical seawater: a relatively high-T C 500 system, where bicarbonate ions are the main component of T C , and carbonate alkalinity ([HCO − 3 ] + 2[CO 2− 3 ]) is the main component of A T . The higher-pH root corresponds to a low-T C system, where virtually all of T C is in the form of carbonate ion, and A T is dominated by non-carbonate species (Fig. 6).  Figure 6. Main components of (a) total alkalinity (AT) and (b) dissolved inorganic carbon (TC) at the two possible pH roots for a known parameter pair of AT (2300 µmol · kg −1 ) and carbonate ion content ([CO 2− 3 ]; 120 µmol · kg −1 ). The low-pH root (left) represents typical seawater, with relatively high TC (2143 µmol·kg −1 ), and both AT and TC dominated by bicarbonate ion (HCO − 3 ). The high-pH root (right) has the same AT and [CO 2− 3 ], but AT is dominated by hydroxide (OH − ), and TC is much lower (122 µmol · kg −1 ) and comprised almost entirely of CO 2− 3 . These calculations were carried out at 15°C, with a practical salinity of 35 and zero nutrients. If nutrients were present, then like borate (B(OH) − 4 ) they would have different contributions to AT at the different pH roots. pH is on the Total scale (Appendix A).
Which root the solver finds depends on the initial pH estimate and the residual alkalinity-pH slope at that point (Eq. (1)). This is an advantage of the improved initial pH estimates in PyCO2SYS: the initial-estimate equation has only a single real 505 root (Fig. 5d). Because the initial estimate is based on equations for a system that only includes carbonate and borate alkalinity (Sect. 3.2.3), the carbonate system contribution to total alkalinity will always dominate, so the single root of the initial estimate will coincide with the lower-pH true root, which is appropriate for seawater. The solver will thus more robustly find the correct root each time.
so a constant initial pH estimate of 8 would also return the correct root. But in more unusual environments, the new algorithm introduced here could help ensure that the solver identifies the correct root.

Pressure corrections for p CO2
In PyCO2SYS, p CO2 (and by extension, f CO2 and x CO2 ) is always evaluated at a total pressure of 1 atm, rather than being corrected for the pressure of the overlying water column (Sect. 2.2). This approach is consistent with all existing implementa-515 tions of CO2SYS. In practice, it means that these values represent the approximate p CO2 that seawater would have if it were brought to the surface ocean without changing the solution composition -'approximate' because this calculation should use potential temperature, rather than in situ temperature, to retrieve the true value expected after adiabatic decompression . PyCO2SYS does not calculate potential temperature, but this could be provided by the user in place of in situ temperature.

520
Although a pressure correction for p CO2 (i.e. a pressure correction for K * 0 and the fugacity factor; Appendix C1.2) is theoretically possible (Weiss, 1974;, it could be argued that this is unnecessary. First, the vast majority of p CO2 measurements are carried out only at the surface ocean (e.g. Bakker et al., 2016), in part due to practical constraints of the 'gold-standard' equilibrator-based methodology. Second, the concept of p CO2 has utility only in the context of air-sea CO 2 exchange, which takes place only at the surface ocean.

525
However, recent developments in sensor technology are beginning to enable direct measurements of in situ p CO2 at depth in the ocean (Clarke et al., 2017). There is also growing interest in calculating in situ p CO2 values at depth for intercomparison exercises in which the marine carbonate system has been overdetermined by measuring more than two of its core parameters (e.g. Raimondi et al., 2019). Therefore, we do anticipate an increasing need for pressure-corrected p CO2 values, and while we have kept the approach in PyCO2SYS consistent with other CO2SYS software for now, we consider a robust implementation 530 of these calculations to be an important target for future code development.

Outlook
The Autograd package that PyCO2SYS uses for automatic differentiation is still being maintained, and its most recent release (v1.3, July 2019) is stable, but it is no longer in active development. Its successor, JAX (Bradbury et al., 2018), has further benefits including 'just-in-time' code compilation and parallelisation. These features could speed up computation speed in 535 PyCO2SYS, especially the components involving automatic differentiation, potentially by several orders of magnitude. However, JAX cannot currently run natively on the Microsoft Windows operating system, which would greatly restrict the usability of PyCO2SYS for the oceanographic research community. This limitation is due to JAX's dependence on the separate XLA (Accelerated Linear Algebra) compiler, rather than being an intrinsic issue with JAX itself. Should this compatibility issue be resolved in the future, we envision updating PyCO2SYS to use JAX instead of Autograd. This should be relatively straight-540 foward thanks to the close similarities between the API (application programming interface) of these packages.
As future developments are made to PyCO2SYS, we will aim to maintain consistency with other CO2SYS-family tools, but cannot guarantee that all new features or updates will be added simultaneously across all implementations. In practice, the workload required to achieve this is not currently feasible, and we would not wish to hold back development because of the time required to replicate changes across multiple implementations. That said, the results should remain consistent enough that 545 users can select which implementation to use based on their preferred software environment, rather than the other way around.
This ambition could also extend beyond the CO2SYS family of software. Independently developed tools for solving the marine carbonate system exist in other languages, such as seacarb in R (Gattuso et al., 2021) and mocsy in Fortran . These give sufficiently consistent results with each other that the selection of which tool to use does not affect scientific interpretation , and we have shown that PyCO2SYS is, and will remain, no exception. Even PyCO2SYS. However, we still lack meaningful and statistically equivalent estimates for the actual uncertainties in the equilibrium constants. The software therefore stands ahead of our knowledge: as more work is done to robustly quantify these uncertainties, the tools are already in place to propagate them through to all marine carbonate system calculations.
As development of PyCO2SYS continues, we do not anticipate changing its fundamental approach to solving the marine carbonate system, but we will try to incorporate the latest research, including keeping up-to-date with new parameterisations, 565 for example of stoichiometric equilibrium constants (e.g. Sulpis et al., 2020;Schockman and Byrne, 2021). Integration with a speciation model that can determine the equilibrium constants based on chemical activities, rather than parameterising these based on salinity, is an area of interest (Turner et al., 2016), but would likely require such substantial changes as to constitute a separate software tool. We do envision further additions to the master chemical speciation function in PyCO2SYS, for example to better represent the impact of organic contributions to alkalinity (e.g. Cantrell et al., 1990;Muller and Bleie, 2008;Kuliński 570 et al., 2014;Abril et al., 2015;Ulfsbo et al., 2015) -noting that a simplified representation of such extra components can already be modelled in PyCO2SYS (Sect. 3.3.1).
Through all these efforts, we aim to ensure that PyCO2SYS remains a reliable and comprehensive tool for analysing seawater chemistry, from samples and experiments in the laboratory through to the changing marine carbonate system across the global ocean. Code availability. The current version of PyCO2SYS is freely available from its GitHub repository at https://github.com/mvdh7/PyCO2SYS under the GNU General Public License v3. Installation is recommended from the Python Package Index (PyPI) via pip and documentation is available online (https://PyCO2SYS.readthedocs.io). The exact version of PyCO2SYS used to produce the results discussed in this paper (v1.7.0), including input data and scripts to run the model and perform all validation tests described here, is archived on Zenodo (https://doi.org/10.5281/zenodo.4757055).

580
Appendix A: pH scales and conversions The pH scales in PyCO2SYS are Free (pH F ), Total (pH T ), Seawater (pH S ) and NBS (pH N ), defined following e.g. Zeebe and Wolf-Gladrow (2001) and Velo et al. (2010): where γ H + is the chemical activity coefficient for H + (Table 3). Note that in PyCO2SYS, [H + ] in all these definitions is a substance content (Sect. 2.1). pH values and stoichiometric equilibrium constants (K * ) are thus converted between these different pH scales using the following factors: where γ H + is the hydrogen ion activity, calculated from temperature and salinity following either Peng et al. (1987) or Takahashi et al. (1982) (see Table 3). The different scales are denoted by the subscript and superscript letters, with F for Free, T 595 for Total, S for Seawater and N for NBS. To convert from any pH scale A to any other pH scale B using these factors: Alternatively and equivalently: The equations above are used in the same way to convert K * values between pH scales.
Equations for all the individual alkalinity components (A C , A B , etc.) are given in the subsequent sections in terms of pHindependent total substance contents (T C , T B , etc.) and [H + ].

B4 Boric acid
Further deprotonation of HS − is considered negligible and thus not modelled (Schoonen and Barnes, 1988 Undissociated H 2 SO 4 is considered negligible and thus not modelled.

B10 Fluoride
The reactions and equations for the second additional component β and its alkalinity contribution A β are identical to those given for α above. PyCO2SYS automatically determines how to modify the alkalinity equation following Eq. (B24) based on 685 the user-provided K * α and K * β values, with a pZLP (zero-level of protons) of 4.5 (Wolf-Gladrow et al., 2007).
Appendix C: Solving the core marine carbonate system Here, we lay out all the equations that are used to convert between different carbonate system parameters in PyCO2SYS. These mostly follow Zeebe and Wolf-Gladrow (2001)  If one of p CO2 , x CO2 or [CO 2 (aq)] is in the known parameter pair, then its values are first converted to f CO2 as follows.
For known p CO2 : where G is the fugacity factor (Table 2).
For known x CO2 : where P v is the vapour pressure factor (Table 3): in which P is total atmospheric pressure (assumed to be 1 atm) and p w is the water vapour pressure (Weiss and Price, 1980).
For known [CO 2 (aq)]: where K * 0 is the solubility factor for CO 2 (Table 3). The calculation steps given below for f CO2 are then followed to solve the core marine carbonate system. Afterwards, p CO2 , 710 x CO2 and [CO 2 (aq)] are calculated where they were not in the original known parameter pair: p CO2 and x CO2 are calculated using Eqs. (C2) and (C3), while [CO 2 (aq)] is calculated by difference using the definition of T C in Eq. (B3).
C2 Solving routines C2.1 From A T and T C First, pH is determined by solving Eq. (B1) for A T as a function of T C and pH using Eq. (B5) for A C and the iterative approach 715 described in Sects. 3.1 and 3.2. The components of T C are then calculated from T C and pH:

C2.2 From A T and pH
First, we determine A C from known A T and pH by using Eq. (B1). T C is then calculated from A C : The components of T C are then calculated from T C and pH using Eqs. (C6), (C7) and (C8).

C2.3 From A T and f CO2
First, pH is determined by solving Eq. (B1) for A T as a function of f CO2 and pH using Eq. (B6) for A C and the iterative approach described in Sects. 3.1 and 3.2. T C is then calculated from A T and pH following Sect. C2.2, and its remaining unknown components with Eqs. (C7) and (C8).

C2.4 From
First, pH is determined by solving Eq. (B1) for A T as a function of [CO 2− 3 ] and pH using Eq. (B8) for A C and the iterative approach described in Sects. 3.1 and 3.2. The lower of the two pH roots is returned by default, as discussed in Sect. 5.2. T C is then calculated from A T and pH following Sect. C2.2, and its remaining unknown components with Eqs. (C6) and (C7).

C2.5 From A T and [HCO
First, pH is determined by solving Eq. (B1) for A T as a function of [HCO − 3 ] and pH using Eq. (B7) for A C and the iterative 735 approach described in Sects. 3.1 and 3.2. T C is then calculated from A T and pH following Sect. C2.2, and its remaining unknown components with Eqs. (C6) and (C8).

C2.6 From T C and pH
First, A T is calculated from T C and pH using Eq. (B1). The components of T C are then calculated from T C and pH using Eqs. (C6), (C7) and (C8).

C2.7 From T C and f CO2
First, pH is calculated from T C and f CO2 using A T and the remaining unknown components of T C are then calculated from T C and pH using Eqs. (B1), (C7) and (C8).

C2.8 From T C and [CO 2− 3 ]
First, pH is calculated from T C and [CO 2− 3 ] using A T and the remaining unknown components of T C are then calculated from T C and pH using Eqs. (B1), (C6) and (C7).

C2.9 From T C and [HCO − 3 ]
First, pH is calculated from T C and [HCO − 3 ] using A T and the remaining unknown components of T C are then calculated from T C and pH using Eqs. (B1), (C6) and (C8).
C2.10 From pH and f CO2

755
First, T C is calculated from pH and f CO2 using A T and the remaining unknown components of T C are then calculated from T C and pH using Eqs. (B1), (C7) and (C8).
A T and the remaining unknown components of T C are then calculated from T C and pH using Eqs. (B1), (C6) and (C8).
C2.13 From f CO2 and [CO 2− 3 ] First, pH is calculated from f CO2 and [CO 2− 3 ] using: pH is then calculated from f CO2 and [CO 2− 3 ] using Eq. (C17). Next, T C is calculated from pH and f CO2 using Eq. (C14). Finally, A T is calculated from T C and pH using Eq. (B1). pH is then calculated from f CO2 and [CO 2− 3 ] using Eq. (C17). Next, T C is calculated from pH and f CO2 using Eq. (C14). Finally, A T is calculated from T C and pH using Eq. (B1).

Appendix D: Other marine carbonate system variables
Calcite and aragonite saturation states (Ω) are calculated from the definition: where K * sp is the solubility product, a function of salinity, temperature and pressure that is different for each mineral ( Table 2). The 'substrate:inhibitor ratio' of Bach (2015) is calculated from the bicarbonate and free hydrogen ion contents: Note that in Eq. (D2), the [H + ] term is always calculated on the Free pH scale of Eq. (A1).
For clarity in the equations in this section, we abbreviate [H + ] as h.
(2) as a function of T C and h is: This can be rearranged into a third-order polynomial in h: P TC (h) = h 3 + h 2 g 2 (T C ) + hg 1 (T C ) + g 0 (T C ) = 0 (E2) The initial h value is determined by: where h min is defined in Eq. (9). Negative A CB is impossible because its equation contains only positive terms, so the approach of Munhoven (2013) cannot be applied if A T is indeed negative. The default h 0 of 10 −3 mol·kg −1 , corresponding to a pH of 3, is therefore used for that case (e.g. after the alkalinity end-point in an acidimetric titration), following .
The maximum possible A CB is 2T C +T B , where T C is entirely CO 2− 3 and T B is entirely B(OH) − 4 . Where A T is actually higher 805 than this limit of this simplified expression, we expect a high pH (given the dominance of CO 2− 3 within T C ) and therefore use an initial estimate pH of 10, again following . Otherwise, h min in Eq. (E6) is found using Eq. (9), following Munhoven (2013) and .

Appendix F: Revelle factor calculation errors in older versions of CO2SYS-MATLAB
Older versions of CO2SYS-MATLAB, including v2.0.5 (Orr et al., 2018) from which PyCO2SYS was originally converted, 810 have minor errors in how the Revelle factor is evaluated. These have been corrected in PyCO2SYS (also in CO2SYS-MATLAB v3.2.0 and CO2SYS-Excel v3), leading to small differences in the calculated values. These differences are notable from a computational perspective (i.e. many orders of magnitude greater than solver tolerance and floating point errors) but still mostly negligible in virtual all practical applications. of using automatic differentiation instead of finite-difference derivatives. The key errors in the original CO2SYS-MATLAB implementation of the finite-difference approach are: 1. An incorrect reference T C value is used in the final evaluation. Rather than using the 'central' T C value, the change in p CO2 is divided by the adjusted (T C − ∆T C ).
2. Under output conditions, the 'Peng correction' is not included in the evaluation of the Revelle factor (Sect. 2.2).

820
The lesser accuracy of the finite-difference method relative to automatic differentiation, particularly given the relatively large ∆T C used in the original finite-difference implementation (i.e. 1 µmol · kg −1 ), explains the differences between the two approaches that remains after the errors above have been corrected.

Appendix G: Propagation of co-varying uncertainties
The uncertainty in an argument a i can be expressed as a variance, denoted σ 2 (p i ), where σ(a i ) would be the same uncer-825 tainty expressed as a standard deviation. The covariance between the uncertainties in a pair of arguments a i and a j is denoted σ(a i , a j ). The variances and covariances of any arbitrary set of arguments (A = a 1 , a 2 , . . . a n ) can be assembled into a symmetric variance-covariance matrix (Σ A ): σ(a 1 , a 2 ) · · · σ(a 1 , a n ) σ(a 2 , a 1 ) σ 2 (a 2 ) · · · σ(a 2 , a n ) . . . . . . . . . . . . σ(a n , a 1 ) σ(a n , a 2 ) · · · σ 2 (a n ) To propagate these uncertainties through to a set of results (R = r 1 , r 2 , . . . r m ), we must first assemble the Jacobian matrix of 830 R with respect to A (J): The variance-covariance matrix for the propagated uncertainties in R (Σ R ) can then be evaluated by matrix multiplication (e.g. Orr et al., 2018):