Retrieving monthly and interannual total-scale pH (pHT) on the East China Sea shelf using an artificial neural network: ANN-pHT-v1

While our understanding of pH dynamics has strongly progressed for open-ocean regions, for marginal seas such as the East China Sea (ECS) shelf progress has been constrained by limited observations and complex interactions between biological, physical and chemical processes. Seawater pH is a very valuable oceanographic variable but not always measured using high-quality instrumentation and according to standard practices. In order to predict total-scale pH (pHT) and enhance our understanding of the seasonal variability of pHT on the ECS shelf, an artificial neural network (ANN) model was developed using 11 cruise datasets from 2013 to 2017 with coincident observations of pHT, temperature (T ), salinity (S), dissolved oxygen (DO), nitrate (N), phosphate (P) and silicate (Si) together with sampling position and time. The reliability of the ANN model was evaluated using independent observations from three cruises in 2018, and it showed a root mean square error accuracy of 0.04. The ANN model responded to T and DO errors in a positive way and S errors in a negative way, and the ANN model was most sensitive to S errors, followed by DO and T errors. Monthly water column pHT for the period 2000–2016 was retrieved using T , S, DO, N, P and Si from the Changjiang biology Finite-Volume Coastal Ocean Model (FVCOM). The agreement is good here in winter, while the reduced performance in summer can be attributed in large part to limitations of the Changjiang biology FVCOM in simulating summertime input variables.


Introduction
Atmospheric carbon dioxide (CO 2 ) levels have increased by nearly 46 %, from approximately 278 ppm (parts per million) in 1750 (Ciais et al., 2014) to 405 ppm in 2017 (Le Quéré et al., 2018). The oceans have absorbed approximately 48 % of the anthropogenic CO 2 emissions (Sabine et al., 2004), resulting in decreasing long-term pH trends of ∼ 0.02 decade −1 in open-ocean waters (e.g., Dore et al., 2009;González-Dávila et al., 2010;Bates et al., 2014;Lauvset et al., 2015). While a gradual decrease in pH is a predictable open-ocean response to elevated anthropogenic CO 2 emissions, the seasonal changes and long-term trends in pH in coastal seas have not been fully understood due to the lack of long-term pH data and complexity of coastal systems. In this context, the development of approaches to predict carbonate chemistry parameters in coastal regions may assist both the management of local water quality and our wider understanding of the ocean carbon cycle.
Many attempts have been made to predict seawater pH by developing empirical relationships between pH and environmental variables, such as temperature (T ) (Juranek et al., 2011), salinity (S) , dissolved oxygen (DO) (e.g., Juranek et al., 2011;Sauzède et al., 2017), nutrients (e.g., Williams et al., 2016;Carter et al., 2016Carter et al., , 2018, and longitude and latitude (Sauzède et al., 2017). Compared with traditional empirical methods, artificial neural networks (ANNs) have been proposed as powerful tools for modelling uncertain and complex systems such as ecosystems and environmental assessment (e.g., Olden and Jackson, 2002;Olden et al., 2004;Uusitalo, 2007;Raitsos et al.,  2008; Chen et al., 2017). Their main advantage compared with, for example, multiple linear regression (MLR) models may be a greater flexibility and versatility in modelling complex nonlinear relationships. ANNs have been used for the retrieval of the partial pressure of carbon dioxide (pCO 2 ) (e.g., Friedrich and Oschlies, 2009;Laruelle et al., 2017), total alkalinity (e.g., Velo et al., 2013;Bostock et al., 2013;Sasse et al., 2013), total dissolved inorganic carbon (e.g., Bostock et al., 2013;Sasse et al., 2013) and phytoplankton functional types (e.g., Raitsos et al., 2008;Palacz et al., 2013). However, these studies mainly focus on the open ocean; relatively few studies have focused on coastal seas, perhaps because of the complexity and heterogeneity of the continental shelves. Alin et al. (2012) developed an MLR model to reconstruct pH in the southern California Current System, while Moore-Maley et al. (2016) evaluated the interannual variability of near-surface pH using a one-dimensional biophysical mixing-layer model in the Strait of Georgia. To our knowledge, no empirical relationship for pH has yet been established for the East China Sea (ECS).
The ECS is the largest marginal sea in the western North Pacific Ocean and receives massive terrestrial inputs from the Changjiang (Yangtze River). The shelf shallower than 200 m covers more than 70 % of the entire ECS (e.g., Ichikawa and Beardsley, 2002;Lie and Cho, 2016), where the dominant currents present seasonal circulation patterns. The spatial and temporal distributions of the carbonate system have been investigated in the ECS (e.g., Chou et al., 2009;Cao et al., 2011;Qu et al., 2015) and were found to largely reflect the distributions of various water masses. The pattern of carbon sources and sinks exhibits substantial seasonal variation (Guo et al., 2015), and the ECS is generally considered as a sink of atmospheric CO 2 throughout the year except in fall (e.g., Shim et al., 2007;Zhai and Dai, 2009). A mechanistic semi-analytical algorithm (MeSAA) was developed to study pCO 2 variations in response to various controlling mechanisms during summertime . However, the seasonal variability of pH has been studied very little in the ECS, mainly due to the limited observational coverage and irregular variability caused by seasonal fluctuations of the Changjiang discharge and anthropogenic processes. Developing methods to extend the seasonal coverage of pH data may thus help to improve our understanding of the ocean carbon cycle in the ECS. This paper is structured as follows: Sect. 2 describes the cruise data and ANN model building; Sect. 3 shows the performance, sensitivity and application of the ANN model. A summary is given and conclusions are drawn in the last section. T and S profiles were obtained directly using conductivity, temperature and depth/pressure (CTD) recorders (SBE 25plus or 911plus). Measurement of DO followed the Winkler procedure, as described previously by Zhai et al. (2014). Nutrients samples were first filtered with a 0.45 µm Whatman GF/F membrane and then stored in 250 mL high-density polyethylene (HDPE) bottles until chemical analysis. Nitrate (N), phosphate (P) and silicate (Si) were determined using a segmented flow analyzer (model: Skalar SAN PLUS , Netherlands) with a precision < 5 % (Zhang et al., 2007); the detection limits are 0.14 µM for N, 0.06 µM for P and 0.07 µM for Si. pH samples were stored in 140 mL brown borosilicate glass bottles and sterilized by addition of 50 µL saturated HgCl 2 solution. Three traceable pH buffers were used including NIST (National Institute of Standards and Technology) buffers with pH = 4.00, 7.02 and 10.09. As described by Zhai et al. (2012Zhai et al. ( , 2014, we converted it into total-scale pH (pH T ) by subtracting 0.143, and the overall accuracy of the pH T dataset was estimated as 0.01.
Three cruises were carried out on the ECS shelf in 2018 ( Fig. 2) during the "National Natural Science Foundation Shared Voyage Plan" -from 10 to 19 March, 12 to 20 July and 12 to 21 October -and one cruise was carried out near the Changjiang Estuary during May 2017 (Fig. 1). The measurement methods of T , S, DO and nutrients are the same as that of the above 10 voyages. pH samples were stored in Figure 4. Comparison of the performance of one hidden layer vs. two hidden layers in predicting independent validation data. The number of neurons in the first hidden layer was the same in the onehidden-layer as in the two-hidden-layer model; numbers in parentheses show the number of neurons in the second hidden layer (for the two-hidden-layer model). Bars show the mean and standard deviation of the root mean square error over a 10-fold cross-validation, for different numbers of neurons in the first hidden layer. 500 mL high-quality borosilicate glass bottles without filtering and sterilized by addition of 200 µL saturated HgCl 2 solution until measurement in the lab. The pH T was measured at the temperature in the flow cell using an automated flowthrough system for embedded spectrophotometry (AFtes) with a precision of 0.0005 pH and uncertainty of < 0.003  (Reggiani et al., 2016). Water samples were collected at three or four different depths during all cruises.
We omitted data points where one or more other physical variables were missing. The three cruises during 2018 ( Fig. 2) were used to estimate model-predicted performance as an exploratory dataset, while the remaining 11 cruises ( Fig. 1) were used to train the model as a confirmatory dataset. The final number of observations in the confirmatory dataset was 1854 (see Table 1 for more detailed information on the field survey).

Artificial neural network development
The ANN we used is a feed-forward multilayer perceptron (Tamura and Tateishi, 1997) with two hidden layers. The neurons of each layer are connected with the neurons of the previous layer and the next layer by weights (Fig. 3a). The coefficients of the weight matrix are iteratively tuned in the training step. In order to avoid overfitting, a 10-fold cross-validation was used to assess model prediction accuracy (Fig. 3b). Here, the confirmatory dataset was randomly divided into 10 equal subsamples. One subsample was used as the independent validation data (10 % of the confirmatory dataset) and was always excluded from training; the remaining nine subsamples were used as training data (90 % of the confirmatory dataset). The training data were further divided randomly into a training set (70 % of the training data), validation set (15 % of the training data) and testing set (15 % of the training data) during the training process. The training set was used for computing the gradient and updating the network weights and biases, the validation set was used to monitor the error and control model stop, and the testing set was used to monitor whether the model was over-fitted (Palacz et al., 2013). We compared performances in predicting the inde-pendent validation data from the 10-fold cross-validation and selected the optimal model based on the lowest root mean square error (RMSE). Then we applied the optimal model to the exploratory dataset ( Fig. 2) and evaluated model performance by calculating error statistics. In our study, calculations were done in the MathWorks MATLAB environment, using the Deep Learning Toolbox.
First, we compared the performance of one hidden layer vs. two hidden layers in predicting independent validation data. The number of neurons varied from 2 2 to 2 8 for the first hidden layer and was fixed at four in the second hidden layer for the two-hidden-layer model; the number of neurons in the first layer was the same in the one-hiddenlayer vs. two-hidden-layer model (Fig. 4). The 10-fold crossvalidation showed that the model with two hidden layers performed better as the number of neurons increased. Second, in order to choose suitable training techniques and activation functions of the ANN model with two hidden layers, we tested three training functions (gradient descent backpropagation (trainGD), Levenberg-Marquardt backpropagation (trainLM) and scaled conjugate gradient backpropagation (trainSCG)), which differed in how the weights are modified, and three transfer functions (log-sigmoid transfer function (logsig), hyperbolic tangent sigmoid transfer function (tansig) and positive linear transfer function (poslin)) ( Fig. 5). The output values of logsig, tansig and poslin were compressed onto [0, 1], [−1, 1] and [0, +∞], respectively (Fig. S1). As the number of neurons increased, the performances of trainGD and tansig became poor. Although there was no obvious difference between trainLM and trainSCG, the training technique trainSCG was selected and the transfer function logsig was applied to two hidden layers considering the overall performance (Fig. 5). Third, in the training  phase of the ANN model, the number of neurons was tested, varying from 4 to 128 for two hidden layers (Table S1). The best performance for both training data and independent validation data was obtained with 40 neurons in the first hidden layer and 16 neurons in the second layer. Finally, different combinations of input variables were tested to choose the optimal architecture of the ANN model (Table 2); the best performance was obtained using longitude, latitude, month, T , S, DO, N, P and Si as input variables. The utility of these variables for predicting pH has a strong a priori basis: the carbonate system thermodynamic relationships depend on both T and S (Lueker et al., 2000); a positive correlation is expected between DO and pH (Wootton et al., 2012) because of the role of photosynthesis and respiration in removing or generating CO 2 in the water; and various nutrients influence phytoplankton growth and abundance, thereby increasing organic carbon fixation/uptake and increasing pH (Wootton et al., 2008(Wootton et al., , 2012. We found geographical information to be a powerful addition in improving the skill of the method (Table 2), allowing the network to learn spatiotemporal patterns that could not be explained by other input variables (Sasse et al., 2013). In order to avoid bias towards high-value inputs/outputs and to eliminate the dimensional influence of the data, all data used by the ANN model were normalized using the following equation (e.g., Sauzède et al., 2015Sauzède et al., , 2016: with σ being the standard deviation of the considered input variables or output variable pH T . Similar to the approach of Sauzède et al. (2015Sauzède et al. ( , 2016, the longitude and month input variables were transformed as follows to account for the periodicity: slongitude = sin Lon · π 180 , clongitude = cos Lon · π 180 ; (2) smonth = sin month · π 6 , cmonth = cos month · π 6 .
(3) . Box plots of the differences between retrieved pH T and the observations. (a) The differences vs. longitude (mean ± SE); (b) the differences vs. latitude (mean ± SE). The height of each box represents the mean value of the differences; the whiskers represent the standard error (SE) value of the differences.
The latitude variable was transformed into the range of the sigmoid function by dividing by 90 and then normalized using Eq. (1).

ANN model performance
To evaluate the performance of the ANN model, we compared model-simulated pH T (pH M T ) with corresponding ob-  servations (pH O T ) using several statistical indices, including the mean absolute error (MAE), the coefficient of determination (R 2 ) and the RMSE. The model simulated pH T with a RMSE of 0.04 and R 2 of 0.88 for the training data (90 % of confirmatory dataset, Fig. 6a), and predicted pH T with a RMSE of 0.03 and R 2 of 0.93 for the independent validation data (10 % of confirmatory dataset, Fig. 6b). The histogram of residuals in the confirmatory dataset (Fig. 6c) showed that 68 % of the residuals were within the RMSE of 0.04. In order to further explore where the ANN model may lead to large errors, we plotted distributions of differences (pH M T -pH O T ) with respect to the longitude and latitude (Fig. 7). Table 3. Model comparison between traditional empirical methods (MLR and MNR) and machine-learning-based empirical methods (decision tree, random forest and SVM). The statistics was derived from the confirmatory dataset (training data and independent validation data) using input variables: T , S, DO, N, P and Si. Note that R 2 statistics in our study were based on the calculation of the coefficient of determination; therefore negative R 2 could be derived when there was a strong bias. . The reduced performance of the ANN model here may be primarily due to the strong seasonal oscillations of the Changjiang discharge (Dai and Trenberth, 2002). As a reference, the performance of some other empirical approaches -including MLR, multi-variate nonlinear regression (MNR), decision tree, random forest and support vector machine (SVM) regression -is shown in Table 3. The selected ANN model (Table 2, model no. 10) showed better performance than the other tested approaches using the same input variables (Table 3).

ANN model validation using the exploratory dataset
To further assess the ability of the ANN model to estimate pH T on the ECS shelf, we applied the ANN model to an exploratory dataset not used in ANN model development and sampled during March, July and October 2018 (Fig. 2). Scatterplots of retrieved pH T vs. observations (Fig. 8a) showed a RMSE of 0.04, R 2 of 0.80 and MAE of 0.03, which is consistent with the performance of the training data (Fig. 6a).
Although the RMSE for pH T we obtained here was higher than obtained in some previous studies (e.g., Juranek et al., 2011;Williams et al., 2016;Sauzède et al., 2017) Sauzède et al. (2017) to our coastal exploratory dataset (Fig. 8b) and obtained a RMSE of 0.09 and MAE of 0.06. It is not surprising that the ANN model (developed here for the ECS shelf) outperforms the CANYON model (developed for the global ocean) for predicting pH T on the ECS shelf. The carbon chemistry parameters in this region are not only under the direct impact of the Taiwan Warm Current and remote control of the Kuroshio water intrusion into the shelf, but they are also significantly controlled by seasonal variations of the Changjiang discharge (e.g., Isobe and Matsuno, 2008;Chen et al., 2008;Chou et al., 2009). Taking into account the highly complex hydrographic, biological and chemical conditions, the accuracy of pH T presented is promising.

ANN model sensitivity to environmental input variables
To assess the ANN model sensitivity to different environmental input variables, we added 5 % perturbation for each environmental variable separately. Statistically, with 5 % T errors added, the ANN model showed slight overestimation in pH T , with a mean bias (MB) of 0.0059, RMSE of 0.0079 and R 2 of 0.9949 (Fig. 9a); with 5 % DO errors added, the ANN model also showed slight pH T overestimation, with a MB of 0.0050, RMSE of 0.0090 and R 2 of 0.9934 (Fig. 9c); and with 5 % S errors added, the ANN model showed an overestimation in pH T , with a MB of −0.0111, RMSE of 0.0162 and R 2 of 0.9789 (Fig. 9b). These results suggested that the ANN model responded to T and DO errors in a positive way and to S errors in a negative way. The positive response to increasing DO reflects a positive correlation between pH T and DO (Cai et al., 2011), which (c) station A6-7; (d) station A6-9; (e) station A7-5; (f) station A8-5. Figure 11. Comparison of water column pH T retrieved by the ANN model using Changjiang biology FVCOM output with corresponding observations at six sites with repeated sampling for 3 to 4 years. The 1 : 1 line is shown in the plot as a visual reference. Skill statistics include the mean absolute error (MAE), the coefficient of determination (R 2 ) and the root mean square error (RMSE). N represents the number of data points.
can be attributed to the processes of photosynthesis (generating DO and removing CO 2 , hence increasing pH) and aerobic respiration (consuming DO and generating CO 2 , hence lowering pH); the negative response to increasing S reflects the influence of the (lower-salinity) Changjiang discharge, carrying large amounts of nutrients that fuel increased primary production (uptake of nutrients and CO 2 , hence raising the pH) in surface waters during warm seasons . It was found that the ANN model was insensitive to nutrient errors ( Fig. 9d-f) and most sensitive to S errors (Fig. 9b), followed by DO and T errors.

Comparison
In order to retrieve monthly pH T on the ECS shelf, the monthly T , S, DO, N, P and Si from the Changjiang biology Finite-Volume Coastal Ocean Model (FVCOM) (http: //47.101.49.44/wms/demo, last access: 20 July 2018) were fed into the ANN model as input variables. The resolution of the Changjiang biology FVCOM output is 1-10 km in the horizontal, 10 depth levels in the vertical and day in the temporal (refer to Ge et al., 2013, for detail information). Comparisons of monthly average FVCOM variables with surface and bottom observations on the ECS shelf showed that simulated T was close to observed values (Fig. S2a), simulated S was also close to observed values except at the bottom in August 2013 and at the surface in July 2016 (Fig. S2b), simulated DO was higher than observed at the bottom (Fig. S2c), and simulated nutrients were higher than observed at the surface ( Fig. S2d-f). Comparisons of monthly average pH T from the FVCOM biogeochemical model with pH T retrieved by the ANN model suggested that the ANN model can potentially provide a more accurate pH T (Fig. S3). The possible reason was that the carbonate system from the Changjiang biology FVCOM was not optimized due to challenges obtaining sufficient boundary information.
Considering the discreteness and discontinuity of the sampling sites, we compared pH T retrieved by the ANN model using the Changjiang biology FVCOM output with the corresponding observations at some sites with repeated sampling for 3 to 4 years. These sites were A1-5 (32.2145 • N, 123.0140 • E), A1-6 (32.2679 • N, 123.2750 • E), A6-7 (30.7050 • N, 122.9880 • E), A6-9 (30.5723 • N, 123.4990 • E), A7-5 (30.2523 • N, 123.4990 • E) and A8-5 (29.9940 • N, 123.4930 • E). Overall, the retrieved pH T agrees well (within the ANN model accuracy: ANN ± RMSE) with the observed values at the surface, except for three samples in summer (Fig. 10). There are relatively large deviations (greater than the RMSE of 0.04) in August 2013 at stations A1-5 and A6-9, and in July 2016 at station A8-5. To illustrate the application performance in the water column, a scatterplot of retrieved pH T vs. observations at six sites with repeated sampling for 3 to 4 years (Fig. 11) showed that the ANN model predicted pH T with a RMSE of 0.05 and R 2 of 0.71.
We further compared monthly pH T retrieved by the ANN model using the Changjiang biology FVCOM output with in situ measured pH T values (Fig. 12). The agreement is good (within the ANN model accuracy: ANN ± RMSE) here in winter, but large deviations (greater than the RMSE of 0.04) appear in summer. The reduced performance in summer can be attributed in large part to a reduced performance of the Changjiang biology FVCOM in predicting summertime input variables S and DO, and nutrients (Fig. S2).

Spatial and temporal patterns of ANN-derived pH T
The temporal and spatial variations of monthly surface pH T from 2000 to 2016 based on Changjiang biology FVCOM output are shown in Fig. 13. During the dry season (November to March of the next year), pH T values vary from ∼ 7.62 to ∼ 8.24. Relatively higher pH T values are found in the southeast of the study area , whereas lower pH T values are found in the northeast of the study area. During the wet season (April to October), pH T values vary from ∼ 7.77 to ∼ 8.35, and water of higher pH T corresponds well to the seasonal dispersion of the Changjiang Diluted Water (Chou et al., 2009. Water of higher pH T is found in the center of the study area during April, spreads to the southwestern part of the study area (along the coast of China) during May and June, and shifts to the northeastern part of the study area during August. In September and October, water of higher pH T is found in the southeastern part of the study area, strongly influenced by the Taiwan Warm Current (Qu et al., 2015). A clear seasonality is that surface pH T gradually increases during spring (March to May), after which it gradually decreases during summer and fall (June to November) (Fig. 14). The surface pH T displays its maximum in May and minimum  in December, and the pH T varies seasonally by up to ∼ 0.3. Larger changes in pH were also discovered on the Washington shelf; the pH varied ∼ 1.0 over the seasons and ∼ 1.5 spanning 8 years (Wootton et al., 2008). Accordingly, seasonal dynamics of surface pH T can be mainly attributed to temperature changes and strong biological activities (production and respiration processes) over the season. From March to June, a rapid increase in surface pH T indicates that production increases faster than respiration, which can be reflected in the drop in surface phosphate (Fig. S5d) and apparent oxygen utilization (AOU) (Fig. S5c). It may be driven by the Changjiang discharge (Fig. S4), which carries large amount of nutrients, resulting in stronger primary production in warm seasons under the combined action of nutrients and suitable temperature . From July to October, although surface temperature remains at a high level (Fig. S5a), the rise in surface AOU (Fig. S5c) suggests a decrease in primary production or increase of respiration, which leads to a gradual drop in surface pH T (Wootton et al., 2012). It implies that respiration processes dominate relative to primary production during summer and fall.

Summary and conclusions
We have developed an artificial neural network (ANN) model, demonstrated its reliability and used it to retrieve monthly pH T for the period 2000-2016 on the East China Sea shelf. We trained this ANN model using 11 cruise datasets from 2013 to 2017. In order to choose the optimal architecture of the ANN model, we tested different training and transfer functions, the number of neurons in two hidden layers and different combinations of input variables. We also validated the reliability of the ANN model with a root mean square error accuracy of 0.04 using three cruises in 2018 as an exploratory dataset. The ANN model responded to temperature and dissolved oxygen errors in a positive way and to salinity errors in a negative way, and it was most sensitive to salinity errors, followed by dissolved oxygen and temperature errors. We also retrieved monthly average pH T using the ANN model in combination with input variables from the Changjiang biology Finite-Volume Coastal Ocean Model (FVCOM).
The approach has several potential applications. First, it can provide estimates of seawater pH T with known accuracies for the East China Sea shelf and the period 2013-2018. Within this region the model could be used as a cost-effective way to handle restrictions of marine observations conducted from ships, such as coarse resolution and undersampling of carbonate system variables. Second, while the ANN model is not a replacement for direct measurements of the carbonate system, it may be a valuable tool for understanding the seasonal variation of pH T in poorly observed regions. Third, this approach can be applied to other regions to predict pH by suitably adapting the input variables and network structure using a local dataset. The MATLAB code used in this study to develop and apply the ANN model is freely available and is accompanied by a README file providing detailed guidance on how to use and adapt the code.
Code and data availability. MATLAB code of the ANN model for pH T estimation and datasets are available at https://doi.org/10.5281/zenodo.3519219 (Li, 2019a). The monthly average input variables (T , S, DO, N, P, Si) from the Changjiang biology Finite-Volume Coastal Ocean Model, retrieved pH T values from 2000 to 2016 on the East China Sea shelf and the data from three cruises during 2018 used to evaluate the ANN model are available at https://doi.org/10.5281/zenodo.3519236 (Li, 2019b). Requests to access the raw data should be directed to Richard Bellerby: richard.bellerby@niva.no. Six stations with repeated sampling for 3 to 4 years and corresponding retrieved pH values from the Changjiang biology FVCOM output are available at https://doi.org/10.5281/zenodo.3491747 (Li, 2019c).
Author contributions. XL, PW, and RGJB contributed to the development of methodology and the design of the model. JG provided 10 cruise datasets from 2013 to 2017 and the input variables from the Changjiang biology Finite-Volume Coastal Ocean Model data.