Development of a semi-parametric PAR (Photosynthetically Active Radiation) partitioning model for the United States, version 1.0

A semi-parametric PAR diffuse radiation model was developed using commonly measured climatic variables from 108 site-years of data from 17 AmeriFlux sites. The model has a logistic form and improves upon previous efforts using a larger data set and physically viable climate variables as predictors, including relative humidity, clearness index, surface albedo and solar elevation angle. Model performance was evaluated by comparison with a simple cubic polynomial model developed for the PAR spectral range. The logistic model outperformed the polynomial model with an improved coefficient of determination and slope relative to measured data (logistic: R2 = 0.76; slope= 0.76; cubic: R2 = 0.73; slope= 0.72), making this the most robust PARpartitioning model for the United States currently available.


Introduction
Photosynthetically active radiation (PAR) is the 0.4-0.7 µm spectral range that is absorbed by plants and drives the process of photosynthesis (McCree, 1972).Photosynthetically active radiation at the ground surface has two primary incoming streams, diffuse and direct, which are significantly affected by the amount of clouds and aerosols in the atmosphere.These two radiant components differ in the way they transfer energy through plant canopies, thus affecting canopy photosynthesis processes differently than what would occur at the leaf scale (Misson et al., 2005).Increased diffuse PAR fraction (the ratio of diffuse or isotropic PAR to total PAR (diffuse + direct beam)) in the atmosphere has been correlated with higher light-use efficiency and increased CO 2 assimilation (e.g., Weiss and Norman, 1985;Gu et al., 1999Gu et al., , 2002Gu et al., , 2003;;Knohl and Baldocchi, 2008;Mercardo et al., 2009;Still et al., 2009).Many of these studies utilize models of diffuse radiation (usually in the 0.15 to 4.0 µm shortwave range) to estimate the diffuse fraction rather than direct measurements.
Diffuse PAR can be estimated from models that range in complexity from spectral parameterization schemes like SPCTRAL2 (Bird and Riodan, 1986) and SMARTS2 (Gueymard, 1995) to simple linear regression models relating diffuse radiation fraction to extra terrestrial PAR (Hassika and Berbigier, 1998;Tsubo and Walker, 2005).Jacovides et al. (2009) developed a third-order polynomial model after applying a 25-point moving average on clearness index (k tp ) (the ratio of global irradiance to extraterrestrial irradiance) data collected over a three-year period over Athens, Greece.Butt et al. (2010) used a proxy cloud fraction (ratio of calculated total solar irradiance at a surface to the measured) to estimate diffuse PAR fraction.
Most diffuse fraction models are developed for global solar irradiation and very few models are developed from PAR data sets.The models developed for global solar radiation have been used in studies to convert the diffuse global solar irradiance into diffuse PAR fractions (e.g., Gu et al., 2002).Regression-type models of diffuse shortwave radiation usually employ linear (e.g., Orgil and Hollands, 1977;Reindl et al., 1990), logistic (Boland et al., 2001;Ridley et al., 2010) or higher-order polynomial type (e.g., Erbs et al., 1982;Spitters et al., 1986;Chandrasekaran and Kumar, 1994;De Miguel et al., 2001;Oliveria et al., 2002;Jacovides et al., 2006) equations relating clearness index (k tp ) to estimate the diffuse fraction (k dp ).Reindl et al. (1990) used multiple regression analysis and identified air temperature, dew point and sine of the solar elevation angle as important parameters determining the partitioning of total irradiance into diffuse and direct components.Solar elevation angle and clearness index were used as inputs in models developed by Maxwell (1987) and Skartveit and Olseth (1987).Other parameters used in modeling the diffuse fraction include dew point temperature, albedo and hourly variability index (root mean square difference between the clearness index of an hour in question with respect to its preceding and succeeding hour; e.g., Perez et al. (1992) and Skartveit et al. (1998)).The Boland-Ridley-Lauret (BRL) model (Ridley et al., 2010) uses hourly clearness index, apparent solar time, solar elevation angle, daily clearness index and a persistence index similar to the variability index to calculate the diffuse fraction.Muneer and Munawwar (2006) used sunshine fraction, cloud fraction and air mass along with clearness index in predicting the diffuse fraction of global irradiance.
The objective of our study is to develop a simple semiparametric diffuse PAR model applicable for the US, employing the AmeriFlux (Hargrove et al., 2003) data set of above-canopy observations that have high spatiotemporal resolution.Development of such a model will aid future investigations of the effect of diffuse radiation on photosynthesis and light-use efficiency in response to climate.Although diffuse radiation is not regularly measured at all AmeriFlux sites, multiple-year records from 17 sites are available for model development.The model presented here is developed with a data set that is larger and more temporally and spatially diverse than any previous efforts, making it the most robust and broadly applicable diffuse PAR model developed to date.The model development is based on the BRL model as the logistic relationship used in this shortwave diffuse radiation model can be adopted for the PAR diffuse fraction but with more pertinent drivers.The model is primarily intended for aiding researchers in understanding ecosystem response in terms of carbon and energy exchange in relation to the diffuse PAR fraction with data recorded at the site.

Methodology and data analysis
The data set used for model development and testing consists of multiple-year records of the PAR and diffuse fraction obtained from the AmeriFlux network.A detailed description of the sites utilized in this study is presented in Table 1.The sites selected consist of forested ecosystems, grasslands, shrublands and croplands, covering a wide latitude range (35-70 • N).The geographical location of the sites is presented in Fig. 1 in the form of a map.Sites which are close to one another may appear as single points on the map due to the resolution of the map.The diffuse fraction data are mostly obtained using the BF3 sensor (Delta-T devices, Cambridge, UK) or a version of it.The BF3 sensor uses an array of photodiodes with a shading pattern that provides shade to some of the photodiodes while others remain exposed.This instrument has a resolution of 1 µmol m −2 s −1 and an accuracy of 15 %.The data from BF3 sunshine recorders have been used in other studies relating cloud fraction to diffuse fraction (Butt et al., 2001).
For our study, data collected when solar elevation angles were < 10 • were removed to avoid cosine response issues.Although the data set contained records in hourly and half-hourly formats, we averaged data to obtain hourly values for consistency.The hourly radiation values were checked against the quality controls proposed by the European Commission's Daylight project.This quality control eliminates data points based on the following criteria: where R d is the total diffuse radiation, R S is the total incoming solar irradiance, R E is the extra terrestrial irradiance and R b is the direct normal irradiance.
Data points were eliminated when hourly rainfall values were greater than 5 mm, relative humidity values were 100 % or when dew point exceeded air temperature as under these conditions, the measurement accuracy might be affected by water droplets formed on the sensor.Outliers were removed visually after the initial quality check so as to remove bad data which could occur due to electronic noise or instrument malfunction that could produce physically impossible values.After implementing the quality control check, the data set consisted of 293 725 hourly records from 108 site-years.
Extraterrestrial PAR (R EP ) was calculated with solar elevation angle at a location according to where R C is the solar constant (2776.4µmol m −2 s −1 , Spitters et al., 1986); sin β is the sine of the solar elevation angle; and t d is the day number since 1 January.
The logistical form of the model has been identified as more robust than previously published piecewise-linear or other nonlinear forms (Boland et al., 2001(Boland et al., , 2008)).The goal of our work is to develop a model that is constrained by more commonly measured micrometeorological variables rather than estimated variables like persistence index.The important factors considered in this study are PAR clearness index (k tp ), relative humidity (RH), albedo (α) and sine of solar elevation angle (sin β).Clearness index is widely used in one-predictor models for PAR partitioning (Jacovides et al., 2009) as it is directly related to cloud fraction.Relative humidity is positively related with cloud cover (Walcek, 1994) and a greater diffuse fraction is often associated with higher humidity values.The effect of relative humidity on the relationship between k tp and k dp observed in our data set is presented in Fig. 1a.The data are binned into linearly spaced bins of relative humidity classes and they indicate increased diffuse PAR fractions associated with higher relative humidity classes.Increased surface albedo resulting from changes in canopy reflectance or presence of snow can alter the diffuse fraction estimates.Skartveit et al. (1998) proposed a correction for clearness index estimation to account for the multiple reflections occurring between the surface and instrument dome when albedo is over 0.15.However, in this study we consider albedo as a contributing factor to the diffuse fraction as multiple reflections between the surface and clouds can enhance the diffuse fraction available for photosynthesis (Campbell and Norman, 2008;Knohl and Baldocchi, 2008;Winton, 2005).Albedo of most vegetated surfaces can reach up to 0.25 and can vary widely as a function of leaf area index, disturbance history and snow cover.The effect of surface albedo on the relationship between k tp and k dp is presented in Fig. 2b.The diffuse PAR fraction in general shows an increasing trend with increased albedo, but the trend shows some variations, probably due to the confounding effects of other factors.Increased albedo can result in the increased diffuse fraction for the same clearness index compared to lower albedo values.The PAR diffuse fraction model developed in this study takes the logistic form where z is given as and a, b, c, d and e are fitted empirical coefficients determined in our analysis.The empirical coefficients were obtained by fitting the model to the data set.The relationship presented in Eq. ( 3) tends to underestimate diffuse fraction under clear-sky conditions (Fig. 3a).As a correction, a second logistic fit is applied to the data for k tp > 0.78.The values of the coefficients for the logistic model along with their 95 % confidence intervals are presented in Table 2.The  model performance is compared with a one-predictor model developed by Jacovides et al. (2009).This model was selected for comparison as it was developed using data in the PAR spectral range and used a simple predictor (k tp ) that could be estimated for a large data set from multiple locations.This cubic polynomial model which relates diffuse PAR fraction as a function of smoothed PAR clearness index (moving average window size of 25) takes the following form after fitting to this data set: k dp = 0.8637 + 1.2699k tp − 5.6676k 2 tp + 3.8088k 3 tp . (5) The original cubic polynomial model had prescribed limits within which the model operated and constant values were assigned to k dp values for k tp values above and below a particular range.The modified cubic polynomial model presented in Eq. ( 5) is valid for 0.13 < k tp < 0.865, whereas for k tp ≤ 0.125, k dp = 0.9399 and k tp ≥ 0.862, k dp = 0.18675.These set points were chosen to provide a smooth transition from the inflection points in the model output.The model coefficients were estimated using a robust nonlinear regression method in MATLAB (Mathworks, Inc).The fit of data to the adjusted logistic model and the cubic model for the data set is presented in Fig. 3b and c.The percentage differences between the measured diffuse fraction k dp and modeled diffuse fraction k dpm is plotted in Fig. 4 as a function of each of the predictor variables in unequally spaced bins with an equal number of data points.
The model fits were assessed by randomly selecting onethird of the data as an evaluation data set for statistical analysis.The comparison between measured and modeled diffuse PAR for the logistic and cubical model for the evaluation data  set is provided in Fig. 5.The performance of both models was further compared by using a bootstrap regression between the measured and modeled diffuse fractions with a data re-sampling of 10 000 times to account for the errors in measuring the independent variable (measured diffuse fraction) from the evaluation data set.The results of the bootstrap regression comparison for the two models are presented in Table 3.The root mean square error percentage (RMSE %) (Jacovides et al., 2006) of the model fits to the evaluation data set is also presented in Table 3.The influence of seasonality on the logistic model accuracy was examined by plotting the RMSE (%) and R 2 of the regression between measured and modeled values as a function of the particular months (Fig. 6) for the entire data set.Since seasonality can influence the model fit, the logistic model was fit to the entire data set by classifying the data into the four different seasons.The seasons were classified as summer (20 June to 21 September), fall (22 September to 20 December), winter (21 December to 19 March) and spring (20 March to 19 June).This enabled the development of seasonal model empirical coefficients, which are presented in Table 4.The model fit for the different sites is also presented by plotting the RMSE (%) and R 2 of the regression between the measured and modeled values for the various sites (Fig. 7).The sites are arranged on the x axis on an increasing latitudinal gradient and the figure illustrates the model fit across the sites.

Discussion
The multi-parameter logistic model predicts different diffuse fractions for the same clearness index for different combinations of albedo, solar elevation angle and relative humidity.The percentage difference between the measured and modeled diffuse fraction generally indicate an underestimation by

Figure 1 .
Figure 1.Location of sites presented on the USA map.Many sites which are closer together may appear as a single point on the map.

Figure 2 .
Figure 2. Relative humidity and albedo effects on the k tp −k dp relationship.

Figure 3 .
Figure 3. Model fit for the proposed multi-parameter logistic model (a and b) and cubic model (c).Panel (a) represents the initial fit to the logistic form and panel (b) indicates the modification to the initial logistic fit with a second logistic fit.

Figure 4 .
Figure 4. Percentage differences between measured and modeled diffuse radiation as a function of predictor variables.

Table 3 .
Model performance comparison using regression analysis.The values given in the brackets are the standard error of the estimates obtained by resampling evaluation data 10 000 times.The root mean square error estimate from the measured and modeled values is also presented.The largest differences are associated with clearness index values around 0.67, albedo values of 0.24, moderate relative humidity (between 50-60 %) and solar elevation angles of 46 • (Fig.3).The logistic model thus produces the largest errors under moderately clear-sky conditions, during the late morning and afternoon periods and when the atmosphere has moderate humidity.The PAR clearness index values close to 0.67 represents a clear-sky condition above which the diffuse PAR fraction stays constant with increasing total PAR.The inability of the model to accurately capture this behavior results in large errors around this clearness index threshold.Further higher PAR clearness index values indicate low diffuse PAR fraction levels, which along with the above-mentioned PAR clearness index threshold can lead to uncertainties in the measurement of the diffuse PAR fraction by the sensor.The albedo value of 0.24 produced the largest errors as this is in the range of most vegetated surfaces and hence other confounding factors contribute to model errors around this albedo range.The cubic polynomial model evaluated in this study produces the largest errors during periods of high solar elevation angle, in contrast to the original model, which exhibited maximum error during the early morning/late evening hours(Jacovides et al., 2010).The cubic polynomial model percentage errors showed a similar behavior in relation to the clearness index and albedo as the logistic model, but produced the largest errors under low humidity in contrast with the logistic model.The regression analysis indicates better performance of the logistic model over the cubic model, with a higher slope, lower intercept and a larger coefficient of determination (R 2 ) (Table3and Fig.5).The RMSE (%) values also indicate a comparatively lower error for the logistic model (30.59 %) compared to the cubic polynomial model (32.68 %).The errors in the developed model could be attributed to other confounding factors such as seasonal effects, changes in atmospheric turbidity caused by air pollution or aerosol loading and location differences.The fact that a combined data set from different locations was used in this study can lead to minimization of the dependence of the k dp -k tp correlation to local conditions(Jacovides et al., 2006).The model coefficients developed over the seasons are similar in nature, and the fit

Figure 5 .
Figure 5.Comparison between measured and modeled diffuse PAR (a) logistic model (b) cubic polynomial model.The regression statistics presented are for the bootstrap regression between the measure and modeled variables.All units are in µmol m −2 s −1 .

Figure 6 .
Figure 6.Logistic model performance in terms of RMSE (%) and R 2 over every month of the year.

Figure 7 .
Figure 7. Logistic model performance in terms of RMSE (%) and R 2 over the various sites.The sites are arranged on the x axis following an increasing latitudinal gradient from left to right.

Table 1 .
Site location and ecosystem-type information.
Ridley et al. (2010)he model developed here is similar in structure to the multi-predictor logistic model (BRL) developed byRidley et al. (2010)for global solar irradiance, except we use additional predictors that directly affect the diffuse fraction, and we also use a considerably larger data set.The predictors in the BRL model include daily clearness index (K t ), sine of the solar elevation angle (sin β), persistence index (ψ) and apparent solar time (AST).

Table 2 .
Logistic model coefficients for clearness index classes.The values given in brackets are the 95 % confidence interval.

Table 4 .
Logistic model coefficients for clearness index classes for each season.The values given in brackets are the 95 % confidence interval.The R 2 and the RMSE obtained by comparing the model output to the observed data is also provided.