Geoscientific Model Development Vegetation height and cover fraction between 60 ◦ S and 60 ◦ N from ICESat GLAS data

We present new coarse resolution (0 .5 × 0.5) vegetation height and vegetation-cover fraction data sets between 60 S and 60 N for use in climate models and ecological models. The data sets are derived from 2003–2009 measurements collected by the Geoscience Laser Altimeter System (GLAS) on the Ice, Cloud and land Elevation Satellite (ICESat), the only LiDAR instrument that provides close to global coverage. Initial vegetation height is calculated from GLAS data using a development of the model of Rosette et al. (2008) with further calibration on desert sites. Filters are developed to identify and eliminate spurious observations in the GLAS data, e.g. data that are affected by clouds, atmosphere and terrain and as such result in erroneous estimates of vegetation height or vegetation cover. Filtered GLAS vegetation height estimates are aggregated in histograms from 0 to 70 m in 0.5 m intervals for each 0 .5 ×0.5. The GLAS vegetation height product is evaluated in four ways. Firstly, the Vegetation height data and data filters are evaluated using aircraft LiDAR measurements of the same for ten sites in the Americas, Europe, and Australia. Application of filters to the GLAS vegetation height estimates increases the correlation with aircraft data fromr = 0.33 to r = 0.78, decreases the root-mean-square error by a factor 3 to about 6 m (RMSE) or 4.5 m (68 % error distribution) and decreases the bias from 5.7 m to−1.3 m. Secondly, the global aggregated GLAS vegetation height product is tested for sensitivity towards the choice of data quality filters; areas with frequent cloud cover and areas with steep terrain are the most sensitive to the choice of thresholds for the filters. The changes in height estimates by applying different filters are, for the main part, smaller than the overall uncertainty of 4.5–6 m established from the site measurements. Thirdly, the GLAS global vegetation height product is compared with a global vegetation height product typically used in a climate model, a recent global tree height product, and a vegetation greenness product and is shown to produce realistic estimates of vegetation height. Finally, the GLAS bare soil cover fraction is compared globally with the MODIS bare soil fraction (r = 0.65) and with bare soil cover fraction estimates derived from AVHRR NDVI data ( r = 0.67); the GLAS treecover fraction is compared with the MODIS tree-cover fraction (r = 0.79). The evaluation indicates that filters applied to the GLAS data are conservative and eliminate a large proportion of spurious data, while only in a minority of cases at the cost of removing reliable data as well. The new GLAS vegetation height product appears more realistic than previous data sets used in climate models and ecological models and hence should significantly improve simulations that involve the land surface.

Abstract.We present new coarse resolution (0.5 • × 0.5 • ) vegetation height and vegetation-cover fraction data sets between 60 • S and 60 • N for use in climate models and ecological models.The data sets are derived from 2003-2009 measurements collected by the Geoscience Laser Altimeter System (GLAS) on the Ice, Cloud and land Elevation Satellite (ICESat), the only LiDAR instrument that provides close to global coverage.Initial vegetation height is calculated from GLAS data using a development of the model of Rosette et al. (2008) with further calibration on desert sites.Filters are developed to identify and eliminate spurious observations in the GLAS data, e.g.data that are affected by clouds, atmosphere and terrain and as such result in erroneous estimates of vegetation height or vegetation cover.Filtered GLAS vegetation height estimates are aggregated in histograms from 0 to 70 m in 0.5 m intervals for each 0.5 • × 0.5 • .The GLAS vegetation height product is evaluated in four ways.Firstly, the Vegetation height data and data filters are evaluated using aircraft LiDAR measurements of the same for ten sites in the Americas, Europe, and Australia.Application of filters to the GLAS vegetation height estimates increases the correlation with aircraft data from r = 0.33 to r = 0.78, decreases the root-mean-square error by a factor 3 to about 6 m (RMSE) or 4.5 m (68 % error distribution) and decreases the bias from 5.7 m to −1.3 m.Secondly, the global aggregated GLAS vegetation height product is tested for sensitivity towards the choice of data quality filters; areas with frequent cloud cover and areas with steep terrain are the most sensitive to the choice of thresholds for the filters.The changes in height estimates by applying different filters are, for the main part, smaller than the overall uncertainty of 4.5-6 m established from the site measurements.Thirdly, the GLAS global vegetation height product is compared with a global vegetation height product typically used in a climate model, a recent global tree height product, and a vegetation greenness product and is shown to produce realistic estimates of vegetation height.Finally, the GLAS bare soil cover fraction is compared globally with the MODIS bare soil fraction (r = 0.65) and with bare soil cover fraction estimates derived from AVHRR NDVI data (r = 0.67); the GLAS treecover fraction is compared with the MODIS tree-cover fraction (r = 0.79).The evaluation indicates that filters applied to the GLAS data are conservative and eliminate a large proportion of spurious data, while only in a minority of cases at the cost of removing reliable data as well.
The new GLAS vegetation height product appears more realistic than previous data sets used in climate models and ecological models and hence should significantly improve simulations that involve the land surface.

Introduction
Global biophysical parameters such as the fraction of photosynthetically active radiation (fAPAR) and leaf area index (LAI) are essential parameters in calculating fluxes in the global carbon cycle, water cycle and energy budget.They are closely linked to the amount of solar radiation absorbed and scattered by the vegetation canopy and can be estimated from data collected by passive optical radiometers that measure Published by Copernicus Publications on behalf of the European Geosciences Union.S. O. Los et al.: Vegetation height between 60 • S and 60 • N from GLAS in visible and near-infrared wave bands.Examples of these sensors collecting global data are the advanced very high resolution radiometer (AVHRR; August 1981-present), the Sea-viewing Wide Field-of-view Sensor (SeaWiFS;September 1997-December 2010), the Système Pour l'Observation de la Terre -Vegetation instrument (SPOT-VGT;April 1998present), the Along Track Scanning Radiometer (ATSR-2 and AATSR; June 1995-present) and the moderate resolution image spectrometer (MODIS;February 2000-present); see e.g.Sellers et al. (1996); Myneni et al. (2003); Gobron et al. (2005).However, these sensors are not particularly suitable to obtain estimates of biophysical parameters linked to canopy structure -e.g.vegetation height, aboveground biomass, canopy inflection point and stem diameteralthough there are approaches that exploit indirect relationships between measurements such as the Normalized Difference Vegetation Index (NDVI) and biomass with some degree of success for particular biomes (Tucker et al., 1986;Prince, 1991;van der Werf et al., 2006).Knowledge of structural vegetation parameters is, for example, essential to assess the amount of carbon stored in vegetation, to improve modelling of light absorption and scattering through the canopy and of photosynthesis (Alton et al., 2005) and to model the wind profile at the surface which affects the exchange of water and carbon between the land and atmosphere (Sellers et al., 1996).
A problem using passive optical sensors to infer canopy structure is that different canopy structures can lead to the same spectral and bidirectional response; the inversion of biophysical parameters in these cases is a non-unique problem with more than one solution and this inhibits unambiguous estimation of canopy parameters.An active optical sensor, such as the Geoscience Laser Altimeter System (GLAS) on the Ice, Cloud and land Elevation Satellite (ICESat) emits a light pulse of known intensity and duration (Zwally et al., 2002;Brenner et al., 2003).The pulse is transmitted, absorbed and scattered at various depths throughout the vegetation canopy by leaves and branches and the returned waveform therefore provides information on canopy structure and height (Drake et al., 2003;Lefsky et al., 2005;Rosette et al., 2008).Compared to active microwave (RADAR) instruments, spaceborne LiDAR has the ability to obtain vegetation parameters at much higher biomass levels (Drake et al., 2003;Waring et al., 1995) but is also more sensitive to atmospheric interference by clouds, water vapour and aerosols (Spinhirne et al., 2005).Furthermore, interpretation of GLAS waveforms is not straightforward since the waveform is not only affected by the vegetation canopy, but also by other factors such as the occurrence of thin clouds and topography (Rosette et al., 2008;North et al., 2010;Rosette et al., 2010;Lee et al., 2011).
The objective of the present paper is to obtain a vegetation height and vegetation cover data set from the GLAS instrument for most of the land surface between 60 • S and 60 • N that can be used in global climate models and global ecological models.To estimate vegetation height from the GLAS data we use the vegetation height model developed by Rosette et al. (2008).The model was derived for a mixed forest in the United Kingdom over an area of moderate topographic complexity.Tree height was estimated with an accuracy (root mean square error) of about 4.5 m.The advantage of the vegetation height model for global applications is that vegetation height can be estimated directly from GLAS data without the requirement of a highly accurate high resolution digital elevation model (DEM).
To obtain a vegetation height data set for the land-surface we set out to achieve the following four aims: 1. Test the vegetation height model by Rosette et al. (2008) derived for the Forest of Dean in the UK to see if it has more general applicability.GLAS vegetation height obtained with the model is therefore compared with aircraft LiDAR measurements for ten sites with different tree-cover types (Sect.4.1).
2. Develop and test data quality filters to screen GLAS data and thus reduce the effects of cloud contamination, aerosols and topography in estimates of vegetation height.Filters are obtained from the literature and from inspection of desert data (Sect.3).The filters are tested on the same site data used to evaluate the vegetation height model (Sect.4.1).The tests are applied to data collected for all GLAS laser campaigns.
3. Develop and test the derived near global (60 • S-60 • N) vegetation height product.Tests consist of a sensitivity analysis of global vegetation height fields to varying the thresholds of the data filters (Sect.4.2) and of a comparison with other global vegetation data such as vegetation height (Sellers et al., 1996), tree height (Lefsky, 2010) and vegetation greenness (Los et al., 2000(Los et al., , 2005) ) (Sect.4.3).
4. Derive bare soil fraction and tree cover fraction from the GLAS tree height product and compare this product with the MODIS vegetation-cover fraction estimates (Hansen et al., 2003(Hansen et al., , 2006) ) and the Fourier Adjusted, Sensor and Solar zenith angle corrected, Interpolated and Reconstructed (FASIR) vegetation-cover fraction estimates (Los et al., 2000, 2005) (Sect. 4.4).
A version of the data in netcdf format is distributed as a Supplement to the present paper.

Data
We used the ICESat GLAS land data (GLA14) product, release 31 (Zwally et al., 2008;Brenner et al., 2003).GLAS emits a pulse waveform in the 532 or 1064 nm bands which is 1 m wide (corresponding to a duration of 5-6 ns) between the points where the signal is half the size of the maximum amplitude.The returned waveform is measured for a duration equivalent to a length of about 82 m at 15 cm intervals for the Laser 1A and 2A periods, and for an equivalent length for 150 m for the other periods (NSIDC, 2011).The footprint size is an ellipse with dimension of 95 by 52 m for the Laser 1A to 2C periods and 61 by 47 m for the other periods.The returned waveform contains various peaks which are fitted by up to 6 Gaussians (Fig. 1).The GLAS instrument collected data intermittently during 2003-2009, usually for 2 or 3 periods of about 1 month per year (Zwally et al., 2002;Harding and Carabajal, 2005).For the derivation of the filters we used data from the Laser 1A period; for testing the filters and the vegetation height model (Sect.4) and for assembling the global vegetation height data we used data from all laser periods.
Table 1 provides a list of the GLA14 parameters.For easier processing, this subset of the GLA14 data is organised in 5 • × 5 • tiles which conform to the tiles of the SRTM version 4.1 data (Rodriguez et al., 2005;Jarvis et al., 2008).Data without geo-location, i.e. missing latitude and longitude values, are removed, as are data without a saturation elevation adjustment (GLAS quality flag i satElevCorr > 2; see NSIDC, 2011, Sect.3.2), since without this parameter it is not possible to calculate elevation.Data below 60 • S and above 60 • N are not analysed because two of the filters require SRTM data (Sect.3.2).
The interpolated SRTM DEM version 4.1 distributed by the Consultative Group for International Agriculture Research -Consortium for Spatial Information (CGIAR-CSI) (Jarvis et al., 2008;Rodriguez et al., 2005) was used to compare with the GLAS waveform reference elevation (i elev) and to obtain an indication of the slope.The CGIAR-CSI data were used rather than the SRTM DEM data included in the GLA14 product because the agreement with GLAS waveform reference elevation (i elev) was closer.
The MODIS continuous fractional cover data (Hansen et al., 2003(Hansen et al., , 2006)), FASIR Normalized Difference Vegetation Index (NDVI) and FASIR vegetation-cover fraction (Los et al., 2000(Los et al., , 2005) ) and global tree height data (Lefsky, 2010) were used to evaluate the vegetation height and vegetation cover fraction products derived in the present paper.
Aircraft LiDAR measurements of vegetation height from Canada, Peru, the United Kingdom, the Netherlands, Germany and Australia were used to test the GLAS vegetation height estimates and application of data quality filters.These globally distributed validation test sites incorporate boreal, temperate and tropical vegetation; managed and natural woodland and varied canopy cover (e.g.sparse cover in the case of the Australian sites and near complete closure for the Peru site).The product is thus evaluated using a range of conditions including those known to be problematic for GLAS.
The Canadian sites, the former southern BOREAS study sites in Saskatchewan, consist of fairly homogeneous forested areas and flat topography with an aspen stand (Populus tremuloides Michx.), a black spruce stand (Picea mariana Mill.) an old jack pine site (Pinus banksiana Lamb.) and a re-grown jack pine site (Barr et al., 2006;Kljun et al., 2007).The Peru site is located in the Tambopata National Reserve and consists of dense mature forest, regenerating forest, part flood plain and wetland, in an area of flat topography (Hill et al., 2011).The UK sites are the Glen Affric and Aberfoyle sites both measured by the UK Forest Research.Glen Affric (Suárez et al., 2008) is an area of ancient woodland, it contains one of the largest ancient Caledonian pinewoods in Scotland.Common species are Scots pine (Pinus Sylvestris, Juniper (Juniperus communis), birch (Betula pubescens), and aspen (Populus tremula).The Aberfoyle site (Suárez, 2010) is a silviculture area where trees are planted and clearfelled in rotations of 40-60 yr.The dominant species is Sitka spruce (Picea sitchensis (Bong.)Carr.).At the Netherlands Loobos site Scots pine (Pinus Sylvestris) is the dominant species (89 %) and is planted on flat, sandy terrain with some open areas (Dolman et al., 2002).The German Tharandt site is a mixed forest stand with trees of different ages consisting of mainly spruce (Picea abies) with scattered pine (Pinus Sylvestris) and European Larch (Larix decidua) on undulating terrain (Grünwald and Bonhofer, 2007).The Australian data were collected 7 km East of Tumbarumba research station to coincide with the GLAS measurements.The area is located in Bago State Forest, New South Wales and consisted of mainly eucalyptus trees (Eucalyptus delegatensis R. T. Baker and Eucalyptus dalrympleana Maiden) in relatively complex terrain (Leuning et al., 2005).

Method
Estimation of vegetation height is based on the GLAS waveform (GLA14) data, version 31. Figure 1a illustrates the waveform data for a vegetated footprint.The returned waveform is the result of interaction of a light pulse emitted by the GLAS laser with a vegetation canopy and the ground surface.The GLAS GLA14 product contains parameters obtained from the raw waveform data such as the start and end of signal and the decomposition of the waveform by up to six Gaussians (Fig. 1b).

Estimating vegetation height
The accuracy of the estimation of vegetation height from GLAS waveforms is highly dependent on the ability to detect the uppermost canopy surface (the signal begin parameter) and a ground elevation which is representative of the terrain within the broad lidar footprint (Rosette et al., 2010).Regarding the latter, here we select the centroid of whichever of the first two Gaussians has the greater amplitude to represent the ground surface.The method is modified by calibration on desert sites (Sect.3.2, Eq. 3) The limits of the waveform signal are determined using a threshold above the mean noise level (+4.5σ in the case of GLAS) (Brenner et al., 2003).The Signal Begin parameter within a waveform (i sigBegOff) is assumed to represent the highest intercepted surface of the forest canopy.The certainty with which the Signal Begin can be placed is dependent on the gradient of the leading edge of the waveform (Lefsky et al., 2007;Hancock et al., 2011).The strength of the beginning of the waveform signal is a function of the intercepted surface area at this elevation plus its reflectivity and will vary with vegetation crown shape and surface roughness, canopy density, fractional cover and slope (e.g. if vegetation is uniformly distributed upon a sloped surface).Additionally, since the illumination of the pulse on the ground is Gaussian in form, the amplitude of the beginning of the waveform signal is also influenced by the distribution of vegetation within the footprint (Hyde et al., 2005), tall vegetation towards the footprint limits thereby contributing relatively less to the received waveform.The broad GLAS footprint poses challenges for the identification of the ground surface beneath a vegetation canopy.This is particularly the case upon sloped surfaces where vegetation and ground can occur at similar elevations meaning that their signals are combined within the waveform.The accuracy of vegetation estimates from GLAS waveforms are therefore influenced by the conditions in mountainous environments (Hyde et al., 2005) and areas of low stature vegetation (Nelson, 2010).The necessity of allocating a single, representative ground elevation within a waveform is more challenging for sites with complex topography and vegetation distribution.
Various approaches exist to obtain vegetation height estimates from GLAS waveform data.Here, we estimate vegetation height according to Rosette et al. (2008): with h V = vegetation height; r 1 = signal start (i SigbegOff); r A 1,2 = the centroid range increment, i gpCntRngOff; for max amplitude between Gaussians 1 and 2.
The equation was derived for the Forest of Dean in the UK, an area with complex topography and mixed broadleaf and needleleaf trees.The choice of the maximum of the first two Gaussians to represent the elevation of the ground surface reduces the effect of slope for areas of low to moderate topography (Rosette et al., 2008).

Data filters
The tests below are intended to detect and eliminate spurious values, e.g.high vegetation height values over deserts, from the GLAS data.Where possible, thresholds for the data filters rely on error estimates from the peer reviewed literature.In cases where no estimates are available, the thresholds rely on visual interpretation of the data.A test of the filtered GLAS product is carried out in Sect.4.1 and a sensitivity analysis of the filters in Sects.4.2 and 4.4.
To design the filters for identification of spurious data, GLAS data from a desert site are explored.Vegetation height estimates for deserts should as a general rule be low; high values therefore indicate problems in the GLAS data.Occurrences of spurious, high vegetation height values are compared with other measures such as slope, the difference between the GLAS waveform reference elevation (i elev) and the elevation indicated by a DEM and the strength of the GLAS signal.
GLAS data from a 5 • × 5 • tile between 20 • N-25 • N and 0 • -5 • E are analysed; this tile covers a desert area with the northern part located in Algeria.Data collected over 41 days in February 2003 and March 2003 during the Laser 1A period are investigated (51 270 GLAS shots).The location of the data is shown in Fig. 2a.The waveform reference elevation (i elev) measured by the GLAS instrument and the elevation in the SRTM DEM version 4.1 data (Rodriguez et al., 2005;Jarvis et al., 2008) are compared in Fig. 2b as a function of latitude.The waveform reference elevation (i elev) is adjusted to the match the SRTM ellipsoid using: Parameter names i elev (the reference position of the waveform), i satElevCorr, i gdHt indicate records of the GLAS data (Table 1); a further description of these records and of elevation calculations can be found in Zwally et al. (2002) and the GLAS on-line documentation provided by the National Snow and Ice Data Center (http://nsidc.org/).Vegetation height as a function of latitude is shown in Fig. 2c.High vegetation height estimates are found in areas where the topography changes rapidly; note that, e.g. the spikes in vegetation height in Fig. 2c occur in the same location as the spikes in topography in Fig. 2b.Thus a first inspection of the data indicates that a large proportion of high vegetation height values are spurious.

Slope test (Fig. 2d)
Slopes affect the GLAS waveform; the waveform from a slope without vegetation can look similar to that of a vegetation canopy over a flat surface (North et al., 2010;Rosette et al., 2010).Using the SRTM DEM 4.1 data, an approximation of the slope was calculated as the maximum of the 8 slopes between the grid cell for which the GLAS measurement was collected and its 8 surrounding neighbours.The grid cell size of the SRTM DEM 4.1 data is 90 m; thus in areas with variations in terrain at shorter lengths the SRTM slope will underestimate the topographic variations within the 50 to 60 m footprint most commonly produced by GLAS.
Grid cells with a slope exceeding 10 • (17 %) were removed from further analysis.Based on theoretical grounds and analysis of the desert data, a threshold of a 10 • slope appears a reasonable compromise between retaining a sufficient proportion of the signal and avoiding erroneous values (Nelson et al., 2009;North et al., 2010;Rosette et al., 2010).Figure 2d indicates that for a slope <17 % both realistic low values and spurious high values are collected; whereas for a slope >17 % a very low number of realistic values and a very large number of spurious high values for vegetation height are found.

Elevation test (Fig. 2e)
The GLAS waveform reference elevation (Eq.2) is compared with the SRTM DEM version 4.1.It is assumed that large differences between the SRTM DEM version 4.1 data and the GLAS waveform reference elevation (i elev) indicate problems in either data set.For the area shown in Fig. 2a, the root mean square error (RMSE) between GLAS and the SRTM 4.   (Rodriguez et al., 2005;Jarvis et al., 2008).The difference between SRTM and GLAS elevation appears small and unbiased, although the root-mean-square error increases with topographic roughness and vegetation density (Carabajal and Harding, 2006).The errors in SRTM elevation include an error for geo-location (i.e.no adjustment for geo-location was made).Based on Rodriguez et al. (2005) and our analysis of the Sahara desert we set a threshold at 8 m, approximately the 95 % confidence interval; data are deemed spurious and are eliminated when the difference between the GLAS elevation and SRTM DEM version 4.1 data is larger than 8 m (Fig. 2e).In cases where dense canopy exists, the SRTM data and GLAS waveform reference elevation (i elev) are affected by the dense canopy and may represent an elevation value about half way in the canopy; for these cases the 95 % of the error distribution in both is likely larger than 8 m (Carabajal and Harding, 2006) and the elevation test may therefore be too conservative.Whether or not this is a problem is further investigated in the analysis of the site data from Peru 4.1 and the comparison of vegetation height in tropical forests found in this study with values found in other studies 4.3.

Area under first
Gaussian test (Fig. 2f)

Refinement of the height model
Estimates of vegetation height in the present paper use the difference between the start of signal and the centroid range increment of the first or second Gaussian (Rosette et al., 2008).The returned waveform will always have a measurable width even in cases where no vegetation is present because of the duration of the emitted signal, the atmospheric attenuation of the signal and the reflection of the signal from a surface that is rarely completely flat.The implication is that for bare soil a small difference between the signal start and the centre of the first Gaussian is found and this translates into an equivalent estimate of vegetation height.In Fig. 2f the estimated vegetation height is plotted as a function of the area under the first Gaussian (in units of V × ns; i.e.Volt × nano second) to obtain an indication of the magnitude of the effect.Figure 2f shows that, as the area under the first Gaussian increases, the estimate for the minimum vegetation height increases.It is assumed that the 5 % values of the height distributions (per interval of 0.1 V ns (Volt × nano s) on the x-axis) provide an indication of the magnitude of the effect.A line is fitted and the estimated vegetation height (Eq. 1) is subsequently adjusted according to: with A the area under the first Gaussian (V ns) and fitted coefficients a = 1.91 and b = 0.11 estimated from about 1400 5 % values.The value for h 0.05 is subtracted from all GLAS vegetation height estimates.

Filter based on area under the first Gaussian
Figure 2f reveals a second potential problem; for low values of the area under the first Gaussian, the spread in estimated vegetation height is large.The higher values in this interval are likely to be unrealistic.A likely cause is that low values for the area under the first Gaussian indicate weak signal strengths, possibly caused by attenuation of the signal in the atmosphere or by low energy emitted.The latter problem occurred frequently during the last two years of the ICESat mission (Lefsky, 2010).A threshold is applied to eliminate values with low first Gaussian areas.Because a low area under the first Gaussian can also occur for vegetation with a dense canopy or multiple scattering delaying the signal response, the threshold cannot be too large so as not to eliminate values from tall, dense vegetation.As a compromise a value of 1 V ns was selected.
3.2.4Amplitude of First Gaussian test (Fig. 2g) A low amplitude of the first Gaussian indicates a data quality problem similar to the low area under the first Gaussian.
The ability to separate the true returned waveform start and end from the background noise is reduced.A test was implemented to eliminate data with low amplitude (Fig. 2g) here set at 0.05 V. Figure 2g indicates a number of outliers over the entire range of amplitudes.A second test was applied to eliminate the highest 0.1 % of values per amplitude interval of 0.1 V; these values appear as outliers in Fig. 2g.

Sigma test (Fig. 2h)
Gaussians with a large spread (range between the 5 % and 95 % values over 80 m or so) are unlikely to be from vegetation which only in exceptional cases reaches these heights.A test was applied to all Gaussians to remove waveforms with high sigma values.The threshold for the sigma test was calculated as the >99.9 % value; this test eliminates the data with the highest 0.1 % sigma values.The thresholds for this test were calculated from frequency distributions of the unfiltered data.

Neighbour test (Fig. 2i)
Finally, data were removed where the along-track neighbour on either side failed any of the above tests.

Choice of filters
The sequence in which the filters are applied starts with thresholds obtained from the peer reviewed literature (Sect.3.2.1-3.2.2) and ends with the neighbour test.The choice of thresholds for the data filters obtained from the desert analysis is obtained from visual inspection rather than optimisation.The sensitivity of estimated vegetation height towards the choice of these filters is therefore further evaluated in Sect.4.2.The scatter plots (Fig. 2) indicate that a large proportion of spurious data is removed but some spurious values are likely still to be present (Fig. 2i).The discussion in the next section and Table 2 provide further indications as to how much data are removed by the filters.If the filter thresholds are adjusted, a larger proportion of spurious values is removed, but this may be at the cost of removing too many reliable data.Prior to a potential adjustment of the thresholds, the filtered vegetation height values are evaluated in Sect. 4.

Application of filters to a temperate and a tropical area
The filters are applied to data from western Europe and the Amazon to obtain an indication of the amount of data removed by each of the processing steps.The elevation test is principally intended to eliminate cloud contaminated data.When more aircraft LiDAR data become available for these regions it may be justified to relax the 8 m uncertainty range over dense forests to acknowledge the greater uncertainty in the SRTM and GLAS elevation data.The large effect of the area under the first Gaussian test may indicate problems with the ground return of the waveform for dense vegetation canopies.Therefore, in Sect.4.2 it is investigated how much the canopy height changes in response to changing the thresholds for the filters.

Testing the vegetation height model and the GLAS data filters
For convenience of processing the data, raw, unfiltered GLAS data were organised in 5 • × 5 • tiles similar to the SRTM DEM v 4.1 tiles.A selection of statistics from the GLA14 record were retained and a number of measures were added as well (Table 1).The filters and adjustments discussed in Sect. 3 were applied to the tiled GLAS data; data that did not pass the filters were removed.An estimate of vegetation height (Eq. 1) adjusted for the area under the first Gaussian (Eq. 3) was added.Measurements from individual laser shots were compared with aircraft data in Sect.4.1 and were then aggregated to global histograms for 0.5 • × 0.5 • cells.

Comparison with airborne LiDAR
Filtered GLAS vegetation height estimates obtained for all Laser periods (2003)(2004)(2005)(2006)(2007)(2008)(2009) were compared with airborne LiDAR measurements of vegetation height for 10 sites (Sect.2): the former southern old aspen, old black spruce and two jack pine BOREAS sites in Canada; a tropical forest site in Tambopata near Puerto Maldonado; Peru; the Loobos needle-leaf forest site in the Netherlands (Dolman et al., 2002); the Tharandt mixed forest site in Germany; the Glen Affric (ancient woodland) and Aberfoyle (silviculture) sites in the UK; and a transect 7 km East of the Tumbarumba flux tower site in Australia.Airborne LiDAR data were collected at a point density of 0.25 m, 0.5 m or 1 m.LiDAR point data were sampled to a 50 m resolution by one of three methods: (1) by selecting the maximum vegetation height value (BOREAS, Loobos, Tharandt, Tumbarumba, Peru) by first sampling to 1 m resolution by taking the 99.9 % value and then selecting the maximum vegetation height (BOREAS) or by taking the 99.9 % value (the Glen Affric and Aberfoyle).
Notice the BOREAS data are sampled in two ways to evaluate the sensitvity of the validation of GLAS data on airborne data.The Tharandt data were post processed to remove erroneous data from sparse clouds during the airborne survey.The Peru data were matched with the centres of the GLAS footprint; reported GLAS footprint dimensions and azimuth for each laser campaign (NSIDC, 2011) were used to extract coincident subsets of the airborne LiDAR data.Vegetation height estimated from the GLAS waveforms and the airborne LiDAR point clouds could then be directly compared.For the other data sets, aircraft data were mapped to a universal transverse Mercator (UTM) projection.Latitude and longitude were calculated for the centres of all grid cells, and data were compared if the distance (in the horizontal plane) between the centre of the 50 by 50 m grid cell and the centre of the GLAS footprint was less than 20 m.The comparison was carried out for unfiltered GLAS data, using the difference of start of signal and end of signal to indicate vegetation height, and for GLAS data with the filters of Sect.3.2 applied and field height calculated with Eqs. ( 1) and (3).
Figure 3 and Table 3 summarise the results of the comparison.Overall, application of the filters led to a significant improvement in the agreement between the GLAS data and aircraft data.All correlations between GLAS data and aircraft data increased, except for the Tharandt data where the correlation remained the same (r = 0.71).The root-meansquare error decreased significantly in all cases; in one case (Glen Affric) by a factor 10. The bias decreased for most cases, only for the Peru data the bias became larger.
The effect of sampling the aircraft data is investigated with the BOREAS data.The first row of Table 3 shows the results when data are sampled to 1 m by selecting the 99.9 % percentile of the height distribution and are then sampled to 50 m by selecting the maximum.The second row of Table 3 shows the result for selecting the maximum.The bias for the second case is larger (−6.6 m versus −0.8 m).This indicates two things: (1) the calculation of vegetation height from aircraft data is extremely sensitive to the statistic used and (2) the GLAS vegetation height is likely not indicative of the maximum height of vegetation, but more indicative of where the canopy starts to become more substantial.
A possible reason for the outliers in GLAS versus aircraft vegetation height scatter plots is the spatial variability in the scene.For early laser campaigns, the major axis of the GLAS footprint can be larger than 50 m; and may incorporate a response of a tree within an adjacent 50 m grid cell.Anecdotal evidence for this effect can be found at the Glen Affric site, where the one outlier is located at or near an area with a small number of trees standing adjacent to the validation grid cell.The Tharandt site, which is the most problematic not only because data were collected under partly cloudy conditions, but also because of the a large variability in tree type, age and height, shows an improvement in values close to the 1:1 line, but contains various outliers that remain in the data.There is reason to assume that these outliers are related to small differences in footprint size in combination with a large variability in tree height (below).The overall improvement is demonstrated when all data (without Peru; not included because information from surrounding grid cells was missing) are combined (Fig. 4a); the correlation increases from 0.33 to r = 0.78 and the RMSE decreases from 22.2 to 6.2 m (Table 3).The vegetation height model, as well as the application of the filters, improve the correspondence between airborne data and GLAS data for all laser campaigns (Fig. 4b-d) The bias for GLAS laser campaign 3 is larger than for GLAS campaign 1 (Table 3); the GLAS laser 1 campaign is represented by BOREAS data only, hence the smaller bias can be explained by the smaller bias in the BOREAS data (row 1, Table 3).
Differences in vegetation height estimated from the GLAS instrument and aircraft LiDAR can be caused by errors in either instrument, registration errors, differences in the size of the footprint and land-cover changes between times of measurement.The geo-location error of the GLAS footprint has a bias smaller than 1 m and a RMSE around 4 m for all but three GLAS laser campaigns (laser 2D-2F; see NSIDC, 2011).Figure 4e shows the absolute difference between the height measurements as a function of distance of the centres of the GLAS waveforms and the 50 m lidar grid cells derived from aircraft.There is no significant decrease in average accuracy with increasing distance, but there is an increase in the Table 3. Summary statistics comparing estimates of vegetation height from GLAS data with aircraft LiDAR measurements.Columns under "Raw" show statistics with no filter applied to the GLAS data and the vegetation height estimated from the difference between the beginning and end of signal.Columns under "Filtered" show the statistics with a filter k = 1 applied to the GLAS data (Sect.3.2); "n" indicates the number of observations where the centres of the aircraft laser shots and the GLAS laser shots were located within 20 m; "r" is the coefficient of correlation, "RMSE" is the root mean square error and "bias" is the average difference between GLAS and aircraft measurements.The row with Boreas (MAX) selects the maximum height in a 50 by 50 m pixel; the agreement is better when the top 0.1 % of the data is removed.see also Fig. 3 maximum error with distance.The average error increases significantly as a function of spatial variability, expressed as the standard deviation in vegetation height for a 3 × 3 grid cell window (Fig. 4f).The mismatch of some of the GLAS data with aircraft data can therefore be explained by errors in registration in combination with high spatial variability.
Overall the comparison with the aircraft data indicates a dramatic improvement in the estimates of vegetation height when the filters are applied to the GLAS data.A large amount of error, expressed as the RMSE in Table 3 is 3 are therefore likely too high, an error estimate more resistant to outliers is the 68 % value of the distances in Fig. 4b and this number is (≈4.5 m).This value of 4.5 m is marginally larger than the RMSE of the elevation measured by GLAS (4 m) and is similar to the RMSE of 4.5 m reported by Rosette et al. (2008).

Sensitivity of vegetation height estimates to application of filters
The screened GLAS data are aggregated into frequency distributions from 0 to 70 m in 0.5 m intervals for each 0.5 • × 0.5 • land-surface cell between 60 • S and 60 • N. The 90th vegetation height percentile was determined from the histograms.The sensitivity of the 90th vegetation height percentile to the choice of data filters is explored.Thresholds for three filters are varied simultaneously by a factor k = 1,2,3, producing increased severity of the filters: where θ is the slope, A 1 the area of the first Gaussian (V ns) and S 1 amplitude of the first Gaussian (V). Figure 5 compares the cumulative distributions of vegetation height per Simple Biosphere model (SiB) vegetation cover type (Loveland et al., 2001) for a filter factor k = 1 versus k = 2 in twelve quantile-quantile plots and Fig. 6 shows the same comparison but for a filter factor k = 2 versus k = 3.The quantilequantile plots of vegetation height for a filter factor k = 1 versus k = 2 vary only slightly for most biomes, indicating that the choice of filters does not affect the height distributions much at the biome level.The exceptions are mostly in the shorter vegetation classes: for the shrubs and bare soil, and to a lesser extent for ground cover and shrubs and tundra.For these classes the larger height estimates for the filter factor k = 2 are somewhat lower.Changing the filter factor from k = 2 to k = 3 affects the broad-leaf deciduous class; for most other classes the height distributions are similar.Thus at the biome level, application of filters does not change the height distribution much.
The effect of application of the filters for a specific locale is investigated by looking at the sensitivity global distribution of 90th percentile of the height frequency distributions per 0. is shown spatially as a change in difference in the 90th percentile for filter factor k = 1 and k = 3 in Fig. 7b.Most areas do not show a significant change.In some areas, mostly in the tropical forests, vegetation increases in height by up to 4 m if k = 3 is used.In some other, mostly mountainous areas, the vegetation decreases in height by at most 4 m.For the majority of cases the change in height is smaller than the RMSE of 4.5-6 m.

Global vegetation height evaluation
Histograms of the 90th percentile of the globally retrieved vegetation height distributions (filter k = 3 to conform with Fig. 8) are shown per SiB biome type (Sellers et al., 1996) in Fig. 8.Where in previous work one vegetation height per biome was used, e.g. to obtain an estimate of surface roughness (Sellers et al., 1996), we find a wider, more realistic, distribution of vegetation heights per biome.There is good agreement between vegetation cover types 1-6 (dominated by trees) and the occurrence of tall vegetation in the GLAS data; a similar agreement is found for land cover types 7-12 (shrubs, grasses, tundra, agriculture, bare soil) and the occurrence of mostly short vegetation.The exception is agriculture and to a lesser extent tundra.It is likely, however, that these classes do contain a minority proportion of tall vegetation.Lefsky (2010) derives vegetation height for forests and woodlands at approximately 0.5 km resolution by merging the MODIS land-cover product (Friedl et al., 2010) with ICE-Sat GLAS measurements.The MOD12Q1 product he uses is different from the SiB classification scheme used in the present paper.Nevertheless, for the more or less comparable tropical forest class Lefsky (2010)  and 30 m with a peak at 25 m, whereas our estimates for broad-leaf evergreen forest show a range between 30 and 60 m with a peak at 40 m (Fig. 8a).Feldpausch et al. (2011) analysed field data obtained from tropical forests in America, Africa and Asia based on an inventory of field studies and for trees with a stem diameter over 40 cm average tree height values between 30 and 40 m.Height estimates for tall vegetation classes outside the tropics have a similar range to the estimates by Lefsky (2010), differences can to some extent be attributed to differences in class definitions.
Figure 9 shows the spatial distribution in height differences between the 90th percentile of tree heights of Lefsky (2010) and the 90th percentile of the present vegetation height product.The 90th percentile of Lefsky's data was calculated for each 0.5 • × 0.5 • cell as the median of the 90th percentiles at 0.5 km resolution.For areas outside the tropics both higher values (North America and south east Asia) and lower values (Eurasian boreal forest) are found in the Lefsky data.The comparison for these areas is not straightforward, however, since Lefsky's product pertains to tree height, whereas the product in the present study pertains to vegetation height.When the comparison is limited to areas with more than 40 % tree cover in the MODIS continuous fields product (Hansen et al., 2003(Hansen et al., , 2006), the differences between the two data sets are smaller and are for the main part limited to the tropics.
Figure 10 compares Lefsky's tree height product and the present vegetation height product with the mean NDVI fields for 1982-1999.The comparison is for areas with more than 40 % tree cover (Hansen et al., 2003(Hansen et al., , 2006)).The NDVI is near linearly related to the fraction of photosynthetically active radiation absorbed by the vegetation canopy for photosynthesis and is linked to the amount of CO 2 absorbed by vegetation (Sellers et al., 1996).The carbon absorbed by vegetation is allocated to leaves and woody biomass above and below ground.From these principles, it is expected that a positive relationship exists between mean annual NDVI and vegetation or tree height.Fig. 10a shows a density scatter plot of Lefsky's tree height product as a function of mean annual NDVI.Tree height shows a modest increase with mean annual NDVI (r = 0.24).The relationship with the present vegetation height product is different; at high NDVI values the vegetation height shows an exponential increase; the coefficient of correlation is r = 0.51.

Comparison of GLAS cover fraction with MODIS data
The University of Maryland (UMD) MODIS continuous field land-cover product provides the percentage cover for three classes: bare soil, trees and other vegetation (Hansen et al., 2003(Hansen et al., , 2006)).The Fourier Adjusted, Solar and sensor zenith angle corrected, interpolated and reconstructed (FASIR) vegetation-cover fraction (Los et al., 2000) can be used to calculate the bare soil fraction as well: with f V the vegetation-cover (all vegetation) fraction.From the GLAS height estimates, a bare-cover fraction and a treecover fraction can be estimated and these can be compared with the MODIS continuous fields and the FASIR bare soil fraction.Bare soil fraction can be calculated as the fraction of GLAS measurements within each 0.5 • × 0.5 • cell heights below a set threshold.This threshold is likely to be higher than some value above zero, otherwise small unevenness of the soil topography may appear as low estimates of vegetation height.The bare soil fraction was calculated from the 0.5 • × 0.5 • degree GLAS height frequency distributions as the proportion of footprints below a height threshold, starting at 0 m and moving up at increments of 0.5 m: with n h≤z being the number of observations for a height interval smaller than z m with z varying from 0 to 70 m in 0.5 m intervals and N the total number of observations per  (Loveland et al., 2001) for Filter k = 3; height values for SiB biomes (Sellers et al., 1996)  grid cell.Similarly, tree-cover fraction for each grid cell was calculated using the fraction of observations above a height threshold: with n h≥z being the number of observations for a height interval larger than or equal to z m.The GLAS bare soil fraction and tree-cover fraction are compared with the MODIS bare soil and tree-cover fraction sampled to 0.5 • ×0.5 • resolution.Bare soil fraction and treecover fraction were estimated from the raw GLAS data and the filtered GLAS data (k = 1,2,3).For the 4 versions of GLAS bare soil fraction and tree-cover fraction, a coefficient of correlation with the MODIS data for land data between 60 • S and 60 • N was calculated for every height interval z.The correlation as a function of the threshold height is shown in Fig. 11a for bare soil and in Fig. 11b for tree-cover.The highest agreement was obtained for k = 3; the GLAS bare soil fraction using a threshold height z = 1 m resulted in the highest correlation (r = 0.66) for k = 2 the correlation was  similar, r = 0.65 at 1.5 m.For the tree-cover fraction the maximum correlation for k = 1 was at 9 m (r = 0.794); the difference with k = 2 at 8 m was small (r = 0.789).In all cases, estimates of tree height fraction and bare soil fraction using filters were in much closer agreement with the MODIS data compared to estimates from the raw data (Fig. 11).Filter k = 2 appears an acceptable compromise between retaining sufficient high quality data to obtain reasonable height estimates and removing the bulk of spurious data.
The maximum correlations between the GLAS bare soil fraction and the FASIR bare soil fraction are only slightly higher than the correlations with the MODIS bare soil fraction (Fig. 11a).

Discussion and conclusion
The present study describes the estimation of a global vegetation height data set from the ICESat GLAS instrument.The spatial extent of the data is limited to the spatial coverage of the SRTM DEM data between 60  (2008) for a mixed forest in the UK, was tested on aircraft LiDAR data for ten sites.The test sites covered a range of land-cover types including boreal forests, mixed temperate forests, tropical forests and dense woodlands.Analysis of the test sites showed that the GLAS vegetation height estimates were in good agreement with the measurements from aircraft when the GLAS data were filtered prior to analysis.The RMSE for the ten sites was larger to that obtained in the initial study by Rosette et al. (2008), 6.2 versus 4.5, and the coefficient of correlation was slightly lower, 0.86 versus 0.79; most differences are explained by a few outliers which are, at least in part, the result of a mismatch between the location of the GLAS data and aircraft data.The robust estimate of the RMSE, the 68 % of the error distribution, 4.5 m, is similar to the results obtained by Rosette et al. (2008).The vegetation height model is likely representative of the location where the canopy becomes more substantial, rather than of the maximum extent of the canopy.This measure of vegetation height is more useful for the calculation of aerodynamic roughness.The vegetation height model and applied filters results in consistent improvements for campaigns from all three GLAS lasers.Some of the filters developed to screen the GLAS data (based on slope and elevation) were based on the literature, whereas other filters (the area under the first Gaussian, peak of the first Gaussian, neighbour test) were based on a visual analysis of desert data.The filters are not optimised using an objective minimization criterion such as least squares, because of the large volumes of data that need to be handled.the SRTM data, hence independent of a particular GLAS laser campaign) and difference in elevation (likely less affected by the laser campaign as well) and the energy of the pulse (area for the first Gaussian) which should have a dependency on the age of the GLAS laser.A sensitivity analysis of the filters indicated that estimates of vegetation height were not overly sensitive to the choice of filters.As more data sets from air campaigns become available, optimisation of the filter thresholds and tuning filters for individual campaigns may lead to further improvements.However the product has been thoroughly tested for a range of vegetation types and conditions found globally, including those known to be challenging for the GLAS instrument, and further improvements are therefore likely to be minor.
For global aggregates of GLAS vegetation height distributions various comparisons with other data products were made.Vegetation height histograms per 0.5 • ×0.5 • cell show more realistic values than existing products.For example, vegetation height derived by biome uses only one average value, the GLAS data indicate that a large variation in vegetation height exists within land-cover classes.The latter is more realistic.Compared to the tree height product of Lefsky (2010), 10-30 m with a peak at 25 m for tropical forest, our estimate of the corresponding 90th height percentiles is almost twice as large: a range up to 60 m with 40 m heights being the most frequently occurring.We believe our estimates to be more realistic since the compare better with the average estimate of 35 m of Sellers et al. (1996) that is based on a review of the literature, and they compare better with the range of values published by Feldpausch et al. (2011) who, based on an inventory of field studies, found 0.05 quantiles between 15 to 60 m for trees with a diameter over 40 cm and average tree height values between 30 and 40 m.
Measuring tree height from waveform LiDAR in tropical forests is notoriously difficult to determine due to the difficulty in identifying the ground return.Further improvements can be expected if ground elevation can be estimated with higher certainty.This is challenging for a large footprint Li-DAR such as GLAS.A future satellite waveform sensor, producing a smaller footprint, would improve the capability of detecting the ground for sloped and vegetated surfaces.
The GLAS vegetation height data show remaining problems over bare soil (r = 0.64 for a height threshold of 1 m).The "apparent" vegetation height over bare soil is most likely caused by unevenness of the ground and the presence of objects such as boulders.However this offers a significant improvement on observations of other authors of estimated vegetation heights of several metres for bare soil.Moreover, for some applications such as the calculation of roughness length, an indication of variations in height of solid objects at sub footprint level may be beneficial.Combining an NDVI-based bare soil estimate or land-cover classificationbased bare soil estimate with the GLAS estimates should improve the overall product further.Compared to calculating the bare soil fraction, measuring the tree cover fraction is more straightforward and correlation with the MODIS product is higher than for bare soil (r = 0.79 for a 8 m height threshold).
Only a small percentage of each 0.5 • × 0.5 • grid cell is sampled by the GLAS instrument.This can lead to uncertainties as to how representative the sample average is for the grid cell average.MacDonald and Hall (1980) found that crop yield for large areas could be estimated well with only a small percentage of land sampled.The limited sensitivity of the GLAS 0.5 • × 0.5 • vegetation height estimates to varying the data quality filters is further indication that reasonable estimates are obtained.
The GLAS vegetation height distributions derived in the present paper are a first attempt to obtain near-global estimates of vegetation height for all biomes without the use of additional vegetation data sets.Despite some limitations, the present product is a substantial improvement over existing products used in climate models and ecological models.
Fig. 1.(a) Example of GLAS waveform collected for a vegetated footprint and approximate indication of start and end of the waveform signal.The first return is reflected from the top of the canopy (Signal Begin), incremental parts of the waveform are reflected by lower parts of the canopy; the end of the signal usually provides an under estimate of the elevation of the ground surface.(b) Decomposition of the waveform by six Gaussians.Gaussians 1 and 2 are used to estimate the location of the ground.

S
. O. Los et al.: Vegetation height between 60 • S and 60 • N from GLAS

withh
= topographic elevation h e = GLAS elevation; i elev h e = Saturation elevation correction; i satElevCorr h g = Height of the EGM2008 geoid above the TOPEX/Poseidon ellipsoid; i gdHt h l = Difference WGS84 and TOPEX/Poseidon ellipsoid = r a (cosφ) 2 + r b (sinφ) 2 with r a = Difference radius of WGS84 and TOPEX/Poseidon ellipsoids at equator (0.7 m) r b = Difference radius for meridian (0.713682 m).φ = Latitude 1 DEM data was about 3.7 m for February 2003 only and was 4.2 m for data of February and March 2003 combined.The 95 % confidence interval of the SRTM data globally is estimated at approximately 8 m; it varies for different continents between 7 m to 8.8 m with the exception of New Zealand where the RMSE was about 12 m www.geosci-model-dev.net/5/413/2012/Geosci.Model Dev., 5, 413-432, 2012

Fig. 2 .
Fig. 2. (a) Location of the GLAS data collected between 20 • -25 • N and 0 • -5 • E prior to April 2003 (Grey lines represent boundaries).(b) Elevation as a function of latitude for the measurements shown under a; black circles are GLAS elevation measurements; they are overlain by grey dots (SRTM 4.1 values).(c) Vegetation height estimated from the GLAS data after Rosette et al. (2008); no filter was applied.(d) Estimated vegetation height as a function of slope.The slope was calculated as the maximum of the slope in 8 directions calculated from the 90 m SRTM version 4.1 data.Grey values show data for slope ≥ 17 %; black values are for slopes <17 %.(e) Vegetation height as a function of the difference between the GLAS reference elevation and the SRTM version 4.1 elevation.Grey circles show values that passed the 17 % slope filter in (d); black circles show the data with a difference in DEM <8 m. 1.f) Vegetation height as a function of the Area of the first Gaussian; black circles pass the test, line indicates the best fit through the 5 % values per equal area interval of 10 V ns.(g) Amplitude test; threshold at 5 V, top 0.1 % of highest values per Amplitude interval are removed, (h) values with a very high signal width (sigma) are removed (grey values), (i) remaining values after Neighbour test is applied (compare with c).

Fig. 3 .
Fig. 3. Comparison of GLAS vegetation height retrievals with vegetation height measurements from aircraft LiDAR averaged to a 50 m by 50 m grid.Distance between the centre of the GLAS shot and centre of the 50 m grid cell is less than 20 m.Vegetation height from GLAS is estimated both from the raw data (grey triangles) and the filtered data (k = 1; black dots).Statistics are shown in Table 3.(a) Former Boreas sites (Canada), (b) Loobos site (the Netherlands) (c) Tambopata (Peru), (d) Tharandt (Germany), (e) E of Tumbarumba (Australia), (f) Glen Affric and Aberfoyle (UK).
Fig. 4. (a) Combined aircraft data and GLAS data (-Peru) of Fig. 3. See Table 3 for statistics.(b-d) Combined aircraft data and GLAS data (-Peru) shown per GLAS laser campaign.Panels (b)-(d) indicate validity of the vegetation height model (Eq. 1) and of the application of filters (Sect.3)across all laser campaigns.(e) Difference in GLAS (Filter k = 1) and aircraft vegetation height estimates as a function of distance between the centre of the GLAS pulse and the centre of the aircraft 50 m by 50 m grid cell.The slope of the regression line is not statistically significant.The maximum error does increase with distance, however.(f) Variation in difference between the GLAS and aircraft vegetation height (absolute values) as a function of the spatial variability in the vegetation height aircraft measurements (standard deviation of a 3 by 3 window around the centre of the 50 m grid cell).The slope of the regression is statistically significant; (p 0.01), the coefficient of correlation is r = 0.3.
caused by high spatial variability in combination with a difference in what the GLAS waveform measures and what is represented by the 50 m aircraft grid cell.The RMSE values in Table
Fig. 7. (a) Spatial distribution of the 90th vegetation height percentiles (in m) for filtered data with k = 3; (b) difference in 90th height percentiles (in m) for filtered data with k = 3 and k = 1 (filter k = 3 -filter k = 1); vegetation height in the tropics increases when a more conservative filter is used, whereas vegetation height in mountainous regions decreases at the same time.

Fig. 10 .
Fig. 10.(a) Colour density plot showing the relationship between the average annual NDVI (Los et al) and the 90th percentile values for the height distribution per 0.5 • ×0.5 • cell obtained from Lefsky (2010) for areas with more than 40 % tree cover.The coefficient of correlation r = 0.24.(b) Same as (a) but showing the 90th vegetation height percentiles of the present study for cells with tree cover over 40 %.The coefficient of correlation r = 0.51.
Fig. 11.(a)Coefficient of correlation between University of Maryland (UMD) MODIS bare soil fraction and GLAS bare soil fraction as a function of the height threshold used to identify bare soil.For bare soil estimated from raw data, the maximum r = 0.42 is at 6 m; for filter k = 1 the maximum r = 0.64 is at 1.5 m; for filter k = 2 the maximum r = 0.65 is at 1.5 m (line not shown); for filter k = 3 the maximum r = 0.66 is at 1 m.Maximum correlation with FASIR 1−f V r = 0.67 is at 2.0 to 2.5 m.(b) Coefficient of correlation between UMD MODIS tree-cover fraction and GLAS tree-cover fraction as a function of the height threshold used to identify trees.For raw data the maximum r = 0.584 is at 12.5 m; for filter k = 1 the maximum r = 0.794 is at 9 m; for filter k = 2 the maximum r = 0.789 is at 8 m (line not shown); for filter k = 3 the maximum r = 0.777 is at 6-7 m.
The most important filters are linked to slope (derived from www.geosci-model-dev.net/5

Table 1 .
List of GLAS parameters retained and of parameters added (last three rows).

Table 2 .
Cumulative percentage of data removed by subsequent filters (Sect.3.2) for 3 test tiles (Reported for data collected for 2003 only).
desert tile shown in Fig.2, the tile in western Europe and the tile over the Amazon.Note that the statistics in Table2refer to the entire year of 2003; whereas Fig.2refers to data collected prior to April 2003.For the desert tile, the filters with the most impact are the elevation test (1.6 %), the area under the first Gaussian test (2.5 %) and the neighbour test (3 %).For the tile that covers part of western Europe most of the spurious data are removed by the slope test; a majority of data removed by this test is because of missing SRTM DEM values over the sea.The elevation test, area under the first Gaussian test and neighbour test each remove approximately 3 % of the data.For tropical forests the largest amount of data, about 35 %, is removed by the area under the first Gaussian test.About 10 % is removed by the difference in elevation test, amplitude test and the neighbours test. the .