Submitted as: model evaluation paper
20 Apr 2022
Submitted as: model evaluation paper | 20 Apr 2022
Status: this preprint is currently under review for the journal GMD.

Estimation of missing building height in OpenStreetMap data: a French case study using GeoClimate 0.0.1

Jérémy Bernard1,3, Erwan Bocher2, Elisabeth Le Saux Wiederhold3, François Leconte4, and Valéry Masson5 Jérémy Bernard et al.
  • 1University of Gothenburg, Department of Earth Sciences, Sweden
  • 2CNRS, Lab-STICC, UMR 6285, Vannes, France
  • 3Université Bretagne Sud, Lab-STICC, UMR 6285, Vannes, France
  • 4Université de Lorraine, INRAE, LERMaB, F88000, Epinal, France
  • 5Météo-France and CNRS, CNRM, UMR3589, Toulouse 31057, France

Abstract. Information describing the elements of urban landscape is a required input data to study numerous physical processes (e.g climate, noise, air pollution). However, the accessibility and quality of urban data is heterogeneous across the world. As an example, a major open-source geographical data project (OpenStreetMap) demonstrates incomplete data regarding key urban properties such as building height. The present study implements and evaluates a statistical approach which models the missing values of building height in OpenStreetMap. A Random Forest method is applied to estimate building height based on building’s closest environment. 62 geographical indicators are calculated with the GeoClimate tool and used as independent variables. A training data set of 14 French communes is selected, and the reference building height is provided by the BDTopo IGN. An optimized Random Forest algorithm is proposed, and outputs are compared with an evaluation dataset. At building scale for all cities, at least 50 % of the buildings have their height estimated with an error being less than 4 m (the city median building height ranges from 4.5 m to 18 m). Two communes (Paris and Meudon) demonstrate building height results out of the main trend due to their specific urban fabric. Putting aside these two communes and when building height is averaged at regular grid scale (100 m × 100 m), the median absolute error is 1.6 m and at least 75 % of the cells of any city have an error lower than 3.2 m. This level of magnitude is quite reasonable when compared to the accuracy of the reference data (at least 50 % of the buildings have an height uncertainty equal to 5 m). This work offers insights about the estimation of missing urban data using statistical method and contributes to the use of open-source data set based on open-source software. The software used to produce the data is freely available at and the data set can be freely accessed at

Jérémy Bernard et al.

Status: open (until 15 Jun 2022)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on gmd-2021-428', Anonymous Referee #1, 17 May 2022 reply

Jérémy Bernard et al.

Jérémy Bernard et al.


Total article views: 318 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
273 37 8 318 2 1
  • HTML: 273
  • PDF: 37
  • XML: 8
  • Total: 318
  • BibTeX: 2
  • EndNote: 1
Views and downloads (calculated since 20 Apr 2022)
Cumulative views and downloads (calculated since 20 Apr 2022)

Viewed (geographical distribution)

Total article views: 305 (including HTML, PDF, and XML) Thereof 305 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 20 May 2022
Short summary
OpenStreetMap is a collaborative project to create a free data set containing topographical information. Since this data is available worldwide it can be used as standard data for geoscience studies. However, most buildings miss the height information while it is a key data for numerous fields (urban climate, noise propagation, air pollution). In this work, the building height is estimated using statistical modeling using indicators characterizing the buildings environment.