Journal cover Journal topic
Geoscientific Model Development An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

IF value: 5.240
IF 5-year value: 5.768
IF 5-year
CiteScore value: 8.9
SNIP value: 1.713
IPP value: 5.53
SJR value: 3.18
Scimago H <br class='widget-line-break'>index value: 71
Scimago H
h5-index value: 51
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Submitted as: model description paper 31 Aug 2020

Submitted as: model description paper | 31 Aug 2020

Review status
This preprint is currently under review for the journal GMD.

Machine learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets

Moritz Lange1, Henri Suominen1, Mona Kurppa2, Leena Järvi2,3, Emilia Oikarinen1, Rafael Savvides1, and Kai Puolamäki1,2 Moritz Lange et al.
  • 1Department of Computer Science, University of Helsinki, Finland
  • 2Institute of Atmospheric and Earth System Research (INAR)/Physics, Faculty of Science, University of Helsinki, Finland
  • 3Helsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, Finland

Abstract. Running large-eddy simulations (LES) can be burdensome and computationally too expensive from the application point-of-view for example to support urban planning. In this study, regression models are used to replicate modelled air pollutant concentrations from LES in urban boulevards. We study the performance of regression models and discuss how to detect situations where the models are applied outside their training domain and their outputs cannot be trusted. Regression models from 10 different model families are trained and a cross-validation methodology is used to evaluate their performance and to find the best set of features needed to reproduce the LES outputs. We also test the regression models on an independent testing dataset. Our results suggest that in general, log-linear regression gives the best and most robust performance on new independent data. It clearly outperforms the dummy model which would predict constant concentrations for all locations (mRMSE of 0.76 vs 1.78 of the dummy model). Furthermore, we demonstrate that it is possible to detect concept drift, i.e., situations where the model is applied outside its training domain and a new LES run may be necessary to obtain reliable results. Regression models can be used to replace LES simulations in estimating air pollutant concentrations, unless higher accuracy is needed. In order to have reliable results, it is however important to do the model and feature selection carefully to avoid over-fitting and to use methods to detect the concept drift.

Moritz Lange et al.

Interactive discussion

Status: open (until 26 Oct 2020)
Status: open (until 26 Oct 2020)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement

Moritz Lange et al.

Data sets

Input data for article "Large eddy simulation of the optimal street-tree layout for pedestrian-level aerosol particle concentrations" Sasu Mikael Karttunen and Mona Liisa Vilhelmiina Kurppa

Model code and software

Datasets of Air Pollutants on Boulevard Type Streets and Software to Replicate Large-Eddy Simulations of Air Pollutant Concentrations Along Boulevard-Type Streets Moritz Lange, Henri Suominen, Mona Kurppa, Leena Järvi, Emilia Oikarinen, Rafael Savvides, and Kai Puolamäki

Moritz Lange et al.


Total article views: 267 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
214 48 5 267 17 3 3
  • HTML: 214
  • PDF: 48
  • XML: 5
  • Total: 267
  • Supplement: 17
  • BibTeX: 3
  • EndNote: 3
Views and downloads (calculated since 31 Aug 2020)
Cumulative views and downloads (calculated since 31 Aug 2020)

Viewed (geographical distribution)

Total article views: 213 (including HTML, PDF, and XML) Thereof 210 with geography defined and 3 with unknown origin.
Country # Views %
  • 1



No saved metrics found.


No discussed metrics found.
Latest update: 28 Sep 2020
Publications Copernicus
Short summary
This study aims to replicate computationally expensive high-resolution large-eddy simulations (LES) with regression models to simulate urban air quality and pollutant dispersion. The model development, including feature selection, model training and cross-validation, and detection of concept drift, has been described in detail. Of the models applied, log-linear regression shows the best performance. A regression model can replace LES unless high accuracy is needed.
This study aims to replicate computationally expensive high-resolution large-eddy simulations...