Submitted as: model description paper | 31 Aug 2020
Review status: this preprint is currently under review for the journal GMD.
Machine learning models to replicate large-eddy simulations of air pollutant concentrations along boulevard-type streets
Moritz Lange1,Henri Suominen1,Mona Kurppa2,Leena Järvi2,3,Emilia Oikarinen1,Rafael Savvides1,and Kai Puolamäki1,2Moritz Lange et al.Moritz Lange1,Henri Suominen1,Mona Kurppa2,Leena Järvi2,3,Emilia Oikarinen1,Rafael Savvides1,and Kai Puolamäki1,2
Received: 18 Jun 2020 – Accepted for review: 20 Aug 2020 – Discussion started: 31 Aug 2020
Abstract. Running large-eddy simulations (LES) can be burdensome and computationally too expensive from the application point-of-view for example to support urban planning. In this study, regression models are used to replicate modelled air pollutant concentrations from LES in urban boulevards. We study the performance of regression models and discuss how to detect situations where the models are applied outside their training domain and their outputs cannot be trusted. Regression models from 10 different model families are trained and a cross-validation methodology is used to evaluate their performance and to find the best set of features needed to reproduce the LES outputs. We also test the regression models on an independent testing dataset. Our results suggest that in general, log-linear regression gives the best and most robust performance on new independent data. It clearly outperforms the dummy model which would predict constant concentrations for all locations (mRMSE of 0.76 vs 1.78 of the dummy model). Furthermore, we demonstrate that it is possible to detect concept drift, i.e., situations where the model is applied outside its training domain and a new LES run may be necessary to obtain reliable results. Regression models can be used to replace LES simulations in estimating air pollutant concentrations, unless higher accuracy is needed. In order to have reliable results, it is however important to do the model and feature selection carefully to avoid over-fitting and to use methods to detect the concept drift.
Input data for article "Large eddy simulation of the optimal street-tree layout for pedestrian-level aerosol particle concentrations"Sasu Mikael Karttunen and Mona Liisa Vilhelmiina Kurppa https://doi.org/10.5281/zenodo.3556287
Model code and software
Datasets of Air Pollutants on Boulevard Type Streets and Software to Replicate Large-Eddy Simulations of Air Pollutant Concentrations Along Boulevard-Type StreetsMoritz Lange, Henri Suominen, Mona Kurppa, Leena Järvi, Emilia Oikarinen, Rafael Savvides, and Kai Puolamäki https://doi.org/10.5281/zenodo.3999302
Moritz Lange et al.
Viewed
Total article views: 467 (including HTML, PDF, and XML)
HTML
PDF
XML
Total
Supplement
BibTeX
EndNote
355
105
7
467
38
18
11
HTML: 355
PDF: 105
XML: 7
Total: 467
Supplement: 38
BibTeX: 18
EndNote: 11
Views and downloads (calculated since 31 Aug 2020)
Cumulative views and downloads
(calculated since 31 Aug 2020)
Viewed (geographical distribution)
Total article views: 384 (including HTML, PDF, and XML)
Thereof 381 with geography defined
and 3 with unknown origin.
This study aims to replicate computationally expensive high-resolution large-eddy simulations (LES) with regression models to simulate urban air quality and pollutant dispersion. The model development, including feature selection, model training and cross-validation, and detection of concept drift, has been described in detail. Of the models applied, log-linear regression shows the best performance. A regression model can replace LES unless high accuracy is needed.
This study aims to replicate computationally expensive high-resolution large-eddy simulations...