Evaluating the use of Facebook’s Prophet model v0.6 in forecasting concentrations of NO2 at single sites across the UK and in response to the COVID-19 lockdown in Manchester, England

Time-series forecasting methods have often been used to mitigate some of the challenges associated with deploying chemical transport models at high resolution for use at local scales. In this study we deploy and evaluate Facebook’s Prophet model v0.6 in predicting hourly concentrations of Nitrogen Dioxide [NO2] over a 2 year period [2018-2019] across the UK’s Automatic Urban and Rural Network (AURN). Results indicate promising performance when comparing absolute values, diurnal trends and seasonality, with discrepancies increasing when the site is classified as having a larger contribution from 5 regional sources and non-local sources. Using mobility and traffic volume data in the model fitting process allowed us to evaluate the ability of the model to forecast levels at two sites in Manchester where there were significant reductions in traffic levels during the COVID-19 lock-down, defined as a national state of restricted access. Prior to lock-down, comparison between hourly concentrations from the Prophet forecast and observations are significantly better compared with predictions from the EMEP regional model. Despite the simplified approach of fitting to derived NO2-per-traffic volume over a 5 year period, 10 trends in absolute NO2 reductions and diurnal profiles were captured well at Manchester Piccadilly. However at a second site, in Sharston, we found that reliance on historical NO2-per-traffic volume resulted in errors in the prediction as the nature of local traffic changed under the COVID-19 lock-down; correlating with an increase in the Heavy Goods Vehicle fleet [HGV] relative to other forms of traffic. Ancillary meteorological information and predictions from the EMEP model enabled identification of significant contributions from regional sources during the lock-down period. These periods coincide with noticeable differences 15 between measured and forecast values from Prophet. Overall the Prophet model offers a relatively effective and simple way to make predictions about NO2 at local levels. The source code to reproduce and expand on the work presented in this paper is made openly available. 1 David Topping 1, David Watts 2, Hugh Coe 1, James Evans 3, Thomas J. Bannan 1, Douglas Lowe 4, Caroline Jay 5, and Jonathan W. Taylor 1 https://doi.org/10.5194/gmd-2020-270 Preprint. Discussion started: 2 September 2020 c © Author(s) 2020. CC BY 4.0 License.


Introduction
The impacts of air quality on both human health and the environment remain important areas of research (Manisalidis et al.,20 2020; Rückerl et al., 2006;Anderson et al., 2012). For predicting how concentrations of pollutants change, there are now a number of modelling platforms that cover local to national scales (e.g., Silveira et al., 2019;Grell et al., 2004). These include those that are built around chemical and process models that reflect key sources, sinks and trans-formative processes, or are based on statistical or machine learning representations fit to empirical observations (e.g., Rybarczyk and Zalakeviciute, 2018).
The ability to forecast levels of pollution is, of course, important to a wide range of stakeholders; not least regional authorities 25 who might want to implement adaptive traffic management strategies that minimise exposure of vulnerable groups. With the rise of internet-of-things [IoT] enabled devices, there has been growing debate around the usefulness of single point and distributed networks of sensors with varying provenance and fidelity (Lewis and Edwards, 2016), and methods that can utilise this information for 'hyper local' forecasting (Huang and Kuo, 2018).
Time-series forecasting methods have often been used to mitigate some of the challenges associated with deploying chemical 30 transport models at high resolution for use at local scales. This does not remove the usefulness, or importance, of the latter methods in understanding wider source contributions and fate, but offers an alternative method for utilising historical local forecasts, and perhaps adapting to locally-driven forces that are important in understanding concentration and composition change. In addition, the boom in smart cities has resulted in often substantial investments in infrastructure for capturing a large number of ancillary data, which in theory can be useful for understanding changes in air-quality and potential routes 35 for reducing exposure. Incorporating that data into time-series forecasting could enable the aforementioned stakeholders to develop systems for evaluating a series of interventions (Huang and Kuo, 2018).
There have been a number of studies developing and evaluating the use of time-series forecasting tools for air-quality.
These include recent demonstrations of Long Short-Term Memory [LSTM] methods, and new demonstrations of combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) methods applied to the PM2.5 forecasting 40 (Huang and Kuo, 2018;Liu and Chen, 2020;Zhu et al., 2017;Khan et al., 2020;Bai et al., 2019). The availability of machine learning and statistical methods, as delivered through common programming languages, has improved significantly over the last decade. This includes widely used package such as Scikit-Learn Pedregosa et al. (2011) and Keras to name a few. From a domain scientist's perspective, having the ability to prototype and tune any new method for forecasting, without requiring extensive training in building model architectures, is a bonus. These methods are similar in nature to the weather normalisation 45 techniques developed by Grange and Carslaw (2019) and others.
In this study we apply and evaluate the use of a commonly used open-source time-series forecasting model, Prophet (Taylor and Letham, 2018), in forecasting concentrations of NO 2 at sites across the UK. As we discuss in the following text, this includes evaluating the use of incorporating data on traffic volume, supplemented by information on traffic type, to evaluate the ability to capture variations during the COVID-19 global pandemic. Overall the Prophet model offers a relatively effective 50 and simple way to make predictions about NO 2 at local levels.
Here we briefly describe the workflow for capturing the data from the Air Quality sites across the UK, fitting and applying the Prophet model and the EMEP regional model.

55
Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily periodicity. Developed for Facebook by Taylor and Letham (2018), it has found applications across various domains, not least driven by the underlying rationale to develop a modular regression model with interpretive parameters that can be intuitively adjusted by analysts with domain knowledge about the time series (e.g., Makridakis et al., 2018;Žunić et al., 2020;Navratil and Kolkova, 2019). In this study we first use the 'vanilla' version 0.6 model variant in fitting to historical data, 60 provided at hourly resolution, and evaluate its use in forecasting concentrations of nitrogen dioxide (NO 2 ) a month in advance.
The internal cross-validation methods provided by Prophet are used to arrive at a set of performance metrics applied across all sites. Distinctly different from cross validation on standard regression or classification methods, this process includes fitting to historical data over a specified period, providing rolling forecasts 30 days in advance, and then repeating the process over a moving window of measurement periods. For example, in our study we look at predicting monthly forecasts at one hour 65 resolution over the period 2018-2019. This includes fitting to the previous three years of data to forecast the proceeding month, and then repeating the process by moving the start of the historical fit, and thus forecast, by 15 days. There are parameters one can tune when using Prophet which can improve performance. These include manually defining change-points, or points in time where the trend in measured concentrations changes significantly, and the weight given to those points. We refrain from performing a detailed sensitivity study of these parameters across all sites for a number of reasons, not least the local knowledge 70 that might be needed to interpret significant interventions across regional authorities that should arise through detected changepoints. We do, however, use two sites in Manchester to investigate model sensitivities, including the use of incorporating traffic data to supplement the forecasting potential and the impact of pre-processing the measured data to reduce any skewness in the profile of NO 2 . The three year (2016-2019) simulation period was split into 18 2-month periods, each preceded by a 7-day spin-up period to initialise the chemical fields. The 3-month simulation for 2020 (March-May) was run as a single period, with a 7-day spin-up 90 period at the end of Feb 2020.

Data
The Air Quality data is taken from the Automatic Urban and Rural Network (AURN), hosted by the Department for Environment Food Rural Affairshttps://uk-air.defra.gov.uk/networks/network-info?view=aurn. The AURN is the UK's largest automatic monitoring network and is the main network used for compliance reporting against the Ambient Air Quality Di-95 rectives. As can be found on the AURN web-page, this network provides measurements of oxides of nitrogen (NOx), sulphur dioxide (SO2), ozone (O3), carbon monoxide (CO) and particles (PM10, PM2.5), depending on the year and site studied. The downloaded data are also combined with meteorological data on wind speed, direction and temperature provided by the UK Met Office. The code used for downloading this data for all sites is discussed in section 4 and delivers parameters at hourly resolution. In the model fitting we use concentrations of NO2 and all meteorological variables by default. Sites in the AURN 100 network are classified into a small number of types to reflect the pollution environment in the vicinity of the sampling location.
These are, briefly, combinations of the following categories listed in Table 1.
In this study, we evaluate data from the following station types between the years 2015 to 2020, with the number of sites given next to each category in Table 2.
Traffic data for this study is provided by Transport for Greater Manchester, the local government body responsible for de-105 livering Greater Manchester's transport strategy and commitments (https://tfgm.com/about-tfgm). Through use of their cloudhosted data platform, C2 (https://www.drakewell.com/), data from their estate of Intelligent Transport Systems (ITS) could be readily accessed. Included in this estate, are a network of Automatic Traffic Count (ATC) sites, monitoring vehicles using Induction Loop Detectors (ILDs) in the carriageway, which provides vehicle flow, speed and classification data in real-time to the platform. Approximately 120 sites constitute for this network of ATC sites, strategically located across main routes and the (MAC) address, which is then randomised and encrypted. MAC addresses are collected across the network with an associated timestamp, allowing journey times to be generated across the Highway Network, utilising the C2 platform to filter outliers. A 115 validation exercise completed internally at TfGM estimates that this method captures between 8 % -15 % of vehicles making a journey between two sensors, depending on the route. This means that there is an error in traffic volume inferred from these measurement points, though we demonstrate in section 3.2 that using this metric of mobility, to infer a relative traffic volume contribution to levels of NO 2 , captures the decreasing trend in measured NO 2 well. For the purpose of this study we use data from monitoring sites as close as possible to the 120 Manchester Piccadilly and Sharston AURN sites. We supplement the analysis for Sharston with data on vehicle types from an ATC [Automated Traffic Counting] measurement node.

COVID-19
The UK Government responded to the COVID-19 pandemic with a 'contain, delay, research and mitigate' strategy (Health Foundation, 2020). On 12th March the response moved from contain to delay, with the the announcement of social distancing 125 measures for people with COVID-19 symptoms, and on 16th March people were advised to 'stop non-essential contact with others and to stop all unnecessary travel'. Schools, bars, restaurants, non-essential shops, leisure activities and many work premises were told to close from 20th March. There has since been a gradual relaxing of measures, with people being encouraged to return to work from 13th May, and non-essential shops allowed to open on 1st June. Bars and restaurants were able to open on 4th July. We refer to this national state of restricted movement as lock-down.

Site analysis for years 2018 and 2019
Figure 1 displays the mean percentage deviation when comparing measured and predicted hourly concentrations of NO 2 over a forecast window of 1 month, as a function of AURN site type, when fit to 3 years of historical data. Over the years 2018 and 2019 this covers 35K data-points for comparison at each site. Whilst the profile for each site type can have multiple modes, the 135 smallest error is found at Urban Traffic sites, followed by those classed as Urban Background. As shown in Table 1, these sites dominate the number studied, providing data from 85 sites out of a total of 114. The errors associated with Rural Background peak at near 40 percent difference, as compared with just under 20 percent for Urban Traffic sites. Despite the difference in the number of sites used, this may be a result of the implicit capture of local influences on the measured NO2 signal, predominantly from traffic, as compared with both Urban and Rural Background which might have a larger relative contribution from a range 140 of both local non-traffic and regional background sources. There are only two Suburban Industrial sites, where results reveal two distributions offering some insight into the range of errors for a single site versus the error between sites. These statistics were generated using the absolute value of predicted NO2, but Prophet also provides a forecast uncertainty range. This is generated from uncertainty in the trend, uncertainty in the seasonality estimates, and additional observation noise. Figure 2 displays the percentage of observations that were within model uncertainty bounds, with all sites having a maximum close 145 to 80 percent. As previously noted, there seems to be a variable performance according to site type, and Figure 3  tendency for some level of over-prediction and the inter-quartile range shows the relatively tighter distribution of values from the forecasts. As has already been stated, understanding the performance of individual sites requires local knowledge, but these results demonstrate promising capability. At the Piccadilly site, EMEP predictions are consistently lower than measured values, with discrepancies increasing during the middle of the day. The same is true at the Sharston site, though discrepancies are reduced. This helps to provide context in section 3.2 when comparing values during the COVID19 lockdown.

155
The multi-modal behaviour of percentage deviation by site type may be indicative of local interventions not captured by the default change-points used during the fitting process. Specifically, Prophet is designed to detect change-points of the sampled trend in observations. One can alter the weighting given to such changes. A manual analysis on an individual site level might identify significant changes in local activity that would be expected to change the seasonality in measured NO 2 and thus define change-points that need to be captured during the fitting process. We also see in Figure 3 that predictions tend to follow a 160 normal distribution, which leads to a higher deviation between measured and predicted values when the measured values are skewed. To investigate potential improvements from giving higher weighting to automatically detected change-points and preprocessing of the target variable, Figure 6  Likewise we see that this variant produces a distribution of values that better match the measured values. However we still 170 see that the predicted distribution is narrower than the measured values, leading to over-estimation at lower concentrations and under-estimation at higher concentrations.
For any given location, attributing the change in measured NO2 to changes in traffic volume, for example, is implicitly captured in the use of detected change-points during the fitting process. From an urban planning perspective, being able to forecast expected changes in NO2 as a result of significant traffic interventions is useful. However, the measured signal 175 implicitly captures the contribution from local and regional backgrounds, so this requires a method for including traffic data as an additional variable during the fitting process. In the following section we study changes measured in air quality and traffic during the COVID-19 lock-down in Manchester.

Incorporating traffic data and response to COVID-19
There have been many approaches used to estimate the fractional contribution to NO 2 from traffic within forecasting techniques 180 (e.g., Carslaw and Beevers, 2005). As Zhang et al. (2013) note, the proximity to roadways and traffic levels are sometimes used as proxies since, in general, NO 2 levels decline with distance from a highway. Other methods include statistical interpolation (e.g., Jerrett et al., 2007), line dispersion models (e.g., Bellander et al., 2001) and integrated emission-meteorological models (e.g., Frohn et al., 2002). In this study we use the simple approximation of using Prophet to predict the ratio of measured NO 2 -per-traffic volume, assuming traffic from an individual site is representative of the dominant source, allowing us to then 185 use the traffic volume data to calculate changes in NO 2 directly. This simple approach relies on a number of assumptions, including the ability of Prophet to detect significant changes in trends from both historical interventions and changes in source profiles from local and regional sources of NO 2 . In the following section, we demonstrate the usefulness of this approach when focusing on two sites in Manchester, UK before and during the COVID-19 lockdown, using traffic data provided by Transport for Greater Manchester (TfGM). Time sensors are subject to some error when interpreting vehicle traffic volume. Nonetheless, we evaluate the use of said data in the fitting process and assume this mobility metric is representative of local traffic. Following Piccadilly, the Sharston site used traffic volume data 40m away to fit the NO 2 -per-volume data, whilst vehicle type data from an ATC site ∼1.7km away was also used to interpret results, assuming that site was indicative of behaviour of the broader industrial and transport hub. The significant increases in measured NO 2 in late March, as well as early-and mid-April, are attributable to periods of 220 slack winds and easterly flow from continental Europe. We investigated the variation in air-mass type throughout this period, by running 72-hour back trajectories using the Hybrid Single Particle Lagrangian Integrated Trajectory Model (HYSPLIT) v4.2.0 [Draxler and Hess, 1998]. Three-hourly backtrajectories were initiated ∼500m above from a release point in south Manchester, roughly midway between the Manchester Piccadilly and Sharston AURN sites. Bearing in mind the uncertainty associated with individual trajectories, we used the Ward's method clustering described by Stunder (1996) to classify air-mass 225 origin into broad regions, rather than to try identify specific sources. The spatial variance of each cluster pair is calculated, defined as the sum of the squared distances between the end points of the potential new cluster's component trajectories and the mean of the trajectories in that cluster. The pair with the lowest spatial variance is merged into a new cluster with each iteration. As the clusters grow larger with each iteration, the total spatial variance (the sum of the spatial variance of all clusters) grows larger, and eventually reaches a point where it increases quickly as dissimilar clusters are merged, reaching a maximum 230 as all clusters are merged into one. The ideal number of clusters is just before this sharp increase; in this case 6 clusters. Figure 10 shows the results of the clustering technique in terms of the broad region of each cluster and a time series of which cluster trajectories were assigned to. For most of March, the clusters from the north Atlantic and North sea were dominant, and these were associated with clean background air-masses. Periods in late March, early April and mid April were dominated by more easterly and southerly air masses, with slack winds common (represented here by the 72-hour back-trajectory end points 235 not being as far from the source). The slack, southerly winds in the cluster from the southern UK were particularly associated with higher levels of NO x in early April. Figure 9 (b) displays the hourly data from the Sharston site. In this case, whilst a decrease in measured NO 2 is seen, this is smaller than the decrease in NO 2 observed at the Piccadilly site, despite the decrease in traffic volume inferred from the nearby site. This is also confirmed in Figure 11 (b) which shows a marked under-prediction of concentrations in the early hours and 240 towards late evening. To investigate this further we analysed data from an ATC site located ∼1.7Km away from the AURN and original traffic site. This site, whilst further away, provides data on segregated vehicles types. In Figures 12(a)  In these plots we see a significant shift in the fractional contribution of HGVs in the early hours. It is plausible this may be 245 one reason behind the under-predictions of NO 2 concentrations when fitting to inferred historical trends in NO 2 -per-traffic, though we repeat we are assuming this site is representative of the local traffic near the AURN site and using the relative higher emission factors of NO 2 from HGVs (Carslaw and Rhys-Tyler, 2013).
ide [NO 2 ], over 30 day forecasts, over a 2 year period [2018][2019] across the UK's Automatic Urban and Rural Network (AURN). Results indicate promising performance when comparing absolute values, diurnal trends and seasonality, with discrepancies increasing when the site is classified as having a larger contribution from regional sources and non-local sources.
Focusing on two sites in Manchester, we find the use of transport data in the fitting process could capture the measured decrease in NO 2 well during the COVID19 lock-down. Results at these sites also demonstrate improved performance over the standard 255 regional model EMEP during 2018-2019, where some improvements in performance can be found by changing change-point prior scales and pre-processing the measured data. Discrepancies between forecasts that incorporate traffic and measured values could arise from errors associated with a number of factors, but data on vehicle traffic type suggests this could also be due to an increase in the ratio of Heavy Goods Vehicles [HGVs]. Despite the systematic error associated with EMEP predictions, combining regional and time-series forecasts help identify periods of regional background influence that might be otherwise 260 hidden in the inferred local traffic contributions in normal conditions.
There are a number of potential improvements that could be made in taking this work forward. As we already note, understanding the impact of any previous interventions at a local level might help better understand the reported errors. The study applied to Manchester might be replicated elsewhere, should traffic data be available. We note the errors associated with journey time to traffic volume inference. We have not used data on boundary layer height, or ancillary activity data during the 265 fitting process. However ongoing work is looking at the value of incorporating a range of data products.
Following the rationale behind its design, the Prophet model offers a relatively simple and effective solution for domain scientists, local authorities and transport authorities to predict the impact of measures to reduce traffic on air quality. The open access nature of the Prophet model plus the increasing availability of traffic data from transport authorities makes this kind of prediction possible across many towns and cities. It also offers a local level resolution for NO2 predictions that has been hard 270 to achieve in the past, but which is essential for decision-makers seeking to reduce traffic and improve air quality in urban areas through geographically targeted interventions.
Code and data availability. In this paper we combine scripts to download the AURN data and then fit/evaluate the Prophet model to said data. The current version of these scripts are available from the project website https://github.com/loftytopping/Prophet_forecasting_AQ under the licence GPL v3.0. The exact version of the scripts used to produce the results used in this paper are archived on Zenodo

275
(https://zenodo.org/record/3978645). This includes the EMEP generated and transport data from TfGM. For queries regarding additional data requests from TfGM please contact David Watts, david.watts@tfgm.com. This repository also contains a Conda environment .yml file for replicating the collection of packages needed to repeat the analysis, including the Prophet model v0.6. The project website for the Prophet model can be found at https://facebook.github.io/prophet/. Version 0.6 was used in this study. The project website for the EMEP model can be found at https://github.com/metno/emep-ctm. Version 4.33 2019 was used for this study. The operational scripts for the exact setup of Brief description

Site classification Description
Urban area (U) Continuously built-up urban area meaning complete (or at least highly predominant) buildingup of the street front side by buildings with at least two floors or large detached buildings with at least two floors. Urban sites should measure air quality which is representative of a few km2.
Suburban area (S) Largely built-up urban area, reflecting contiguous settlement of detached buildings of any size with a building density less than for 'continuously built-up' area. Suburban sites should measure air quality which is representative of some tens of km2.
Rural area (R) Sampling points targeted at the protection of vegetation and natural ecosystems shall be sited more than 20 km away from agglomerations and more than 5 km away from other built-up areas, industrial installations or motorways or major roads, so that the air sampled is representative of air quality in a surrounding area of at least 1 000 km2.
Traffic station (T) Located such that its pollution level is determined predominantly by the emissions from nearby traffic (roads, motorways, highways). Sampling probes shall be at least 25 m from the edge of major junctions and no more than 10 m from the kerbside.
Industrial station (I) Located such that its pollution level is influenced predominantly by emissions from nearby single industrial sources or industrial areas with many sources. Air sampled at industrial sites is representative of air quality for an area of at least 250 m × 250 m.
Background station (B) Located such that its pollution level is not influenced significantly by any single source or street, but rather by the integrated contribution from all sources upwind of the station. At rural background sites, the sampling point should not be influenced by agglomerations or industrial sites in its vicinity, i.e. sites closer than five kilometres.