Atmospheric aging of small-scale wood combustion emissions (model MECHA 1.0) – is it possible to distinguish causal effects from non- causal associations?

Primary emissions of wood combustion are complex mixtures of hundreds or even over a thousand compounds, which pass 15 through a series of chemical reactions and physical transformation processes in the atmosphere (“aging”). This aging process depends on atmospheric conditions, such as concentration of atmospheric oxidizing agents (OH radical, ozone and nitrate radicals), humidity and solar radiation, and is known to strongly affect the characteristics of atmospheric aerosols. However, there are only few models that are able to represent the aging of emissions during its lifetime in the atmosphere. In this work, we implemented a model (Model for aging of Emissions in environmental CHAmber, MECHA v 1.0) to describe 20 the evolution by differential equation system. The model performance was first evaluated using two different, simulated datasets. The purpose of the evaluation was to investigate the ability of the model to 1) find the correct relationships between the variables in the dataset and 2) to evaluate the accuracy of the model to reproduce the evolution of variables in time. Subsequently, the model was implemented to wood combustion exhaust in atmosphere, based on a dataset from smog chamber experiments. Evaluation in simulated datasets served as a basis of the drawings made from modeled aging of the residential 25 wood combustion emission. We found that the model was able to reproduce the evolution of the variables in time reasonably well. By using the state of the art detection algorithms for causal structures, we could unveil a large number of relationships for measured variables. However, as the emission data is complex in its nature due to multiple processes interacting with each other, for many relationships it was not possible to say if there was a causal pathway or if the variables were just covarying. 30 https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c © Author(s) 2020. CC BY 4.0 License.

This study serves as the first step towards a comprehensive model for the description of the evolution of the whole emission in both gas-and particle phase during atmospheric aging. We present contributions to challenges faced in this kind of modeling and discuss the possible improvements and expected importance of those for the model.

Introduction
Small-scale wood combustion is a common method for residential heating and has been identified as a substantial contributor 5 to ambient levels of particulate matter (PM) in several European areas (Cordell et al., 2016;Glasius et al., 2018;Hovorka et al., 2015;Qadir et al., 2014;Reche et al., 2012). Wood combustion-derived aerosol, in particular from manually-fired logwood stoves, contains substantial amounts of several air pollutants, such as Black Carbon (BC), polycyclic aromatic hydrocarbons (PAH), volatile organic compounds (VOC) and CO (Orasche et al., 2012;Tissari et al., 2008), with consequences on Earth's radiative forcing (Frey et al., 2014), cloud formation (Dusek et al., 2011), and human health (Kocbach Bølling et al., 2009). 10 After release into the atmosphere, wood combustion emission are immediately transformed ("atmospheric aging") in a complex process involving multiphase chemistry, leading to oxidation and functionalization of particulate and gaseous pollutants (Hallquist et al., 2009;Pöschl, 2005) and consequent secondary organic and inorganic aerosol (SOA and SIA) formation. The formation of SOA has been reported to double to triple the concentration of the particulate organic aerosol emitted by wood combustion, after less than 1.5 days of photochemical aging (Bertrand et al., 2017;Bruns et al., 2015;Tiitta et al., 2016). 15 Despite several known reaction pathways for single emission constituents or compounds classes, such as PAH (Keyte et al., 2013), interactions between all constituents of a real-world combustion aerosol complicates the prediction of secondary aerosol formation and its physico-chemical properties.
Smog chambers or environmental chambers have been used to study the atmospheric transformation of single VOC or combustion emissions for several decades, offering a controlled environment and conditions similar to the atmosphere, to study 20 the effects of e.g. photochemical age, relative humidity, NOx, addition of seed aerosol, oxidizing agents etc. on emission transformation (Hinks et al., 2018;Lambe et al., 2015;Li et al., 2016;Platt et al., 2013;Tiitta et al., 2016). Previous studies have demonstrated that atmospheric aging alters toxicological effects of wood combustion aerosols (Künzi et al., 2013;Nordin et al., 2015) and changes the optical properties as well (Kumar et al., 2018;Martinsson et al., 2015), with implications for climate and human health. 25 Radicals, such as OH, O3 and nitrate (NO3) are known to have an important role in SOA chemistry (Hallquist et al., 2009).
OH is the most important radical during photochemical aging, whereas NO3 and O3 are more important during dark aging (Atkinson, 2000;Tiitta et al., 2016). SOA from photochemical and dark aging differs by chemical composition. However, both aging mechanisms are important and needed to take into account when evaluating the whole climate implications of aging processes (Vakkari et al., 2014). Nitrate radical chemistry is an efficient SOA formation mechanism and also an important 30 pathway for the production of organic nitrates, serving as a NOx sink in the atmosphere (Kiendler-Scharr et al., 2016). https://doi.org /10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.
Only few attempts have been made to capture the evolution of emissions in both, the gas-and particle-phase. Modeling approaches of SOA formation and evolution of the compounds leading to SOA can be divided to at least two groups. Some models, such as Volatility basis set (VBS), aim to describe one or several features of emission. In the VBS approach (Donahue et al., 2006) models the evolution is based on the volatilities of the compounds, considering the equilibrium concentrations of different compounds in gas and particle phase, and how different factors, such as chemical and physical reactions affect the 5 equilibrium state. Another approach to model SOA is the family of explicit chemical modeling. There exists several chemical models such as Master Chemical Mechanism (MCM) (Jenkin et al., 1997;Saunders et al., 2003) and GECKO-A (Aumont et al., 2005), which comprise large amounts of chemical reactions and pre-determined reaction coefficients to replicate the evolution of the system. MCM has been recently applied to wood burning emissions by running the model with most important primary emission species to model the evolution of gas-phase species using smaller selection of reactions from the whole 10 system (Coggon et al., 2019). Statistical Oxidation Model (SOM) is somewhere between one-quality models and explicit chemical models. SOM uses several qualities (volatility, number of carbon and oxygen) of compounds to predict SOA mass and atomic O:C ratio (Cappa and Wilson, 2012).
All approaches, volatility-based, SOM, and explicit chemical modeling, are based on differential equation approaches. These equations describe the evolution of some quantity which we are interested to model and what can be transformed into the end-15 product of the model, here the amount of SOA. For the volatility approach this is the volatility distribution of particles, in chemical modeling the amounts of substances in the system.
In this kind of system, the changes of gases and particles in time can be thought to be a consequence of reactions occurring in the system. Reactions are caused by the initial compounds and certain properties of the compounds. Reaction coefficients and concentrations of initial compounds determine the amount of reactions occurring in time. Compounds can be thought to be 20 causally related as those are related to each other through chemical reactions. Initial compounds cause the increase in products and decrease in their own concentrations.
In this study, we described the connections between primary emissions and ageing conditions by using a causal model (Pearl, 2009). The aim of this study was to evaluate whether it is possible to learn the causal relations of variables from the system of atmospheric aging without having explicit prior information about these relations. On that account, we created a model for the 25 complex interactions between aerosol constituents in both gas and particle phase (Hartikainen et al., 2018;Tiitta et al., 2016).
Here, we present the first version of our model that is able to represent dependencies between observed variables on the chamber studies in (Tiitta et al., 2016). In addition, we discuss the issues in data processing related to causal modeling in wood combustion aging, and more generally, emission aging datasets by applying the model to artificial, simulated datasets and evaluating the accuracy of the model. 30 https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.

Data and methods
The chamber experiments to study atmospheric transformation of residential wood combustion emissions were conducted in the ILMARI environmental chamber (Leskinen et al., 2015) at the University of Eastern Finland, as described by Tiitta et al. (2016). Briefly, five chamber experiments were conducted using a modern heat-storing masonry heater (Reda et al., 2015) as the emission source. The masonry heater was operated with spruce logs, using both fast ignition and slow ignition for initiating 5 the combustion experiment (Detailed procedure described in Tiitta et al., 2016) to adjust VOC-to-NOx ratio. In each experiment, a partial sample of the combustion exhaust was diluted and fed into the chamber for 35 min, including the ignition, flaming combustion and residual char burning phases of logwood combustion . This was followed by stabilization of the sample, after which the oxidative aging of the sample was initiated by feeding ozone into the chamber.
Both, dark-and UV-light aging experiments were conducted, representing evolution at night-and daytime in the atmosphere, 10 respectively. We used data from two different types of experiments: In the first one, UV-lights (blacklight lamps, 350 nm) were switched on immediately after feeding ozone into the chamber, and the wood combustion emission was photochemically aged for four hours (Experiments 4B and 5B, Tiitta et al., 2016), which is based on the measured OH-radical exposure represented 0.6-0.8 days of photochemical aging in the atmosphere. These experiments simulate daytime ambient conditions where OH-radicals 15 are dominating the oxidative aging of emissions. In the second type of experiments, aging was conducted at first without UV radiation and after 4 hours of dark aging, the UV lights switched on (Experiments 2B and 3B, Tiitta et al., 2016). The dark aging period represents night-time ambient conditions where ozone and nitrate radicals are dominating the oxidative aging of emissions (Brown and Stutz, 2012;Tiitta et al., 2016).
The evolution of gases and particles in the chamber was measured by comprehensive on-line measurements. Gas-phase organic 20 and inorganic compounds were measured by single gas analyzers and a Proton-Transfer-Reactor Time-of-Flight Mass Spectrometer (PTR-TOF 8000, Ionicon Analytik, Innsbruck, Austria). The PTR dataset have been described in detail by Hartikainen et al., (2018). The particle size distribution was measured by a Scanning Mobility Particle Sizer (SMPS, CPC 3022, TSI) and the chemical composition and mass concentrations by a Soot Particle Aerosol Mass Spectrometer (SP-HR-TOF-AMS; Onasch et al., (2012)). 25 Dark and UV-induced aging were treated separately in further data analysis because intensive UV radiation affects the transformation of emission due to photochemical reactions and further the dependence structure of variables. The aging type can affect, for example, amount of SOA formed, reactions of OH and gaseous nitrous compounds, and their formation products (Ortega et al., 2013;Vakkari et al., 2014). We had two experiments with different ignition technique, one with fast ignition and one with slow ignition for both dark (2B and 3B) and UV-light induced aging (4B and 5B). Both experiments with the 30 same aging type were used as a single dataset for the model. https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.
Ignition type affects the composition of the emission from wood combustion (Hartikainen et al., 2018;Tiitta et al., 2016). As we used both experiments with different ignition types as a single dataset, we assumed that the reactions in the chamber are similar in both datasets and only the concentrations of the compounds are different.

Data processing
Here we present step by step how data has been processed before modeling. In this study, the data processing consists three 5 steps that are listed here and described in more detail in Sect. 2.1.1-2.1.4: 1. Using dimension reduction methods for the particle size distribution and the mass spectrometer datasets.
2. Forming approximate time series for the OH based on measurement of butanol-d9.
3. Adjusting the time resolution of data to be the same in datasets from every measurement instrument. 4. Filtering time series to reduce the amount of measurement error. 10

Dimension reduction of SMPS, PTR, and AMS datasets
SMPS provided very detailed measurements of size distribution consisting of over 100 size bins by particle (mobility) diameter from 14.1 -14.6 nm, to 710.5 -736.5 nm. For this study, we used a simpler representation of size distribution, size binning of wall-loss corrected (see Tiitta et al., (2016)) SMPS time series which were grouped into four larger size classes. These four classes roughly represent atmospherically relevant particle modes: under 25 nm (nucleation mode), 25 to 100 nm (Aitken 15 mode), 100 to 300 nm (accumulation mode), and over 300 nm (coarse mode). Measured size bins were summed to form number concentrations of particles in four classes.
Mass spectra of AMS and PTR contain over a hundred variables each. In order to form a model that provides a simple representation of emission's evolution, we do not want to present the dependencies of all single ions of mass spectra. Instead, we try to represent the evolution of spectra by combining masses into a smaller number of variables, called factors, by applying 20 dimension reduction techniques for both gas-and particle phase measurements from PTR and AMS, respectively. Compounds in the same factor are assumed to behave similarly, as the factors are formed from using dependencies in the dataset. However, it does not mean that all compounds in the same factor will react with the same compounds such as oxidizers. That can make interpretation of factors in the model more difficult.
Unfortunately, it is not necessarily self-evident which dimension reduction method should be applied to the dataset of interest. 25 Isokääntä et al. (2019) have studied the importance of selecting the appropriate dimension reduction method to the dataset in case of atmospheric measurement datasets. Datasets were produced by mass spectrometers in chamber studies. They found that fragmentation of compounds and rapid changes in chemical composition (e.g. caused by turning on UV-lights) should be taken into account when selecting the dimension reduction technique. They suggest that for datasets, where fragmentation is not such a problem, exploratory factor analysis (EFA) or principal component analysis (PCA) might extract those fast changes 30 from the data more efficiently than e.g. positive matrix factorization (PMF).
Our aim was to compare multiple experiments in further data analysis, and therefore dimension reduction techniques were applied to the PTR datasets from different experiments at once. Using all of the data to form the factors instead of applying a dimension reduction method separately to each one of the experiment have some advantages. Same variables (masses) are loaded to the same factor in each experiment, which means the comparison of factors is easier between experiments. However, as experiments contain different burning and aging conditions, proportions of masses loaded into one factor can be different 5 between experiments. For example, a different type of ignition might produce a large amount of specific compound that is absent in other experiments.
For PTR-MS mass spectra, EFA and PCA were tested, but EFA was selected further analysis due to more interpretable factors.
EFA was applied using the function fa from package psych (Revelle, 2018) in the R environment (R Core Team, 2019).
Minimum residual technique was used for EFA and oblique Oblimin-rotation was used to enhance the separation of the masses. 10 In addition, masses that had no significant contribution (i.e. loading) on any of the factors have been removed from analysis and factorization was performed again to the remaining masses. These removed compounds were mainly instrumental background or compounds with very small concentration without any clear structure as a time series. Time series for the factors were calculated by sum score method, where the original data is multiplied directly with the loading values from EFA with possibly some threshold limit (Comrey, 1973). We selected absolute value of 0.3 as a threshold for the loadings, meaning any 15 loading values smaller than that were suppressed to zero before the multiplication (Yong and Pearce, 2013). PTR factor 2 was scaled to be positive by adding the minimum value to all the time points. Negative values were caused by the baseline correction of PTR data and factorization. The factors (see Fig. 1) were identified based on their temporal evolution and the compounds as well as their average carbon oxidation state (OSC) contributing in each factor, and for PTR, three factors were found: 1) primary VOCs, 2) photochemical aging products, and 3) dark aging products. 20 Factorization of AMS spectra by PMF has been described in Tiitta et al., (2016). OA spectra were analyzed applying positive matrix factorization (PMF; (Paatero, 1997;Ulbrich et al., 2009)) to classify OA into subgroups according to their formation mechanisms. They found two factors to describe the primary organic aerosol particles (POA): 1) biomass burning OA and 2) hydrocarbon-like OA. In addition, they found secondary organic aerosol (SOA) factors representing the three major oxidizers: 1) formation by ozonolysis, 2) formation by nitrate/peroxy radicals and 3) formation by OH radicals. PMF factorization was 25 performed using The PMF Evaluation Tool v.2.08 (Ulbrich et al., 2009).
We assume that the factors from AMS and PTR dataset represent the groups of products we have determined based on representative compounds and their evolution. Exact representation of the evolution of specific groups of compounds would require knowing the exact compounds in the chamber, which is not possible with current measurement setup (Canagaratna et al., 2007;de Gouw and Warneke, 2007;Hatch et al., 2017;Yuan et al., 2017). Accurate optimization of factors is out of scope 30 of this modeling study.

Forming OH time series
As OH radical has an important role in transformation of emission during photochemical aging, we applied OH concentration level estimation formulas (1-3) from (Barmet et al., 2012) to calculate the OH concentration time series. Its estimation was based on the d9-butanol tracer method, and the slope was determined from 20 observations (time period of 30 min) around the time point OH concentration was estimated. Concentrations of OH, which were estimated to be negative, were set to zero. 5 Negative values were only estimated for dark aging experiments, however, in according to our initial assumption, OH was not allowed to affect other variables.

Adjusting time resolution
The applied instruments measured concentrations and size distributions with different time resolutions. PTR-MS measured every two second, whereas SMPS scan time and sample analysis took a total of 5 minutes to produce a single measurement. 10 To combine variables measured with multiple measurement devices into one dataset, it is important to transform data appropriately to make variables to the same time resolution. We applied simple cubic spline interpolation to interpolate SMPS data to same time resolution than AMS data, which is 2 minutes. Data from PTR-MS and other gas analyzers was averaged over a 2-minute time period.

Filtering time series 15
Measurements are discrete observations from continuous transformation process of emission and contain measurement error, which may arise from measurement limitations regarding the representativeness of measurement, accuracy of a measurement instrument or some other factor affecting to measurement event. The measurement , at time for time series is the sum of the true value , and the measurement error , . These measurement errors comprise sampling from the chamber ( , ), error related to the estimation of particles and gases losses to chamber walls ( − ), and error related to 20 processing of measurement instrument data from raw data into more useful form ( ).
Measurement , were presented as where error term , is independent in time and follows specified distribution presenting all uncertainties. State , describes 25 the estimate of the real state of the variable in the chamber and the error term , represents the error related to estimation of the state.
Understanding the evolution of the state , of the variable is the question of interest. We would like to understand the factors affecting the change of state during the aging of emission. Therefore, we estimated the state , from measurements , for each variable. 30 https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.
Multiple statistical methods can be applied to estimate the state from data. We applied the filtering method that used only already observed measurements to determine the current state of the time series. The method was similar to LOESS (locally estimated scatterplot smoothing (Cleveland, 1979)), where observations are weighted according to the proximity of measurement , 1 from the measurement , 0 which state is estimated. Instead of using all observations from the time series (smoothing), we used only observations before the estimated state (filtering). We applied the method to every time series 5 separately and the number of previous measurements used (time window) to estimate the current state , 0 was determined separately to each time series.
The filtering window was tuned manually by evaluating the state by comparing the time series and the trend and choosing window such that the trend is representative to time series. Because of different measurement instruments and time resolution, different time series had a different fraction of noise to total variation. Figure 2 shows the effect of filtering for one variable 10 during experiment 2B.

Causal model and causal structure
A causal model is a model that allows the researcher to analyze complex dependence structures between observed and latent variables. Here we refer to the definition of the causal model from Pearl, (2009) def. 7.1.1. The causal model defines functional relationships between variables and thus determines the dependence structure between all variables. 15 The causal structure is a representation of dependencies between variables without explicitly determining the exact functional form of each dependency. It can be presented as a directed acyclic graph, where nodes represent variables. Edges connect nodes and represent dependencies between variables. The direction of an edge determines the causal direction of the dependency. Figure 3 shows an example of a causal graph.
The causal structure can represent general knowledge about the phenomenon and can be determined without using any data 20 from experiments, or by applying causal discovery algorithms to search the structure or some part of the structure. In this study, we determined the causal structure between variables by using both previously known dependence paths between variables and causal discovery algorithms to discover previously unknown dependencies from measured data. Because of uncertainty in the structure found by causal discovery algorithm, the meaningfulness of each found dependency was evaluated.

Causal inference using causal model and causal graph 25
A important feature that separates the causal model from traditional multivariate model is its ability to make both interventional and counterfactual inference (Pearl, 2018). This enables the researcher to study questions such as how intervening of a variable changes the state of the observed process in the future (interventional), or how the currently observed state would have changed if we had changed some conditions beforehand (counterfactual). These questions are examples of interventional and counterfactual questions. This kind of inference is possible because of the assumption that the causal model represents the 30 https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.
effects of variables to each other (Pearl, 2009). When using the causal model, the one assumes that the causal structure contains all observed and latent variables needed to explain the studied phenomenon.
Causal models lie between statistical and physical models (Peters et al., 2017). As physical models, such as global climate models and MCM model (Jenkin et al., 1997;Saunders et al., 2003)), causal models can also consider interventional and counterfactual questions using simulations where the researcher can alter initial parameters or go back in time from the 5 observed state of the system. These questions cannot be assessed using only statistical models. Causal models do not always provide the same physical insight of the phenomenon that physical models do. However, the calculation of simulations in physical model may be time-consuming if the model is complex. In this study, the causal model provides a simpler representation that does not use as many parameters as the physical model and thus is computationally more feasible.

Causal discovery algorithms 10
Causal discovery algorithms are known as algorithms that enable searching for dependence structures of given variables using data and observed dependencies between variables to eventually find the causal structure. These algorithms are usually based on (in)dependency tests, data fitness-based scores, or both in addition to prior knowledge about the structure to complete the causal structure. Independence tests are often based on correlation, which implies that the data is assumed to be multivariate normally distributed and that the observed dependencies between variables are linear. Both normally distributed and linear 15 dependency assumptions are often in place in score-based algorithms.
When determining the causal structure, it is possible to include initial knowledge in the structure. As mentioned above, algorithms are based on observed dependencies in dataset and distinction between causal and correlated dependency might be difficult. R-causal package (Wongchokprasitti, 2019) enables the researcher to forbid or to require edges previously known as impossible or necessary to be in the structure, respectively. For example, previously known dependencies between variables 20 can be added to the causal structure before applying the algorithm and forbid known, not causal directions of connections.
Also forbidding of edges that are impossible because of other reasons, e.g. because of the temporal structure of variables, can be removed.

Creating the model
The main scope of this study is to understand the evolution of measured gases and particles. For describing the evolution of 25 each variable, we devise differential equations. The change of measured variable/concentration between subsequent time points has been modeled using previously observed values of relevant variables as predictors explaining the change. Figure 4a) illustrates the way variables are linked in a causal graph. We assumed that only the previous observation/concentration measured right before time interval can affect differences in variables during the time period. We modeled the evolvement of each variable using linear differential equations to model the evolution of variables, as defined in formula (1) Below we present a simple four-step procedure of how the model was applied to processed version of emission dataset. 5 Sections 2.5.1-2.5.4 provide more detailed description of selected steps in model creation.

1.
Use the causal discovery algorithm to search the potential dependencies between measured variables and measured changes (Δ( ): ) in time. For each measured change, we get its own potential variables, which are possible causes of the measured change.

2.
Form interaction variables from measured variables such that both variables in the interaction variable should be 10 suggested by the algorithm. In addition, prior assumptions have been taken into account here, thus forbidden variables cannot be part of the interaction variable. Interaction variables are also formed separately for each Δ( ).

3.
Use LASSO (least absolute shrinkage and selection operator) to select predictor variables amongst interaction variables (and possible single variables suggested by the algorithm). Simultaneously, estimate the coefficients of each selected predictor using a linear model. 15

4.
Calculate the modeled evolution using ODE system deSolve (Soetaert et al., 2010) using estimated coefficients as reaction coefficients and the first observation from the experiment as initial state.

Using a causal discovery algorithm
Causal discovery algorithms implemented in r-causal package (Wongchokprasitti, 2019) enable to "tune" the amount of dependencies in the whole graph. In PC-algorithm, tunable parameters are depth and alpha. According to the algorithms 20 documentation (Wongchokprasitti, 2019), the depth determines 'a number of nodes conditioned on in the search'. Depth values used in the algorithm were two, three, and infinite, meaning that every node can be further conditioned in the search. Alpha, which is set between zero and one, indicates statistical significance of the dependency between searched variables. However, significance was used here as a tuning parameter to allow the number of edges to rise with larger alpha values. Alpha values 0.01, 0.05, 0.2, 0.3 have been used for wood combustion datasets. For simulated datasets alpha 0.05, 0.2, and 0.3 were used. 25 For depth, 3 and unlimited depth (-1) was used.
Causal discovery algorithms have been used to search potential cause variables for each variable. An important part of the causal discovery algorithm is the implementation of prior knowledge. We used prior information for mostly restricting dependencies which were thought to be impossible or negligible. Separate prior information was used for dark aging and photochemical aging. The apparent a priori difference in prior information between two aging types is that in dark aging, OH 30 is not allowed to affect any of the other variables. In photochemical aging, OH can affect other variables than particle size distribution variables. OH was also forced to affect SOA3, which was characterized as OH radical formation products in a https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.
previous study (Tiitta et al., 2016). Furthermore, effects from size distribution variables on other variables, POA on SOA, AMS variables on gas phase variables, and negligible variables (variables with small concentration) on any other variable were not allowed. As the applied causal discovery algorithm is based on correlations, variables that have negligible effects in reality, could have been found to have large effect in correlation-based search.

Forming interaction variables 5
Causal discovery algorithm provides a list of potential edges between variables which we wanted to turn into interaction variables. We used this list from the discovery algorithm as a starting point to form a final causal structure to use in a model.
For each variable Y, the algorithm provides a list of variables X, that could cause Y, X Y. We used those variables X to form interaction variables for explaining Y.
Interaction variables for Y were formed by pairing potential causes of Y. Pairs were formed so that both variables are suggested 10 by the causal discovery algorithm. The only limitation for variables is that variable interactions are not allowed to violate prior assumptions. For example, if it is known that Z cannot cause Y, it cannot be included into interaction variables that could cause Y. In this step, we formed a list of potential interaction variables for each variable Y. Figure 4b) shows an example of interaction variables formed.
In addition to interaction variables, we allowed some single variables also to act as predictor variables. Many physical and 15 chemical reactions involve more than just one compound or phase state, but some reactions involve only one variable. As an example, particles of the same size can coagulate during evolution and form larger particles. , which would be already large for relatively small . Therefore, a pre-selection of most potential 25 variables to form these interaction variables reduces the number of interaction variables formed remarkably.

Applying LASSO to the dataset
Interaction variables also brought some challenges for selection of variables. Interaction variables are often highly correlated, as two interaction variables can have one common variable. Highly correlated variables can lead to multicollinearity in coefficient estimation. When data have multicollinearity problem, coefficients of interaction variables might suffer from 30 https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.
biases, and small changes in the model can lead to large changes in its output. Therefore, it is important to note the possibility of multicollinearity in predictors.
When interaction variables have been formed, there are usually many correlated variables that can explain each response variable Δ( ). We used LASSO regression (Bayesian approach from (Wang, 2019) based on (Park and Casella, 2008)) to reduce the number of variables and multicollinearity in the set of explanatory variables. LASSO shrinks the coefficients of 5 some redundant variables to be (almost or equal to) zero and therefore reduces the number of effective variables. Variables with coefficient close to zero were consequently left out from the structure. Variables that have non-zero coefficient in LASSO fit were used as predictor variables for Δ( )s. Coefficients of these variables were used as coefficients in the final model. Negative coefficients are only allowed when predictor variable includes the variable that is predicted i.e the amount of 10 predicted variable is decreased due to the process. Without this limitation, the modeled value of variable can go below zero during calculation, which is not physically realistic. If a variable that has negative coefficient does not have predicted variable in response variable, the model attempts to fix that by changing the predictor to some other predictor that will have positive coefficient (usually negatively correlated with original predictor) or to some predictor that includes the predicted variable.

Calculating the evolution based on obtained model 15
We evaluated the model fit by calculating the evolution of the system based on model's structure and estimated coefficients.
We used R package deSolve (Soetaert et al., 2010) to calculate the evolution of modeled linear ordinary differential equation system.
Selection of the best model was then done based on that evolution. We calculated Root Mean Squared Error (RMSE) for each variation of tuning parameter and selected the model based on smallest RMSE for calculated evolution. 20 As we had only two datasets for both dark and light induced aging, we did not want to use separate datasets for training and evaluating the model fit. As the ignition type also affects e.g. the chemical composition of the emission (Tiitta et al., 2016), evolution between datasets might not be directly comparable. Therefore, we used both ignition types for searching the structure and estimating the coefficients of the model.

Simulation studies 25
For the situation where the causal graph between variables is not completely known, causal discovery algorithms have been suggested to be one solution to find missing dependencies. The atmospheric aging process of wood combustion emissions is definitely one of these situations at present where we have limited prior knowledge. For the reliability of the obtained results from our experimental dataset, it is important to know how causal discovery algorithm combined with our model is functioning in a similar kind of datasets than our experimental dataset (see Table 1). 30 Several questions of interest existed related to the properties of input dataset and data pretreatment (Table 2). Firstly, we were interested to study how the precision of the measurements by analytical instrumentation is related to the model fit, which was https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License. assessed the measurement error and the number of measurements made per time bin. In order to mimic the measurement error, normally distributed random noise was added to the variables.
Secondly, we were interested to know whether the methods we applied to increase the quality of data are actually increasing the quality, accuracy of fit and structure of the model. Does filtering or smoothing of time series improve the fit and accuracy of prediction and is there an optimal time resolution to which data should be averaged? 5 Thirdly, the amount of necessary prior information was the question of interest. We were interested to study the importance of prior information to the modeled structure. Does addition of prior information improve the accuracy of modeled structure and how much incorrect edges do we have? Additionally, the effect of prior information to the model fit was interesting. How much prior information is necessary to get a reasonably good fit for the model?
The question about necessity of the prior information is also related to the dependence of model fit and the correctness of the 10 structure. Intuitively, one might think that the correct structure would produce the best fit for the evolution. As many of the variables are highly dependent, it is probable that we will fail to obtain the real structure between variables. If dependent variables we use in the model to explain the evolution are correlated with real causes in the dataset, we might still be able to predict the evolution of emission in the chamber. In addition to the differences between obtained and real structure in the model, we are interested about the predictive value of the obtained model compared to the simulated evolution. 15 Two simulated datasets (see Table 1) were formed to study how model would perform in a situation where we know the correct evolution and structure. Both dataset were formed using R-package deSolve (Soetaert et al., 2010). Datasets describe the evolution of Ordinary Differential Equation (ODE) system which length is 100 time points. The difference between datasets is the way how differential equations of variables are linked to each other. In smaller datasets, differential equations are from Mass Action Kinetics system. In larger dataset, equations have been formed independently for each variable. We called these 20 datasets as simulated datasets throughout the text.
Accuracy tests for causal discovery algorithms were also performed earlier (Scheines and Ramsey, 2016; Singh et al., 2017).
However, those tests are dependent on the used dataset. In our case, variables that explain the evolution of some variable do not originate directly from the discovery algorithm. Instead, the variables from the algorithm are only used to form possible interaction variables, hence this situation differs from the tests performed earlier. 25 We measured performance of the model in simulated dataset by two ways. First is the fitting of the model to the simulated dataset: how well the model can capture simulated evolution and how well the model can predict the simulated evolution after fitted dataset (ability to predict). Second can be called as structural accuracy: how well the underlying causal structure of variables can be returned by the model.
For measuring accuracy of the model fit, we compared the evolution obtained from the model to measurements. Evolution was 30 then compared to true evolution, not including the error added to the simulations, using RMSE for all time series. To equally weight each time series when calculating RMSE, each time series were scaled by dividing those with a standard deviation before calculating RMSE. In further text, we refer to this scaled version as RMSE.
In addition to the accuracy of the model, the accuracy of model prediction was evaluated. We used the obtained coefficients from the model to predict further time steps of the evolution of the system. Then we compared the prediction to the same time steps from the real system and evaluate the accuracy of model prediction using RMSE. Prediction length was 30% of the simulated dataset.
As mentioned before, RMSE calculated from the used dataset was compared to the real RMSE, which was calculated 5 comparing evolution from the model to the error-free evolution from the simulations. In reality, we obviously do not have an error-free dataset. Hence, RMSE is calculated from dataset containing measurement errors. The purpose of this investigation was to see whether RMSE that is calculated from error-free dataset is in line with the metric that is calculated from dataset containing measurement errors. The reason why these two estimators might give a different result is related to modifications of the data before applying the model or biases in errors of the obtained dataset. In this case, the interest was to see whether 10 applied filtering and smoothing techniques produce bias to the dataset and therefore goodness-of-fit measures would have significantly higher values in the error-free dataset than what the values should be based on the obtained dataset.
For structural accuracy, we used adjacency precision (AP), adjacency recall (AR) (Scheines and Ramsey, 2016) and F-score (Singh et al., 2017). AP was defined as a fraction of correct edges in the model of all proposed edges. AR was defined as a fraction of correct edges in the model of all correct edges. F-score was defined based on AP and AR as 15 In addition to F-score, we wanted to study whether incorrect predictors for variables were close to correct causes and if the model is able to find a good replacement for each correct predictor that was not chosen for modeled structure. Correlation was used to measure closeness here. For each correct predictor we calculated correlation between it and each predictor in the model (for same variable). The maximum of these correlations was taken as the value for that correct predictor. CorMean is the mean 20 value of correlations for all the correct predictors.

Simulation studies
The effect of measurement error is reported for simulated datasets (Tables 3 and 4). We found that in general, the model fit to the data was better when the error was lower. This is not surprising, as the error makes the dataset noisier. The effect of 25 uncertainty was small for F-score and correlation between the correct edges and edges in the model. Furthermore, in both datasets the optimum number of measurements during one experiment was related to the amount of error applied (Tables 3   and 4). For lower error, higher time resolution leads to better results when the real change between subsequent time points was dominating the change observed and if many observations about the evolution process were available. In case of high error, fewer observations from the phenomenon were preferred, as the signal to noise ratio was low and increasing the number of 30 observations would make it even lower. However, the dependence between sample size and fraction of error was weak.
Filtering or smoothing of only time series representing measurements turned out to reduce the error when measurement uncertainty was high (Table 5). These methods were also improving the accuracy of model prediction. Again, the importance of applying smoothing or filtering was larger when the measurement uncertainty increases.
To understand importance of applying smoothing or filtering method in a real dataset, we need to understand whether the error in the real dataset is large enough that applying filtering or smoothing method is reasonable. We assume the error related to 5 the time series as the difference between these datasets. The standard deviation of the error was then compared to the standard deviation of the filtered dataset to estimate the fraction of the error of the whole standard deviation in the time series.
In AMS and PTR datasets and in largest size bin of SMPS measurements, the fraction of standard deviation of error was around 10-20% of the whole standard deviation. Variables measured by gas analyzers contained the least amount of error, approximately 1-5% of the standard deviation of filtered time series. However, because the fraction of error of some variables 10 is high, applying of the filtering methods to real dataset was reasonable.
Reasons why applying the filtering methods to the dataset is not improving the fit before the error is relatively high can be assessed further, but was omitted in this study. In addition to the fraction of error, the sample size is probably related to the necessity of the filtering method. These simulations for the larger dataset were performed using the sample size of 100, which is the same sample size that was used in the shortest experiment. 15 Fourthly, prior information was closely related especially to the correctness of the obtained dependence structure between variables in the dataset. In some situations, there might be large amounts of prior information from previous studies available, which might help to construct the structure based on prior knowledge. This means that the correctness of the structure is high, and the causal inference based on the structure and the observed dataset has higher quality.
From wood combustion emission experiments, we do not have much prior knowledge about the phenomenon. Even though 20 there has been and currently is a vast amount of research about the oxidative aging of combustion emissions, the number of possible reactions occurring during experiments is very high and the reaction coefficients are therefore hard to estimate. Thus, the construction of the "true" causal structure between variables might be out of reach based on low amount of prior information.
One question of interest is that how the incorrect edges in the structure are related to the missing or present correct edges. 25 Based on the nature of the causal discovery algorithm one might think that the algorithm is not able to separate between causal and non-causal associations. Therefore, it might be that in the model the real cause is replaced with the indicative one. This, of course, decreases the accuracy of the structure, but not necessarily affect the fit and accuracy of prediction of the model, if the dependence between a real and an indicative cause is strong enough. Decreased accuracy in the structure is obviously causing the incorrect interpretations of the causal effects. 30 We found that there is a small difference in both fittingness and accuracy parameters when the amount of correct and incorrect edges in prior information is varied in the smaller, artificial dataset. Larger amount of correct prior information led to slightly lower RMSE for the model and prediction (Table 6). Addition of both correct and prior information led to lower RMSE and higher F-score. For prediction, the information had not much effect.

Modeling results for combustion experiments
As the evolution of emission is known to be different in dark and UV-aging conditions, experiments are separated based on aging type. In addition, prior knowledge for dark and UV aging experiments are of different quality (see Sect. 2.5.1).
The causal discovery algorithm was used for both dark aging experiments (dark aging part of 2B and 3B) using both experiments as input dataset for the model. Similarly, UV aging part of 4B and 5B was used to search the structure for UV 5 aging. Thus, both experiments with same aging type have the same dependence structure between variables and also the coefficients for the experiments of same aging type were the same. The dataset used to estimate the coefficients included the whole data from both experiments. After estimating the coefficients, modeled evolution of the system was then calculated and compared to the measured evolution of the experiment. The zero-time in each experiment is the start of the aging period of the experiment. 10 For dark aging experiments, the causal graph for SOA variables is shown in Fig. 5. Coefficients for the edges are provided in the Supplement. For SOA, most of the edges connected seem to be possible causes. For SOA1, the main increase of the particle concentration is due to the reaction with ozone. According to the model, some parts of the SOA1 might turn into SOA2 due to reactions with nitrate radical. For SOA2, reactions of nitrogen species (NO2 in gas-phase) are involved in most of the reactions.
For SOA3, the change in concentration is minor compared to SOA2 during dark aging. According to the model, some increase 15 in SOA2 and SOA3 is occurring due to ozone reactions.
We see from Fig. 6 that the evolution based on the model for both experiments follows the measured evolution well. In general, some factors (SOA1, SOA3, and PTR2) have slight differences between measured and modeled evolution in either one or both experiments. All coefficients for dark aging model and figures for other variables can be found in Supplementary file (Table   S1 and Fig. S1-S3). 20 For photochemical aging experiments, the difference between measured and modeled evolution is still slightly larger than for dark aging experiments. The largest differences between measured and modeled evolution were in variables NO and NO2 on both experiments. Figure 7 shows the causal graph for SOA variables in photochemical aging experiments. For SOA3, which has the most significant increase during the experiments, reactions of OH with PTR1 and photochemical aging products reacting with SOA3 25 are the most important sources. SOA1 is also related to OH during photochemical aging, whereas SOA2 is mostly related to reactions with gas-phase products (PTR2). Figure 8 shows the evolution of PTR3 and SOA3. All coefficients for photochemical aging model and figures for other variables can be found in Supplementary file (Table S2 and Fig. S4-S6).

Summary and conclusions
In this study, we created a model for describing the evolution of complex system based on differential equations. Model 30 creation is described step-by-step in Sect. 2.5. Dependences between observed time series were studied using causal discovery algorithm. We tested the accuracy of the fit, prediction and structure returned model using simulated datasets. Predictive https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License. accuracy of the model can be considered a reasonable measure for many possible applications of the model, where explicit causal understanding of the system is not needed.
We have also assessed many of the uncertainties related to atmospheric measurements and studied how these affect our model in simulated datasets (step-by-step description of data processing is provided in Sect. 2.1). Based on simulations, we managed to reduce the error related to the measurement process by smoothing the initial time series before applying the model, and 5 therefore make the model and the obtained structure more accurate. These errors are important to be considered in similar studies also in the future.
It is evident that the modeling of the structure and real causal paths of the evolution of combustion emission would need more initial knowledge about the dependencies between different gaseous and particulate matter. The model can be applied to test if hypothesized paths can be seen in the data, though there are still many possible paths that cannot be distinguished from these 10 data that can lead to the almost same result at the end of the experiment. However, chamber experiments make it possible to study the effect of each variable to another and reducing the amount of correlated but not correct variables to explain the However, it must be considered that our model is not based on the explicit knowledge of chemical formulas of the compounds, or the number of carbon and oxygen of the species. In the case of complex systems, such as wood combustion, which contain hundreds to thousands of different emission species, explicit knowledge is not available. The presented model used temporal evolution of the compounds in 1) factors made for particle-and gas-phase mass-spectrometer data and 2) forming a structure 25 of dependencies between compound groups and some specific compounds. In that sense, the model differs from SOM and MCM (Cappa and Wilson, 2012;Jenkin et al., 1997;Saunders et al., 2003).
Based on simulated experiments we made, it seems that the model obtained from the procedure we have explained here is not strictly a causal model, as the procedure is unable to find the correct edges among all possible edges without excessive knowledge about the phenomenon. Therefore, the results of the model cannot directly be interpreted as causal evolution of the 30 combustion emission in the atmospheric chamber. However, it seems that the model can capture the evolution quite well, and most of the incorrect edges in the simulated model are correlated with at least some of the correct edges and can most probably be interpreted as indicators for edges not measured in data but affecting the evolution. Based on simulations, it seems that our current model is not processing prior information in a very efficient way.
This was the first step of the model towards more complete understanding of variable interactions in laboratory experiments of atmospheric aging. More research and datasets are still needed to better quantify the dependence structure between studied variables. Recently, there has been various studies that have applied many dimension reduction techniques to oxidation chamber datasets showing that the selection of statistical dimension reduction technique can affect the interpretation of factors received from dimension reduction of mass spectrometer dataset (Isokääntä et al., 2019;Koss et al., 2019;Rosati et al., 2019). 5 For the purpose of emission oxidation study, a relevant way to form the groups would be to have the generation of oxidation products in the same factor. For the purpose of this study, the more explicit study of dimension reduction is out of the scope of this work. However, in the future development of the model, optimization of the choice of dimension reduction technique should be investigated more in detail.

Code and data availability 10
Codes are provided for reviewers and will be published in GitHub upon the publication of the manuscript. Data is partly uploaded to EUROCHAMP database (https://data.eurochamp.org/data-access/chamber-experiments/, chamber ILMARI / UEF, experiments Spruce log combustion aerosol + O3 -Aerosol study -physical properties) and will be completed before final publication of this manuscript.

Competing interests
The authors declare that they have no conflict of interest.              (Table 3 and 4) Adding normally distributed random noise (sd = unc_frac) to the simulated evolution. This random noise was representing the possible measurement uncertainty.
Purpose was to see if the measurement uncertainty reduces both structural accuracy and fit accuracy.
Filtering and smoothing (Table 5) Applied filtering and smoothing to original variables of simulated dataset.
How filtering or smoothing would improve the fit and structure of the model? Is it reasonable to use filtering or smoothing of dataset before making a model?
Prior Information (Table 6 and 7) Using prior information about the edges possible in the model. Using both correct and incorrect prior information. Fraction of information from all correct prior information was used as a measure of information.
Does addition of prior information help model to get correct structure and good fit? 5 https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License.  https://doi.org/10.5194/gmd-2020-13 Preprint. Discussion started: 24 January 2020 c Author(s) 2020. CC BY 4.0 License. Table 5 Effect of filtering and smoothing techniques to the goodness of fit -parameters (Fscore, RMSE) of the model for larger simulated dataset. Filter1 is a version, where filtering window was chosen manually for each time series, filter2 has automatic selection of filtering window, and smooth when smoothing was used with automatic selection of filtering window. Row 1-7 are results for raw dataset where any technique haven't been applied. Variable nEdges describes the number of edges in each model (single and interaction variables affecting ( )s). Mean average error of the predicted evolution of the model was calculated (RMSE_pred).

5
CorMean is the mean correlation between correct edge and the most correlated edge in the model. For each row, number of replications is 10.