Positive matrix factorization of large real-time atmospheric mass spectrometry datasets using error-weighted randomized hierarchical alternating least squares

Sapper, Benjamin C.; Youn, Sean; Henze, Daven K.; Canagaratna, Manjula; Stark, Harald; Jimenez, Jose L.

doi:https://doi.org/10.5194/gmd-18-2891-2025

Articles | Volume 18, issue 10

https://doi.org/10.5194/gmd-18-2891-2025

Articles | Volume 18, issue 10

Development and technical paper

19 May 2025

Development and technical paper |

| 19 May 2025

Positive matrix factorization of large real-time atmospheric mass spectrometry datasets using error-weighted randomized hierarchical alternating least squares

Benjamin C. Sapper, Sean Youn, Daven K. Henze, Manjula Canagaratna, Harald Stark, and Jose L. Jimenez

Abstract

Weighted positive matrix factorization (PMF) has been used by scientists to find small sets of underlying factors in environmental data. However, as the size of the data has grown, increasing computational costs have made it impractical to use traditional methods for this factorization. In this paper, we present a new external weighting method to dramatically decrease computational costs for these traditional algorithms. The external weighting scheme, along with the randomized hierarchical alternating least squares (RHALS) algorithm, was applied to the Southern Oxidant and Aerosol Study (SOAS 2013) dataset of gaseous highly oxidized multifunctional molecules (HOMs). The modified RHALS algorithm successfully reproduced six previously identified interpretable factors, with the total computation time of the nonoptimized code showing potential improvements of the order of 1 to 2 orders of magnitude compared to competing algorithms. We also investigate rotational ambiguity in the solution and present a simple “pulling” method to rotate a set of factors. This method is shown to find alternative solutions and, in some cases, lower the weighted residual error of the algorithm.

Download & links

Article (PDF, 8717 KB)

Download & links

How to cite.

Received: 05 Nov 2022 – Discussion started: 20 Dec 2022 – Revised: 21 Feb 2025 – Accepted: 03 Mar 2025 – Published: 19 May 2025

1 Introduction

1.1 Problem statement

Low-rank matrix factorization has been widely used in data science to explain underlying factors in large datasets (Xie et al., 1998; Kim and Hopke, 2007; Wei et al., 2016). The process considers a data matrix, A, of size m×n, which is decomposed into two smaller matrices, W of size m×k and H of size k×n, where $k ≪ \min (m, n)$ and A≈WH. Traditionally, principal component analysis (PCA) and singular value decomposition (SVD) have been used to find these factors (Kumar, 2017; Wei et al., 2016). PCA finds the eigenvectors of the covariance matrix A^TA, which are called the principal components, representing the directions of maximum variance in the data. The vectors are ordered by how much variance they explain, and only the most important vectors are kept, which are identified as the underlying factors in the data. Closely related, the SVD finds the factorization A=USV^T, where U and V contain the left and right singular vectors, respectively, and S is a diagonal matrix containing the singular values of A in decreasing order. Thus, to find a low-rank approximation of A, one could keep only the k most significant singular values and vectors to form the truncated SVD, $U_{k} S_{k} V_{k}^{T}$ . Mathematically, the truncated SVD is the most optimal rank k factorization of A for minimizing squared error (Eckart and Young, 1936). However, the SVD is not appropriate for all factorization problems for several reasons (Paatero and Tapper, 1994):

The SVD produces factors with negative values. For some factor analysis problems, such as finding chemical sources for air pollution data, SVD results can be difficult to interpret as chemical concentrations can only be nonnegative.
The SVD produces orthogonal factors. Many factor analysis problems are not constrained by requirements of orthogonality between factors.
The SVD is not fit to solve the following weighted least squares problem. Suppose that accompanying the dataset A is an equally sized (m×n) matrix Σ with Σ_ij=σ_ij representing the uncertainty of the measurement for A_ij. If rank(Σ)>1, the SVD cannot be scaled to find a solution minimizing the weighted residual error, which is defined as $\sum_{i = 1}^{m} \sum_{j = 1}^{n} \frac{(A_{i j} - \sum_{l = 1}^{k} W_{i l} H_{l j})^{2}}{σ_{i j}^{2}}$ (Paatero and Tapper, 1994).

Positive matrix factorization (PMF) was introduced by Paatero and Tapper (1993) to address these concerns. Weighted PMF attempts to find two factor matrices, W and H, by minimizing the equation

\begin{matrix} (1) & | | (A - W H) ⊘ Σ | |_{F}^{2} with W \geq 0, H \geq 0 . \end{matrix}

In Eq. (1), ⊘ represents elementwise division, the norm $| | \cdot | |_{F}$ is the Frobenius norm, and all elements of W and H are constrained to be nonnegative. Further, we note that for consistency with nomenclature in the literature related to the use of this algorithm for factorization of real-time atmospheric mass spectrometry datasets, we refer to this approach as “positive” matrix factorization (i.e., PMF) while recognizing that a more precise name would be nonnegative matrix factorization (NMF).

Traditional factor analysis methods are known to be computationally expensive. Steps to speed up factor analysis have been explored, such as randomization and the use of graphical processing units (GPUs) (Halko et al., 2011; Tan et al., 2018). Developing efficient algorithms is especially critical in atmospheric mass spectrometry, as improvements in instrumentation and increases in the duration of their use in field campaigns have led to intractably large datasets. Currently, analysis of these datasets requires sacrificing data resolution or extensive manual preprocessing to operate within existing PMF software tools, and full analysis can routinely take days or weeks of computation time (Hopke et al., 2023). As a result, a variety of approaches have emerged for efficient source apportionment of atmospheric mass spectrometry data. Algorithms to solve the nonconvex optimization posed by PMF range from gradient descent and block coordinate descent to projected gradient methods (Guo et al., 2024). Attempts at using supervised ensemble machine learning approaches have been shown to be capable of replicating results from traditional (unsupervised) factorization methods while reducing computation time (Zhang et al., 2025). Recently, Erichson et al. (2018) applied randomization to PMF and introduced a new method, randomized hierarchical alternating least squares (RHALS), to solve the unweighted PMF problem. In this paper, we test the application of RHALS to atmospheric concentration data that contain uncertainties. Accounting for these uncertainties as regression weights, we introduce a method of externally weighting and unweighting the data, which to our knowledge is novel in its application to RHALS. We consider the accuracy and the reduced computational costs compared to other PMF algorithms commonly used in the field of atmospheric science.

1.2 Background

1.2.1 PMF2 and Paatero

The first widely accepted algorithm for PMF was derived in Paatero (1997) using the Gauss–Newton method. This algorithm, called PMF2, is commonly used with environmental data (Kim and Hopke, 2007; Ulbrich et al., 2009; Massoli et al., 2018). Paatero (1997) defines an enhanced objective function and attempts to find factor matrices W and H that minimize the cost function Q in

\begin{matrix} (2) & \begin{aligned} Q & = \sum_{i = 1}^{m} \sum_{j = 1}^{n} \frac{(A_{i j} - \sum_{l = 1}^{k} W_{i l} H_{l j})^{2}}{σ_{i j}^{2}} \\ - α \sum_{i = 1}^{m} \sum_{l = 1}^{k} \log (W_{i l}) - β \sum_{l = 1}^{k} \sum_{j = 1}^{n} \log (H_{l j}) \\ + γ \sum_{i = 1}^{m} \sum_{l = 1}^{k} W_{i l}^{2} + δ \sum_{l = 1}^{k} \sum_{j = 1}^{n} H_{l j}^{2} . \end{aligned} \end{matrix}

In Eq. (2), logarithmic penalty terms are added to penalize factor values that become too close to zero (and therefore potentially negative), and L2 regularization is added to smooth out the factors and avoid overfitting (Paatero, 1997). Here, α and β control the strength of the penalty terms, while γ and δ control the strength of L2 regularization. To date, Paatero's exact algorithmic approach to solving Eq. (2) remains unpublished. However, pseudo-code for using the Gauss–Newton method to solve Eq. (2) is detailed in Lu and Wu (2004).

1.2.2 Multiplicative update

An alternative method for PMF was developed in Lee and Seung (1999). The multiplicative update (MU) method utilizes a special case of gradient descent where the learning rates are chosen to avoid subtraction in the gradient (Gillis, 2020). A multiplicative update estimate of a parameter θ (either W or H) is found by updates of the form (Gillis, 2020)

\begin{matrix} (3) & θ = θ ⊙ (\nabla_{θ}^{-} Q (θ) ⊘ \nabla_{θ}^{+} Q (θ)), \end{matrix}

where Q(θ) is a cost function to be minimized, $\nabla_{θ}^{-}$ consists of the negative terms of the gradient of the cost function, $\nabla_{θ}^{+}$ consists of the positive terms of the gradient, and ⊙ denotes elementwise multiplication. Here, θ is initialized with all positive entries as the MU cannot update an entry θ_ij if it is equal to zero (Gillis, 2020). As the data matrix A, the uncertainties (σ_ij), and the factor matrices are all nonnegative at each step of this algorithm, the factor matrices in the subsequent step are guaranteed to be positive because Eq. (3) only deals with the multiplication and division of positive numbers.

It is possible to perform PMF using other forms of gradient descent – for example, the projected gradient method (PGM) sets the step size to the inverse of the maximum eigenvalue of the Hessian of the cost function and may lead to faster convergence than MU (Gillis, 2020). However, we choose to only test MU due to its widespread use and flexibility (Gillis, 2020).

1.2.3 Alternating least squares and hierarchical ALS

Alternating least squares (ALS) methods solve for the factor matrices W and H by iteratively updating each matrix until convergence is reached (Cichocki et al., 2009). The cost function Q, containing $| | (A - W H) ⊘ Σ | |_{F}^{2}$ , is minimized by setting the partial derivatives $\frac{\partial Q}{\partial W}$ and $\frac{\partial Q}{\partial H}$ to zero and solving for W and H. To satisfy the positivity constraint, negative elements in the factors are set to zero.

Nonnegative ALS has no theoretical convergence guarantee and, in some problems, may fail to converge to a feasible solution (Gillis, 2020). For this reason, the alternating nonnegative least squares (ANLS) method and the alternating direction method of multipliers (ADMM) are interesting alternatives. In ANLS, indices of an “active set” are set to zero, and the rest are updated via an unconstrained optimization (Kim and Park, 2011). The active set is then updated to contain the indices with the new negative factor elements. In ADMM, an auxiliary factor matrix Y is formed, and an additional term is added to the cost function, which penalizes the distance between the target factor matrix (W or H) and Y (Gillis, 2020). Both of these methods may lead to faster and better convergence than nonnegative ALS (Gillis, 2020). However, we find that the simple nonnegative ALS almost always converges to feasible solutions for our dataset, and we do not explore these alternative methods.

In recent years, hierarchical alternating least squares (HALS) has become increasingly popular as an efficient method for PMF (Cichocki and Phan, 2009). Instead of minimizing with respect to the entire factor matrices W and H, HALS minimizes the cost function with respect to one block, or an outer product, of individual factors at a time. The main component of the cost function is redefined as $Q_{j} = | | (R_{j} - W_{(:, j)} H_{(j, :)}) ⊘ Σ | |_{F}^{2}$ , where $W_{(:, j)}$ and $H_{(j, :)}$ are the jth factors. Q_j can be minimized for each factor j by setting the partial derivatives $\frac{\partial Q_{j}}{\partial W_{(:, j)}}$ and $\frac{\partial Q_{j}}{\partial H_{(j, :)}}$ to zero and solving for $W_{(:, j)}$ and $H_{(j, :)}$ .

The derivation of the ALS update rules is detailed in Appendix A, while the derivation of HALS is detailed in Sect. 2.

1.3 Random projections

To reduce the computational costs of a matrix factorization algorithm for large datasets, randomization methods have been used as a dimension reduction technique (Erichson et al., 2018; Halko et al., 2011; Kaloorazi and Chen, 2019). Below, we present a brief overview of the theory and results laid out in Halko et al. (2011).

When performing randomization techniques, we hope that much of the relevant information about the column space of the data matrix A can be stored in a much smaller subset of vectors that we can sample. This is only true if the “effective rank” of A is low (A only has a few nonnegligible singular values), but that is generally assumed to be the case in any PMF problem (Erichson et al., 2018). Mathematically, we seek the approximation

\begin{matrix} (4) & A \approx P P^{T} A, \end{matrix}

where the relatively small number of columns in the matrix P are orthonormal and form an approximate basis of A. Choosing the columns of P to be the left singular vectors of A would minimize the L2 error: if the first k singular vectors were chosen, the error term $| | A - P P^{T} A | |_{2} = σ_{k + 1}$ , with σ_k+1 being the singular value of A with index (k+1) (Halko et al., 2011). However, random sampling from the column space of A can also produce a suitable basis.

The assumption that there are k underlying factors within A implies that the effective rank of A is k. It then appears reasonable to use k random samples from the column space of A to form a basis. However, with underlying uncertainties, we can write $A = B + E$ , where B is the rank k matrix spanned by the factors for which we wish to find a basis, and E is a perturbation matrix filled with the noise in A (Halko et al., 2011). Suppose we were to sample from the column space of A – that is, form the vector $y = A ω = B ω + E ω$ . Each vector y is slightly pushed out of the column space of B by the term Eω. Thus, to increase the likelihood of spanning the column space of B, an additional p vectors are sampled from A. In practice, choosing p to be 10 or 20 is sufficient (Erichson et al., 2018).

To construct this low-rank approximation, k+p random normal samples, stored as columns of the matrix Ω (dimensions $n \times (k + p)$ ) of the column space of A, are taken and stored in Y:

\begin{matrix} (5) & \underset{m \times (k + p)}{Y} = \underset{m \times n}{A} \cdot \underset{n \times (k + p)}{Ω} . \end{matrix}

Next, the columns of Y are orthonormalized using a QR decomposition to form our projection matrix P. The algorithm can now be run on the lower dimensional matrix B=P^TA.

1.4 Nonuniqueness of solutions

Unlike the SVD, there is no guarantee of uniqueness for the factor matrices W and H in PMF. That is, the factorization A=WH can also be expressed as $A = W T T^{- 1} H$ , where T is a “rotational” matrix and $\hat{W} = W T$ and $\hat{H} = T^{- 1} H$ are the new rotated factors. We note that T does not necessarily represent a true rotation in a mathematical form, which would require T to be orthogonal. To span the space of feasible solutions, previous approaches such as PMF2 have aimed to find new solutions $\hat{W}$ and $\hat{H}$ by varying T (Paatero, 1997) or varying initializations (Ulbrich et al., 2009).

The rotational matrix T is a k×k matrix where t_ii, a diagonal element of T, represents a scaling of the ith factor, and t_ij represents a rotation of the jth factor towards the ith factor in W and a rotation of the ith factor away from the jth factor in H. For example, consider the elementary rotation matrices (Paatero et al., 2002)

\begin{matrix} (6) & \begin{aligned} T_{E} = (\begin{array}{ccccc} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & r & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & 1 \end{array}), \\ T_{E}^{- 1} = (\begin{array}{ccccc} 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & - r & \dots & 0 \\ 0 & 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & 0 & \dots & 1 \end{array}) . \end{aligned} \end{matrix}

All factors remain the same, except ${\hat{W}}_{(:, 3)} = W_{(:, 3)} + r W_{(:, 2)}$ and ${\hat{H}}_{(2, :)} = H_{(2, :)} - r H_{(3, :)}$ .

Regardless of whether r is positive or negative, values in either W or H will be pulled towards negative values. Thus if a large proportion of the factors are filled with zeros, there may be little to no pure rotations or rotations that do not change the residual $| | A - W H | |_{F}^{2}$ . Algorithms such as those developed by Paatero, where a logarithmic penalty term is added to push factor values to be more positive, will have few zeros in the factor matrices and will thus have more rotational ambiguity (Paatero et al., 2005). However, RHALS enforces nonnegativity merely by setting negative elements to zero, and thus many values in the factor matrices may end up being zero. Thus for RHALS, a perfect rotational matrix T will almost certainly not exist, and only “approximate” rotations can be studied, in which the rotation will alter the value of the weighted error.

It is not feasible to span all possible variants that T can take. Thus, the problem is often simplified to considering only positive rotations (values of T greater than zero) and negative rotations (values of T less than zero). A rotational program in PMF2 called FPEAK uses the parameter ϕ to denote the rotation strength, with positive values leading to positive rotations in W (Paatero, 1997). Paatero further improved this method in the multilinear engine (ME) algorithm, where the strength of rotation is allowed to vary between factors (Paatero and Hopke, 2009). The pulling algorithm presented in Paatero and Hopke (2009) is a sophisticated rotational method; more rudimentary pulling methods that mimic varying the regularization of the factor matrices are presented in Paatero (1997) and Paatero et al. (2002). Recent attempts at controlling for rotational ambiguity have involved additional factorization of the time-series matrix W into a matrix incorporating shape regularization to reflect known diurnal patterns of factors and a diagonal scaling matrix (Nanra et al., 2024).

1.5 Scaling with uncertainties

Recall that to account for inaccuracies in real data, we measure the squared error Q of the algorithm by dividing the residual by the standard deviation of the uncertainty of each measurement:

\begin{matrix} (7) & Q = \sum_{i = 1}^{m} \sum_{j = 1}^{n} {(\frac{A_{i j} - \sum_{l = 1}^{k} W_{i l} H_{l j}}{σ_{i j}})}^{2} . \end{matrix}

To account for these uncertainties, one could incorporate them into each update rule of the factor matrices, as is done in PMF2 (Paatero, 1997). In this paper, we refer to this as “internal weighting”. However, this is computationally expensive due to repeated elementwise operations with the uncertainty matrix Σ. Elementwise operations of large arrays are inefficient processes compared to other operations of the same computational complexity, such as matrix–vector multiplication, due to the large allocation of memory towards intermediary results (Jia et al., 2020). We thus introduce an alternate approach to weighted PMF where the data are prescaled by the uncertainty matrix, the unweighted algorithm is applied to the scaled data, and the converged factors are scaled by the uncertainties. This approach, which we refer to as “external weighting”, dramatically reduces computational costs and allows for dimensionality reduction as weights are not included in the update rules.

It is noted in Paatero and Tapper (1993) that if the weighting matrix is rank 1, that is if $\frac{1}{σ_{i j}} = B_{(i, :)} C_{(:, j)}$ for vectors of B and C, then an optimal scaling can be found. By forming the diagonal matrices D_L (with the elements of $B_{(i, :)}$ ) and D_R (with the elements of $C_{(:, j)}$ ), if $A = W H + E$ , then $D_{L} A D_{R} = D_{L} W H D_{R} + D_{L} E D_{R} = \hat{W} \hat{H} + \hat{E}$ . Thus, by first finding $\hat{A} = D_{L} A D_{R}$ and then running a PMF algorithm without weights on $\hat{A}$ , one can produce estimated factor matrices $\hat{W}$ and $\hat{H}$ , where the unscaled estimates are $W = D_{L}^{- 1} \hat{W}$ and $H = \hat{H} D_{R}^{- 1}$ .

If the weighting matrix Σ is not rank 1 – which is likely for environmental data – and $\frac{1}{σ_{i, j}}$ cannot be estimated as the outer product of two vectors, there is no scaling of the previous form that can be applied to the data matrix A (Paatero and Tapper, 1993). To address this, Paatero and Tapper (1993) presented a simple algorithm to find an approximate rank 1 factorization of Σ. We present a different method where the data are scaled by the full-rank matrix Σ and then unscaled after the algorithm is complete. This algorithm is described in detail in Sect. 2.3.

Expectation maximization

The expectation maximization (EM) approach was first designed for matrix factorization problems associated with missing entries (Zhang et al., 2006). Specifically, if A^o is the observed data and A^u is the unknown data within A, then the EM approach seeks to find factors W and H that satisfy the following (Zhang et al., 2006):

\begin{matrix} (8) & {argmax}_{WH} E (\log (P (A^{o}, A^{u} | W H)) | A^{o}, W H^{(t - 1)}), \end{matrix}

where WH^(t−1) is the product of the previous estimates of W and H and ℙ is a probability measure. This problem is equivalent to running a PMF algorithm on the following adjusted matrix (Zhang et al., 2006):

\begin{matrix} (9) & A_{1} = C ⊙ A + (1 - C) ⊙ W H^{(t - 1)}, \end{matrix}

with C_ij=1 if A_ij is known and C_ij=0 if A_ij is unknown, and 1 is a matrix of ones.

Recent work has looked into expanding on this approach to continuous weights, as seen in most PMF problems of real-time atmospheric mass spectrometry data (Yahaya et al., 2019; Yahaya, 2021; Yahaya et al., 2021). To handle the continuous case, a variation of Eq. (8) is maximized (Yahaya et al., 2019):

\begin{matrix} (10) & \begin{aligned} {argmax}_{WH} E (\log (P (C ⊙ A, (1 - C) \\ ⊙ A_{theo} | W H)) | C ⊙ A, W H^{(t - 1)}), \end{aligned} \end{matrix}

where C is now a weight matrix containing estimates of confidence as a value between 0 and 1 in a given data point, and A_theo is the theoretical true data. Maximizing Eq. (10) is equivalent to running any PMF algorithm on the matrix A₁ formed in Eq. (9).

To form the confidence matrix C from the uncertainty matrix Σ, Yahaya (2021) suggests scaling the weights (in this case $\frac{1}{σ_{i j}}$ ) so that the maximum value is 1. However, previous testing has primarily focused on problems with binary weights (Yahaya et al., 2019; Yahaya, 2021; Yahaya et al., 2021).

1.6 Determining the number of factors

The squared weighted residual error, $Q = \sum_{i = 1}^{m} \sum_{j = 1}^{n} {(\frac{A_{i j} - \sum_{l = 1}^{k} W_{i l} H_{l j}}{σ_{i j}})}^{2}$ , can be used to determine whether a given solution either overfits or underfits the data (Ulbrich et al., 2009). If measurement errors are normally distributed, then Q will follow a χ² distribution with $m n - k (m + n)$ degrees of freedom. Thus, to avoid overfitting, the number of factors in a solution is chosen such that $Q \approx m n - k (m + n)$ (Paatero et al., 2002). When additional error is present, measuring the convergence of Q or the weighted residual error with additional factors can determine if these additional factors add much information to the model.

Another method used in determining the number of factors is the lack of rotational ambiguity of a solution (Ulbrich et al., 2009). Consider a simple case where the data matrix A is the product of two rank-two matrices $W = [a, b]$ and $H = [y, z]^{T}$ , with $a, b, y,$ and z being column vectors. An exact solution can also be obtained by finding $W = [c, d, b]$ and $H = [y, y, z]$ , where $c + d = a$ , in a process known as factor splitting (Ulbrich et al., 2009). Thus, a three-factor solution introduces rotational ambiguity, as any two factors c and d can be chosen as long as they add up to a. The same analysis can be seen by analyzing solutions with four or more factors, and we can conclude that a large amount of rotational ambiguity is a potential sign of overfitting.

1.7 Data

In this paper, we use the data from “Ambient measurements of highly oxidized gas-phase molecules during the Southern Oxidant and Aerosol Study (SOAS) 2013”, measuring highly oxidized multifunctional molecules (HOMs) over a forest site in Alabama from 22 June to 7 July 2013 (Massoli et al., 2018). The dataset contains mass spectra concentrations of 1059 different ions over 27 336 different time stamps. Additionally, initial uncertainties associated with each measurement are also included. PMF was applied to the data using the PMF2 algorithm described in Paatero (1997), checking solutions from 2 to 10 factors, where a six-factor solution was obtained (Massoli et al., 2018). We note that some of the uncertainties were artificially increased for this PMF2 analysis. The authors concluded that a significant portion of the secondary organic aerosol (SOA) was the result of interactions between biogenic and anthropogenic emissions (Massoli et al., 2018).

We use this six-factor solution as a reference solution and test whether the RHALS algorithm can recreate formulated factors and those found from PMF2. Analyses of results for different numbers of factors (other than the original six identified in Massoli et al., 2018) were not considered in order to maintain interpretability of the algorithm output. For reference, the PMF2 factor mass spectra and the time trends over all of the data are shown in Fig. 1. The factor time series, as well as the time series of the total mass concentration, is also shown in Fig. 2. Both plots show total concentration amounts over the entire time series and mass spectra, respectively. Discussion of the chemical interpretations of the data are not presented, and the scope of this effort is limited to the mathematical results from implementing the RHALS algorithm.

https://gmd.copernicus.org/articles/18/2891/2025/gmd-18-2891-2025-f01

Figure 1(a–f) Mass spectra profiles of the six PMF2 factors labeled by order; (g) total mass spectra concentration of the data.

Positive matrix factorization of large real-time atmospheric mass spectrometry datasets using error-weighted randomized hierarchical alternating least squares

1.1 Problem statement

1.2 Background

1.2.1 PMF2 and Paatero

1.2.2 Multiplicative update

1.2.3 Alternating least squares and hierarchical ALS

1.3 Random projections

1.4 Nonuniqueness of solutions

1.5 Scaling with uncertainties

Expectation maximization

1.6 Determining the number of factors

1.7 Data

2.1 HALS algorithm

2.2 Rotational considerations

2.3 External weighting

2.4 Regularization

4.1 Computational efficiency

4.2 Simple case

4.3 Large dataset

4.3.1 Comparing different algorithms

4.3.2 Comparison between expectation maximization and external weighting

4.3.3 Complete analysis of RHALS algorithm

4.4 Testing rotations