Articles | Volume 17, issue 10
https://doi.org/10.5194/gmd-17-4383-2024
https://doi.org/10.5194/gmd-17-4383-2024
Development and technical paper
 | 
24 May 2024
Development and technical paper |  | 24 May 2024

Application of regional meteorology and air quality models based on the microprocessor without interlocked piped stages (MIPS) and LoongArch CPU platforms

Zehua Bai, Qizhong Wu, Kai Cao, Yiming Sun, and Huaqiong Cheng
Abstract

The microprocessor without interlocked piped stages (MIPS) and LoongArch are reduced instruction set computing (RISC) processor architectures, which have advantages in terms of energy consumption and efficiency. There are few studies on the application of MIPS and LoongArch central processing units (CPUs) in geoscientific numerical models. In this study, the Loongson 3A4000 CPU platform with the MIPS64 architecture and the Loongson 3A6000 CPU platform with the LoongArch architecture were used to establish the runtime environment for the air quality modelling system Weather Research and Forecasting–Comprehensive Air Quality Model with extensions (WRF-CAMx) in the Beijing–Tianjin–Hebei region. The results show that the relative errors for the major species (NO2, SO2, O3, CO, PNO3, and PSO4) between the MIPS and X86 benchmark platforms are within ±0.1 %. The maximum mean absolute error (MAE) of major species ranged up to 10−2 ppbV or µg m−3, the maximum root mean square error (RMSE) ranged up to 10−1 ppbV or µg m−3, and the mean absolute percentage error (MAPE) remained within 0.5 %. The CAMx takes about 195 min on the Loongson 3A4000 CPU, 71 min on the Loongson 3A6000 CPU, and 66 min on the Intel Xeon E5-2697 v4 CPU, when simulating a 24 h case with four parallel processes using MPICH. As a result, the single-core computing capability of the Loongson 3A4000 CPU for the WRF-CAMx modelling system is about one-third of the Intel Xeon E5-2697 v4 CPU, and the one of Loongson 3A6000 CPU is slightly lower than that of Intel Xeon E5-2697 v4 CPU; but, the thermal design power (TDP) of Loongson 3A4000 is 40 W, while the TDP of Loongson 3A6000 is 38 W, only about one-fourth of that of Intel Xeon E5-2697 v4, whose TDP is 145 W. The results also verify the feasibility of cross-platform porting and the scientific usability of the ported model. This study provides a technical foundation for the porting and optimization of numerical models based on MIPS, LoongArch, or other RISC platforms.

1 Introduction

In recent years, with the increasing demand for high-performance computing resources and rapid development in the computer industry (especially supercomputers), the central processing unit (CPU) has undergone significant advancements in logical structure, operational efficiency, and functional capabilities, making it the core component of current computer technology development. There are two main types: one is the complex instruction set computer (CISC) CPU (George, 1990; Shi, 2008), which mainly uses the X86 architecture and whose representative vendors include Intel and AMD. It is widely used in high-performance computing platforms. The other is the reduced instruction set computer (RISC) CPU (Mallach, 1991; Liu et al., 2022), which mainly uses the ARM, microprocessor without interlocked piped stages (MIPS), RISC-V, and other architectures and whose representative vendors include Loongson. It is mainly used in high-performance computing platforms, which have high efficiency and excellent stability and scalability. The microprocessor without interlocked piped stages (MIPS) architecture is one of the significant representatives of the RISC architecture. MIPS was originally developed in the early 1980s by John Leroy Hennessy at Stanford University and his group (Hennessy et al., 1982). The simplicity of the MIPS instruction set contributes to its ability to process instructions quickly, thus achieving higher performance even in low-power conditions. In 1999, MIPS Technologies Inc. released the MIPS32 and MIPS64 architecture standards (MIPS Technology Inc., 2014). Compared to the CISC CPUs, RISC CPUs, which have gained popularity among chip manufacturers, demonstrate excellent performance and power efficiency.

The Loongson processor family developed by Loongson Technology is mainly designed using the MIPS architecture and the Linux operating system (Hu et al., 2011), which is rich in application tools in Linux open-source projects. The main reason that the development of CPUs that implement non-X86 instruction set architecture such as MIPS64 is currently restricted is the immature software ecosystem (Hu et al., 2016). Based on the strategy of open-source software, the Loongson platform has gained abundant software tools, making it possible to further develop scientific computing and numerical models.

Air quality model (AQM) systems use mathematical equations and algorithms to simulate and predict the pollutant concentration in the atmosphere. The current AQMs have become more complex, incorporating numerous factors such as emissions from industrial sources; vehicle traffic; and natural sources; as well as meteorological conditions, including modelling meteorology, emissions, chemical reactions, and removal processes (Zhang et al., 2012). Regional-scale AQMs have been widely used to predict air quality in cities, formulate emission reduction strategies, and evaluate the effectiveness of control polices (Wang et al., 2023); these include the Community Multiscale Air Quality (CMAQ) modelling system (Appel et al., 2017, 2021), the Comprehensive Air Quality Model with extensions (CAMx; RAMBOLL ENVIRON Inc., 2014a), and the Nested Air Quality Prediction Modelling System (Wang et al., 2006; Chen et al., 2015). Due to the requirement of meteorological input, commonly used offline meteorological models such as WRF (Michalakes et al., 2001) are coupled offline with the regional AQMs to provide meteorological and chemical forecast as a WRF-AQM modelling system such as the WRF-CMAQ modelling system (Wu et al., 2014).

Both the meteorological and air quality numerical simulation rely heavily on high-performance computing systems. The WRF-AQM systems can run stably on high-performance computing platforms based on X86 or X86-compatible instruction set architecture (ISA) CPUs, which account for the highest percentage among the main processors of current high-performance computing platforms. There is relatively limited research on the application of WRF-AQM systems on the MIPS and LoongArch CPU platforms at present; this study focuses on the application of the WRF-CAMx model on the Loongson CPU platform based on the MIPS and LoongArch architectures. A simulation case covering the Beijing–Tianjin–Hebei region was set up to evaluate the differences and performance between the MIPS and X86 platforms. This study validated the stability of scientific computing on the MIPS and LoongArch CPU platforms, and it offered technical references and evaluation methods for the porting and application of numerical models on non-X86 platforms.

Section 2 provides the model descriptions of the Weather Research and Forecasting–Comprehensive Air Quality Model with extensions (WRF-CAMx) modelling system and the descriptions of the MIPS, LoongArch, and benchmark platforms. The configurations of the air quality numerical simulation system and simulation case are also presented in Sect. 2. Section 3 describes porting and optimization of the WRF-CAMx modelling system on the MIPS and LoongArch CPU platforms. Section 4 analyses the differences in model results between the MIPS CPU platform and the benchmark platform. Section 5 discusses MIPS and LoongArch CPUs' performance in scientific computing. The conclusions are presented in Sect. 6.

2 Model and porting platform description

The air quality modelling system was constructed using the WRF v4.0 model developed by the National Center for Atmospheric Research (NCAR) (Skamarock et al., 2019) and CAMx v6.10 developed by Ramboll Environ (RAMBOLL ENVIRON Inc., 2014a), as shown in Fig. 1. The Loongson 3A4000 CPU platform was chosen for the porting work in the study. This study introduces the porting of the WRF-CAMx modelling system to the MIPS and LoongArch CPU platforms.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f01

Figure 1The framework of the WRF-CAMx modelling system. The core modules have been ported to the MIPS and LoongArch CPU platforms. The core modules are framed by the dotted red line in the figure.

Download

In Xi'an, China, and Milan, Italy, the WRF-CAMx modelling system was applied, enabling a high-resolution hourly model output of pollutant concentration within specific local urban areas (Pepe et al., 2016; Yang et al., 2020). The modelling system is widely used to study the spatiotemporal variation in pollutant concentration and source apportionment; analyse the contribution of regional transport to pollution; and investigate the impact of initial conditions and emissions on pollution simulation in key regions such as the North China Plain, Sichuan Basin, and Fenwei Plain (Bai et al., 2021; Zhen et al., 2023; Zhang et al., 2022; Xiao et al., 2021).

2.1 Description of the WRF-CAMx modelling system

WRF and CAMx serve as the core components of the modelling system. WRF is a mesoscale numerical weather prediction system designed for atmospheric research and operational forecasting applications. Distinguished by its high temporal and spatial resolutions, WRF is suitable for multi-scale simulations of short-term weather forecasts, atmospheric processes, and long-term climate, making it an essential tool in the meteorological and atmospheric research communities (Powers et al., 2017). In the modelling system, WRF provided gridded meteorological field data for the air quality model CAMx. The relative humidity, a meteorological variable used in result validation, is calculated using the wrf-python package (the official website is https://wrf-python.readthedocs.io, last access: October 2023). CAMx is an atmospheric pollutant calculation model, which can be utilized for simulating and predicting the concentrations of various air pollutants. The WRF and CAMx models are distinguished by modularity and parallelism, using MPI in parallel computing, making them efficient (Skamarock et al., 2019; RAMBOLL ENVIRON Inc., 2014a).

In the modelling system, the Sparse Matrix Operator Kernel Emissions (SMOKE) model and the CMAQ2CAMx program are used to process emission data and provide model-ready gridded emission data for the CAMx model. The WRFCAMx program converts the WRF results into meteorological input files, which are compatible with CAMx. The TUV (Tropospheric Ultraviolet and Visible radiation) model is a radiation transfer model capable of producing clean sky photolysis rate input files for the chemical mechanisms in CAMx, and the O3MAP program prepares ozone column input files for TUV and CAMx. The icbcprep program prepares initial and boundary condition files for CAMx with the profile, and the effects of initial conditions have been studied by Xiao et al. (2021). The CAMx2IOAPI program converts the CAMx output files to the netCDF format, following the Models-3 I/O API convention, and then uses NCL or other software to analyse the model results.

2.1.1 Model domain setup

The model domain focusing on the Beijing–Tianjin–Hebei region has been set up in this study. The WRF model has three nested domains with horizontal resolutions of 27 km (D1), 9 km (D2), and 3 km (D3), as shown in Fig. 2. The outer domain (D1) covers most parts of China, and the inner domain (D3) covers Beijing, Tianjin, and Hebei Province. The model domain is centred at (35° N, 110° E), with two true latitudes located at 20° N and 50° N. The vertical resolution of WRF is 34 vertical layers. The CAMx model has only one model domain, which is the innermost grid with a resolution of 3 km (D3), mainly covering the Beijing–Tianjin–Hebei region. The vertical resolution of CAMx is 14 vertical layers, which are extracted from the WRF output files using the WRFCAMx module, and the lower 7 layers of CAMx are the same as those in the WRF model.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f02

Figure 2The domains of three-level nested grids in the WRF-CAMx modelling system. The respective horizontal resolutions are 27 km × 27 km (D1), 9 km × 9 km (D2), and 3 km × 3 km (D3).

2.1.2 Model configuration

From 00:00 on 3 November 2020 until 24:00 on 5 November 2020 the modelling system simulated the meteorological and air quality for a period of 72 h. In the research of Wang et al. (2019), a 72 h test case was set for the scientific validation and performance evaluation of the chemistry transport models. A 72 h case represents a moderate-sized real scientific workload, which allows for simulating during a short time period to validate the results and assess computational efficiency on the MIPS and LoongArch platforms. For the meteorological model, the global meteorological initial and boundary fields for the WRF model are derived from the NCEP Final (FNL) global reanalysis data, with a spatial resolution of 0.5° × 0.5° and a temporal resolution of 6 h. The parameterization schemes of the WRF model used in the simulation case are shown in Table 1.

For the air quality model, the meteorological files provided by the WRF model are used for the chemical transport module in CAMx. The emission inventory used in the simulation case was obtained from Sun (2022). It contains basic emissions from Sun et al. (2022) and fugitive dust emissions from bare ground surfaces. The SMOKE model (v2.4) is used to process the emission inventory and provide gridded emissions for CAMx. The parameterization schemes of the CAMx model used in the simulation case are shown in Table 2.

Table 1Parameterization schemes of WRF in research case.

Download Print Version | Download XLSX

Table 2Parameterization schemes of CAMx in the research case.

Download Print Version | Download XLSX

2.1.3 Statistical indicators for model results

To quantify the differences in the model results between the MIPS and benchmark platforms, three statistical indicators are used to analyse the differences in concentration time series: mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE). The MAPE quantifies the deviation between computational differences and simulated values. The smaller these indicators, the better the accuracy and stability of the scientific computing of the modelling system on the MIPS platform. The calculation formulas for these statistical indicators are provided in Eqs. (1) to (3).

(1)MAE=1ni=1nMIPSi-Base(i)(2)RMSE=1ni=1nMIPSi-Base(i)212(3)MAPE=1ni=1nMIPSi-Base(i)MIPS(i)×100%

In the equations, n represents the number of grids in the domain. MIPS(i) represents the simulated value of a certain grid on the MIPS platform, and Base(i) represents the baseline value of a certain grid on the benchmark platform.

Table 3The main parameters of the Loongson 3A4000 CPU and the Loongson 3A6000 CPU.

 Source: https://www.loongson.cn (last access: January 2024).

Download Print Version | Download XLSX

2.2 MIPS and LoongArch CPU platform descriptions

The Loongson CPU platform was chosen for the porting work in the study. Currently, the Loongson processor family has three generations of CPU products, evolving from single-core to multi-core architectures and from experimental prototypes to mass-produced industrial products (Hu et al., 2011). The Loongson 2 processor is a 64-bit general-purpose RISC processor series, which is compatible with the MIPS instruction set. It can be used in personal computers, mobile terminals, and various embedded applications, running many operating systems such as Linux and Android smoothly (Zhi an Xu, 2012). Wu and Cheng (2019) report the application of the mesoscale model on the Loongson 2F CPU platform. The Loongson 3 processor features a scalable multi-core architecture, targeting high-throughput data centres, high-performance scientific computing, and other applications, with the significant advantage of achieving a high peak-performance-to-power ratio and striking a well-balanced trade-off between performance and power consumption (Hu et al., 2009).

The Loongson 3A series comprises multi-core processors designed for high-performance computers, featuring with high bandwidth, and low power consumption. The efficient design solution and the advantage of high energy efficiency ratio make servers based on Loongson CPUs highly competitive in performance, power consumption, and cost-effectiveness (Li et al., 2014; Wang et al., 2014). In this study, the Loongson platform uses a Debian Linux operating system commercially known as Tongxin UOS (https://www.uniontech.com, last access: January 2024) and the Loongson 3A4000 processor, which is the first quad-core processor based on the GS464V 64-bit microarchitecture in the Loongson 3 processor family. The main technical parameters of the Loongson 3A4000 CPU are shown in Table 3. Compared to previously released CPUs, the processor improves frequency and performance by optimizing the on-chip interconnect and memory access path, integrating a 64-bit DDR4 memory controller and an on-chip security mechanism. The Loongson 3A6000 CPU platform uses Loongnix, an open-source community edition operating system released by Loongson (https://www.loongson.cn/system/loongnix, last access: January 2024), and the latest-released Loongson 3A46000 processor, which is a quad-core processor based on the LA664 microarchitecture. The main technical parameters of the Loongson 3A6000 CPU are shown in Table 3. The processor supports the LoongArch™ instruction set and hyper-threading, and the performance has significantly improved compared to the previously released processors (Hu et al., 2022).

2.3 Benchmark platform description

This study uses an X86 CPU platform as the benchmark platform compared to the MIPS and LoongArch CPU platforms. The benchmark platform is powered by an Intel Xeon E5-2697 v4 CPU, with strong floating-point performance and many technical features such as Intel Turbo Boost Technology (Intel Inc., 2023). The Intel Xeon E5-2697 v4 CPU has 18 cores, with a 2.3 GHz base frequency and a 3.6 GHz maximum Turbo Boost frequency, 45 MB of Intel Smart Cache, and 145 W design power consumption. The operating system is CentOS Linux 7.4.1708. The main information for all platforms is shown in Table 4.

Table 4The comparison of the main configuration between the MIPS, LoongArch, and X86 platforms.

Download Print Version | Download XLSX

2.4 The difference between the MIPS, LoongArch, and X86 platforms

In this study, the numerical model's source code is written in Fortran, and commonly used compilers for the X86 architecture include Intel Compiler, PGI, and GNU Compiler. The compiler for the MIPS platform is built using the GCC 8.3 MIPS GNU/Linux cross toolchain based on the open-source GNU Project, called MIPS GNU, and the latest version is 8.3. The compiler for the LoongArch platform is built using GCC 8.3 LoongArch GNU/Linux cross toolchain based on an open-source GNU project called LoongArch GNU, and the latest version is 8.3. The compiler for the benchmark platform is set to X86 GNU, and the version is also 8.3. Table 5 shows the differences between all platforms' GNU compilers in terms of applicable platforms. Compared to X86 GNU, the default compilation options of the MIPS GNU compiler not only specify the platform architecture but also include additional instruction sets, such as the atomic operation instruction set LLSC and the shared library instruction set PLT, which can optimize target programs compiled by GNU for the MIPS architecture and improve computational efficiency. The default compilation options of the LoongArch GNU compiler not only specify the platform architecture but also include the target microarchitecture tuning option, which can also optimize target programs compiled by GNU for the LoongArch architecture.

Table 5Comparison of the GNU compiler between the MIPS, LoongArch, and X86 CPU platforms.

Download Print Version | Download XLSX

The WRF-CAMx modelling system depends on several scientific computing libraries. Firstly, the general data formats netCDF and HDF5 are required to store the large-scale gridded data of the modelling system. NetCDF is a self-describing data format developed by NCAR Unidata primarily used for storing multidimensional array data in fields like meteorology and earth sciences (UCAR/Unidata, 2021). HDF5 is a data format developed by HDF Group that supports complex data structures with multiple data types and multidimensional datasets (The HDF Group, 2019). In this study, netCDF-C (v4.8.1), netCDF-Fortran (v4.5.3), HDF5 (v1.12.1), and IOAPI (v3.1) were successfully installed on the MIPS and LoongArch platforms by building from their sources, which are obtained from the official website.

The MPICH library is required to support parallel computing in the modelling system. In order to fully utilize computing resources, the method of MPI message communication is used in the WRF and CAMx models (Wu et al., 2012). MPICH is an open-source, portable parallel computing library for implementing the MPI standard (Amer et al., 2021). It supports inter-process communication and data exchange in a parallel computing environment. Similarly, this study successfully installed MPICH (v3.4) on the MIPS and LoongArch platforms by building on its source. During the compilation and installation of the abovementioned libraries, the configure tool was used to check the basic information of the platform's CPU and compiler and prepare for compatibility with platform before compilation; the GNU compiler was used to compile the source code of libraries, and the CMake tool was used to install the libraries. Additionally, the same runtime environment as that of the MIPS platform was also built on the benchmark platform.

3 Porting the WRF-CAMx modelling system on the MIPS and LoongArch CPU platforms

The simulation result is influenced by several factors, including the processor architecture, operating system, compiler, parallel environment, and scientific computing libraries. In order to ensure the stability and accuracy of numerical simulation, the models should be adapted to the new runtime environment when porting across platforms. Additionally, various operating systems have different tools, software, and libraries, which may impact the results of numerical simulations.

In this study, the runtime environment for the WRF-CAMx modelling system was built on the MIPS and LoongArch platforms, including parallel computing libraries such as MPICH3 (v3.4) and data format libraries such as HDF5 (v1.15.1) and NETCDF (C-v4.8.1, Fortran-v4.5.3). These libraries do not support the architecture (mips64el and LoongArch) and GNU compiler of the Loongson platform. Relevant information needs to be added to the free software config.guess and config.sub provided by GNU. Part of the information is shown in panel (a) in Fig. 3, which can help identify the platform architecture and system during the compilation and installation of libraries using the configure and make tools. The configuration files for making the models were modified to fit the compilers of the Linux system on the MIPS and LoongArch platforms. In order to verify the stability of scientific computing on the MIPS and LoongArch platforms, a control experiment was set up on the benchmark platform, minimizing the impact of other factors on the simulation results of both platforms.

The WRF model v4.0 and CAMx v6.10 were successfully deployed on MIPS and LoongArch platforms through source code compilation and installation. In the WRF model, the default options for the GNU compiler, which are suitable for MIPS and LoongArch architecture CPUs, are not provided in the configure file of the source code package, and it is necessary to incorporate architecture-specific settings for the model. For example, the architecture presets are stored in the configure.defaults file, but the settings regarding the Loongson platform are not included. Specific architecture details, including the CPU architecture, GNU compiler, and compilation flags, need to be added, which can ensure the correct display of configuration during the building of the WRF model, and part of information is shown in panel (b) in Fig. 3. Table 5 provides the detailed information added to the configure file, mainly about MIPS and LoongArch GNU Fortran. When compiling Fortran programs on the MIPS and LoongArch platforms, the MIPS and LoongArch GNU Fortran and necessary compilation flags must be specified. These flags include common Fortran file format flags such as -fconvert=big-endian and -frecord-marker=4 and optimization flags such as -O2 -ftree-vectorize -funroll-loops. By specifying the appropriate compiler and flags for the MIPS an LoongArch architectures, the configure tool will provide necessary settings to compile WRF. Correspondingly, when compiling the WRF model on the benchmark platform, the compilation flags are strictly consistent with those of the MIPS and LoongArch CPU platforms, which ensures that differences in the simulation results of two platforms are primarily attributed to the underlying hardware architecture rather than changes in compilation settings.

In the CAMx model, the makefile provides information about parallelism and compilers. Similarly, information about the CPU architecture, GNU compiler, and compilation flags on MIPS and LoongArch platforms also needs to be added to the makefile. For the detailed information added to the makefile, please refer to Table 5. Additionally, the code of CAMx was modified to make it run smoothly on the MIPS and LoongArch platforms. Taking some function in the CAMx model as an example, the model frequently uses the write function for formatted output. The format specifiers in the parameters consist of data types (I, F, E, A, X, etc.) followed by a character width. In the CAMx model, the format specifiers in the write function mostly default to character width, but there is a compilation issue with MIPS GNU, requiring character width descriptors. It is also essential to ensure consistency with the default precision. A specific example is illustrated in the figure below. A specific example is showed in panel (c) in Fig. 3. So far, the WRF-CAMx model has been successfully compiled and installed on the MIPS and LoongArch platforms after modifications of the configuration files mentioned above.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f03

Figure 3Sample codes containing the configuration index and architecture-specific settings and functions in the WRF-CAMx model. Panel (a) provides architecture information for configuration. Panel (b) shows architecture-specific settings for WRF. Panel (c) illustrates the sample code of functions in CAMx before and after modification.

Download

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f04

Figure 4Spatial distribution of 2 m temperature, surface pressure, and relative humidity from WRF. (a, d, g) The MIPS platform. (b, e, h) The X86 platform. (c, f, i) The differences between the MIPS and benchmark (X86) platforms.

4 The differences in model results on the two platforms

4.1 Validation of the spatial distribution

A 72 h simulation case has been designed to test the stability and availability of the WRF-CAMx modelling system on the MIPS CPU platform in Beijing. By analysing the differences in simulation results and computing time, the accuracy and performance of the modelling system on MIPS platform were evaluated, which further verifies the feasibility and stability of the modelling system after porting to the MIPS platform.

Common meteorological variables, including 2 m temperature, land surface pressure, and relative humidity were selected to verify the WRF model results. Figure 4 shows the spatial distribution of the four meteorological variables after a 72 h simulation on different platforms and the absolute errors (AEs). The meteorological variables from the modelling system on the different platforms exhibit a generally consistent spatial distribution in the Beijing–Tianjin–Hebei region shown in Fig. 4.

Similarly, NO2, SO2, O3, CO, PNO3, and PSO4 were selected to verify the CAMx model results on the MIPS platform. Figure 5 shows the spatial distribution of the six species, as well as the absolute errors (AEs) between the two platforms after a 72 h simulation. Simulating the 72 h case with four parallel processes using MPICH, and CAMx takes about 9 h on Loongson 3A4000 CPU and 2.6 h on Intel Xeon E5-2697 v4 CPU. As shown in Fig. 5, the spatial distribution of air pollution concentrations from the different platforms is essentially consistent, appearing very similar visually.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f05

Figure 5Spatial distribution of NO2, SO2, O3, CO, PNO3, and PSO4 from CAMx on the MIPS and benchmark platforms. The left column shows the MIPS platform. The middle column shows the X86 platform. The right column shows the differences between the MIPS and the benchmark (X86) platforms.

As shown in Fig. 6, which contains the scatter plots comparing the two platforms, it can be seen that for a total of 22 765 grids within the 145×157 simulation domain, the root mean square errors (RMSEs) of the six species between the MIPS platform and the benchmark platform are close to 0.001, which is essentially 0. The linear regression model was used to fit the scatters, and the regression slopes for each species are nearly 1, with intercepts close to 0; the R2 values used for the goodness of fit are nearly 1. The fitted lines closely coincide with the y=x line, indicating that the differences between the MIPS and X86 platforms for each species are minimal to negligible.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f06

Figure 6Scattering grid of the concentrations of NO2, SO2, O3, CO, PNO3, and PSO4 from CAMx on the MIPS and benchmark platforms. The density of the scatters is represented by the colour gradient.

Download

Figure 7 contains the boxplots which show the absolute errors (AEs) and relative errors (REs) of the six species between the MIPS and benchmark platforms. According to Fig. 7, the absolute errors in the six species are generally in the range of ±10−3 ppbv (parts per billion by volume – the unit of NO2, SO2, O3, and CO concentration) or µg m−3 (the unit of particle composition of PNO3 and PSO4), and the relative errors are generally in the range of ±0.01 %. For CO in particular, it exhibits more pronounced AEs compared to other species. In some grid boxes, the AEs between the MIPS and benchmark platforms exceed the range of ±10−3 ppbv, but they remain in the range of ±10−2 ppbv. In summary, there are some errors between the results of the modelling system on the MIPS and benchmark platforms during the porting process. However, these errors are relatively minor compared to the numerical values. The causes are attributed to the differences in the CPU architecture and compiler characteristics between the two platforms, such as data operations and precision running on different CPUs, which are primarily responsible for the observed errors.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f07

Figure 7The absolute errors and the relative errors for NO2, SO2, O3, CO, PNO3, and PSO4 concentrations in all grids between the MIPS and benchmark platforms.

Download

Additionally, random grids in the domain were selected to assess the precision of simulation results in localized regions. The positions of these grids were determined based on 32 observation stations in Beijing, and the nearest grid was determined using the Euclidean shortest distance in the domain. The station map is presented in Fig. S1 in the Supplement. A Taylor diagram is used to assess the precision of concentrations for six species near the observation stations, and the scatters representing the six species at 32 stations overlap significantly. Statistical parameters used in the Taylor diagram, such as the correlation coefficient (R) approaching 1 and the normalized standard deviation (NSD) and the normalized root mean square error (NRMSE) approaching 0, indicate high precision of the simulation results at specific stations on the MIPS platform.

4.2 Validation of the temporal distribution from the two platform

The time series of computational differences also be evaluated in this study. A random grid in the domain was selected to examine the hourly concentrations of the six species. Taking the example of the Beijing Olympic Center station (116.40° E, 39.99° N) from the National Standard Air Quality (NSAQ) stations, the time series of hourly concentrations in the grid of the Beijing Olympic Center station and relative errors between the MIPS and benchmark platforms over the 72 h period are shown in Fig. 8. As shown in Fig. 8, it can be seen that the time series of the air pollutant concentrations are highly consistent between the two platforms. In the 72 h period, the relative errors for NO2, SO2, CO, and PSO4 remain within ±0.025 %. For PNO3, the relative errors remain within ±0.05 %, and for O3, they remain within ±0.1 %. This indicates that the errors caused by different architectures are within a reasonable range.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f08

Figure 8Time series of NO2, SO2, O3, CO, PNO3, and PSO4 concentrations and its relative errors (REs) at the Beijing Olympic Sports Center site between the MIPS and X86 platforms. The solid red line and the dashed blue line represent the CAMx model results on the MIPS platform and the X86 platform, respectfully. The solid black line shows the relative errors (REs) between the MIPS and X86 platforms.

Download

Figure 9 shows the time series of the concentration and their statistical indicators, MAE, RMSE, and MAPE, during the 72 h simulation. As shown in the figure, for NO2, SO2, O3, and PSO4, the MAEs are all below 10−3 ppbv (µg m−3) and the RMSEs are all below 10−3. The MAEs for CO and PNO3 are below 10−2 ppbv (µg m−3), and the RMSEs for PNO3 are below 10−2, while the RMSEs for CO are below 10−1. This is because PNO3 and CO have relatively higher background concentrations compared to the other species. The MAPE of PNO3 concentration is mainly in the range of 0 %–0.5 %, while the MAPE of CO concentration has the lowest values, below 0.001 %; the other species are in the range of 0 %–0.01 %. Overall, the above time series analysis verifies the accuracy and stability of the modelling system on the MIPS platform.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f09

Figure 9Time series of MAEs, RMSEs, and MAPEs for NO2, SO2, O3, CO, PNO3, and PSO4 concentrations in the 72 h simulation. The yellow bar represents the MAE. The red lines show the RMSE and the blue lines the MAPE.

Download

Table 6RMSE, SD, and RMSE/SD for NO2, SO2, O3, CO, PNO3, and PSO4.

Download Print Version | Download XLSX

In this study, the evaluation method proposed by Wang et al. (2021) was also used to assess the scientific applicability of the model results on the MIPS platform. The root mean square errors (RMSEs) for NO2, SO2, O3, CO, PNO3, and PSO4 concentration between the MIPS and benchmark platform were computed along with the standard deviations (SDs) used to describe the spatial variation in species and the ratio of RMSE to SD, as shown in Table 6. The differences between the four species and between the two platforms are negligible compared to their own spatial variations. Therefore, the results of the MIPS platform meet the accuracy requirements for research purposes.

In fact, the differences in model results cannot be completely eliminated, primarily due to the varying CPU architectures and compilers. In the practical applications, compared with the errors arising from the inherent uncertainties in the modelling system and the input data, the differences in model results between different platforms can even be considered negligible. The comprehensive analysis demonstrates that the results of the WRF-CAMx modelling system on the MIPS CPU platform are reasonable.

5 The evaluation of computational performance

Scientific computing involves a significant number of floating-point operations, and the floating-point computational capability is a crucial indicator of CPU performance. In this study, the simulation case was configured to conduct parallel computing tests on the MIPS, LoongArch, and benchmark platforms. These tests included assessing the CPU's single-core performance with the non-parallel model and the platform's parallel performance with the parallel model using multiple processes. The elapsed time of the CAMx model running simulation case for 24 h in the modelling system is shown in Fig. 10. From the figure, it can be observed that under single-core conditions, the computing capability of the MIPS platform for CAMx is approximately one-third of that of the X86 benchmark platform; the one of LoongArch platform is slightly lower than that of X86 benchmark platform.

It is worth noting that the simulation time of the CAMx model when running two processes in parallel and running them in non-parallel remains approximately consistent. This is because the MPI used in CAMx is designed using a primary–secondary parallel processing approach, and a process is allocated for the input/output and message communication during the runtime (Cao et al., 2023). This process does not perform any simulations in the model. Therefore, the time required for parallelism of two processes is comparable to the non-parallelism, and in some cases, it might even be slightly longer due to the overhead of MPI communication. Compared to the non-parallel approach, the speedup of the MIPS platform with four-process parallelism using MPICH3 is approximately 2.8, while when using OpenMP it is about 2.9, and the speedup of the LoongArch platform with four-process parallelism using MPICH3 is approximately 2.8, while when using OpenMP it is about 2.9. For the X86 benchmark platform, running four processes in parallel using MPICH3 has a speedup of approximately 2.7.

Additionally, the performance of the MIPS platform significantly decreases when the number of parallel processes exceeds four. This is because the modelling system involves computing-intensive tasks. The Loongson 3A4000 CPU has four cores, and when the number of processes called by MPI matches the number of CPU cores, the CPU utilization can approach 100 %. Further increasing the number of processes causes the cores to compete for CPU resources, resulting in additional overhead and reduced computational efficiency. As for the LoongArch platform, the performance slightly decreases when the number of parallel processes exceeds four. The Loongson 3A6000 CPU has four physical cores and eight logical cores, and when the number of processes called by MPI matches the number of physical cores, the computational load is evenly distributed across each core. Although the Loongson 3A6000 supports hyper-threading, when further increasing the number of processes, the CPU starts to schedule logical cores to allocate computational load. Thread scheduling results in additional overhead and reduced computational efficiency. This explains why the elapsed time is slightly higher when CAMx is running with five parallel processes compared to with four parallel processes, as shown in the Sect. S2 in the Supplement.

https://gmd.copernicus.org/articles/17/4383/2024/gmd-17-4383-2024-f10

Figure 10Elapsed time of the CAMx model running simulation case with MPICH and OpenMP for 24 h on the MIPS, LoongArch, and benchmark platforms.

Download

In recent years, the Loongson CPUs have been continuously upgraded. Compared to that of the previous generations of products, the performance of Loongson CPUs has shown significant improvement. Wu and Cheng (2019) simulated a nested domain covering Beijing for 48 h using the PSU/NCAR mesoscale model (MM5) on the Loongson 3A quad-core CPU platform. The results showed that the computational capacity of the Loongson 3A platform for the MM5 is approximately equivalent to around 1/12 of the Intel Core 2 Quad Q8400 quad-core CPU, which was released the same year. In the study of Luo et al. (2011), a comparison between Loongson 3A and Intel i5 was made by running the NPB benchmark on each platform. The results showed that the performance of the 3A was nearly 1/10 of that of the i5. The rapid development of Loongson CPUs has provided a strong hardware foundation for the application of numerical simulation and scientific computing on the MIPS and LoongArch architecture CPU platforms. Based on the performance evaluation of WRF-CAMx modelling system on the Loongson 3A4000 and Loongson 3A6000 platforms, it could be found that the computing capability nearly tripled, while maintaining similar power consumption. The adaptation and optimization of the models based on RISC CPUs will also be an important research direction in the future. Many factors influencing parallel performance, such as the computing scale, I/O, and multiprocessor, will be considered to evaluate platforms with a stronger performance and more processors in the future.

6 Conclusions

This study describes the application of the WRF-CAMx model on the MIPS CPU platform. The platform used in this study is the Loongson 3A4000 quad-core CPU, with a main frequency of 1.8–2.0 GHz, which can offer a peak operational speed of 128 Gflops. It is equipped with the MIPS GNU compiler. The benchmark platform used the Intel Xeon E5-2697 v4 CPU along with the same version of the X86 GNU compiler. Based on the characteristics of the CPU architecture and compiler, this study has successfully completed the construction of a runtime environment for the WRF-CAMx modelling system. The application of an air quality modelling system based on WRF-CAMx was successfully tested using a 72 h simulation case in the Beijing–Tianjin–Hebei region.

The results showed that the spatial distribution of the meteorological variables and air pollutant species was nearly identical, with relative errors in the range of ±0.1 %. Statistically, the maximum MAEs of major species ranged from 10−3 to 10−2 ppbv (µg m−3), the maximum RMSEs ranged from 10−2 to 10−1 ppbv (µg m−3), and the MAPEs remained within 0.5 %; the differences caused by the architectures and compilers were within a reasonable range. To simulate a 2 h case with four parallel processes using MPICH, CAMx takes about 15.2 min on the Loongson 3A4000 CPU and 4.8 min on the Intel Xeon E5-2697 v4 CPU. In terms of single-core CPU performance, the single-core computing capability of the Loongson 3A4000 CPU for the WRF-CAMx modelling system is about one-third of than that of the Intel Xeon E5-2697 v4 CPU.

Currently, Loongson Technology has focused on the LoongArch architecture, and it was used in their latest product. It is foreseeable that the LoongArch architecture will lead to more significant performance improvements. In the future, as the numerical models become more complex and computational scales become larger, more models will be tested on high-performance computing platforms equipped with the LoongArch architecture CPUs.

Code and data availability

The source codes of CAMx version 6.10 are available at https://camx-wp.azurewebsites.net/download/source (RAMBOLL ENVIRON Inc., 2014b). The datasets related to this paper and the binary executable files of CAMx for MIPS and LoongArch CPUs are available online via Zenodo (https://doi.org/10.5281/zenodo.10722127, Bai and Wu, 2024).

Supplement

The supplement related to this article is available online at: https://doi.org/10.5194/gmd-17-4383-2024-supplement.

Author contributions

ZB and QW conducted the simulation and prepared the materials. QW planned and organized the project. ZB and QW completed the porting and application of the model for MIPS and LoongArch CPUs. YS collected and prepared the emission data for the simulation. ZB, QW, KC, and HC participated in the discussion.

Competing interests

The contact author has declared that none of the authors has any competing interests.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

Acknowledgements

The research is supported by the High-Performance Scientific Computing Centre (HSCC) of Beijing Normal University.

Financial support

This research has been supported by the National Key Research and Development Programme of China (grant no. 2020YFA0607804) and the Beijing Advanced Innovation Programme for Land Surface.

Review statement

This paper was edited by Xiaomeng Huang and reviewed by three anonymous referees.

References

Amer, A., Balaji, P., Bland, W., Gropp, W., Guo, Y., Latham, R., Lu, H., Oden, L., Pena, A. J., Raffenetti, K., Seo, S., Si, M., Thakur, R., Zhang, J., and Zhao, X.: MPICH User's Guide Version 3.4, https://www.mpich.org/static/downloads/3.4/mpich-3.4-userguide.pdf (last access: January 2024), 2021. 

Appel, K. W., Napelenok, S. L., Foley, K. M., Pye, H. O. T., Hogrefe, C., Luecken, D. J., Bash, J. O., Roselle, S. J., Pleim, J. E., Foroutan, H., Hutzell, W. T., Pouliot, G. A., Sarwar, G., Fahey, K. M., Gantt, B., Gilliam, R. C., Heath, N. K., Kang, D., Mathur, R., Schwede, D. B., Spero, T. L., Wong, D. C., and Young, J. O.: Description and evaluation of the Community Multiscale Air Quality (CMAQ) modeling system version 5.1, Geosci. Model Dev., 10, 1703–1732, https://doi.org/10.5194/gmd-10-1703-2017, 2017. 

Appel, K. W., Bash, J. O., Fahey, K. M., Foley, K. M., Gilliam, R. C., Hogrefe, C., Hutzell, W. T., Kang, D., Mathur, R., Murphy, B. N., Napelenok, S. L., Nolte, C. G., Pleim, J. E., Pouliot, G. A., Pye, H. O. T., Ran, L., Roselle, S. J., Sarwar, G., Schwede, D. B., Sidi, F. I., Spero, T. L., and Wong, D. C.: The Community Multiscale Air Quality (CMAQ) model versions 5.3 and 5.3.1: system updates and evaluation, Geosci. Model Dev., 14, 2867–2897, https://doi.org/10.5194/gmd-14-2867-2021, 2021. 

Bai, X., Tian, H., Liu, X., Wu, B., Liu, S., Hao, Y., Luo, L., Liu, W., Zhao, S., Lin, S., Hao, J., Guo, Z., and Lv, Y.: Spatial-temporal variation characteristics of air pollution and apportionment of contributions by different sources in Shanxi province of China, Atmos. Environ., 244, 117926, https://doi.org/10.1016/j.atmosenv.2020.117926, 2021. 

Bai, Z. and Wu, Q.: Application of regional meteorology and air quality models based on MIPS and LoongArch CPU Platform, Zenodo [data set], https://doi.org/10.5281/zenodo.10722127, 2024. 

Cao, K., Wu, Q., Wang, L., Wang, N., Cheng, H., Tang, X., Li, D., and Wang, L.: GPU-HADVPPM V1.0: a high-efficiency parallel GPU design of the piecewise parabolic method (PPM) for horizontal advection in an air quality model (CAMx V6.10), Geosci. Model Dev., 16, 4367–4383, https://doi.org/10.5194/gmd-16-4367-2023, 2023. 

Chen, H. S., Wang, Z. F., Li, J., Tang, X., Ge, B. Z., Wu, X. L., Wild, O., and Carmichael, G. R.: GNAQPMS-Hg v1.0, a global nested atmospheric mercury transport model: model description, evaluation and application to trans-boundary transport of Chinese anthropogenic emissions, Geosci. Model Dev., 8, 2857–2876, https://doi.org/10.5194/gmd-8-2857-2015, 2015. 

George, A. D.: An overview of RISC vs. CISC, in: [1990] Proceedings, The Twenty-Second Southeastern Symposium on System Theory, The Twenty-Second Southeastern Symposium on System Theory, Cookeville, TN, USA, 436–438, https://doi.org/10.1109/SSST.1990.138185, 1990. 

Hennessy, J., Jouppi, N., Przybylski, S., Rowen, C., Gross, T., Baskett, F., and Gill, J.: MIPS: A microprocessor architecture, SIGMICRO Newsl., 13, 17–22, https://doi.org/10.1145/1014194.800930, 1982. 

Hu, W., Wang, J., Gao, X., Chen, Y., Liu, Q., and Li, G.: Godson-3: A Scalable Multicore RISC Processor with x86 Emulation, IEEE Micro., 29, 17–29, https://doi.org/10.1109/MM.2009.30, 2009. 

Hu, W., Zhang, Y., and Fu, J.: An introduction to CPU and DSP design in China, Sci. China Inf. Sci., 59, 1–8, https://doi.org/10.1007/s11432-015-5431-6, 2016. 

Hu, W., Gao, X., and Zhang, G.: Building the softw are ecosystem for the Loongson instruction set architecture, Information and Communications Technology and Policy, 48, 43–48, https://doi.org/10.12267/j.issn.2096-5931.2022.04.008, 2022 (in Chinese). 

Hu, W.-W., Gao, Y.-P., Chen, T.-S., and Xiao, J.-H.: The Godson Processors: Its Research, Development, and Contributions, J. Comput. Sci. Technol., 26, 363–372, https://doi.org/10.1007/s11390-011-1139-2, 2011. 

Intel Inc.: Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture, https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html (last access: January 2024), 2023. 

Li, L., Chen, Z., and Wang, S.: Power Consumption and Analysis of Server Based on Loongson CPU No. 3, Information Technology & Standardization, 5, 46–50, https://doi.org/10.3969/j.issn.1671-539X.2014.05.012, 2014 (in Chinese). 

Liu, Y., Ye, K., and Xu, C.-Z.: Performance Evaluation of Various RISC Processor Systems: A Case Study on ARM, MIPS and RISC-V, in: Cloud Computing – CLOUD 2021, vol. 12989, Springer, Cham, 61–74, https://doi.org/10.1007/978-3-030-96326-2_5, 2022. 

Luo, Q., Kong, C., Cai, Y., and Liu, G.: Performance Evaluation of OpenMP Constructs and Kernel Benchmarks on a Loongson-3A Quad-Core SMP System, in: 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies, 191–196, https://doi.org/10.1109/PDCAT.2011.66, 2011. 

Mallach, E. G.: RISC: Evaluation and Selection, Journal of Information Systems Management, 8, 8–16, https://doi.org/10.1080/07399019108964978, 1991. 

Michalakes, J., Chen, S., Dudhia, J., Hart, L., Klemp, J., Middlecoff, J., and Skamarock, W.: Development of a next-generation regional weather research and forecast model, in: Developments in Teracomputing, World Scientific, 269–276, https://doi.org/10.1142/9789812799685_0024, 2001. 

MIPS Technology Inc.: MIPS Architecture For Programmers Volume I-A, https://www.mips.com/products/architectures/mips64 (last access: January 2024), 2014. 

Pepe, N., Pirovano, G., Lonati, G., Balzarini, A., Toppetti, A., Riva, G. M., and Bedogni, M.: Development and application of a high resolution hybrid modelling system for the evaluation of urban air quality, Atmos. Environ., 141, 297–311, https://doi.org/10.1016/j.atmosenv.2016.06.071, 2016. 

Powers, J. G., Klemp, J. B., Skamarock, W. C., Davis, C. A., Dudhia, J., Gill, D. O., Coen, J. L., Gochis, D. J., Ahmadov, R., Peckham, S. E., Grell, G. A., Michalakes, J., Trahan, S., Benjamin, S. G., Alexander, C. R., Dimego, G. J., Wang, W., Schwartz, C. S., Romine, G. S., Liu, Z., Snyder, C., Chen, F., Barlage, M. J., Yu, W., and Duda, M. G.: The Weather Research and Forecasting Model: Overview, System Efforts, and Future Directions, B. Am. Meteorol. Soc., 98, 1717–1737, https://doi.org/10.1175/BAMS-D-15-00308.1, 2017. 

RAMBOLL ENVIRON Inc.: CAMx User's Guide Version 6.1, https://camx-wp.azurewebsites.net/Files/CAMxUsersGuide_v6.10.pdf (last access: January 2024), 2014a. 

RAMBOLL ENVIRON Inc.: CAMx v6.10 source code, ENVIRON [code], https://www.camx.com/download/source (last access: January 2024), 2014b. 

Shi, Z.: Technology comparison and research of RISC and CISC, China Science and Technology Information, 131–132, 2008 (in Chinese). 

Skamarock, C., Klemp, B., Dudhia, J., Gill, O., Liu, Z., Berner, J., Wang, W., Powers, G., Duda, G., Barker, D., and Huang, X.: A Description of the Advanced Research WRF Model Version 4, https://doi.org/10.5065/1dfh-6p97, 2019. 

Sun, Y.: Research on the contribution of soil fugitive dust in Beijing based on satellite identification and numerical simulation techology, Master, Beijing Normal University, https://etdlib.bnu.edu.cn (last access: January 2024), 2022. 

Sun, Y., Wu, Q., Wang, L., Zhang, B., Yan, P., Wang, L., Cheng, H., Lv, M., Wang, N., and Ma, S.: Weather Reduced the Annual Heavy Pollution Days after 2016 in Beijing, Sola, 18, 135–139, https://doi.org/10.2151/sola.2022-022, 2022. 

The HDF Group: HDF5 User's Guide Version 1.1, https://portal.hdfgroup.org/display/HDF5/HDF5+User+Guides (last access: January 2024), 2019. 

UCAR/Unidata: NetCDF User's Guide Version 1.1, https://docs.unidata.ucar.edu/nug (last access: January 2024), 2021. 

Wang, H., Lin, J., Wu, Q., Chen, H., Tang, X., Wang, Z., Chen, X., Cheng, H., and Wang, L.: MP CBM-Z V1.0: design for a new Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical mechanism architecture for next-generation processors, Geosci. Model Dev., 12, 749–764, https://doi.org/10.5194/gmd-12-749-2019, 2019. 

Wang, K., Gao, C., Wu, K., Liu, K., Wang, H., Dan, M., Ji, X., and Tong, Q.: ISAT v2.0: an integrated tool for nested-domain configurations and model-ready emission inventories for WRF-AQM, Geosci. Model Dev., 16, 1961–1973, https://doi.org/10.5194/gmd-16-1961-2023, 2023. 

Wang, P., Jiang, J., Lin, P., Ding, M., Wei, J., Zhang, F., Zhao, L., Li, Y., Yu, Z., Zheng, W., Yu, Y., Chi, X., and Liu, H.: The GPU version of LASG/IAP Climate System Ocean Model version 3 (LICOM3) under the heterogeneous-compute interface for portability (HIP) framework and its large-scale application , Geosci. Model Dev., 14, 2781–2799, https://doi.org/10.5194/gmd-14-2781-2021, 2021. 

Wang, S., Li, L., and Chen, Z.: The Test and Analysis on Memory Access Performance Based on Loongson CPU, Information Technology & Standardization, 32–36, 2014 (in Chinese). 

Wang, Z., Xie, F., Wang, X., An, J., and Zhu, J.: Development and Application of Nested Air Quality Prediction Modeling System, Chinese J. Atmos. Sci., 30, 778–790, https://doi.org/10.3878/j.issn.1006-9895.2006.05.07, 2006. 

Wu, Q. and Cheng, H.: Transplantation and application of mesoscale mode on Loongson CPU platform, Journal of Beijing Normal University (Natural Science), 55, 11–18, https://doi.org/10.16360/j.cnki.jbnuns.2019.01.002, 2019. 

Wu, Q. Z., Xu, W. S., Shi, A. J., Li, Y. T., Zhao, X. J., Wang, Z. F., Li, J. X., and Wang, L. N.: Air quality forecast of PM10 in Beijing with Community Multi-scale Air Quality Modeling (CMAQ) system: emission and improvement, Geosci. Model Dev., 7, 2243–2259, https://doi.org/10.5194/gmd-7-2243-2014, 2014. 

Wu, Y., Xu, G., Zhao, Y., and Tan, Y.: Parallel Processing on WRF Meteorological Data Using MPICH, in: 2012 Sixth International Conference on Internet Computing for Science and Engineering, Zhengzhou, China, 2012, 262–265, https://doi.org/10.1109/ICICSE.2012.12, 2012.  

Xiao, H., Wu, Q., Yang, X., Wang, L., and Cheng, H.: Numerical study of the effects of initial conditions and emissions on PM2.5 concentration simulations with CAMx v6.1: a Xi'an case study, Geosci. Model Dev., 14, 223–238, https://doi.org/10.5194/gmd-14-223-2021, 2021. 

Yang, X., Xiao, H., Wu, Q., Wang, L., Guo, Q., Cheng, H., Wang, R., and Tang, Z.: Numerical study of air pollution over a typical basin topography: Source appointment of fine particulate matter during one severe haze in the megacity Xi'an, Sci. Total Environ., 708, 135213, https://doi.org/10.1016/j.scitotenv.2019.135213, 2020. 

Zhang, Y., Bocquet, M., Mallet, V., Seigneur, C., and Baklanov, A.: Real-time air quality forecasting, part I: History, techniques, and current status, Atmos. Environ., 60, 632–655, https://doi.org/10.1016/j.atmosenv.2012.06.031, 2012. 

Zhang, Z., Wang, X., Cheng, S., Guan, P., Zhang, H., Shan, C., and Fu, Y.: Investigation on the difference of PM2.5 transport flux between the North China Plain and the Sichuan Basin, Atmos. Environ., 271, 118922, https://doi.org/10.1016/j.atmosenv.2021.118922, 2022. 

Zhen, J., Guan, P., Yang, R., and Zhai, M.: Transport matrix of PM2.5 in Beijing-Tianjin-Hebei and Yangtze River Delta regions: Assessing the contributions from emission reduction and meteorological conditions, Atmos. Environ., 304, 119775, https://doi.org/10.1016/j.atmosenv.2023.119775, 2023. 

Zhi, Y. and Xu, J.: Android transplantation and analysis based on Loongson, in: 2012 International Conference on Information Management, Innovation Management and Industrial Engineering, Sanya, 2012, 59–61, https://doi.org/10.1109/ICIII.2012.6339777, 2012. 

Download
Short summary
There is relatively limited research on the application of scientific computing on RISC CPU platforms. The MIPS architecture CPUs, a type of RISC CPUs, have distinct advantages in energy efficiency and scalability. The air quality modeling system can run stably on the MIPS and LoongArch platforms, and the experiment results verify the stability of scientific computing on the platforms. The work provides a technical foundation for the scientific application based on MIPS and LoongArch.