<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" dtd-version="3.0">
  <front>
    <journal-meta>
<journal-id journal-id-type="publisher">GMD</journal-id>
<journal-title-group>
<journal-title>Geoscientific Model Development</journal-title>
<abbrev-journal-title abbrev-type="publisher">GMD</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">Geosci. Model Dev.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">1991-9603</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>

    <article-meta>
      <article-id pub-id-type="doi">10.5194/gmd-9-2293-2016</article-id><title-group><article-title>Performance evaluation of a throughput-aware framework for ensemble data
assimilation: the case of NICAM-LETKF</article-title>
      </title-group><?xmltex \runningtitle{Performance evaluation of a throughput-aware ensemble DA system}?><?xmltex \runningauthor{H. Yashiro
et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Yashiro</surname><given-names>Hisashi</given-names></name>
          <email>h.yashiro@riken.jp</email>
        <ext-link>https://orcid.org/0000-0002-2678-526X</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Terasaki</surname><given-names>Koji</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2 aff3">
          <name><surname>Miyoshi</surname><given-names>Takemasa</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-3160-2525</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Tomita</surname><given-names>Hirofumi</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>RIKEN Advanced Institute for Computational Science, Kobe, Japan</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Application Laboratory, Japan Agency for Marine-Earth Science and
Technology, Yokohama, Japan</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>University of Maryland, College Park, Maryland, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Hisashi Yashiro (h.yashiro@riken.jp)</corresp></author-notes><pub-date><day>5</day><month>July</month><year>2016</year></pub-date>
      
      <volume>9</volume>
      <issue>7</issue>
      <fpage>2293</fpage><lpage>2300</lpage>
      <history>
        <date date-type="received"><day>8</day><month>January</month><year>2016</year></date>
           <date date-type="rev-request"><day>12</day><month>February</month><year>2016</year></date>
           <date date-type="rev-recd"><day>30</day><month>May</month><year>2016</year></date>
           <date date-type="accepted"><day>17</day><month>June</month><year>2016</year></date>
      </history>
      <permissions>
<license license-type="open-access">
<license-p>This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/3.0/">http://creativecommons.org/licenses/by/3.0/</ext-link></license-p>
</license>
</permissions><self-uri xlink:href="https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016.html">This article is available from https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016.html</self-uri>
<self-uri xlink:href="https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016.pdf">The full text article is available as a PDF file from https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016.pdf</self-uri>


      <abstract>
    <p>In this paper, we propose the design and implementation of an ensemble data
assimilation (DA) framework for weather prediction at a high resolution and
with a large ensemble size. We consider the deployment of this framework on
the data throughput of file input/output (I/O) and multi-node communication.
As an instance of the application of the proposed framework, a local ensemble
transform Kalman filter (LETKF) was used with a Non-hydrostatic Icosahedral
Atmospheric Model (NICAM) for the DA system. Benchmark tests were performed
using the K computer, a massive parallel supercomputer with distributed file
systems. The results showed an improvement in total time required for the
workflow as well as satisfactory scalability of up to 10 K nodes (80 K
cores). With regard to high-performance computing systems, where data
throughput performance increases at a slower rate than computational
performance, our new framework for ensemble DA systems promises drastic
reduction of total execution time.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <title>Introduction</title>
      <p>Rapid advancements in high-performance computing (HPC) resources in recent
years have enabled the development of atmospheric models to simulate and
predict the weather at high spatial resolution. For effective use of massive
parallel supercomputers, parallel efficiency becomes a common but critical
issue in weather and climate modeling. Scalability for several large-scale
simulations has been accomplished to a certain extent thus far. For example,
the Community Earth System Model (CESM) performs high-resolution coupled
climate simulations by using over 60 K cores of an IBM Blue Gene/P system
(Dennis et al., 2012). Miyamoto et al. (2013) generated the first global
sub-kilometer atmosphere simulation by using the Non-hydrostatic Icosahedral
Atmospheric Model (NICAM) with 160 K cores of the K computer.</p>
      <p>Climate simulations at such high resolutions need to be able to handle the
massive amounts of input/output (henceforth, I/O) data. Since the throughput
of file I/O is much lower than that of the main memory, I/O performance is
important to maintaining the scalability of the simulations as well as
guaranteeing satisfactory computational performance. Parallel I/O is
necessary to improve the total throughput of I/O. In order to improve
performance, a few libraries have been developed for climate models, e.g.,
the application-level parallel I/O (PIO) library, which was developed (Dennis
et al., 2011) and applied to each component model of the CESM. The XML I/O
server (XIOS, <uri>http://forge.ipsl.jussieu.fr/ioserver</uri>) was used in
European models, such as EC-EARTH (Hazeleger et al., 2010). XIOS
distinguishes the I/O node group from the simulation node group and
asynchronously transfers data for output generated by the latter group to the
former. With the development of models at increasing spatial resolution, the
use of parallel I/O libraries will become more common.</p>
      <p>In addition to the simulation generated, the performance of the data
assimilation (DA) system plays an important role in the speed of numerical
weather prediction. Many DA systems have been developed, e.g., variational
methods, Kalman filters, and particle filters. In particular, two advanced DA
methods – the four-dimensional variational (4D-Var) method (Lorenc, 1986)
and the ensemble Kalman filter (EnKF, Evensen, 1994, 2003) – are used at
operational forecasting centers. Hybrid ensemble/4D-Var systems have also
been recently developed (Clayton et al., 2013). 4D-Var systems require an
adjoint model that relies heavily on the simulation model. By contrast, DA
systems using the EnKF method are independent of the model. Ensemble size is
a critical factor in obtaining statistical information regarding the
simulated state in an ensemble DA system. Miyoshi et al. (2014, 2015)
performed 10 240-member EnKF experiments and proposed that the typical
choice of an ensemble size of approximately 100 members is insufficient to
capture the precise probability density function and long-range error
correlations. Thus, it is reasonable to increase not only the resolution of
the model, but also its ensemble size in accordance with performance
enhancement yielded by supercomputers. However, this enhancement in model
resolution and ensemble size leads to a tremendous increase in total data
input and output. For example, prevalent DA systems operating at high
resolution with a large number of ensemble members require terabyte-scale
data transfer between components. In the future, the volume of data in
large-scale ensemble DA systems is expected to reach the petabyte scale.</p>
      <p>In such cases, data movement between the simulation model and the ensemble DA
systems will become the most significant issue. This is because data
distribution patterns for inter-node parallelization in the two systems are
different. The processes of a simulation model share all global grids of a
given ensemble member. By contrast, the DA system requires all ensemble
members for each process. Even if the simulation model and the DA system use
the same processes, the data layout in each is different and, hence, needs to
be altered between them. Thus, a large amount of data exchange through
inter-node communication or file I/O is required. This problem needs to be
addressed in order to enhance the scalability of the ensemble DA system.</p>
      <p>As described above, data throughput between model simulations and ensemble DA
systems becomes much larger than that for single atmospheric simulations. We
are now confronted with the problem of data movement between the two
components. Hamrud et al. (2015) have pointed out the limitation of scaling by using
file I/O in the European Centre for Medium-range Weather Forecasts' (ECMWF)
semi-operational ensemble Kalman filter (EnKF) system. By contrast,
Houtekamer et al. (2014) showed satisfactory scalability in a Canadian
operational EnKF DA system by using parallel I/O. This study aims to
investigate the performance of ensemble DA systems by focusing on reducing
data movement. NICAM (Satoh et al., 2014) and a local ensemble transform
Kalman filter (LETKF) (Hunt et al., 2007) were used as reference cases for
the model and the DA system, respectively. In Sect. 2, we summarize the
design and implementation of the conventional framework for ensemble DA
systems, and illuminate the problem from the perspective of data throughput.
To solve the problem, we propose our framework for DA systems in Sect. 3. In
order to test the effectiveness of our framework, we describe performance and
scalability in the case of NICAM and LETKF on the K computer, which has a
typical mesh torus topology for inter-node communication, in Sect. 4. We
summarize and discuss the results in Sect. 5.</p>
</sec>
<sec id="Ch1.S2">
  <title>NICAM–LETKF DA system</title>
      <p>NICAM (Satoh et al., 2014) is a global non-hydrostatic atmospheric model
developed mainly at the Japan Agency for Marine-Earth Science and Technology,
University of Tokyo, and the RIKEN Advanced Institute for Computational
Science. With the aid of state-of-the-art supercomputers, NICAM has been
contributing to atmospheric modeling at high resolutions. The first global
simulations with a 3.5 km horizontal mesh were carried out on the Earth
Simulator. The simulations showed a realistic multi-scale cloud structure
(Tomita, 2005; Miura et al., 2007). The K computer allowed many more
simulations at the same or higher resolutions. Miyakawa et al. (2014) showed
using several case studies that the skill score of the Madden–Julian
Oscillation (MJO) (Madden and Julian,  1972) improved by using a convection-resolving model in
comparison with other models. As a climate simulation, the 30-year AMIP-type
simulation was conducted with a 14 km horizontal mesh (Kodama et al., 2015).
The global sub-kilometer simulation revealed that the essential change in
convection statistics occurred at a grid spacing of approximately 2 km
(Miyamoto et al., 2013). NICAM employs fully compressible non-hydrostatic
dynamics where the finite volume method is used for discretization on the
icosahedral grid system. The grid point method has the advantage of reducing
data transfer between computational nodes over a spectral transform method,
which requires global communication between nodes and constitutes one of the
bottlenecks in a massively parallel machine.</p>
      <p>The LETKF (Hunt et al., 2007) is an advanced data assimilation method based
on the local ensemble Kalman filter (LEKF; Ott et al., 2004), where the
ensemble update method of the ensemble transform Kalman filter (ETKF; Bishop
et al., 2001) is applied to
reduce computational cost. The LETKF has been coupled with a number of
weather and climate models. For example, Miyoshi and Yamane (2007) applied
the LETKF to the AFES global spectral model (Ohfuchi et al.,
2004), Miyoshi et al. (2010)
applied it to an operational global model developed by the Japan
Meteorological Agency (JMA), and Miyoshi and Kunii (2012) constructed the WRF
(Skamarock et al., 2005)–LETKF system. Kondo and Tanaka (2009) were the
first to conduct simulations under the perfect model scenario by using the
LETKF with NICAM. Terasaki et al. (2015) developed a NICAM–LETKF system for
experiments with real data. In addition to its impressive physical
performance, a reason why many prevalent DA systems employ the LETKF lies in
its massive parallel computation ability, where the analysis of calculation
is separately executed for each grid. The NICAM–LETKF system is based on the
code for the LETKF by Miyoshi (2005). Miyoshi and Yamane (2007) applied a parallel algorithm to the LETKF
for efficient parallel computation, and Miyoshi et al. (2010) addressed load
imbalance in the algorithm.</p>
      <p>The following is devoted to an explanation of the current NICAM-LETKF and a
clarification of the problem. Figure 1 shows a flow diagram of the DA system
with the LETKF and an atmospheric model. In this DA system, three application
programs are used: an atmospheric simulation model, a
simulation-to-observation converter (henceforth, StoO), and the LETKF. These
programs are executed sequentially in a DA cycle. Most atmospheric models
often use aggregated data for file I/O. This framework also assumes that each
member has only a single file containing the simulated state. The numbers of
computational nodes to be used are separately set for each program component.
Since no component contains knowledge of the process used for file I/O, the
output should be located in the shared file system; otherwise, the components
cannot share information with one another. The StoO program reads the
simulation results [<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>f</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>] and observation data [<inline-formula><mml:math display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>] as a first guess. The
simulation results are diagnostically converted into observed variables
[<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>]. By using information regarding the horizontal and vertical
locations, the model grid data are interpolated to data at the position of
observation. Variable conversions, such as radiation calculations, are also
applied when necessary. Following the conversion, the difference between the
converted simulation results and the observations [<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:math></inline-formula>] is
calculated for the output. The StoO program is independently executed for
each ensemble member. In the first version of the NICAM–LETKF system, raw
simulation data on the icosahedral grid are once converted to fit the
latitude–longitude grid. Following this interpolation, the StoO program
generates variables at the observational point using another interpolation.
Although this enables the use of existing DA code, the redundant
interpolation takes time and yields additional interpolation error. Terasaki
et al. (2015) improved this by directly using data on the icosahedral grid
for interpolation at the observation point, instead of using pre-converted
data from the icosahedral to the latitude–longitude grid system. The LETKF
program reads the simulation and the results of the StoO. Processes equal in
number to the ensemble size are selected to read the simulation results in
parallel. Each selected process reads a member of the simulation result
[<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>f</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>] and distributes grid data to all other processes by scatter
communication. Following the data exchange, the main computational part of
the LETKF is separately executed in each process. The results are exchanged
once again by gathering communication among all processes to generate the new
initial grid states [<inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>a</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>] from the selected processes in parallel.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1"><caption><p>Schematic flow of the DA system with the LETKF.</p></caption>
        <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016-f01.pdf"/>

      </fig>

      <p><?xmltex \hack{\newpage}?>The workflow described above has the following three bottlenecks:
<list list-type="order"><list-item>
      <p>limitation in the total throughput of I/O;</p></list-item><list-item>
      <p>collision of I/O requests due to a shared file system (FS); and</p></list-item><list-item>
      <p>global communication of large amounts of data.</p></list-item></list>
Improvement in the parallel efficiency of the LETKF has thus far been made
from the viewpoint of computation (Miyoshi and Yamane, 2007; Miyoshi et al.,
2010). The three bottlenecks above are related to data movement. We discuss
them in detail. First, the number of nodes for the input simulated state is
limited to the number of ensemble members. With a simulation model of
increasing resolution, the amount of output data increases. Nevertheless, the
number of available nodes is limited to the ensemble size in the DA system.
This limitation is due to the assumption that the model output is a single
file. As a result, the time to read grid data increases in the absence of
scalability. Second, the use of a shared file system (FS) causes I/O
communication to slow down. I/O performance is related not only to
throughput, but also to inundation of the I/O request. Many HPC systems adopt
distributed parallel FS, such as Lustre (<uri>http://lustre.org/</uri>), that
enable parallel disk access and improve I/O throughput. However, a latent
bottleneck in the metadata server occurs when a large number of processes
simultaneously access the file system to write data. Third, global
communication takes a long time with a large amount of data. This problem
becomes more serious in high-resolution and large ensembles. A greater number
of grid data items take longer to distribute to all processes. The increase
in the total number of processes also requires time to complete data
exchange. Note that this is more or less true of any network topology. The
scalability of the ensemble DA system on massive parallel supercomputers
worsens due to these bottlenecks.</p><?xmltex \hack{\newpage}?>
</sec>
<sec id="Ch1.S3">
  <title>Proposed NICAM–LETKF framework</title>
      <p>To solve the three problems with the current workflow stated and explained in
Sect. 2, we design and implement a framework for the NICAM–LETKF system. The
key concepts of data handling in the new framework are shown in Fig. 2. This
framework is based on the I/O pattern of NICAM, which handles horizontally
divided data such that each process separately reads and writes files. In an
ensemble simulation, the total number of processes is equal to the
horizontally divided processes multiplied by the ensemble size. This is equal
to the number of output files. Output data from each process are written to a
local disk. We assume that this local disk is not shared by any other
process. In this framework, we use the same number of processes in each of
the three program components. All processes are used for I/O in every
program. We use MPI_Alltoall to exchange grid data (we call this
“shuffle”) in StoO and the LETKF. The processes of ensemble members in the
same positions in the grids are grouped for MPI communication. All ensemble
members in the same local region are included in the same group. This
grouping can minimize the number of communication partners and reduce the
total data transfer distance. We can hence avoid a global shuffle, which is
the third problem with conventional frameworks. Following the computation of
the LETKF, the data for analysis are shuffled again. Data for the next
simulation are then transferred to the local disk, in the reverse order of
the input stage.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2"><caption><p>Schematic diagram of the proposed framework in NICAM–LETKF. PE
means individual MPI processes.</p></caption>
        <?xmltex \igopts{width=227.622047pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016-f02.pdf"/>

      </fig>

      <p>The above concepts of the proposed framework can be applied to any simulation
model. The model can use any grid system, structured or unstructured. Based
on these concepts, the method of implementation in NICAM–LETKF is a typical
example of models with a structured and complicated grid system. NICAM adopts
the icosahedral grid configuration, where the grids quasi-homogeneously cover
the sphere and are horizontally divided into groups called “regions”
(Tomita et al., 2008). One or more regions are assigned to each process. The
global grid is constructed by a recursive method (Tomita et al., 2002; Stuhne
and Peltier, 1999). The regions are also constructed with a rule similar to
the recursive division method. Thus, the structure of the local grids is kept
in the region. We also adopt the same method for grid distribution in each
shuffling group. Figure 3 shows the schematic picture of grid division. By
using a mini-region as a unit, we can retain the grid mesh structure. This
method is advantageous when we interpolate the grid data from the icosahedral
grid system to the location of observation in StoO. However, this rule limits
the available number of processes in the shuffling group, which is equal to
the number of ensemble members. In the case of NICAM, there are
10 <inline-formula><mml:math display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 4<inline-formula><mml:math display="inline"><mml:msup><mml:mi/><mml:mi>n</mml:mi></mml:msup></mml:math></inline-formula> regions, where <inline-formula><mml:math display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is an integer greater than zero. We
can use a divisor of the total number of regions as the number of regions to
assign to each process. The number of mini-regions depends on the number of
regions in a process. For example, we can configure the horizontal grid as
follows; the total number of regions is set to 160. Two regions are assigned
to each process, and 16 <inline-formula><mml:math display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> 16 grids are contained in each region. At
this setting, we can use 1, 2, 4, 8, 16, 32, 64, 128, 256, or 512 as the
ensemble size. We can choose any division method of a local grid group, but
assign priority to the efficiency of interpolation calculation and load
balancing in this study.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3"><caption><p>The concept of grid division.</p></caption>
        <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016-f03.pdf"/>

      </fig>

      <p>In the proposed framework, only the master process of all MPI processes
manages the I/O of the observation data and the results of the StoO. Global
communication is used to broadcast and aggregate these data items. In this
study, the size of these data items is smaller than 50 MB, because of which
the time needed for I/O and the communication of these data items is short.
We leave issues arising from a large amount of observation data as part of
future research, and reflect on it in our discussion.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><caption><p>Configurations of the DA experiment used to measure time taken on
the K computer. PE means individual MPI processes.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="9">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:thead>
       <oasis:row>  
         <oasis:entry colname="col1">Exp. name</oasis:entry>  
         <oasis:entry colname="col2">Horizontal</oasis:entry>  
         <oasis:entry colname="col3">Number of</oasis:entry>  
         <oasis:entry colname="col4">Number of</oasis:entry>  
         <oasis:entry colname="col5">Number of</oasis:entry>  
         <oasis:entry colname="col6">Number of</oasis:entry>  
         <oasis:entry colname="col7">Number of</oasis:entry>  
         <oasis:entry colname="col8">Number of</oasis:entry>  
         <oasis:entry colname="col9">Number of</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">mesh size</oasis:entry>  
         <oasis:entry colname="col3">vertical</oasis:entry>  
         <oasis:entry colname="col4">horizontal</oasis:entry>  
         <oasis:entry colname="col5">horizontal</oasis:entry>  
         <oasis:entry colname="col6">PE</oasis:entry>  
         <oasis:entry colname="col7">ensemble</oasis:entry>  
         <oasis:entry colname="col8">PE</oasis:entry>  
         <oasis:entry colname="col9">horizontal grids</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">  
         <oasis:entry colname="col1"/>  
         <oasis:entry colname="col2">(km)</oasis:entry>  
         <oasis:entry colname="col3">layers</oasis:entry>  
         <oasis:entry colname="col4">grids (per PE)</oasis:entry>  
         <oasis:entry colname="col5">grids (total)</oasis:entry>  
         <oasis:entry colname="col6">(per member)</oasis:entry>  
         <oasis:entry colname="col7">members</oasis:entry>  
         <oasis:entry colname="col8">(total)</oasis:entry>  
         <oasis:entry colname="col9">(per PE, shuffled)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>  
         <oasis:entry colname="col1">G7R0E3</oasis:entry>  
         <oasis:entry colname="col2">56</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">16 900</oasis:entry>  
         <oasis:entry colname="col5">169 000</oasis:entry>  
         <oasis:entry colname="col6">10</oasis:entry>  
         <oasis:entry colname="col7">64</oasis:entry>  
         <oasis:entry colname="col8">640</oasis:entry>  
         <oasis:entry colname="col9">324</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G7R1E3</oasis:entry>  
         <oasis:entry colname="col2">56</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">4356</oasis:entry>  
         <oasis:entry colname="col5">174 240</oasis:entry>  
         <oasis:entry colname="col6">40</oasis:entry>  
         <oasis:entry colname="col7">64</oasis:entry>  
         <oasis:entry colname="col8">2560</oasis:entry>  
         <oasis:entry colname="col9">100</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G7R2E3</oasis:entry>  
         <oasis:entry colname="col2">56</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">1156</oasis:entry>  
         <oasis:entry colname="col5">184 960</oasis:entry>  
         <oasis:entry colname="col6">160</oasis:entry>  
         <oasis:entry colname="col7">64</oasis:entry>  
         <oasis:entry colname="col8">10 240</oasis:entry>  
         <oasis:entry colname="col9">36</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G7R0E4</oasis:entry>  
         <oasis:entry colname="col2">56</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">16 900</oasis:entry>  
         <oasis:entry colname="col5">169 000</oasis:entry>  
         <oasis:entry colname="col6">10</oasis:entry>  
         <oasis:entry colname="col7">256</oasis:entry>  
         <oasis:entry colname="col8">2560</oasis:entry>  
         <oasis:entry colname="col9">100</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G7R1E4</oasis:entry>  
         <oasis:entry colname="col2">56</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">4356</oasis:entry>  
         <oasis:entry colname="col5">174 240</oasis:entry>  
         <oasis:entry colname="col6">40</oasis:entry>  
         <oasis:entry colname="col7">256</oasis:entry>  
         <oasis:entry colname="col8">10 240</oasis:entry>  
         <oasis:entry colname="col9">36</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G6R0E3</oasis:entry>  
         <oasis:entry colname="col2">112</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">4356</oasis:entry>  
         <oasis:entry colname="col5">43 560</oasis:entry>  
         <oasis:entry colname="col6">10</oasis:entry>  
         <oasis:entry colname="col7">64</oasis:entry>  
         <oasis:entry colname="col8">640</oasis:entry>  
         <oasis:entry colname="col9">100</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G6R1E3</oasis:entry>  
         <oasis:entry colname="col2">112</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">1156</oasis:entry>  
         <oasis:entry colname="col5">46 240</oasis:entry>  
         <oasis:entry colname="col6">40</oasis:entry>  
         <oasis:entry colname="col7">64</oasis:entry>  
         <oasis:entry colname="col8">2560</oasis:entry>  
         <oasis:entry colname="col9">36</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G8R1E3</oasis:entry>  
         <oasis:entry colname="col2">28</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">16 900</oasis:entry>  
         <oasis:entry colname="col5">676 000</oasis:entry>  
         <oasis:entry colname="col6">40</oasis:entry>  
         <oasis:entry colname="col7">64</oasis:entry>  
         <oasis:entry colname="col8">2560</oasis:entry>  
         <oasis:entry colname="col9">324</oasis:entry>
       </oasis:row>
       <oasis:row>  
         <oasis:entry colname="col1">G8R2E3</oasis:entry>  
         <oasis:entry colname="col2">28</oasis:entry>  
         <oasis:entry colname="col3">40</oasis:entry>  
         <oasis:entry colname="col4">4356</oasis:entry>  
         <oasis:entry colname="col5">696 960</oasis:entry>  
         <oasis:entry colname="col6">160</oasis:entry>  
         <oasis:entry colname="col7">64</oasis:entry>  
         <oasis:entry colname="col8">10 240</oasis:entry>  
         <oasis:entry colname="col9">100</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S4">
  <title>Performance evaluation</title>
      <p>In this section, we describe experiments to test the proposed framework on
the K computer. This computer system is equipped with both a global and local
FS. The user enters initial data from the global FS to the local FS through
the staging process. The local FS in a node has a shared directory with all
other nodes, and a local (rank) directory used only by the node. Although the
shared directory allows all nodes to access one another, its throughput is
degraded by the frequency of requests for I/O from them. On the contrary, the
local directory can maximize the efficiency of total I/O bandwidth because
any conflict in I/O between nodes is avoided by reducing the load on the
metadata server. In our comparative case study, the old framework used a
shared directory in the conventional manner, while the new framework used
only the rank directory.</p>
      <p>Table 1 summarizes the experimental setup: the resolutions, the number of
ensembles, the number of processes, and so forth. As the observational data
were assimilated into the results of the model, the NCEP PREPBUFR (available
at <uri>http://rda.ucar.edu/datasets/ds337.0</uri>)
observation data set was used. Data thinning was applied according to a
112 km mesh, and the same number (50 000 per 6 h on average) of total
observations was used for all experiments. Covariance localization was
adopted by using a Gaussian function within 400 km in the horizontal and 0.2
<inline-formula><mml:math display="inline"><mml:mrow><mml:mi>l</mml:mi><mml:mi>n</mml:mi><mml:mo>(</mml:mo><mml:mi>p</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> in the vertical directions, where <inline-formula><mml:math display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> represents pressure. Note that
the simulation with a 28 km mesh employed a more sophisticated cloud
microphysics scheme.</p>
      <p>Figure 4 shows the breakdown of elapsed time for a DA cycle in the case
involving 256 members. The blue bar shows NICAM, whereas the green and red
bars show StoO and the LETKF, respectively. The shaded part represents the
time taken for communication and I/O. As reference, we confirmed that the
112 km mesh experiment took comparable times in model simulation and data
assimilation when using the 2560 processes. The computation times for the
StoO and the LETKF increased fourfold in the 56 km mesh experiments, as
shown in Fig. 4. This was reasonable in light of the increase in horizontal
resolution. On the contrary, the time required for the simulation increased
almost eightfold. If we halve the grid spacing, we have to halve <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula>.
Therefore the number of simulation steps doubles. A higher resolution
incurred a longer execution time than that needed for data assimilation.
Thus, we need to increase the number of computation nodes to shorten elapsed
time. For example, as shown in Fig. 4, the number of nodes increased
fourfold. Although we expected a fourfold reduction in time, the old
framework could not attain effective reduction in data assimilation due to
the bottleneck associated with I/O and the communication components. By
contrast, the proposed framework yielded scalability in terms of computation,
IO, and communication. In particular, a significant reduction was observed in
the time needed for StoO. In this study, multiple time slots of the
observation and the model output were used for StoO calculation following
Miyoshi et al. (2010). Thus, input data size in the StoO was 7 times larger
than that in the LETKF. The improvement in I/O throughput largely contributed
to the performance gain in StoO.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4"><caption><p>The time taken by NICAM–LETKF on the old and new frameworks.</p></caption>
        <?xmltex \igopts{width=236.157874pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016-f04.pdf"/>

      </fig>

      <p>From the viewpoint of computational efficiency, we obtained 36 TFLOPS (tera
floating-point operations per second) of sustained performance in a DA cycle
with 10 240 processes that corresponded to 81 920 cores. The ratio of
computation time to total time improved from 0.44 in the conventional
framework to 0.76 in the proposed framework. From Fig. 4, we see that the
computation time of the LETKF in the proposed framework increased in
comparison with that in the old framework. This is because of load imbalance
in the former. The number of observational data items used for data
assimilation at each grid point depended on the localized radius in the LETKF
and the spatial homogeneity of the observational data. The conventional
framework can avoid this imbalance by shuffling the grid among all nodes
(Miyoshi et al., 2010). However, the proposed framework cannot avoid it
because each process manages its spatially localized region. In other words,
the proposed framework reduces data movement at the cost of load balancing.
No satellite observation was used in this study. The number of observations
will increase by 1 or 2 orders of magnitude if we use satellite observations.
When high-resolution data assimilation is conducted with frequent
assimilation cycles (e.g., 3 h for the assimilation window), the
inhomogeneity of the observation becomes larger and the load-balancing issue
more critical. There is a trade-off between computational load balancing and
data movement. It is worth considering the balancing technique described by
Hamrud et al. (2015). On the other hand, we argue that the speedup in
computation in the LETKF is easier than that of global communication with
massive nodes. The ratio of floating point operations to memory accesses in
LETKF analysis is large. This type of calculation is expected to become
faster by performance enhancement in future processors. We should select the
best method for load balancing according to the number of nodes used, the
number of observations, the analytical method used in the LETKF, and the
performance of the computer system.</p>
      <p>Figure 5 shows the elapsed time for one DA cycle for all experiments listed
in Table 1. We can confirm that any resolution experiment could yield
satisfactory scalability. This suggests that the new framework provides
effective procedures for high-resolution and large-ensemble experiments on
massively parallel computers.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5"><caption><p>The time taken by NICAM–LETKF on the new framework.</p></caption>
        <?xmltex \igopts{width=213.395669pt}?><graphic xlink:href="https://gmd.copernicus.org/articles/9/2293/2016/gmd-9-2293-2016-f05.pdf"/>

      </fig>

</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <title>Summary and discussion</title>
      <p>In this paper, we proposed a framework that maintains data locality and
maximizes the throughput of file I/O between the simulation model and the
ensemble DA system. Each process manages data in a local disk. Separated
parallel I/O is effective not only for read/write operations, but also for
access to the metadata server. To reduce communication time, we changed the
global communication of grid data to smaller group communication. The
movement of data is strongly related to energy consumption as well as
computational cost. Our approach is based on the concept of reducing the size
and distance of moving data in the entire system. We assessed the performance
of our framework on the K computer. Since the K computer is constructed as a
distributed FS with a three-dimensional mesh torus, it is not clear whether
the approach proposed in this paper is effective with other FS and inter-node
network topologies. However, the underlying concept – that minimizing data
movement leads to better computational performance – will hold for most
other supercomputer systems. This suggests that the cooperative design and
development of the model and the DA system are necessary for optimization.</p>
      <p>To improve I/O throughput for tentative use of disk storage, large HPC
systems are being equipped with high-speed disks or non-volatile memory, such
as a solid-state drive (SSD) with nodes (e.g., Catalyst at the Lawrence
Livermore National Laboratory and TSUBAME2.5 in Tokyo Tech.). We propose
using these “buffer disks” in future HPC systems. Each process occupies the
local disk of its own node, and access collision can hence be avoided. We can
also use the main memory as buffer storage depending on problem size. If
memory is sufficient, we can facilitate the exchange of data between the
simulation model and the LETKF without any disk I/O. If we have a limited
number of ensemble members to execute at the same time, we should use
storage. We can store more than one set of output files to the buffer
storage. The resilience of the DA cycle is also an important issue. It is
better to back up analysis data during the cycle. We can copy the data as a
background process from the local FS to the global FS when the data files are
not busy. If the DA cycle is stopped due to the node failure, we can restart
by using the latest data in the global FS.</p>
      <p>The observation-space data files used in this study were not distributed
because these files were relatively small (e.g., <inline-formula><mml:math display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 1 MB for observation
data, <inline-formula><mml:math display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 26 MB for sim-to-obs data involving 256 members). However, the
amount of observational data continues to increase. For example, massive
multi-channel satellites, such as the Atmospheric Infrared Sounder (AIRS,
Aumann et al., 2003), provide massive amounts of data for large areas. The
Himawari 8 geostationary satellite generates approximately 50 times more data
than its predecessor. Several hundred megabytes of observation data and
several gigabytes of converted data by StoO are used in each assimilation
instance. A node of a massive parallel supercomputer does not have sufficient
memory to store these amounts of observation data. The time required by the
master node to read such volumes of data will become a bottleneck. We thus
need to consider a division between observation data and their parallel I/O.
The number of observational data items required by each process varies
according to the number of divided parts of the simulation data and the
spatial distribution of observation. Each process of the StoO program applies
a conversion within the assigned area. By contrast, each process of the LETKF
requires that the data be converted through multiple processes in the StoO
according to the spatial localization range. Data exchanged using a library,
such as MapReduce, are effective for such altering of many-to-many
relationships. In order to increase the speed of data assimilation systems in
the future, preprocessing of the observation data, such as dividing,
grouping, and quality check, will be incorporated into our framework.</p>
</sec>
<sec id="Ch1.S6">
  <title>Code availability</title>
      <p>Information concerning NICAM can be found at <uri>http://nicam.jp/</uri>. The
source code for NICAM can be obtained upon request (see
<uri>http://nicam.jp/hiki/?Research+Collaborations</uri>). The source code for the
LETKF is open source and available
at <uri>https://code.google.com/archive/p/miyoshi/</uri>. The previous version of
the NICAM-LETKF DA system was based on this LETKF code and the NICAM.13 tag
version of the NICAM code. The new version of the DA system proposed in this
study is based on tag version NICAM.15, which includes LETKF
code.</p>
</sec>

      
      </body>
    <back><ack><title>Acknowledgements</title><p>The authors are grateful to the editors of Geoscientific Model Development
and anonymous reviewers. This work was partially funded by MEXT's program for
the Development and Improvement of Next-generation Ultra High-Speed Computer
System, under its Subsidies for Operating the Specific Advanced Large
Research Facilities. The experiments were performed using the K computer at
the RIKEN Advanced Institute for Computational Science. This study was partly
supported by JAXA/PMM.<?xmltex \hack{\newline}?><?xmltex \hack{\newline}?> Edited by: J. Annan</p></ack><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Aumann, H. H., Chahine, M. T., Gautier, C., Goldberg, M. D., Kalnay, E.,
McMillin, L. M., Revercomb, H., Rosenkranz, P. W., Smith, W. L., Staelin, D.
H., Strow, L. L., and Susskind, J.: AIRS/AMSU/HSB on the Aqua mission:
Design, science objectives, data products, and processing systems, IEEE T.
Geosci. Remote Sens., 41, 253–264, <ext-link xlink:href="http://dx.doi.org/10.1109/TGRS.2002.808356" ext-link-type="DOI">10.1109/TGRS.2002.808356</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Bishop, C. H., Etherton, B. J., and Majumdar, S. J.: Adaptive sampling with
the Ensemble Transform Kalman Filter. Part I: Theoretical aspects,
<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0493(2001)129&lt;0420:ASWTET&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(2001)129&lt;0420:ASWTET&gt;2.0.CO;2</ext-link>,
2001.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Clayton, A. M., Lorenc, A. C., and Barker, D. M.: Operational implementation
of a hybrid ensemble/4D-Var global data assimilation system at the Met
Office, Q. J. Roy. Meteor. Soc., 139, 1445–1461, <ext-link xlink:href="http://dx.doi.org/10.1002/qj.2054" ext-link-type="DOI">10.1002/qj.2054</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Dennis, J. M., Edwards, J., Loy, R., and Jacob, R.: An application-level
parallel I/O library for Earth system models, Int. J. High Perform. C., 26,
43–53, <ext-link xlink:href="http://dx.doi.org/10.1177/1094342011428143" ext-link-type="DOI">10.1177/1094342011428143</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Dennis, J. M., Vertenstein, M., Worley, P. H., Mirin, A. A., Craig, A. P.,
Jacob, R., and Mickelson, S.: Computational performance of
ultra-high-resolution capability in the Community Earth System Model, Int. J.
High Perform. C., 26, 5–16, <ext-link xlink:href="http://dx.doi.org/10.1177/1094342012436965" ext-link-type="DOI">10.1177/1094342012436965</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Evensen, G.: Sequential data assimilation with a nonlinear quasi-geostrophic
model using Monte Carlo methods to forecast error statistics, J. Geophys.
Res.-Atmos., 99, 10143–10162, <ext-link xlink:href="http://dx.doi.org/10.1029/94JC00572" ext-link-type="DOI">10.1029/94JC00572</ext-link>, 1994.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Evensen, G.: The Ensemble Kalman Filter: Theoretical formulation and
practical implementation, Ocean Dynam., 53, 343–367,
<ext-link xlink:href="http://dx.doi.org/10.1007/s10236-003-0036-9" ext-link-type="DOI">10.1007/s10236-003-0036-9</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Hazeleger, W., Severijns, C., Semmler, T., Ştefănescu, S., Yang, S.,
Wang, X., Wyser, K., Dutra, E., Baldasano, J. M., Bintanja, R., Bougeault,
P., Caballero, R., Ekman, A. M. L., Christensen, J. H., van den Hurk, B.,
Jimenez, P., Jones, C., Kållberg, P., Koenigk, T., McGrath, R., Miranda,
P., Van Noije, T., Palmer, T., Parodi, J. A., Schmith, T., Selten, F.,
Storelvmo, T., Sterl, A., Tapamo, H., Vancoppenolle, M., Viterbo, P., and
Willén, U.: EC-Earth: A seamless earth-system prediction approach in
action, B. Am. Meteorol. Soc., 91, 1357–1363, <ext-link xlink:href="http://dx.doi.org/10.1175/2010BAMS2877.1" ext-link-type="DOI">10.1175/2010BAMS2877.1</ext-link>,
2010.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Hamrud, M., Bonavita, M., and Isaksen, L.: EnKF and Hybrid Gain Ensemble Data
Assimilation. Part I: EnKF Implementation, Mon. Weather Rev., 143,
4847–4864, <ext-link xlink:href="http://dx.doi.org/10.1175/MWR-D-14-00333.1" ext-link-type="DOI">10.1175/MWR-D-14-00333.1</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Houtekamer, P. L., He, B., and Mitchell, H. L.: Parallel Implementation of an
Ensemble Kalman Filter, Mon. Weather Rev., 142, 1163–1182,
<ext-link xlink:href="http://dx.doi.org/10.1175/MWR-D-13-00011.1" ext-link-type="DOI">10.1175/MWR-D-13-00011.1</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Hunt, B. R., Kostelich, E. J., and Szunyogh, I.: Efficient data assimilation
for spatiotemporal chaos: A local ensemble transform Kalman filter, Physica
D, 230, 112–126, <ext-link xlink:href="http://dx.doi.org/10.1016/j.physd.2006.11.008" ext-link-type="DOI">10.1016/j.physd.2006.11.008</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Kodama, C., Yamada, Y., Noda, A. T., Kikuchi, K., Kajikawa, Y., Nasuno, T.,
Tomita, T., Yamaura, T., Takahashi, H. G., Hara, M., Kawatani, Y., Satoh, M.,
and Sugi, M.: A 20-year climatology of a NICAM AMIP-type simulation, J. Math.
Soc. Jpn., 93, 393–424, <ext-link xlink:href="http://dx.doi.org/10.2151/jmsj.2015-024" ext-link-type="DOI">10.2151/jmsj.2015-024</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Kondo, K. and Tanaka, H. L.: Applying the local ensemble transform Kalman
filter to the Nonhydrostatic Icosahedral Atmospheric Model (NICAM), SOLA, 5,
121–124, <ext-link xlink:href="http://dx.doi.org/10.2151/sola.2009-031" ext-link-type="DOI">10.2151/sola.2009-031</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Lorenc, A. C.: Analysis methods for numerical weather prediction, Q. J. Roy.
Meteor. Soc., 112, 1177–1194, <ext-link xlink:href="http://dx.doi.org/10.1002/qj.49711247414" ext-link-type="DOI">10.1002/qj.49711247414</ext-link>, 1986.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Madden, R. A. and Julian, P. R.: Description of global-scale circulation
cells in the Tropics with a 40–50 day period, J. Atmos. Sci.,
<ext-link xlink:href="http://dx.doi.org/10.1175/1520-0469(1972)029&lt;1109:DOGSCC&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0469(1972)029&lt;1109:DOGSCC&gt;2.0.CO;2</ext-link>,
1972.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Miura, H., Satoh, M., Nasuno, T., Noda, A. T., and Oouchi, K.: A
Madden–Julian oscillation event realistically simulated by a global
cloud-resolving model, Science, 318, 1763–1765, <ext-link xlink:href="http://dx.doi.org/10.1126/science.1148443" ext-link-type="DOI">10.1126/science.1148443</ext-link>,
2007.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Miyakawa, T., Satoh, M., Miura, H., Tomita, H., Yashiro, H., Noda, A. T.,
Yamada, Y., Kodama, C., Kimoto, M., and Yoneyama, K.: Madden–Julian
oscillation prediction skill of a new-generation global model demonstrated
using a supercomputer, Nat. Commun., 5, 3769, <ext-link xlink:href="http://dx.doi.org/10.1038/ncomms4769" ext-link-type="DOI">10.1038/ncomms4769</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Miyamoto, Y., Kajikawa, Y., Yoshida, R., Yamaura, T., Yashiro, H., and
Tomita, H.: Deep moist atmospheric convection in a subkilometer global
simulation, Geophys. Res. Lett, 40, 4922–4926, <ext-link xlink:href="http://dx.doi.org/10.1002/grl.50944" ext-link-type="DOI">10.1002/grl.50944</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>
Miyoshi, T.: Ensemble Kalman filter experiments with a primitive-equation global model, PhD dissertation, University of Maryland, 197 pp., 2005.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Miyoshi, T. and Kunii, M.: The Local Ensemble Transform Kalman filter with
the weather research and forecasting model: Experiments with real
observations, Pure Appl. Geophys., 169, 321–333,
<ext-link xlink:href="http://dx.doi.org/10.1007/s00024-011-0373-4" ext-link-type="DOI">10.1007/s00024-011-0373-4</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>Miyoshi, T. and Yamane, S.: Local Ensemble Transform Kalman filtering with an
AGCM at a T159/L48 resolution, Mon. Weather Rev., 135, 3841–3861,
<ext-link xlink:href="http://dx.doi.org/10.1175/2007MWR1873.1" ext-link-type="DOI">10.1175/2007MWR1873.1</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Miyoshi, T., Sato, Y., and Kadowaki, T.: Ensemble Kalman filter and 4D-Var
intercomparison with the Japanese operational global analysis and prediction
system, Mon. Weather Rev., 138, 2846–2866, <ext-link xlink:href="http://dx.doi.org/10.1175/2010MWR3209.1" ext-link-type="DOI">10.1175/2010MWR3209.1</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Miyoshi, T., Kondo, K., and Imamura, T.: The 10,240-member ensemble Kalman
filtering with an intermediate AGCM, Geophys. Res. Lett, 41, 5264–5271,
<ext-link xlink:href="http://dx.doi.org/10.1002/2014GL060863" ext-link-type="DOI">10.1002/2014GL060863</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>Miyoshi, T., Kondo, K., and Terasaki, K.: Big Ensemble Data Assimilation in
Numerical Weather Prediction, Computer, 48, 15–21, <ext-link xlink:href="http://dx.doi.org/10.1109/MC.2015.332" ext-link-type="DOI">10.1109/MC.2015.332</ext-link>,
2015.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>
Ohfuchi, W., Nakamura, H., and Yoshioka, M. K.: 10-km mesh meso-scale resolving
simulations of the global atmosphere on the Earth Simulator: Preliminary outcomes of AFES (AGCM for the Earth Simulator), J. Earth Simul., 1, 8–34, 2004.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza,
M., Kalnay, E., Patil, D. J., and Yorke, J. A.: A local ensemble Kalman
filter for atmospheric data assimilation, Tellus A, 56, 415–428,
<ext-link xlink:href="http://dx.doi.org/10.3402/tellusa.v56i5.14462" ext-link-type="DOI">10.3402/tellusa.v56i5.14462</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Satoh, M., Tomita, H., Yashiro, H., Miura, H., Kodama, C., Seiki, T., Noda,
A. T., Yamada, Y., Goto, D., Sawada, M., Miyoshi, T.,
Niwa, Y., Hara, M., Ohno, T., Iga, S.-I., Arakawa, T., Inoue, T., and
Kubokawa, H.: The Non-hydrostatic Icosahedral Atmospheric Model: Description
and development, Progress in Earth and Planetary Science, Springer, 1, 1–32,
<ext-link xlink:href="http://dx.doi.org/10.1186/s40645-014-0018-1" ext-link-type="DOI">10.1186/s40645-014-0018-1</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation>Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O., Barker, D. M., Wang,
W., and Powers, J. G.: A description of the Advanced Research WRF Version 2,
NCAR Tech Notes-468<inline-formula><mml:math display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula>STR, 2005.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>Stuhne, G. R. and Peltier, W. R.: New icosahedral grid-point discretizations
of the shallow water equations on the sphere, J. Comput. Phys., 148, 23–58,
<ext-link xlink:href="http://dx.doi.org/10.1006/jcph.1998.6119" ext-link-type="DOI">10.1006/jcph.1998.6119</ext-link>, 1999.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>Terasaki, K., Sawada, M., and Miyoshi, T.: Local Ensemble Transform Kalman
filter experiments with the Nonhydrostatic Icosahedral Atmospheric Model
NICAM, SOLA, 11, 23–26, <ext-link xlink:href="http://dx.doi.org/10.2151/sola.2015-006" ext-link-type="DOI">10.2151/sola.2015-006</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>Tomita, H.: A global cloud-resolving simulation: Preliminary results from an
aqua planet experiment, Geophys. Res. Lett, 32, L08805,
<ext-link xlink:href="http://dx.doi.org/10.1029/2005gl022459" ext-link-type="DOI">10.1029/2005gl022459</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation>
Tomita, H., Satoh, M., and Goto, K.: An optimization of the icosahedral grid
modified by spring dynamics, J. Comput. Phys., 183, 307–331, 2002.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><mixed-citation>Tomita, H., Goto, K., and Satoh, M.: A new approach to atmospheric general
circulation model: Global cloud resolving model NICAM and its computational
performance, SIAM J. Sci. Comput., 30, 2755–2776, <ext-link xlink:href="http://dx.doi.org/10.1137/070692273" ext-link-type="DOI">10.1137/070692273</ext-link>,
2008.</mixed-citation></ref>

  </ref-list><app-group content-type="float"><app><title/>

    </app></app-group></back>
    <!--<article-title-html>Performance evaluation of a throughput-aware framework for ensemble data
assimilation: the case of NICAM-LETKF</article-title-html>
<abstract-html><p class="p">In this paper, we propose the design and implementation of an ensemble data
assimilation (DA) framework for weather prediction at a high resolution and
with a large ensemble size. We consider the deployment of this framework on
the data throughput of file input/output (I/O) and multi-node communication.
As an instance of the application of the proposed framework, a local ensemble
transform Kalman filter (LETKF) was used with a Non-hydrostatic Icosahedral
Atmospheric Model (NICAM) for the DA system. Benchmark tests were performed
using the K computer, a massive parallel supercomputer with distributed file
systems. The results showed an improvement in total time required for the
workflow as well as satisfactory scalability of up to 10 K nodes (80 K
cores). With regard to high-performance computing systems, where data
throughput performance increases at a slower rate than computational
performance, our new framework for ensemble DA systems promises drastic
reduction of total execution time.</p></abstract-html>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
Aumann, H. H., Chahine, M. T., Gautier, C., Goldberg, M. D., Kalnay, E.,
McMillin, L. M., Revercomb, H., Rosenkranz, P. W., Smith, W. L., Staelin, D.
H., Strow, L. L., and Susskind, J.: AIRS/AMSU/HSB on the Aqua mission:
Design, science objectives, data products, and processing systems, IEEE T.
Geosci. Remote Sens., 41, 253–264, <a href="http://dx.doi.org/10.1109/TGRS.2002.808356" target="_blank">doi:10.1109/TGRS.2002.808356</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
Bishop, C. H., Etherton, B. J., and Majumdar, S. J.: Adaptive sampling with
the Ensemble Transform Kalman Filter. Part I: Theoretical aspects,
<a href="http://dx.doi.org/10.1175/1520-0493(2001)129&lt;0420:ASWTET&gt;2.0.CO;2" target="_blank">doi:10.1175/1520-0493(2001)129&lt;0420:ASWTET&gt;2.0.CO;2</a>,
2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
Clayton, A. M., Lorenc, A. C., and Barker, D. M.: Operational implementation
of a hybrid ensemble/4D-Var global data assimilation system at the Met
Office, Q. J. Roy. Meteor. Soc., 139, 1445–1461, <a href="http://dx.doi.org/10.1002/qj.2054" target="_blank">doi:10.1002/qj.2054</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
Dennis, J. M., Edwards, J., Loy, R., and Jacob, R.: An application-level
parallel I/O library for Earth system models, Int. J. High Perform. C., 26,
43–53, <a href="http://dx.doi.org/10.1177/1094342011428143" target="_blank">doi:10.1177/1094342011428143</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
Dennis, J. M., Vertenstein, M., Worley, P. H., Mirin, A. A., Craig, A. P.,
Jacob, R., and Mickelson, S.: Computational performance of
ultra-high-resolution capability in the Community Earth System Model, Int. J.
High Perform. C., 26, 5–16, <a href="http://dx.doi.org/10.1177/1094342012436965" target="_blank">doi:10.1177/1094342012436965</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
Evensen, G.: Sequential data assimilation with a nonlinear quasi-geostrophic
model using Monte Carlo methods to forecast error statistics, J. Geophys.
Res.-Atmos., 99, 10143–10162, <a href="http://dx.doi.org/10.1029/94JC00572" target="_blank">doi:10.1029/94JC00572</a>, 1994.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
Evensen, G.: The Ensemble Kalman Filter: Theoretical formulation and
practical implementation, Ocean Dynam., 53, 343–367,
<a href="http://dx.doi.org/10.1007/s10236-003-0036-9" target="_blank">doi:10.1007/s10236-003-0036-9</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
Hazeleger, W., Severijns, C., Semmler, T., Ştefănescu, S., Yang, S.,
Wang, X., Wyser, K., Dutra, E., Baldasano, J. M., Bintanja, R., Bougeault,
P., Caballero, R., Ekman, A. M. L., Christensen, J. H., van den Hurk, B.,
Jimenez, P., Jones, C., Kållberg, P., Koenigk, T., McGrath, R., Miranda,
P., Van Noije, T., Palmer, T., Parodi, J. A., Schmith, T., Selten, F.,
Storelvmo, T., Sterl, A., Tapamo, H., Vancoppenolle, M., Viterbo, P., and
Willén, U.: EC-Earth: A seamless earth-system prediction approach in
action, B. Am. Meteorol. Soc., 91, 1357–1363, <a href="http://dx.doi.org/10.1175/2010BAMS2877.1" target="_blank">doi:10.1175/2010BAMS2877.1</a>,
2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
Hamrud, M., Bonavita, M., and Isaksen, L.: EnKF and Hybrid Gain Ensemble Data
Assimilation. Part I: EnKF Implementation, Mon. Weather Rev., 143,
4847–4864, <a href="http://dx.doi.org/10.1175/MWR-D-14-00333.1" target="_blank">doi:10.1175/MWR-D-14-00333.1</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
Houtekamer, P. L., He, B., and Mitchell, H. L.: Parallel Implementation of an
Ensemble Kalman Filter, Mon. Weather Rev., 142, 1163–1182,
<a href="http://dx.doi.org/10.1175/MWR-D-13-00011.1" target="_blank">doi:10.1175/MWR-D-13-00011.1</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
Hunt, B. R., Kostelich, E. J., and Szunyogh, I.: Efficient data assimilation
for spatiotemporal chaos: A local ensemble transform Kalman filter, Physica
D, 230, 112–126, <a href="http://dx.doi.org/10.1016/j.physd.2006.11.008" target="_blank">doi:10.1016/j.physd.2006.11.008</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
Kodama, C., Yamada, Y., Noda, A. T., Kikuchi, K., Kajikawa, Y., Nasuno, T.,
Tomita, T., Yamaura, T., Takahashi, H. G., Hara, M., Kawatani, Y., Satoh, M.,
and Sugi, M.: A 20-year climatology of a NICAM AMIP-type simulation, J. Math.
Soc. Jpn., 93, 393–424, <a href="http://dx.doi.org/10.2151/jmsj.2015-024" target="_blank">doi:10.2151/jmsj.2015-024</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
Kondo, K. and Tanaka, H. L.: Applying the local ensemble transform Kalman
filter to the Nonhydrostatic Icosahedral Atmospheric Model (NICAM), SOLA, 5,
121–124, <a href="http://dx.doi.org/10.2151/sola.2009-031" target="_blank">doi:10.2151/sola.2009-031</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
Lorenc, A. C.: Analysis methods for numerical weather prediction, Q. J. Roy.
Meteor. Soc., 112, 1177–1194, <a href="http://dx.doi.org/10.1002/qj.49711247414" target="_blank">doi:10.1002/qj.49711247414</a>, 1986.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
Madden, R. A. and Julian, P. R.: Description of global-scale circulation
cells in the Tropics with a 40–50 day period, J. Atmos. Sci.,
<a href="http://dx.doi.org/10.1175/1520-0469(1972)029&lt;1109:DOGSCC&gt;2.0.CO;2" target="_blank">doi:10.1175/1520-0469(1972)029&lt;1109:DOGSCC&gt;2.0.CO;2</a>,
1972.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
Miura, H., Satoh, M., Nasuno, T., Noda, A. T., and Oouchi, K.: A
Madden–Julian oscillation event realistically simulated by a global
cloud-resolving model, Science, 318, 1763–1765, <a href="http://dx.doi.org/10.1126/science.1148443" target="_blank">doi:10.1126/science.1148443</a>,
2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
Miyakawa, T., Satoh, M., Miura, H., Tomita, H., Yashiro, H., Noda, A. T.,
Yamada, Y., Kodama, C., Kimoto, M., and Yoneyama, K.: Madden–Julian
oscillation prediction skill of a new-generation global model demonstrated
using a supercomputer, Nat. Commun., 5, 3769, <a href="http://dx.doi.org/10.1038/ncomms4769" target="_blank">doi:10.1038/ncomms4769</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>
Miyamoto, Y., Kajikawa, Y., Yoshida, R., Yamaura, T., Yashiro, H., and
Tomita, H.: Deep moist atmospheric convection in a subkilometer global
simulation, Geophys. Res. Lett, 40, 4922–4926, <a href="http://dx.doi.org/10.1002/grl.50944" target="_blank">doi:10.1002/grl.50944</a>, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>
Miyoshi, T.: Ensemble Kalman filter experiments with a primitive-equation global model, PhD dissertation, University of Maryland, 197 pp., 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
Miyoshi, T. and Kunii, M.: The Local Ensemble Transform Kalman filter with
the weather research and forecasting model: Experiments with real
observations, Pure Appl. Geophys., 169, 321–333,
<a href="http://dx.doi.org/10.1007/s00024-011-0373-4" target="_blank">doi:10.1007/s00024-011-0373-4</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>
Miyoshi, T. and Yamane, S.: Local Ensemble Transform Kalman filtering with an
AGCM at a T159/L48 resolution, Mon. Weather Rev., 135, 3841–3861,
<a href="http://dx.doi.org/10.1175/2007MWR1873.1" target="_blank">doi:10.1175/2007MWR1873.1</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>
Miyoshi, T., Sato, Y., and Kadowaki, T.: Ensemble Kalman filter and 4D-Var
intercomparison with the Japanese operational global analysis and prediction
system, Mon. Weather Rev., 138, 2846–2866, <a href="http://dx.doi.org/10.1175/2010MWR3209.1" target="_blank">doi:10.1175/2010MWR3209.1</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>
Miyoshi, T., Kondo, K., and Imamura, T.: The 10,240-member ensemble Kalman
filtering with an intermediate AGCM, Geophys. Res. Lett, 41, 5264–5271,
<a href="http://dx.doi.org/10.1002/2014GL060863" target="_blank">doi:10.1002/2014GL060863</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>
Miyoshi, T., Kondo, K., and Terasaki, K.: Big Ensemble Data Assimilation in
Numerical Weather Prediction, Computer, 48, 15–21, <a href="http://dx.doi.org/10.1109/MC.2015.332" target="_blank">doi:10.1109/MC.2015.332</a>,
2015.

</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
Ohfuchi, W., Nakamura, H., and Yoshioka, M. K.: 10-km mesh meso-scale resolving
simulations of the global atmosphere on the Earth Simulator: Preliminary outcomes of AFES (AGCM for the Earth Simulator), J. Earth Simul., 1, 8–34, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>
Ott, E., Hunt, B. R., Szunyogh, I., Zimin, A. V., Kostelich, E. J., Corazza,
M., Kalnay, E., Patil, D. J., and Yorke, J. A.: A local ensemble Kalman
filter for atmospheric data assimilation, Tellus A, 56, 415–428,
<a href="http://dx.doi.org/10.3402/tellusa.v56i5.14462" target="_blank">doi:10.3402/tellusa.v56i5.14462</a>, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>
Satoh, M., Tomita, H., Yashiro, H., Miura, H., Kodama, C., Seiki, T., Noda,
A. T., Yamada, Y., Goto, D., Sawada, M., Miyoshi, T.,
Niwa, Y., Hara, M., Ohno, T., Iga, S.-I., Arakawa, T., Inoue, T., and
Kubokawa, H.: The Non-hydrostatic Icosahedral Atmospheric Model: Description
and development, Progress in Earth and Planetary Science, Springer, 1, 1–32,
<a href="http://dx.doi.org/10.1186/s40645-014-0018-1" target="_blank">doi:10.1186/s40645-014-0018-1</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>
Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O., Barker, D. M., Wang,
W., and Powers, J. G.: A description of the Advanced Research WRF Version 2,
NCAR Tech Notes-468+STR, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>
Stuhne, G. R. and Peltier, W. R.: New icosahedral grid-point discretizations
of the shallow water equations on the sphere, J. Comput. Phys., 148, 23–58,
<a href="http://dx.doi.org/10.1006/jcph.1998.6119" target="_blank">doi:10.1006/jcph.1998.6119</a>, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>
Terasaki, K., Sawada, M., and Miyoshi, T.: Local Ensemble Transform Kalman
filter experiments with the Nonhydrostatic Icosahedral Atmospheric Model
NICAM, SOLA, 11, 23–26, <a href="http://dx.doi.org/10.2151/sola.2015-006" target="_blank">doi:10.2151/sola.2015-006</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>
Tomita, H.: A global cloud-resolving simulation: Preliminary results from an
aqua planet experiment, Geophys. Res. Lett, 32, L08805,
<a href="http://dx.doi.org/10.1029/2005gl022459" target="_blank">doi:10.1029/2005gl022459</a>, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>
Tomita, H., Satoh, M., and Goto, K.: An optimization of the icosahedral grid
modified by spring dynamics, J. Comput. Phys., 183, 307–331, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>
Tomita, H., Goto, K., and Satoh, M.: A new approach to atmospheric general
circulation model: Global cloud resolving model NICAM and its computational
performance, SIAM J. Sci. Comput., 30, 2755–2776, <a href="http://dx.doi.org/10.1137/070692273" target="_blank">doi:10.1137/070692273</a>,
2008.
</mixed-citation></ref-html>--></article>
