TriCCo v1.0.0 – a cubulationbased method for computing connected components on triangular grids
 ^{1}Department of Meteorology and Geophysics, University of Vienna, Austria
 ^{2}Institute for Algebra and Geometry, Department of Mathematics, OttovonGuericke University, Magdeburg, Germany
 ^{3}Institute of Meteorology and Climate Research  Department Troposphere Research Karlsruhe Institute of Technology, Germany
 ^{1}Department of Meteorology and Geophysics, University of Vienna, Austria
 ^{2}Institute for Algebra and Geometry, Department of Mathematics, OttovonGuericke University, Magdeburg, Germany
 ^{3}Institute of Meteorology and Climate Research  Department Troposphere Research Karlsruhe Institute of Technology, Germany
Abstract. We present a new method to identify connected components on a triangular grid. Triangular grids are, for example, used in atmosphere and climate models to discretize the horizontal dimension. Because they are unstructured, neighbor relations are not selfevident and identifying connected components is challenging. Our method addresses this challenge by involving the mathematical tool of cubulation. We show that cubulation allows one to map the 2d cells of the triangular grid onto the vertices of the 3d cells of a cubic grid. The latter is structured and so connected components can be readily identified on the cubic grid by previously developed software packages. An advantage is that the cubulation, i.e., the mapping between the triangular and cubic grids, needs to be computed only once, which should be benifical for analysing many data fields for the same grid.We further implement our method in a python package that we name TriCCo and that is made available via pypi and gitlab. We document the package, demonstrate its application using cloud data from the ICON atmosphere model, and characterize its computational performance. This shows that TriCCo is ready for triangular grids with 100,000 cells, but that its speed and memory requirements need to be improved to analyse larger grids.
Aiko Voigt et al.
Status: closed

RC1: 'Comment on gmd2021349', Anonymous Referee #1, 23 Dec 2021
General comments:
This is a wellwritten manuscript with beautiful figures. I enjoyed reading it
and learning about cubulation.One major issue puzzles me. I apologize if I've just missed something obvious.
The content of this review focuses on this issue. If I've missed something
obvious, simply explain and that will be sufficient. If my point is substantial,
then I'd like the revised manuscript to state clearly at the outset that
alternative, much more efficient, methods exist to solve the problem of finding
connected components of a triangulation.Again, the text itself is well written, and even if I'm right that the
particular problem of finding connected components is much more efficiently done
using standard graphtheoretic means, I suggest the text be published to
document this cubulation procedure in case it is of use in other contexts.Specific comments:
1. According to Table 2, the most expensive part of the workflow by a large
factor is the cubulation: hours vs seconds or minutes for the other steps. Given
this cost, why is it better to cubulate the triangulation and then run a
connectedcomponents algorithm on the cubulation than simply to compute the
connected components of the triangulation directly, using any of a number of
standard approaches?A standard approach is something like this, with details dependent on setting.
Represent the triangulation by
* vertex positions: an nvertex x ndim matrix P and
* triangles: an ntriangle x 3 matrix T of indices into P.
From this representation, one can construct a triangletriangle adjacency matrix
G = (V,E) in time linear in the number of objects, with edges E determined by an
adjacency rule, e.g. edge or vertex adjacency. (In the edgerule case, the
resulting graph is the dual graph defined in Defn. 2.1.) I want to emphasize
that because this procedure has linear time, computing adjacency data from the
raw triangulation is extremely fast, likely about as fast as the fastest entries
in Table 2.Now consider the cloud segmentation task. Each node v in V corresponds to a
triangle. In the cloud segmentation task, remove v and its edges from G if the
cloud fraction is below the threshold. One can then use a standard algorithm,
such as breadthfirst search, to find components in the resulting pruned graph.
Each of these operations is again linear in the number of objects. Thus, here
again this operation would be about as fast as the fastest entries in Table 2.
See, e.g., https://en.wikipedia.org/wiki/Component_(graph_theory).It is not clear to me what the computational complexity of your cubulation
implementation is, nor what the optimal complexity is. But for the former, we
can estimate from Table 2 as follows. The cost factor increase when going from
20km to 10km (4x more triangles) is 180min/12min = 15, just short of the 4^2 =
16 that would imply quadratic increase in cost. On the other hand, the factor is
9.6 when going from 40km to 20km: still superlinear (i.e., > 4) but less than
quadratic. It's possible a constant cost dominates at lower resolution, and the
15x in the 20>10 jump shows that an asymptotic quadratic cost is eventually
exposed.In any case, it is clear that cubulation is expensive.
Thus, I pose to you this question: Why should we prefer cubulation followed by
computing components rather than computing the components directly from the
triangletriangle adjacency graph?2. As a smaller issue, the figures and text imply the triangulation needs to be
geometrically regular so that the index scheme works. Is this a true constraint?
That is, will this method not work for geometrically unstructured triangular
grids? The graphbased method described above does not require regular
triangulations, nor for that matter a fixed type of polygon.Typos and grammar:
50: "we first introduce...and [to] rigorously define"
Fig. 2: "in [+the] same colors"
300: "their cube coordinates differ[s]"  RC2: 'Comment on gmd2021349', Anonymous Referee #2, 03 Apr 2022

AC1: 'Comment on gmd2021349', Aiko Voigt, 19 Jun 2022
Answer letter to the reviewers' comments
Aiko Voigt on behalf of all authors
First of all, we would like to thank both reviewers for their sincere interest in our work and their very helpful remarks, which have motivated us to substantially revise the paper. We are confident that our revision answers the comments in a satisfying manner, and hope that our revised paper will be acceptable for publication in GMD. In particular, we have adapted the introduction and the conclusion (now named discussion and conclusion, section 6) to better carve out the motivation for our new method, and to characterize its strength and limitations. We have also added a new section 5 to compare TriCCo to alternative methods based on graphs. We have also made some smaller editorial changes to increase readability, and have changed figure 11 so that the connected components are now ordered by size (which we think makes the figure somewhat easier to read). When submitting the revised manuscript we intend to also submit a trackedchanges version (assuming this is possible at GMD) to help the reviewers to quickly spot our text changes. In the following, we describe our changes in response to each reviewer comment in more detail.
Reviewer 1:
General comment: "One major issue puzzles me. I apologize if I've just missed something obvious. The content of this review focuses on this issue. If I've missed something obvious, simply explain and that will be sufficient. If my point is substantial, then I'd like the revised manuscript to state clearly at the outset that alternative, much more efficient, methods exist to solve the problem of finding connected components of a triangulation. Again, the text itself is well written, and even if I'm right that the particular problem of finding connected components is much more efficiently done using standard graphtheoretic means, I suggest the text be published to document this cubulation procedure in case it is of use in other contexts."
Answer: Thank you, we agree that other graph based approaches exist. This is now clearly stated in the introduction, and also has motivated us to include the new section 5. In this new section, we compare TriCCo in particular to a graphbased approach using the NetworkX library. We find that notwithstanding the cubulation step, which is expensive, the actual connected component labeling is faster in TriCCo. However, this comparison remains limited, and in the reworked discussion and conclusion section we now point out this limitation. We also would like to stress that while the cubulation step is expensive, its relative cost is slow as it only needs to be done once for a given grid; it also seems the cubulation step could be accelerated relatively easily by someone with more computer science expertise. These points are made clear in several places of the revised manuscript (e.g., section 6, abstract).
Specific comment 1: "1. According to Table 2, the most expensive part of the workflow by a large factor is the cubulation."
Answer: As written above, it is important to remember that the cubulation needs to be done only once, and so its relative cost decreases strongly as more time steps or simulations with the same grid are analyzed.
Specific comment 2: "As a smaller issue, the figures and text imply the triangulation needs to be geometrically regular so that the index scheme works. Is this a true constraint? That is, will this method not work for geometrically unstructured triangular grids?"
Answer: Yes, the reviewer is right here. Our method cannot handle general triangulations of the sphere, as the construction of the hyperplanes requires that each vertex has six triangles. We now state this limitation in the conclusion sections.
Typos: Thanks! We have corrected them, as well as others that we found during the revision.
Reviewer 2:
Comment: "While I enjoyed reading through the manuscript, the authors are not particularly convincing about how their algorithm is an improvement among other simpler and more efficient methods."
Answer: Yes, this was a weak point of our original manuscript. We have reworked the introduction to better describe why we have developed TriCCo and why we believe it represents a useful addition to the connected component labeling problem. The latter point is also made stronger in the discussion and conclusion section. Furthermore we have added a new section 5 to compare TriCCo to graphbased methods using breadthfirst search. In this section we find that TriCCo compares well, but we also point out the limitations of this comparison in the discussion and conclusion section (section 6).
Comment on "Connectivity": "The authors seem to engage in a bit of circular logic to justify their approach: “Because [triangular meshes] are unstructured, neighbor relations are not selfevident and identifying connected components is challenging. Our method addresses this challenge by involving the mathematical tool of cubulation.” But the method itself doesn’t actually address this challenge since the connectivity information needs to be known a priori to build the cubulation."
Answer: We believe this was a misunderstanding due to our unclear writing. Of course the adjacency information must be available also on the triangular grid. The advantage of the cubic grid is that the adjacency information is directly encoded in the cube indices. This is now written clearly in the revised introduction section and the revised discussion and conclusion section.
Comment on "Connected Components": "While the authors have clearly shown that the cubulation can be used to identify connected components, no significant effort has been made to compare and contrast their approach with other simpler, robust and efficient algorithms from the literature."
Answer: Thanks, we agree that this was a shortcoming of our paper. In the revision we have added a new section 5 to attempt such a comparison, even though we admit in the discussion in section 6 that our comparison might be biased by the choice of external libraries (cc3d versus NetworkX). As a consequence, we stress that in section 6 that cannot and do not intend to claim that TriCCo beats existing algorithms, but rather that our message is that that TriCCo works with reasonable speed (apart from the cubulation step) and provides a new approach to connected component labeling on triangular grids." We have included He et al. (2017) as a reference.
Comment on "Robustness": "While the authors emphasize early on that their study is focused on unstructured meshes, the only examples given in the text are structured triangular meshes (i.e., isometric grids consisting of equilateral triangles). Is this algorithm applicable to common refined triangular grids used in atmospheric science, such as those depicted in Figure 1?"
Answer: Thank you, this is an important point. Our method can be quite easily extended to the global ICON grid and to regional refinements, but general triangulations cannot be handled. We now devote a new paragraph in the discussion and conclusion section (section 6) to clarify this point.
Changes independent of the reviewer comments:
Fig. 11: We have now sorted the connected components according to their size so that the largest connected component corresponds to the color with value 1.
Benchmarking: We have redone the benchmarks on the new DKRZ Levante system. The previously used Mistral system was decommissioned in May 2022. The numbers do not change substantially.
In section 3.3., L318322 were an old spurious textblock that we now removed. Also, the caption of Fig. 8 was changed as "make_cubical_coordinates" was an old outdated function name and actually needs to read "compute_cubulation".
The TriCCo version for the revised paper will be 1.1.0. The zenodo archive will be updated accordingly, as well as the pypi package.
Status: closed

RC1: 'Comment on gmd2021349', Anonymous Referee #1, 23 Dec 2021
General comments:
This is a wellwritten manuscript with beautiful figures. I enjoyed reading it
and learning about cubulation.One major issue puzzles me. I apologize if I've just missed something obvious.
The content of this review focuses on this issue. If I've missed something
obvious, simply explain and that will be sufficient. If my point is substantial,
then I'd like the revised manuscript to state clearly at the outset that
alternative, much more efficient, methods exist to solve the problem of finding
connected components of a triangulation.Again, the text itself is well written, and even if I'm right that the
particular problem of finding connected components is much more efficiently done
using standard graphtheoretic means, I suggest the text be published to
document this cubulation procedure in case it is of use in other contexts.Specific comments:
1. According to Table 2, the most expensive part of the workflow by a large
factor is the cubulation: hours vs seconds or minutes for the other steps. Given
this cost, why is it better to cubulate the triangulation and then run a
connectedcomponents algorithm on the cubulation than simply to compute the
connected components of the triangulation directly, using any of a number of
standard approaches?A standard approach is something like this, with details dependent on setting.
Represent the triangulation by
* vertex positions: an nvertex x ndim matrix P and
* triangles: an ntriangle x 3 matrix T of indices into P.
From this representation, one can construct a triangletriangle adjacency matrix
G = (V,E) in time linear in the number of objects, with edges E determined by an
adjacency rule, e.g. edge or vertex adjacency. (In the edgerule case, the
resulting graph is the dual graph defined in Defn. 2.1.) I want to emphasize
that because this procedure has linear time, computing adjacency data from the
raw triangulation is extremely fast, likely about as fast as the fastest entries
in Table 2.Now consider the cloud segmentation task. Each node v in V corresponds to a
triangle. In the cloud segmentation task, remove v and its edges from G if the
cloud fraction is below the threshold. One can then use a standard algorithm,
such as breadthfirst search, to find components in the resulting pruned graph.
Each of these operations is again linear in the number of objects. Thus, here
again this operation would be about as fast as the fastest entries in Table 2.
See, e.g., https://en.wikipedia.org/wiki/Component_(graph_theory).It is not clear to me what the computational complexity of your cubulation
implementation is, nor what the optimal complexity is. But for the former, we
can estimate from Table 2 as follows. The cost factor increase when going from
20km to 10km (4x more triangles) is 180min/12min = 15, just short of the 4^2 =
16 that would imply quadratic increase in cost. On the other hand, the factor is
9.6 when going from 40km to 20km: still superlinear (i.e., > 4) but less than
quadratic. It's possible a constant cost dominates at lower resolution, and the
15x in the 20>10 jump shows that an asymptotic quadratic cost is eventually
exposed.In any case, it is clear that cubulation is expensive.
Thus, I pose to you this question: Why should we prefer cubulation followed by
computing components rather than computing the components directly from the
triangletriangle adjacency graph?2. As a smaller issue, the figures and text imply the triangulation needs to be
geometrically regular so that the index scheme works. Is this a true constraint?
That is, will this method not work for geometrically unstructured triangular
grids? The graphbased method described above does not require regular
triangulations, nor for that matter a fixed type of polygon.Typos and grammar:
50: "we first introduce...and [to] rigorously define"
Fig. 2: "in [+the] same colors"
300: "their cube coordinates differ[s]"  RC2: 'Comment on gmd2021349', Anonymous Referee #2, 03 Apr 2022

AC1: 'Comment on gmd2021349', Aiko Voigt, 19 Jun 2022
Answer letter to the reviewers' comments
Aiko Voigt on behalf of all authors
First of all, we would like to thank both reviewers for their sincere interest in our work and their very helpful remarks, which have motivated us to substantially revise the paper. We are confident that our revision answers the comments in a satisfying manner, and hope that our revised paper will be acceptable for publication in GMD. In particular, we have adapted the introduction and the conclusion (now named discussion and conclusion, section 6) to better carve out the motivation for our new method, and to characterize its strength and limitations. We have also added a new section 5 to compare TriCCo to alternative methods based on graphs. We have also made some smaller editorial changes to increase readability, and have changed figure 11 so that the connected components are now ordered by size (which we think makes the figure somewhat easier to read). When submitting the revised manuscript we intend to also submit a trackedchanges version (assuming this is possible at GMD) to help the reviewers to quickly spot our text changes. In the following, we describe our changes in response to each reviewer comment in more detail.
Reviewer 1:
General comment: "One major issue puzzles me. I apologize if I've just missed something obvious. The content of this review focuses on this issue. If I've missed something obvious, simply explain and that will be sufficient. If my point is substantial, then I'd like the revised manuscript to state clearly at the outset that alternative, much more efficient, methods exist to solve the problem of finding connected components of a triangulation. Again, the text itself is well written, and even if I'm right that the particular problem of finding connected components is much more efficiently done using standard graphtheoretic means, I suggest the text be published to document this cubulation procedure in case it is of use in other contexts."
Answer: Thank you, we agree that other graph based approaches exist. This is now clearly stated in the introduction, and also has motivated us to include the new section 5. In this new section, we compare TriCCo in particular to a graphbased approach using the NetworkX library. We find that notwithstanding the cubulation step, which is expensive, the actual connected component labeling is faster in TriCCo. However, this comparison remains limited, and in the reworked discussion and conclusion section we now point out this limitation. We also would like to stress that while the cubulation step is expensive, its relative cost is slow as it only needs to be done once for a given grid; it also seems the cubulation step could be accelerated relatively easily by someone with more computer science expertise. These points are made clear in several places of the revised manuscript (e.g., section 6, abstract).
Specific comment 1: "1. According to Table 2, the most expensive part of the workflow by a large factor is the cubulation."
Answer: As written above, it is important to remember that the cubulation needs to be done only once, and so its relative cost decreases strongly as more time steps or simulations with the same grid are analyzed.
Specific comment 2: "As a smaller issue, the figures and text imply the triangulation needs to be geometrically regular so that the index scheme works. Is this a true constraint? That is, will this method not work for geometrically unstructured triangular grids?"
Answer: Yes, the reviewer is right here. Our method cannot handle general triangulations of the sphere, as the construction of the hyperplanes requires that each vertex has six triangles. We now state this limitation in the conclusion sections.
Typos: Thanks! We have corrected them, as well as others that we found during the revision.
Reviewer 2:
Comment: "While I enjoyed reading through the manuscript, the authors are not particularly convincing about how their algorithm is an improvement among other simpler and more efficient methods."
Answer: Yes, this was a weak point of our original manuscript. We have reworked the introduction to better describe why we have developed TriCCo and why we believe it represents a useful addition to the connected component labeling problem. The latter point is also made stronger in the discussion and conclusion section. Furthermore we have added a new section 5 to compare TriCCo to graphbased methods using breadthfirst search. In this section we find that TriCCo compares well, but we also point out the limitations of this comparison in the discussion and conclusion section (section 6).
Comment on "Connectivity": "The authors seem to engage in a bit of circular logic to justify their approach: “Because [triangular meshes] are unstructured, neighbor relations are not selfevident and identifying connected components is challenging. Our method addresses this challenge by involving the mathematical tool of cubulation.” But the method itself doesn’t actually address this challenge since the connectivity information needs to be known a priori to build the cubulation."
Answer: We believe this was a misunderstanding due to our unclear writing. Of course the adjacency information must be available also on the triangular grid. The advantage of the cubic grid is that the adjacency information is directly encoded in the cube indices. This is now written clearly in the revised introduction section and the revised discussion and conclusion section.
Comment on "Connected Components": "While the authors have clearly shown that the cubulation can be used to identify connected components, no significant effort has been made to compare and contrast their approach with other simpler, robust and efficient algorithms from the literature."
Answer: Thanks, we agree that this was a shortcoming of our paper. In the revision we have added a new section 5 to attempt such a comparison, even though we admit in the discussion in section 6 that our comparison might be biased by the choice of external libraries (cc3d versus NetworkX). As a consequence, we stress that in section 6 that cannot and do not intend to claim that TriCCo beats existing algorithms, but rather that our message is that that TriCCo works with reasonable speed (apart from the cubulation step) and provides a new approach to connected component labeling on triangular grids." We have included He et al. (2017) as a reference.
Comment on "Robustness": "While the authors emphasize early on that their study is focused on unstructured meshes, the only examples given in the text are structured triangular meshes (i.e., isometric grids consisting of equilateral triangles). Is this algorithm applicable to common refined triangular grids used in atmospheric science, such as those depicted in Figure 1?"
Answer: Thank you, this is an important point. Our method can be quite easily extended to the global ICON grid and to regional refinements, but general triangulations cannot be handled. We now devote a new paragraph in the discussion and conclusion section (section 6) to clarify this point.
Changes independent of the reviewer comments:
Fig. 11: We have now sorted the connected components according to their size so that the largest connected component corresponds to the color with value 1.
Benchmarking: We have redone the benchmarks on the new DKRZ Levante system. The previously used Mistral system was decommissioned in May 2022. The numbers do not change substantially.
In section 3.3., L318322 were an old spurious textblock that we now removed. Also, the caption of Fig. 8 was changed as "make_cubical_coordinates" was an old outdated function name and actually needs to read "compute_cubulation".
The TriCCo version for the revised paper will be 1.1.0. The zenodo archive will be updated accordingly, as well as the pypi package.
Aiko Voigt et al.
Model code and software
TriCCo v1.0.0 python package Aiko Voigt, Petra Schwer, Noam von Rotberg, Nicole Knopf https://gitlab.phaidra.org/climate/tricco
Aiko Voigt et al.
Viewed
HTML  XML  Total  BibTeX  EndNote  

942  120  23  1,085  12  9 
 HTML: 942
 PDF: 120
 XML: 23
 Total: 1,085
 BibTeX: 12
 EndNote: 9
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1