the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Unified System for Evaluating, Ranking and Clustering in Diverse Scientific Domains
Abstract. Evaluating, ranking, and clustering (ERC) stand as fundamental tasks in scientific research, each requiring a mathematical foundation. This study presents an ERC system anchored in the CCHZ-DISO (Chen, Chen, Hu, and Zhou-Distance between Indices of Simulation and Observation) system. Previous research underscores the optimality achieved by the CCHZ-DISO system (Hu et al., 2022). Since the inception of CCHZ- DISO-series research by Hu et al. (2019), DISO has found extensive applications across various domains including geography, hydrology, and economics. Analogous to the CCHZ-DISO system's construction, the ERC system employs the Euclidean distance to perform evaluating, ranking, and clustering tasks. Furthermore, illustrative examples are provided to elucidate the application of the ERC system. In fact, the ERC system unified the evaluating, ranking, and clustering tasks in one simple equation which is more flexible and simpler than the present system. It will have a more widely application than CCHZ-DISO in diverse scientific domains.
- Preprint
(2242 KB) - Metadata XML
-
Supplement
(126 KB) - BibTeX
- EndNote
Status: open (until 08 Aug 2024)
-
RC1: 'Comment on gmd-2024-82', Anonymous Referee #1, 21 Jun 2024
reply
This study designed a tool for Evaluation, Ranking, and Clustering (ERC) tasks based on Euclidean distance. The main issues are as follows:
- What is the importance and necessity of using a common math mathematical framework to complete Evaluation, Ranking, and Clustering (ERC) tasks from a scientific perspective? The discussion in this research is not sufficient.
- The method established in this paper essentially uses Euclidean distance for evaluation, ranking, and clustering. If there have been any evaluation method based on Euclidean distance for the three fields mentioned above? This study lacks a comprehensive review and summary of existing methods.
- What are the advantages of the method proposed in this study compared to existing methods? What is the framework for evaluating different methods? And how can the superiority of this method be scientifically proven? There is still insufficient work in this study to address the aforementioned issues.
Citation: https://doi.org/10.5194/gmd-2024-82-RC1 -
AC1: 'Reply on RC1', Zengyun Hu, 23 Jun 2024
reply
Comment 1: What is the importance and necessity of using a common math mathematical framework to complete Evaluation, Ranking, and Clustering (ERC) tasks from a scientific perspective? The discussion in this research is not sufficient.
Reply Thanks for your good suggestion. In general, a common mathematical model can be widely applied in numerous scientific domains due to its advantage in revealing the essential of the change law in things. Especially, the common mathematical models usually have the advantages applied in the interdisciplinary areas.
The present models/approaches about Evaluation, Ranking, and Clustering always focus on some special research areas. In essential, they completely can be unified by a common mathematical model since they have the same Euclidean distance characteristics.
Moreover, the ERC system proposed in our study includes some advantages than present Evaluation, Ranking, and Clustering models, which have been illustrated in our manuscript. Therefore, it is very necessary and urgent to unified the present complex and various models about Evaluation, Ranking, and Clustering in a common mathematical model: ERC model/ ERC system.
Comment 2: The method established in this paper essentially uses Euclidean distance for evaluation, ranking, and clustering. If there has been any evaluation method based on Euclidean distance for the three fields mentioned above? This study lacks a comprehensive review and summary of existing methods.
Reply Thanks for your good suggestion. In fact, we also want to search some methods to address the three fields of Evaluation, Ranking, and Clustering. Unfortunately, there is no unified method for the three fields. That is why we propose our ERC system to address the three fields.
Previous studies, such as Taylor diagram (Taylor 2001) Nash-Sutcliffe efficiency (NSE) (Nash and Sutcliffe, 1970) and Kling-Gupta efficiency coefficient (KGE) (Gupta et al., 2008), only employed limited statistical metrics and their applications only focus some special research areas. They are not widely applied in various departments.
References
Gupta, H., Kling, H., Yilmaz, K., and Martinez, G., Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, Journal of Hydrology, 377, 80-91, doi:10.1016/j.jhydrol.2009.08.003, 2009.
Nash, J.E., Sutcliffe, J.V., River flow forecasting through. Part I. A conceptual models discussion of principles. Journal of Hydrology. 10, 282-290, 1970.
Taylor, K. Summarizing multiple aspects of model performance in a single diagram. Journal of Geophysical Research, 106, 7183-7192, 2001.
Comment 3: What are the advantages of the method proposed in this study compared to existing methods? What is the framework for evaluating different methods? And how can the superiority of this method be scientifically proven? There is still insufficient work in this study to address the aforementioned issues.
Reply Thanks for your good suggestions. The ERC System proposed in this manuscript is our series research of DISO (Distance between Indices of Simulation and Observation) (Hu et al., 2019, 2022; Zhou et al., 2021). ERC not only contains all the advantages of DISO, but also is extended to Ranking, and Clustering. The advantages of ERC System are provided as follows.
1 The dimension of ERC is from one to infinity, which is more flexible and simpler than present system.
2 It can include all the statistical metrics, not some special metrics in other models.
3 It successfully solves the multiple variables, multiple weights in the three fields.
The framework for evaluating different methods is Euclidean distance.
The Third-Party Evaluations of DISO can objective and impartial to show ERC’s advantage as follows.
Since its initial publication, CCHZ-DISO has garnered significant traction and witnessed widespread application, garnering over 100 citations in the Web of Science within the past three years. In this section, we showcase a selection of notable objective and positive evaluations from third-party sources, underscoring the impact and utility of the CCHZ-DISO system.
Kalmar et al. (2021) first recommended the DISO index for assessing historical regional precipitation simulations with the RegCM4.5 model. They compared DISO and Taylor diagram methods and found that “The advantage of using DISO versus the Taylor diagram is that the comprehensive performances of the different models are still not quantified by the latter”. In a sensitivity analysis of soil water and heat transfer parameters in community land surface models, Deng et al. (2021) introduced DISO, as discussed in section 2.3 of their paper published in the “Journal of Advances in Modeling Earth Systems”. Moreover, they declared that “In this paper, the most important and best advantage of DISO is that after normalizing the observed and simulated data, the value of DISO can express the performance of the same model at different sites”.
Wu et al. (2023) suggested that DISO has more advantages than Taylor diagrams, noting that “DISO overcomes some disadvantages of Taylor diagrams and provides an intuitive way to measure differences between various GCMs in the same assessment system”; additionally, the limitations of Taylor diagrams were discussed: “Taylor diagrams (Taylor 2001) are the most common way to assess the performance of climate products. However, they inevitably have inherent drawbacks…. The DISO (Hu et al. 2019) algorithm was designed to overcome the drawbacks that exist in Taylor diagrams. First, DISO has a higher dimensionality than Taylor diagrams… In addition, DISO can evaluate the performance of a model based on multiple metrics at the same time, which can reflect the performance of the model in different aspects”. The comprehensive performance of the CCHZ-DISO has been explored in many studies (Zhuang et al., 2023; Liu et al., 2022; Longo-Minnolo et al., 2022; Ma et al., 2022; Yin et al., 2022).
The third-party evaluations unequivocally indicate that, in comparison to Taylor diagrams, CCHZ-DISO exhibits superior advantages. It stands as an efficient and highly effective approach for the comprehensive quantification of performance across diverse models, providing a holistic assessment of their overall capabilities.
References:
Deng, M., Meng, X. and Lu, Y., et al., 2021, Impact and Sensitivity Analysis of Soil Water and Heat Transfer Parameterizations in Community Land Surface Model on the Tibetan Plateau, Journal of Advances in Modeling Earth Systems, 13, e2021MS002670.
Kalmar, T., Pieczka, H. and Pongracz, R., 2021, A sensitivity analysis of the different setups of the RegCM 4.5 model for the Carpathian region, International Journal of Climatology, 41, E1180-E1201,
Hu, Z., Chen, X. and Zhou, Q., et al., 2019, DISO: A rethink of Taylor diagram. International Journal of Climatology, 39(5), 2825-2832.
Liu, Z., Huang, J. and Xiao, X., et al., 2022, The capability of CMIP6 models on seasonal precipitation extremes over Central Asia, Atmospheric Research, 278, 106364.
Longo-Minnolo, G., Vanella, D. and Consoli, S., et al., 2022, Assessing the use of ERA5-Land reanalysis and spatial interpolation methods for retrieving precipitation estimates at basin scale, Atmospheric Research, 271, 106131.
Ma, R., Xiao, J. and Liang, S., 2022, Pixel-level parameter optimization of a terrestrial biosphere model for improving estimation of carbon fluxes with an efficient model-data fusion method and satellite-derived LAI and GPP data, Geoscientific Model Development, 15, 6637-6657.
Taylor, K., 2001, Summarizing multiple aspects of model performance in a single diagram. Journal of Geophysical Research, 106, 7183-7192.
Citation: https://doi.org/10.5194/gmd-2024-82-AC1
-
RC2: 'Comment on gmd-2024-82', Anonymous Referee #2, 25 Jun 2024
reply
The manuscript proposed an ERC system anchored in the CCHZ-DISO (Chen, Chen, Hu, and Zhou-Distance between Indices of Simulation and Observation) system. Analogous to the CCHZ-DISO system's construction, the ERC system employs the Euclidean distance to perform evaluating, ranking, and clustering tasks. Furthermore, three examples are provided to elucidate the application of the ERC system. In general, the structure of the manuscript is clear, and the expression is smooth. However, the biggest issue with the article is a lack of innovation. Readers will wonder what is truly new and what is particularly special. As one of the principal criteria in peer review, the scientific significance of this manuscript is Poor. It is crucial for the authors to provide a thorough justification for the novelty of their work and to clearly differentiate it from existing research and systems.
- The manuscript claims to present a unified system for evaluating, ranking, and clustering (ERC) based on the CCHZ-DISO system. The novelty would need to be assessed in terms of whether the integration of these three tasks into one system is indeed a new approach or if similar systems have been proposed before.
- The use of Euclidean distance as the mathematical foundation for the ERC tasks is presented as a simple yet potentially powerful method. The innovation here could be in how it is applied across different scientific domains, but it needs to be clear if this application is unique.
- The paper suggests that the ERC system is versatile and can be applied to various scientific fields. The innovation might lie in its adaptability, but it is important to scrutinize whether this cross-domain applicability is truly novel or if other systems have demonstrated similar flexibility. The manuscript posits that the ERC system simplifies complex tasks. Innovation could be in the simplification process itself, but it must be determined if this simplification is indeed novel or if it is a reiteration of existing methods. In section 5, here would be best to have a table comparing existing similar systems, as well as their advantages and disadvantages.
- To assert the novelty of the ERC system, it would be necessary to compare it with existing evaluation, ranking, and clustering methods. The manuscript should clearly articulate why and how the ERC system is different and potentially superior to these methods. In applications in section 6, the evaluation results should include comparisons with other systems. Or, at least, comparisons with other similar assessment systems should be discussed.
- If the manuscript introduces new theoretical concepts or frameworks that underpin the ERC system, these should be clearly outlined and compared with existing theories to highlight their novelty. Similarly, if the innovation lies in the practical application of the system, such as improved efficiency, accuracy, or ease of use, these benefits should be clearly demonstrated and compared with the existing state of the art. The manuscript should provide a clear explanation of why the approach taken is original, including any unique methodologies, algorithms, or data uses that have not been explored in previous research.
- In regard to the applicability of DISO, this paper should especially clarify that the majority of dimensionless measurement indices targeting multiple objectives (or different physical quantities) tend to create a significant amount of ineffective search spaces or paths, thus rendering them less suitable for the field such as the bionic or meta-heuristic artificial intelligence algorithms.
- The authors mentioned the flexibility in selecting statistical metrics for the ERC system. However, it would be helpful to provide guidance on the criteria for selecting these metrics in the context of geoscientific applications, where data characteristics can be quite varied.
- The discussion on the comparison between NSE, KGE, and CCHZ-DISO is appreciated. However, it would be advantageous to include a more comprehensive comparison with other prevalent methods in geosciences, such as the Taylor diagram (and does the DISO’s advantage is simpler?), to position the ERC system within the field better.
- The paper briefly touches on the significance testing for models with small differences in ERC values. It would be valuable to see a more detailed explanation of how this testing is conducted and its implications for geoscientific studies, where subtle differences can be critical.
- The authors state that the ERC system does not consider data characteristics such as outliers. Given the common presence of outliers in geoscience data, it is important to discuss how the ERC system's results might be affected and whether any adjustments or preprocessing are recommended.
- Technical corrections, i.e.,
In Line 340, repeated “clustering”
Figure 3, the position of its reference to is inaccurate, and all abbreviations of models should be given their full names, as well as brief introductions of different models, for broader reading.
Citation: https://doi.org/10.5194/gmd-2024-82-RC2
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
199 | 34 | 18 | 251 | 18 | 7 | 7 |
- HTML: 199
- PDF: 34
- XML: 18
- Total: 251
- Supplement: 18
- BibTeX: 7
- EndNote: 7
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1