Evaluation


Task 1 assessment: For assessing segmentation results of isolated structures, two metrics will be employed:

  • volumetric Dice similarity coefficients (DSC): calculates the similarity of prediction and ground truth by comparing the shared voxel volume relative to the total voxel volume, providing a global measure of segmentation accuracy.
  • Betti matching error in dimension 0: quantifies the discrepancies in the number and spatial arrangement of individual segments [1].


Task 2 assessment: For assessing segmentation results of contiguous structures, four metrics will be employed:

  • volumetric Dice similarity coefficients (DSC)
  • Betti matching error [1]: evaluates how well the prediction matches the topological features of the ground truth, helping to identify whether the segmentation algorithm preserves the connectiveness and branching patterns of contiguous structures.
  • Centerline-Dice similarity coefficients (cIDice) [2]: evaluates the voxel-wise overlap of the central axis of contiguous structures, helping to assess how well the centerline of the predicted structure matches the centerline of the ground truth.
  • Hausdorff Distance 95% Percentile (HD95): measures the maximum distance between the surface points of prediction and that of ground truth, helping to identify how well the boundaries of the segmented structure align.

Assessments are provided separately for each task. For every task, there are distinct evaluations for algorithms with and without SSL. The assessments of algorithms without SSL are included only as a reference to measure the improvement achieved through SSL.

[1] N. Stucki et al. Topologically faithful image segmentation via induced matching of persistence barcodes. In International Conference on Machine Learning, 2023, pp. 32698-32727.
[2] S. Shit et al. clDice - a novel topology-preserving loss function for tubular structure segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16555-16564.


Ranking


  • The final ranking will be determined based solely on the performance in the final test phase.

  • The ranking of a submitted algorithm is determined through the following process:

    • Compute the metric scores for each test case;
    • Calculate the average of the metric scores across all test cases for each individual metric;
    • Rank the averaged scores for each metric independently based on its specific optimization trend (lower is better for Betti matching error and HD95, while higher is better for the rest metrics);
    • Determine the overall ranking of the submitted algorithm by calculating the mean rank across all metrics;
    • If two or more algorithms have equal final ranks, the prize will be shared equally among them.
  • The top 3 or top 5 algorithms are determined for each task based on the assessment results of the algorithms with SSL.
  • We will employ bootstrapping and leave-one-out analyses to assess the robustness and stability of the rankings. The results will be presented during the in-person challenge event.