ASSESSMENT METHODS for Task 2: Aneurysm Segmentation
Although the aneurysm segmentation ground truth was provided in both, NIFTI image and STL geometry format, the main format for this challenge is the NIFTI image format. Hence, submitted results must be in NIFTI image format and will be compared against a ground truth of NIFTI images. Please note that an instance segmentation of the individual aneurysms is expected (e.g. the predicted segmentation should label the first detected aneurysm as 1, the second as 2, and so on). More information on the submission format will follow.
Runtime is a crucial parameter with regard to clinical applicability and shall be provided together with hardware requirements for all submissions.
The metrics for the segmentation assessment is based on the comparison segmentation of the masks M*_cA provided by the participants with ground truth masks M_cA from the expert annotations. We intend to calculate standard metrics for segmentation results. Class probabilities will not be considered.
b. Hausdorff distance:
c. Average distance
d. Pearson correlation coefficient r between predicted V^*and reference volume V of all aneurysms
e. Bias (b) computed as the mean absolute difference of predicted and reference volume
f. Standard deviation (σ) of the difference between predicted and reference volumes
The segmentation is the basis for the quantitative assessment of the aneurysms. It should enable the extraction of shape and volume parameters for the assessment of change over time or the comparison with decision thresholds. Therefore, the overlap, and distance from reference segmentations is important. For the assessment of volumes, we also analyze, how well the results correlate over the cohort and if there is a bias.
For the ranking, we will perform a normalization according to the maximum among all participants so that each individual metric takes a value between 0 (worst case among all participants) and 1 (perfect fit between the reference and predicted segmentation). The ranking score is calculated as the average of the normalized metrics. We consider all metrics as equally important for the application context and therefore try to integrate them with equal weight in the scoring system.