Vision metrics


source

calculate_dsc


def calculate_dsc(
    pred:Tensor, targ:Tensor
)->Tensor:

Calculate Dice score using MONAI’s compute_dice.

Accepts tensors of various shapes and automatically reshapes to 5D format.

Args: pred: Binary prediction tensor. Accepts: - [D, H, W] single 3D volume - [C, D, H, W] single volume with channel - [B, C, D, H, W] batched volumes targ: Binary target tensor (same shape options as pred).

Returns: Dice score(s). Single value for 3D/4D input, tensor of values for 5D batch.


source

calculate_haus


def calculate_haus(
    pred:Tensor, targ:Tensor, spacing:NoneType=None
)->Tensor:

Compute 95th percentile Hausdorff distance (HD95) using MONAI.

.. deprecated:: Use [calculate_surface_metrics](https://fastmonai.no/vision_metrics.html#calculate_surface_metrics) for canonical spacing-aware HD95 in mm. [calculate_haus](https://fastmonai.no/vision_metrics.html#calculate_haus) will be removed in a future release.

HD95 is more robust than standard Hausdorff distance as it ignores the top 5% of outlier distances.

Accepts tensors of various shapes and automatically reshapes to 5D format.

Args: pred: Binary prediction tensor. Accepts: - [D, H, W] single 3D volume - [C, D, H, W] single volume with channel - [B, C, D, H, W] batched volumes targ: Binary target tensor (same shape options as pred). spacing: Voxel spacing forwarded to MONAI. None (default) computes HD95 in voxel units (unchanged legacy behaviour). When set, it must match the spatial-dim order of the input - MONAI only checks its length, so a permuted tuple silently changes the result. For canonical spacing-aware HD95 in mm, prefer [calculate_surface_metrics](https://fastmonai.no/vision_metrics.html#calculate_surface_metrics); the two implementations do not agree numerically.

Returns: HD95 value(s). Single value for 3D/4D input, tensor of values for 5D batch.


source

binary_dice_score


def binary_dice_score(
    act:Tensor, targ:Tensor
)->Tensor:

Calculates the mean Dice score for binary semantic segmentation tasks.

Args: act: Activation tensor with dimensions [B, C, W, H, D]. targ: Target masks with dimensions [B, C, W, H, D].

Returns: Mean Dice score (empty-target samples are ignored).


source

multi_dice_score


def multi_dice_score(
    act:Tensor, targ:Tensor
)->Tensor:

Calculate the mean Dice score for each class in multi-class semantic segmentation tasks.

Args: act: Activation tensor with dimensions [B, C, W, H, D]. targ: Target masks with dimensions [B, C, W, H, D].

Returns: Mean Dice score for each class.

Sensitivity and Precision


source

calculate_confusion_metrics


def calculate_confusion_metrics(
    pred:Tensor, targ:Tensor, metric_name:str
)->Tensor:

Calculate confusion matrix-based metric using MONAI.

Args: pred: Binary prediction tensor [B, C, W, H, D]. targ: Binary target tensor [B, C, W, H, D]. metric_name: One of “sensitivity”, “precision”, “specificity”, “f1 score”.

Returns: Metric values for each sample in batch.


source

binary_sensitivity


def binary_sensitivity(
    act:Tensor, targ:Tensor
)->Tensor:

Calculate mean sensitivity (recall) for binary segmentation.

Sensitivity = TP / (TP + FN) - measures the proportion of actual positives that are correctly identified.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].

Returns: Mean sensitivity score.


source

multi_sensitivity


def multi_sensitivity(
    act:Tensor, targ:Tensor
)->Tensor:

Calculate mean sensitivity for each class in multi-class segmentation.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].

Returns: Mean sensitivity for each class.


source

binary_precision


def binary_precision(
    act:Tensor, targ:Tensor
)->Tensor:

Calculate mean precision for binary segmentation.

Precision = TP / (TP + FP) - measures the proportion of positive predictions that are actually correct.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].

Returns: Mean precision score.


source

multi_precision


def multi_precision(
    act:Tensor, targ:Tensor
)->Tensor:

Calculate mean precision for each class in multi-class segmentation.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].

Returns: Mean precision for each class.

Lesion Detection Rate


source

calculate_lesion_detection_rate


def calculate_lesion_detection_rate(
    pred:Tensor, targ:Tensor, threshold:float=0.0
)->Tensor:

Calculate lesion-wise detection rate.

For each connected component (lesion) in the target, check if it is detected by the prediction. Detection criteria depends on threshold: - threshold=0: any overlap counts as detected - threshold>0: per-lesion Dice score must exceed threshold

Args: pred: Binary prediction tensor [B, C, W, H, D]. targ: Binary target tensor [B, C, W, H, D]. threshold: Minimum Dice score for a lesion to be considered detected. Default 0.0 means any overlap counts as detected.

Returns: Detection rate (detected lesions / total lesions) for each sample.


source

binary_lesion_detection_rate


def binary_lesion_detection_rate(
    act:Tensor, targ:Tensor, threshold:float=0.0
)->Tensor:

Calculate mean lesion detection rate for binary segmentation.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D]. threshold: Minimum Dice score for a lesion to be considered detected. Default 0.0 means any overlap counts as detected.

Returns: Mean lesion detection rate.


source

multi_lesion_detection_rate


def multi_lesion_detection_rate(
    act:Tensor, targ:Tensor, threshold:float=0.0
)->Tensor:

Calculate mean lesion detection rate for each class in multi-class segmentation.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D]. threshold: Minimum Dice score for a lesion to be considered detected. Default 0.0 means any overlap counts as detected.

Returns: Mean lesion detection rate for each class.

Signed Relative Volume Error (RVE)


source

calculate_signed_rve


def calculate_signed_rve(
    pred:Tensor, targ:Tensor
)->Tensor:

Calculate signed Relative Volume Error.

RVE = (pred_volume - targ_volume) / targ_volume

Positive values indicate over-segmentation (model predicts too large), negative values indicate under-segmentation (model predicts too small).

Args: pred: Binary prediction tensor [B, C, W, H, D]. targ: Binary target tensor [B, C, W, H, D].

Returns: Signed RVE for each sample in batch.


source

binary_signed_rve


def binary_signed_rve(
    act:Tensor, targ:Tensor
)->Tensor:

Calculate mean signed RVE for binary segmentation.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].

Returns: Mean signed RVE.


source

multi_signed_rve


def multi_signed_rve(
    act:Tensor, targ:Tensor
)->Tensor:

Calculate mean signed RVE for each class in multi-class segmentation.

Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].

Returns: Mean signed RVE for each class.

Surface Distance Benchmark Metrics (MONAI grid-based, mm)

Canonical, spacing-aware boundary metrics for final benchmark reporting (NOT the training loop): symmetric ASSD, HD95, and Normalized Surface Dice (NSD) at one or more tolerances, all in millimetres, computed with MONAI’s grid-based surface metrics (monai.metrics). Unlike calculate_haus (MONAI, voxel units by default), these report a single implementation with explicit voxel spacing and tolerance. Grid-based metrics carry a known voxel-discretization (staircasing) bias; for publication-grade surface distances use a mesh-based implementation (see the patch tutorials and arxiv:2410.02630).


source

calculate_surface_metrics


def calculate_surface_metrics(
    pred, targ, spacing_mm, nsd_tolerances_mm:tuple=(0.5, 1.0, 2.0), hd_percentile:int=95
)->dict:

Canonical spacing-aware surface metrics for one 3D case (benchmark reporting only).

Computes symmetric ASSD, HD95 and Normalized Surface Dice (NSD) at the given tolerance(s), all in millimetres, using MONAI’s grid-based surface metrics (monai.metrics). Intended for final benchmark evaluation, NOT inside the training loop (CPU, per-case).

Grid-based metrics have a known voxel-discretization (staircasing) bias; for publication-grade surface distances consider a mesh-based implementation.

Args: pred: Predicted mask (binary / argmax label). torch.Tensor or np.ndarray, shaped [W,H,D], [C,W,H,D] or [B,C,W,H,D] with singleton batch/channel. targ: Ground-truth mask (same shape options as pred). spacing_mm: Sequence of 3 positive floats, voxel size in mm, in the SAME axis order as the (squeezed) mask array. Derive per case from the image affine/header - do NOT hard-code; a permuted spacing silently corrupts every distance. nsd_tolerances_mm: NSD tolerance(s) in mm; each yields an nsd_tau{t}_mm key (the tolerance is float-formatted, so 1 and 1.0 both give nsd_tau1.0_mm). hd_percentile: Percentile for the robust Hausdorff distance (default 95 -> HD95).

Returns: dict with assd_mm, hd{p}_mm (hd95_mm by default), one nsd_tau{t}_mm per tolerance, and provenance surface_distance_source, tau_mm, spacing_mm, status (“ok” | “both_empty” | “one_empty”). Empty cases are scored by a guard (both empty -> 0/0/NSD 1.0; exactly one empty -> inf/inf/NSD 0.0) and never enter the backend, keeping inf/nan out of it.

Accumulated Dice Metrics (nnU-Net Style)

These metrics accumulate true positives, false positives, and false negatives across all validation batches before computing Dice. This is more statistically robust than averaging per-batch Dice scores, especially for patch-based training where patches have variable foreground ratios.


source

AccumulatedDice


def AccumulatedDice(
    n_classes:int=2, include_background:bool=False
):

nnU-Net-style accumulated Dice metric for reliable pseudo dice during training.

Instead of averaging per-batch Dice scores, this metric accumulates true positives, false positives, and false negatives across ALL validation batches, then computes Dice from the totals. This gives more weight to batches with more foreground voxels and is more statistically robust.

Args: n_classes: Number of classes including background (default: 2 for binary). include_background: Whether to include background in metric (default: False).

Example: ```python learn = Learner(dls, model, loss_func=loss_func, metrics=[AccumulatedDice(n_classes=2)])

# For checkpoint selection based on accumulated dice:
save_best = SaveModelCallback(
    monitor='accumulated_dice',
    comp=np.greater,  # Higher dice is better
    fname='best_model'
)
```

source

AccumulatedMultiDice


def AccumulatedMultiDice(
    n_classes:int=2, include_background:bool=False
):

Multi-class version of AccumulatedDice that returns per-class Dice scores.

Instead of returning a single mean Dice, this returns a tensor with the Dice score for each foreground class. Useful for monitoring per-class performance during training.

Example: python # For 3-class segmentation (background + 2 foreground classes) learn = Learner(dls, model, loss_func=loss_func, metrics=[AccumulatedMultiDice(n_classes=3)])

EMA Model Checkpoint

Exponential Moving Average (EMA) based model selection, inspired by nnU-Net’s approach of using smoothed Dice scores rather than noisy per-epoch values for checkpoint selection.


source

EMACheckpoint


def EMACheckpoint(
    monitor:str='accumulated_dice', momentum:float=0.9, comp:ufunc=greater, fname:str='best_model',
    with_opt:bool=False
):

Save model checkpoint based on EMA of a monitored metric (nnU-Net style).

Instead of saving the best model based on a single (noisy) epoch metric, this tracks the exponential moving average and saves when the EMA improves. More robust for patch-based training where per-epoch metrics fluctuate.

Formula: ema = momentum * previous_ema + (1 - momentum) * current_value

Unlike SaveModelCallback, this does NOT auto-load the best model after training. Load explicitly with learn.load(fname).

Args: monitor: Metric name to track (default: ‘accumulated_dice’). momentum: EMA momentum (default: 0.9, matching nnU-Net). Higher momentum = more smoothing. Range: (0, 1). nnU-Net uses 0.9 (keeps 90% of history, adds 10% of current epoch). comp: Comparison function (default: np.greater for higher-is-better). fname: Model save filename (default: ‘best_model’). with_opt: Whether to save optimizer state (default: False).

Example: ```python save_best = EMACheckpoint( monitor=‘accumulated_dice’, momentum=0.9, fname=‘best_model’ ) learn.fit_one_cycle(30, lr, cbs=[save_best])

# Load best model after training:
learn.load('best_model')

# Access EMA history for plotting:
save_best.ema_history
```