Vision metrics
calculate_dsc
def calculate_dsc(
pred:Tensor, targ:Tensor
)->Tensor:
Calculate Dice score using MONAI’s compute_dice.
Accepts tensors of various shapes and automatically reshapes to 5D format.
Args: pred: Binary prediction tensor. Accepts: - [D, H, W] single 3D volume - [C, D, H, W] single volume with channel - [B, C, D, H, W] batched volumes targ: Binary target tensor (same shape options as pred).
Returns: Dice score(s). Single value for 3D/4D input, tensor of values for 5D batch.
calculate_haus
def calculate_haus(
pred:Tensor, targ:Tensor, spacing:NoneType=None
)->Tensor:
Compute 95th percentile Hausdorff distance (HD95) using MONAI.
.. deprecated:: Use [calculate_surface_metrics](https://fastmonai.no/vision_metrics.html#calculate_surface_metrics) for canonical spacing-aware HD95 in mm. [calculate_haus](https://fastmonai.no/vision_metrics.html#calculate_haus) will be removed in a future release.
HD95 is more robust than standard Hausdorff distance as it ignores the top 5% of outlier distances.
Accepts tensors of various shapes and automatically reshapes to 5D format.
Args: pred: Binary prediction tensor. Accepts: - [D, H, W] single 3D volume - [C, D, H, W] single volume with channel - [B, C, D, H, W] batched volumes targ: Binary target tensor (same shape options as pred). spacing: Voxel spacing forwarded to MONAI. None (default) computes HD95 in voxel units (unchanged legacy behaviour). When set, it must match the spatial-dim order of the input - MONAI only checks its length, so a permuted tuple silently changes the result. For canonical spacing-aware HD95 in mm, prefer [calculate_surface_metrics](https://fastmonai.no/vision_metrics.html#calculate_surface_metrics); the two implementations do not agree numerically.
Returns: HD95 value(s). Single value for 3D/4D input, tensor of values for 5D batch.
binary_dice_score
def binary_dice_score(
act:Tensor, targ:Tensor
)->Tensor:
Calculates the mean Dice score for binary semantic segmentation tasks.
Args: act: Activation tensor with dimensions [B, C, W, H, D]. targ: Target masks with dimensions [B, C, W, H, D].
Returns: Mean Dice score (empty-target samples are ignored).
multi_dice_score
def multi_dice_score(
act:Tensor, targ:Tensor
)->Tensor:
Calculate the mean Dice score for each class in multi-class semantic segmentation tasks.
Args: act: Activation tensor with dimensions [B, C, W, H, D]. targ: Target masks with dimensions [B, C, W, H, D].
Returns: Mean Dice score for each class.
Sensitivity and Precision
calculate_confusion_metrics
def calculate_confusion_metrics(
pred:Tensor, targ:Tensor, metric_name:str
)->Tensor:
Calculate confusion matrix-based metric using MONAI.
Args: pred: Binary prediction tensor [B, C, W, H, D]. targ: Binary target tensor [B, C, W, H, D]. metric_name: One of “sensitivity”, “precision”, “specificity”, “f1 score”.
Returns: Metric values for each sample in batch.
binary_sensitivity
def binary_sensitivity(
act:Tensor, targ:Tensor
)->Tensor:
Calculate mean sensitivity (recall) for binary segmentation.
Sensitivity = TP / (TP + FN) - measures the proportion of actual positives that are correctly identified.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].
Returns: Mean sensitivity score.
multi_sensitivity
def multi_sensitivity(
act:Tensor, targ:Tensor
)->Tensor:
Calculate mean sensitivity for each class in multi-class segmentation.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].
Returns: Mean sensitivity for each class.
binary_precision
def binary_precision(
act:Tensor, targ:Tensor
)->Tensor:
Calculate mean precision for binary segmentation.
Precision = TP / (TP + FP) - measures the proportion of positive predictions that are actually correct.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].
Returns: Mean precision score.
multi_precision
def multi_precision(
act:Tensor, targ:Tensor
)->Tensor:
Calculate mean precision for each class in multi-class segmentation.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].
Returns: Mean precision for each class.
Lesion Detection Rate
calculate_lesion_detection_rate
def calculate_lesion_detection_rate(
pred:Tensor, targ:Tensor, threshold:float=0.0
)->Tensor:
Calculate lesion-wise detection rate.
For each connected component (lesion) in the target, check if it is detected by the prediction. Detection criteria depends on threshold: - threshold=0: any overlap counts as detected - threshold>0: per-lesion Dice score must exceed threshold
Args: pred: Binary prediction tensor [B, C, W, H, D]. targ: Binary target tensor [B, C, W, H, D]. threshold: Minimum Dice score for a lesion to be considered detected. Default 0.0 means any overlap counts as detected.
Returns: Detection rate (detected lesions / total lesions) for each sample.
binary_lesion_detection_rate
def binary_lesion_detection_rate(
act:Tensor, targ:Tensor, threshold:float=0.0
)->Tensor:
Calculate mean lesion detection rate for binary segmentation.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D]. threshold: Minimum Dice score for a lesion to be considered detected. Default 0.0 means any overlap counts as detected.
Returns: Mean lesion detection rate.
multi_lesion_detection_rate
def multi_lesion_detection_rate(
act:Tensor, targ:Tensor, threshold:float=0.0
)->Tensor:
Calculate mean lesion detection rate for each class in multi-class segmentation.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D]. threshold: Minimum Dice score for a lesion to be considered detected. Default 0.0 means any overlap counts as detected.
Returns: Mean lesion detection rate for each class.
Signed Relative Volume Error (RVE)
calculate_signed_rve
def calculate_signed_rve(
pred:Tensor, targ:Tensor
)->Tensor:
Calculate signed Relative Volume Error.
RVE = (pred_volume - targ_volume) / targ_volume
Positive values indicate over-segmentation (model predicts too large), negative values indicate under-segmentation (model predicts too small).
Args: pred: Binary prediction tensor [B, C, W, H, D]. targ: Binary target tensor [B, C, W, H, D].
Returns: Signed RVE for each sample in batch.
binary_signed_rve
def binary_signed_rve(
act:Tensor, targ:Tensor
)->Tensor:
Calculate mean signed RVE for binary segmentation.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].
Returns: Mean signed RVE.
multi_signed_rve
def multi_signed_rve(
act:Tensor, targ:Tensor
)->Tensor:
Calculate mean signed RVE for each class in multi-class segmentation.
Args: act: Activation tensor [B, C, W, H, D]. targ: Target masks [B, C, W, H, D].
Returns: Mean signed RVE for each class.
Surface Distance Benchmark Metrics (MONAI grid-based, mm)
Canonical, spacing-aware boundary metrics for final benchmark reporting (NOT the training loop): symmetric ASSD, HD95, and Normalized Surface Dice (NSD) at one or more tolerances, all in millimetres, computed with MONAI’s grid-based surface metrics (monai.metrics). Unlike calculate_haus (MONAI, voxel units by default), these report a single implementation with explicit voxel spacing and tolerance. Grid-based metrics carry a known voxel-discretization (staircasing) bias; for publication-grade surface distances use a mesh-based implementation (see the patch tutorials and arxiv:2410.02630).
calculate_surface_metrics
def calculate_surface_metrics(
pred, targ, spacing_mm, nsd_tolerances_mm:tuple=(0.5, 1.0, 2.0), hd_percentile:int=95
)->dict:
Canonical spacing-aware surface metrics for one 3D case (benchmark reporting only).
Computes symmetric ASSD, HD95 and Normalized Surface Dice (NSD) at the given tolerance(s), all in millimetres, using MONAI’s grid-based surface metrics (monai.metrics). Intended for final benchmark evaluation, NOT inside the training loop (CPU, per-case).
Grid-based metrics have a known voxel-discretization (staircasing) bias; for publication-grade surface distances consider a mesh-based implementation.
Args: pred: Predicted mask (binary / argmax label). torch.Tensor or np.ndarray, shaped [W,H,D], [C,W,H,D] or [B,C,W,H,D] with singleton batch/channel. targ: Ground-truth mask (same shape options as pred). spacing_mm: Sequence of 3 positive floats, voxel size in mm, in the SAME axis order as the (squeezed) mask array. Derive per case from the image affine/header - do NOT hard-code; a permuted spacing silently corrupts every distance. nsd_tolerances_mm: NSD tolerance(s) in mm; each yields an nsd_tau{t}_mm key (the tolerance is float-formatted, so 1 and 1.0 both give nsd_tau1.0_mm). hd_percentile: Percentile for the robust Hausdorff distance (default 95 -> HD95).
Returns: dict with assd_mm, hd{p}_mm (hd95_mm by default), one nsd_tau{t}_mm per tolerance, and provenance surface_distance_source, tau_mm, spacing_mm, status (“ok” | “both_empty” | “one_empty”). Empty cases are scored by a guard (both empty -> 0/0/NSD 1.0; exactly one empty -> inf/inf/NSD 0.0) and never enter the backend, keeping inf/nan out of it.
Accumulated Dice Metrics (nnU-Net Style)
These metrics accumulate true positives, false positives, and false negatives across all validation batches before computing Dice. This is more statistically robust than averaging per-batch Dice scores, especially for patch-based training where patches have variable foreground ratios.
AccumulatedDice
def AccumulatedDice(
n_classes:int=2, include_background:bool=False
):
nnU-Net-style accumulated Dice metric for reliable pseudo dice during training.
Instead of averaging per-batch Dice scores, this metric accumulates true positives, false positives, and false negatives across ALL validation batches, then computes Dice from the totals. This gives more weight to batches with more foreground voxels and is more statistically robust.
Args: n_classes: Number of classes including background (default: 2 for binary). include_background: Whether to include background in metric (default: False).
Example: ```python learn = Learner(dls, model, loss_func=loss_func, metrics=[AccumulatedDice(n_classes=2)])
# For checkpoint selection based on accumulated dice:
save_best = SaveModelCallback(
monitor='accumulated_dice',
comp=np.greater, # Higher dice is better
fname='best_model'
)
```
AccumulatedMultiDice
def AccumulatedMultiDice(
n_classes:int=2, include_background:bool=False
):
Multi-class version of AccumulatedDice that returns per-class Dice scores.
Instead of returning a single mean Dice, this returns a tensor with the Dice score for each foreground class. Useful for monitoring per-class performance during training.
Example: python # For 3-class segmentation (background + 2 foreground classes) learn = Learner(dls, model, loss_func=loss_func, metrics=[AccumulatedMultiDice(n_classes=3)])
EMA Model Checkpoint
Exponential Moving Average (EMA) based model selection, inspired by nnU-Net’s approach of using smoothed Dice scores rather than noisy per-epoch values for checkpoint selection.
EMACheckpoint
def EMACheckpoint(
monitor:str='accumulated_dice', momentum:float=0.9, comp:ufunc=greater, fname:str='best_model',
with_opt:bool=False
):
Save model checkpoint based on EMA of a monitored metric (nnU-Net style).
Instead of saving the best model based on a single (noisy) epoch metric, this tracks the exponential moving average and saves when the EMA improves. More robust for patch-based training where per-epoch metrics fluctuate.
Formula: ema = momentum * previous_ema + (1 - momentum) * current_value
Unlike SaveModelCallback, this does NOT auto-load the best model after training. Load explicitly with learn.load(fname).
Args: monitor: Metric name to track (default: ‘accumulated_dice’). momentum: EMA momentum (default: 0.9, matching nnU-Net). Higher momentum = more smoothing. Range: (0, 1). nnU-Net uses 0.9 (keeps 90% of history, adds 10% of current epoch). comp: Comparison function (default: np.greater for higher-is-better). fname: Model save filename (default: ‘best_model’). with_opt: Whether to save optimizer state (default: False).
Example: ```python save_best = EMACheckpoint( monitor=‘accumulated_dice’, momentum=0.9, fname=‘best_model’ ) learn.fit_one_cycle(30, lr, cbs=[save_best])
# Load best model after training:
learn.load('best_model')
# Access EMA history for plotting:
save_best.ema_history
```